| NRMSE | KS-Test | Theil's | ||
| $ \texttt{statistic}$ | $ \texttt{p-value}$ | $ U_2 $ | ||
| SES | $ 1.20 $ | $ 0.71 $ | $ 0.053 $ | $ 1.47 $ |
| DES | $ 0.63 $ | $ 0.43 $ | $ 0.58 $ | $ 0.998 $ |
| QIH-LSQ | $ 0.59 $ | $ 0.43 $ | $ 0.42 $ | $ 0.69 $ |
| QIH-I-LSQ | $ 0.60 $ | $ 0.43 $ | $ 0.42 $ | $ 0.76 $ |
In this article the authors introduce a spline Hermite quasi-interpolation technique for the preprocessing operations of imputation and smoothing of univariate time series. The constructed model is then applied for the forecast and for the anomaly detection. In particular, for the latter case, algorithms based on the combination of quasi-interpolation, dynamic copulas and clustering have been proposed. Some numerical results are included showing the effectiveness of the presented techniques.
| Citation: |
Figure 13. Scatterplot of $ x(t)-s(t) $ vs $ s^\prime(t) $ for the QIH-I-LOF (a) and of $ x(t)-s(t) $ vs Kendall's tau (b) for QIH-I-DC-DBSCAN, both for problem A4Benchmark-TS10. Computed normal points: magenta bullet. Computed anomalies: green diamond. True normal behavior: blue +. True anomalies: yellow x
Table 1. Statistics ran on the results for the livestock sheep dataset
| NRMSE | KS-Test | Theil's | ||
| $ \texttt{statistic}$ | $ \texttt{p-value}$ | $ U_2 $ | ||
| SES | $ 1.20 $ | $ 0.71 $ | $ 0.053 $ | $ 1.47 $ |
| DES | $ 0.63 $ | $ 0.43 $ | $ 0.58 $ | $ 0.998 $ |
| QIH-LSQ | $ 0.59 $ | $ 0.43 $ | $ 0.42 $ | $ 0.69 $ |
| QIH-I-LSQ | $ 0.60 $ | $ 0.43 $ | $ 0.42 $ | $ 0.76 $ |
Table 2. Statistics ran on the results for PS time series
| NRMSE | KS-Test | Theil's | ||
| $\texttt{statistic}$ | $\texttt{p-value}$ | $ U_2 $ | ||
| SES | $ 0.93 $ | $ 0.625 $ | $ 0.087 $ | $ 1.129 $ |
| DES | $ 0.99 $ | $ 0.625 $ | $ 0.087 $ | $ 0.84 $ |
| QIH-LSQ | $ 0.98 $ | $ 0.625 $ | $ 0.087 $ | $ 0.86 $ |
| QIH-I-LSQ | $ 1.10 $ | $ 0.5 $ | $ 0.282 $ | $ 0.749 $ |
Table 3. Mean values of the recall, overall accuracy and ROC-AUC for the A4Benchmark for the compared algorithms
| RECALL | OA | ROC-AUC | |
| QIH-I-DBSCAN | 0.917 | 0.942 | 0.929 |
| QIH-I-DC-DBSCAN | 0.939 | 0.980 | 0.959 |
| DBSCAN | 0.984 | 0.067 | 0.523 |
| QIH-I-LOF | 0.973 | 0.955 | 0.964 |
| QIH-I-DC-LOF | 0.897 | 0.984 | 0.940 |
| LOF | 0.208 | 0.991 | 0.601 |
| QIH-I-IF | 0.940 | 0.791 | 0.865 |
| QIH-I-DC-IF | 0.958 | 0.882 | 0.920 |
| IF | 0.620 | 0.728 | 0.674 |
| [1] |
W. Aigner, S. Miksch, H. Schumann and C. Tominski, Visualization of Time-Oriented Data, Springer Science & Business Media, 2011.
doi: 10.1007/978-0-85729-079-3.
|
| [2] |
A. Andrisani, R. M. Mininni, F. Mazzia, G. Settanni, A. Iurino, S. Tangaro, A. Tateo and R. Bellotti, Applications of PDEs inpainting to magnetic particle imaging and corneal topography, Opuscula Mathematica, 39 (2019), 453-482.
doi: 10.7494/OpMath.2019.39.4.453.
|
| [3] |
T. Andrysiak, L. Saganowski and W. Mazurczyk, Network anomaly detection for railway critical infrastructure based on autoregressive fractional integrated moving average, EURASIP Journal on Wireless Communications and Networking, 245 (2016), 1-14.
doi: 10.1186/s13638-016-0744-8.
|
| [4] |
R. Armina, A. M. Zain, N. A. Ali and R. Sallehuddin, A review on missing value estimation using imputation algorithm, In Journal of Physics: Conference Series, 892 (2017).
doi: 10.1088/1742-6596/892/1/012004.
|
| [5] |
N. Benlagha and L. Noureddine, A time-varying copula approach for modelling dependency: New evidence from commodity and S & P500 markets, Journal of Multinational Financial Management, 892 (2016).
|
| [6] |
G. E. Box, G. M. Jenkins, G. C. Reinsel and G. M. Ljung, Time Series Analysis: Forecasting and Control, John Wiley & Sons, 2016.
|
| [7] |
M. M. Breunig, H. P. Kriegel, R. T. Ng and J. Sander, LOF: identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 29 (2000), 93-104.
doi: 10.1145/342009.335388.
|
| [8] |
F. Calabrò, A. Falini, M. L. Sampoli and A. Sestini, Efficient quadrature rules based on spline quasi-interpolation for application to IGA-BEMs, Journal of Computational and Applied Mathematics, 338 (2018), 153-167.
doi: 10.1016/j.cam.2018.02.005.
|
| [9] |
V. Chandola, A. Banerjee and V. Kumar, Anomaly detection: A survey, ACM Computing Surveys (CSUR), 41 (2009), 1-58.
|
| [10] |
M. P. Clements, P. H. Franses and N. R. Swanson, Forecasting economic and financial time-series with non-linear models, International Journal of Forecasting, 20 (2004), 169-183.
doi: 10.1016/j.ijforecast.2003.10.004.
|
| [11] |
W. P. Cleveland and G. C. Tiao, Decomposition of seasonal time series: A model for the census X-11 program, Journal of the American Statistical Association, 71 (1976), 581-587.
doi: 10.1080/01621459.1976.10481532.
|
| [12] |
W. S. Cleveland, Robust locally weighted regression and smoothing scatterplots, Journal of the American Statistical Association, 74 (1979), 829-836.
doi: 10.1080/01621459.1979.10481038.
|
| [13] |
J. Contreras, R. Espinola, F. J. Nogales and A. J. Conejo, ARIMA models to predict next-day electricity prices, IEEE Transactions on Power Systems, 18 (2003), 1014-1020.
|
| [14] |
C. de Boor, Splines as Linear Combinations of B-Splines, Lorentz, G.G., et al. (eds.) Approximation Theory Ⅱ, pp. 1–47. Academic Press, San Diego, 1976.
|
| [15] |
C. de Boor, A Practical Guide to Splines, revised edn., Springer, Berlin, 2001.
|
| [16] |
C. de Boor and M. G. Fix, Spline approximation by quasi-interpolants, J. Approx. Theory, 8 (1973), 19-54.
doi: 10.1016/0021-9045(73)90029-4.
|
| [17] |
A. M. De Livera, R. J. Hyndman and R. D. Snyder, Forecasting time series with complex seasonal patterns using exponential smoothing, Journal of the American Statistical Association, 106 (2011), 1513-1527.
doi: 10.1198/jasa.2011.tm09771.
|
| [18] |
F. Durante and C. Sempi, Principles of Copula Theory, 1$^{st}$ edition, Chapman and Hall/CRC, New York, 2015.
|
| [19] |
M. Ester, H. P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, In Kdd, 96 (1996), 226-231.
|
| [20] |
A. Falini, C. Giannelli, T. Kanduč, M. L. Sampoli and A. Sestini, An adaptive IgA-BEM with hierarchical B-splines based on quasi-interpolation quadrature schemes, Internat. J. Numer. Methods Engrg., 117 (2019), 1038-1058.
doi: 10.1002/nme.5990.
|
| [21] |
A. Falini, G. Castellano, C. Tamborrino, F. Mazzia, R. M. Mininni, A. Appice and D. Malerba, Saliency detection for hyperspectral images via sparse-non negative-matrix-factorization and novel distance measures, In 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, (EAIS) (2020), 1–8.
doi: 10.1109/EAIS48028.2020.9122749.
|
| [22] |
A. Falini and T. Kanduč, A study on spline quasi-interpolation based quadrature rules for the isogeometric Galerkin BEM, In Advanced Methods for Geometric Modeling and Numerical Simulation, Springer, Cham., (2019), 99–125.
|
| [23] |
A. Falini, C. Tamborrino, G. Castellano, F. Mazzia, R. M. Mininni, A. Appice and D. Malerba, Novel reconstruction errors for saliency detection in hyperspectral images, In International Conference on Machine Learning, Optimization, and Data Science, Springer, Cham. (2020), 113–124.
doi: 10.1007/978-3-030-64583-0_12.
|
| [24] |
T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters, 27 (2006), 861-874.
doi: 10.1016/j.patrec.2005.10.010.
|
| [25] |
T. C. Fu, A review on time series data mining, Engineering Applications of Artificial Intelligence, 24 (2011), 164-181.
doi: 10.1016/j.engappai.2010.09.007.
|
| [26] |
M. Gavrilov, D. Anguelov, P. Indyk and R. Motwani, Mining the stock market (extended abstract) which measure is best?, In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000,487–496.
|
| [27] |
P. J. Green and B. W. Silverman, Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, Monographs on Statistics and Applied Probability, 58. Chapman & Hall, London, 1994.
doi: 10.1201/b15710.
|
| [28] |
H. S. Guirguis and G. A. Felder, Further advances in forecasting day-ahead electricity prices using time series models, KIEE International Transactions on Power Engineering, 4 (2004), 159-166.
|
| [29] |
J. J. Guo and P. B. Luh, Selecting input factors for clusters of Gaussian radial basis function networks to improve market clearing price prediction, IEEE Transactions on Power Systems, 18 (2003), 665-672.
|
| [30] |
W. Härdle, H. Lütkepohl and R. Chen, A review of nonparametric time series analysis, International Statistical Review, 65 (1997), 49-72.
|
| [31] |
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer Series in Statistics. Springer, New York, 2009.
doi: 10.1007/978-0-387-84858-7.
|
| [32] |
J. L. Hodges, Jr., The significance probability of the Smirnov two-sample test, Ark. Mat., 3 (1958), 469-486.
doi: 10.1007/BF02589501.
|
| [33] |
R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice. OTexts, 2018.
|
| [34] |
H. Joe, Dependence Modeling with Copulas, Monographs on Statistics and Applied Probability, 134. CRC Press, Boca Raton, FL, 2015.
|
| [35] |
R. H. Jones, Maximum likelihood fitting of ARMA models to time series with missing observations, Technometrics, 22 (1980), 389-395.
doi: 10.1080/00401706.1980.10486171.
|
| [36] |
P. S. Kalekar, Time series forecasting using holt-winters exponential smoothing, Kanwal Rekhi School of Information Technology, 4329008 (2004), 1-13.
|
| [37] |
M. G. Kendall, Rank Correlation Methods, Griffin, 1948.
|
| [38] |
W. Kim, B.-J. Choi, E.-K. Hong, S.-K. Kim and D. Lee, A taxonomy of dirty data, Data Min. Knowl. Discov., 7 (2003), 81-99.
doi: 10.1023/A:1021564703268.
|
| [39] |
F. T. Liu, K. M. Ting and Z. H. Zhou, Isolation forest, In 2008 Eighth IEEE International Conference on Data Mining, (2008), 413–422.
doi: 10.1109/ICDM.2008.17.
|
| [40] |
F. T. Liu, K. M. Ting and Z.-H. Zhou, Isolation-based anomaly detection, ACM Transactions on Knowledge Discovery from Data, 6 (2012), 1-39.
doi: 10.1145/2133360.2133363.
|
| [41] |
T. Lyche and L. L. Schumaker, Local spline approximation, J. Approx. Theory, 15 (1975), 294-325.
doi: 10.1016/0021-9045(75)90091-X.
|
| [42] |
F. Mazzia and A. Sestini, The BS class of Hermite spline quasi-interpolants on nonuniform knot distributions, BIT Numerical Mathematics, 49 (2009), 611-628.
doi: 10.1007/s10543-009-0229-9.
|
| [43] |
F. Mazzia and A. Sestini, Quadrature formulas descending from BS Hermite spline quasi-interpolation, J. Comput. Appl. Math., 236 (2012), 4105-4118.
doi: 10.1016/j.cam.2012.03.015.
|
| [44] |
F. Mazzia, A. Sestini and D. Trigiante, B-spline multistep methods and their continuous extensions, SIAM J. Numer. Anal., 44 (2006), 1954-1973.
doi: 10.1137/040614748.
|
| [45] |
F. Mazzia, A. Sestini and D. Trigiante, BS linear multistep methods on non-uniform meshes, JNAIAM J. Numer. Anal. Ind. Appl. Math., 1 (2006), 131-144.
|
| [46] |
F. Mazzia, A. Sestini and D. Trigiante, The continous extension of the B-spline linear multistep methods for BVPs on non-uniform meshes, Appl. Numer. Meth., 59 (2009), 723-738.
doi: 10.1016/j.apnum.2008.03.036.
|
| [47] |
S. Moritz and T. Bartz-Beielstein, ImputeTS: Time series missing value imputation in R, R. J., 9 (2017), 207-218.
doi: 10.32614/RJ-2017-009.
|
| [48] |
F. Muharemi, D. Logofătu and F. Leon, Review on general techniques and packages for data imputation in R on a real world dataset, In International Conference on Computational Collective Intelligence, 2018,386–395, Springer, Cham.
doi: 10.1007/978-3-319-98446-9_36.
|
| [49] |
K. R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen and V. Vapnik, Predicting time series with support vector machines, In International Conference on Artificial Neural Networks, 1997. Springer, Berlin, Heidelberg.
|
| [50] |
R. B. Nelsen, An Introduction to Copulas, 2$^{nd}$ edition, Springer Publishing Company, 2006.
|
| [51] |
T. Niimura, H.-S. Ko and K. Ozawa, A day-ahead electricity price prediction based on a fuzzy-neuro autoregressive model in a deregulated electricity market, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, 2 (2002), 1362-1366.
doi: 10.1109/IJCNN.2002.1007714.
|
| [52] |
P. Omenzetter and J. M. W. Brownjohn, Application of time series analysis for bridge monitoring, Smart Materials and Structures, 15 (2006), 129.
doi: 10.1088/0964-1726/15/1/041.
|
| [53] |
A. J. Pattom, Modelling asymmetric exchange rate dependence, Internat. Econom. Rev., 10 (2006), 527-556.
doi: 10.1111/j.1468-2354.2006.00387.x.
|
| [54] |
B. Ramosaj, L. Amro and M. Pauly, A cautionary tale on using imputation methods for inference in matched-pairs design, Bioinformatics, 36 (2020), 3099-3106.
doi: 10.1093/bioinformatics/btaa082.
|
| [55] |
F. J. Rohlf and R. R. Sokal, Statistical Tables, Macmillan, 1995.
|
| [56] |
P. Sablonnière, Positive spline operators and orthogonal splines, J. Approx. Theory, 52 (1988), 28-42.
doi: 10.1016/0021-9045(88)90035-4.
|
| [57] |
P. Sablonnière, Univariate spline quasi-interpolants and applications to numerical analysis, Rend. Semin. Mat. Univ. (Torino), 63 (2005), 211-222.
|
| [58] |
P. Sablonnière and D. Sbibih, Integral spline operators exact on polynomials, Approx. Theory Appl., 10 (1994), 56-73.
|
| [59] |
X. Shao, Self-normalization for time series: A review of recent developments, J. Amer. Statist. Assoc., 110 (2015), 1797-1817.
doi: 10.1080/01621459.2015.1050493.
|
| [60] |
R. H. Shumway and D. S. Stoffer, Time Series Analysis and its Applications, Fourth edition, Springer, Cham, 2017.
doi: 10.1007/978-3-319-52452-8.
|
| [61] |
M. Sklar, Fonctions de Répartition à $n$ Dimensions et Leurs Marges, Publ. Inst. Statist. Univ. Paris, 8 (1959), 229–231.
|
| [62] |
M. K. P. So and C. Y. T. Yeung, Vine-copula GARCH model with dynamic conditional dependence, Comput. Statist. Data Anal., 76 (2014), 655-671.
doi: 10.1016/j.csda.2013.08.008.
|
| [63] |
M. Teng, Anomaly detection on time series, 2010 IEEE International Conference on Progress in Informatics and Computing, 1 (2010), 603-608.
|
| [64] |
H. Theil, Economic Forecasts and Policy, North-Holland Pub. Co., 1961.
|
| [65] |
H. Theil, Applied Economic Forecasting, North-Holland Pub. Co., 1971.
|
| [66] |
G. S. Watson, Smooth regression analysis, Sankhyā: The Indian Journal of Statistics, Series A, 26 (1964), 359-372.
|
| [67] |
C. Zhang, D. Song, Y. Chen, X. Feng, C. Lumezanu, W. Cheng, J. Ni, B. Zong, H. Chen and N. V. Chawla, A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data, Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 1409-1416.
doi: 10.1609/aaai.v33i01.33011409.
|
Results on "tsAirgap": the dots in blue are the known data, the magenta diamonds the missing values and the yellow squares are the imputed values
Three random walk patterns are generated. The quadratic continuous model
The same time series are now approximated with the quadratic continuous model obtained by using QIH-LSQ
Given the original three random walk patterns, the use of QIH-I-LSQ
We see a cubic, smooth continuous model
Smoothing and forecast of livestock, sheep in Asia. Comparison of SES, DES, QIH-LSQ
PS time series regularized with QIH-LSQ
Zoom in of the forecasting task performed on the last
A1Benchmark-real19 time series and Kendall's tau time-varying copula
A1Benchmark-real25 time series and Kendall's tau time-varying copula
A4Benchmark-TS10 time series and Kendall's tau time-varying copula
A4Benchmark-TS11 time series and Kendall's tau time-varying copula
Scatterplot of