Article Contents
Article Contents

A bidirectional weighted boundary distance algorithm for time series similarity computation based on optimized sliding window size

• * Corresponding author: Zhaohui Tang
• The existing method of determining the size of the time series sliding window by empirical value exists some problems which should be solved urgently, such as when considering a large amount of information and high density of the original measurement data collected from industry equipment, the important information of the data cannot be maximally retained, and the calculation complexity is high. Therefore, by studying the effect of sliding window on time series similarity technology in practical application, an algorithm to determine the initial size of the sliding window is proposed. The upper and lower boundary curves with a higher fitting degree are constructed, and the trend weighting is introduced into the $LB\_Hust$ distance calculation method to reduce the difficulty of mathematical modeling and improve the efficiency of data similarity computation.

Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

 Citation:

• Figure 1.  Distance between three series

Figure 2.  The sliding window principle

Figure 3.  The normalized state of 6 types time serie

Figure 4.  Quadratic distribution of window size

Figure 5.  Steplength range

Figure 6.  Time series of the three generators

Figure 7.  Clustering result comparison

Figure 8.  Clustering error rate with different weight coefficients

Figure 9.  Precision of five methods on5 data sets

Figure 10.  Clustering error rate with different weight coefficients

Figure 11.  Precision of five methods on5 data sets

Figure 12.  Runtime of five methods on 5 data sets

Table 1.  Dataset attribute

 Type Items Status 1 1-100 Normal 2 101-200 Cyclic 3 201-300 Increasing trend 4 301-400 Decreasing trend 5 401-500 Upward shift 6 501-600 Downward shift

Table 2.  the combination value of ${w_s}$ and ${L_s}$ and the corresponding distance ${D_T}$

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 49.1 - - - - - - - - - - - - - - 3 55.4 - - - - - - - - - - - - - - 4 54.8 60.6 - - - - - - - - - - - - - 5 58.7 61.2 - - - - - - - - - - - - - 6 64.3 65.3 60.0 - - - - - - - - - - - - 7 62.5 63.9 61.3 - - - - - - - - - - - - 8 64.8 65.6 64.0 62.6 - - - - - - - - - - - 9 62.5 61.2 63.5 62.3 - - - - - - - - - - - 10 63.4 63.9 59.1 62.9 60.8 - - - - - - - - - - 11 58 61.9 62.1 58.5 60.1 - - - - - - - - - - 12 61.3 61.2 58.3 61.1 60.9 59.1 - - - - - - - - - 13 60.2 59.9 59.1 49.3 58.2 49.3 - - - - - - - - - 14 49.9 59.6 60.2 60.8 59.1 60.1 61.7 - - - - - - - - 15 55.9 56.2 53.2 48.2 60.2 49.2 58.7 - - - - - - - - 16 60.9 55.2 58.3 58.0 52.1 50.0 56.1 49.0 - - - - - - - 17 49.2 53.1 54.4 55.2 60.8 59.4 51.7 53.3 - - - - - - - 18 53.7 54.4 52.0 52.3 52.8 60.1 59.2 49.2 49.2 - - - - - - 19 58.3 60.8 55.1 50.3 58.7 58.0 58.1 53.0 51.9 - - - - - - 20 50.7 53.3 55.8 50.1 42.3 45.7 50.0 51.2 51.2 46.3 - - - - - 21 49.6 50.2 46.9 46.5 47.2 47.1 47.5 46.0 48.2 43.9 - - - - - 22 51.3 49.9 48.2 46.7 50.0 45.0 40.0 39.2 39.5 40.7 39.1 - - - - 23 39.9 54.2 50.0 49.8 49.0 36.8 36.5 38.1 42.5 43.9 35.9 - - - - 24 46.8 51.9 48.3 45.0 38.2 42.7 39.4 50.2 41.9 38.4 43.3 51.8 - - - 25 51.6 40.0 58.7 43.1 40.0 39.4 35.0 45.3 45.9 41.2 38.1 40.9 - - - 26 47.3 48.6 50.3 39.6 42.6 55.2 42.0 36.1 35.0 42.0 43.8 39.5 47.8 - - 27 47.5 51.3 40.0 41.6 39.5 35.0 50.0 49.2 39.4 38.4 35.6 39.2 49.9 - - 28 45 40.4 38.4 35.0 35.7 46.2 50.6 45.2 39.1 39.6 42.1 48.2 40.0 38.9 - 29 50.1 48.3 40.2 41.6 35.9 36.1 40.3 39.4 50.1 46.3 39.6 35.9 35.9 35.0 - 30 42.2 49.8 45.0 39.2 40.0 38.9 40.4 39.3 37.5 38.6 36.3 36.9 35.0 36.2 35.2

Table 3.  5 Groups Dataset

 Dataset Samples Categories Attributes temperature 148 3 2 pressure 169 4 12 position 327 10 17 concentration 112 6 16 flow rate 236 5 7

Table 4.  Cross-validation results

 Dataset The optimal value Average precision ${w_s}$ ${w_n}$ ${w_p}$ test set training set temperature 8 0.6 0.4 ${\rm{90\% }}$ ${\rm{92\% }}$ pressure 9 0.6 0.4 ${\rm{89\% }}$ ${\rm{91\% }}$ position 8 0.6 0.4 ${\rm{91\% }}$ ${\rm{92\% }}$ concentration 8 0.6 0.4 ${\rm{90\% }}$ ${\rm{91\% }}$ flow rate 8 0.6 0.4 ${\rm{90\% }}$ ${\rm{92\% }}$
•  [1] R. Belohlavek and V. Vychodil, Relational similarity-based model of data part 1: Foundations and query systems, Int. J. General Syst., 46 (2017), 671-751.  doi: 10.1080/03081079.2017.1357550. [2] W. Bian and D. Tao, Max-Min distance analysis by using sequential sdp relaxation for dimension reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2011), 1037-1050.  doi: 10.1109/TPAMI.2010.189. [3] M. R. Chernick, Wavelet Methods for Time Series Analysis, Technometrics, 43 (2016), 491-508. [4] L. Dong, S. Liu and H. Zhang, A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples, Pattern Recognition, 64 (2017), 374-385.  doi: 10.1016/j.patcog.2016.11.026. [5] G. Hesamian and M. G. Akbari, A semi-parametric model for time series based on fuzzy data, IEEE Transactions on Fuzzy Systems, 99 (2018), 1-10.  doi: 10.1109/TFUZZ.2018.2791931. [6] K. Hornik, I. Feinerer and M. Kober, et al., Spherical k-means clustering, J. Statistical Software, 50 (2017), 1-22.  doi: 10.18637/jss.v050.i10. [7] B. Hu, P. C. Dixon and J. V. Jacobs, et al., Machine learning algorithms based on signals from a single wearable inertial sensor can detect surface- and age-related differences in walking, J. Biomechanics, 71 (2018), 36-48.  doi: 10.1016/j.jbiomech.2018.01.005. [8] I. Güler and M. Meghdadi, A different approach to off-line handwritten signature verification using the optimal dynamic time warping algorithm, Digital Signal Processing, 18 (2008), 940-950.  doi: 10.1016/j.dsp.2008.06.005. [9] H. Ji, C. Zhou and Z. Liu, An approximate representation method for time series symbol aggregation based on the distance between origin and end, Computer Science, 10 (2018), 135-147. [10] R. J. Kate, Using dynamic time warping distances as features for improved time series classification, Data Min. Knowl. Discov., 30 (2016), 283-312.  doi: 10.1007/s10618-015-0418-x. [11] S. W. Kim, J. Kim and S. Park, Physical database design for efficient time-series similarity search, IEICE Trans Commun., 91 (2008), 1251-1254.  doi: 10.1093/ietcom/e91-b.4.1251. [12] G. Lee, U. Yun and K. Ryu, Sliding window based weighted maximal frequent pattern mining over data streams, Expert Syst. Appl., 41 (2014), 694-708.  doi: 10.1016/j.eswa.2013.07.094. [13] R. Li, X. Wu and S. Yang, Dynamic on-state resistance test and evaluation of GaN power devices under hard and soft switching conditions by double and multiple pulses, IEEE Transactions on Power Electronics, 34 (2018), 1-6.  doi: 10.1109/TPEL.2018.2844302. [14] T. Luo, C. Hou and F. Nie, Dimension reduction for non-Gaussian data by adaptive discriminative analysis, IEEE Transactions on Cybernetics, 49 (2018), 1-14.  doi: 10.1109/TCYB.2018.2789524. [15] M. D. C. Moura, E. Zio and I. D. Lins, et al., Failure and reliability prediction by support vector machines regression of time series data, Reliability Engineering and Syst. Safety, 96 (2017), 1527-1534.  doi: 10.1016/j.ress.2011.06.006. [16] S. J. Noh, D. Shim and M. Jeon, Adaptive sliding-window strategy for vehicle detection in highway environments, IEEE Transactions on Intelligent Transportation Syst., 17 (2016), 323-335.  doi: 10.1109/TITS.2015.2466652. [17] N. M. Parthaláin, Q. Shen and R. Jensen, A distance measure approach to exploring the rough set boundary region for attribute reduction, IEEE Transactions on Knowledge and Data Engineering, 22 (2010), 305-317.  doi: 10.1109/TKDE.2009.119. [18] F. Petitjean, G. Forestier and G. I. Webb, Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm, Knowledge and Information Syst., 47 (2016), 1-26.  doi: 10.1007/s10115-015-0878-8. [19] H. Ren, M. Liu and Z. Li, A piecewise aggregate pattern representation approach for anomaly detection in time series, Knowledge-Based Syst., 21 (2017), 213-220.  doi: 10.1016/j.knosys.2017.07.021. [20] J. W. Roh and B. K. Yi, Efficient indexing of interval time sequences, Inform. Process. Lett., 109 (2008), 1-12.  doi: 10.1016/j.ipl.2008.08.003. [21] H. Ryang and U. Yun, High utility pattern mining over data streams with sliding window technique, Expert Syst. Appl., 57 (2016), 214-231.  doi: 10.1016/j.eswa.2016.03.001. [22] M. J. Safari, F. A. Davani and H. Afarideh, Discrete fourier transform method for discrimination of digital scintillation pulses in mixed neutron-gamma fields, IEEE Transactions on Nuclear Science, 63 (2016), 325-332.  doi: 10.1109/TNS.2016.2514400. [23] D. Schultz and B. Jain, Nonsmooth analysis and subgradient methods for averaging in dynamic time warping spaces, Pattern Recognition, 74 (2018), 340-358.  doi: 10.1016/j.patcog.2017.08.012. [24] Y. Sun, J. Li and J. Liu, An improvement of symbolic aggregate approximation distance measure for time series, Neurocomputing, 102 (2014), 189-198.  doi: 10.1016/j.neucom.2014.01.045. [25] M. Vafaeipour, O. Rahbari and M. A. Rosen, et al., Application of sliding window technique for prediction of wind velocity time series, Inter. J. Energy and Environmental Engineering, 5 (2014), 105-116.  doi: 10.1007/s40095-014-0105-5. [26] Y. Xue, X. Mei and Y. Zhi, Method of subway health status recognition based on time series data mining, Information Sciences, 38 (2018), 905-910. [27] G. Yuan, P. Sun and J. Zhao, A review of moving object trajectory clustering algorithms, Artificial Intelligence Review, 47 (2017), 1-22.  doi: 10.1007/s10462-016-9477-7. [28] R. Yao, G. Lin and Q. Shi, Efficient dense labelling of human activity sequences from wearables using fully convolutional networks, Pattern Recognition, 78 (2017), 221-232.  doi: 10.1016/j.patcog.2017.12.024. [29] S. Yue, Y. Li and Q. Yang, Comparative analysis of core loss calculation methods for magnetic materials under no sinusoidal excitations, IEEE Transactions on Magnetics, 54 (2018), 1-5.  doi: 10.1109/TMAG.2018.2842064. [30] U. Yun, D. Kim, H. Ryang, G. Lee and K. Lee, Mining recent high average utility patterns based on sliding window from stream data, J. Intelligent and Fuzzy Syst., 30 (2016), 3605-3617.  doi: 10.3233/IFS-162106. [31] U. Yun, D. Kim, E. Yoon and H. Fujita, Damped window based high average utility pattern mining over data streams, Knowl.-Based Syst., 144 (2018), 188-205.  doi: 10.1016/j.knosys.2017.12.029. [32] U. Yun and G. Lee, Sliding window based weighted erasable stream pattern mining for stream data applications, Future Generation Comp. Syst., 59 (2016), 1-20.  doi: 10.1016/j.future.2015.12.012. [33] U. Yun, G. Lee and K. Ryu, Mining maximal frequent patterns by considering weight conditions over data streams, Knowl.-Based Syst., 55 (2014), 49-65.  doi: 10.1016/j.knosys.2013.10.011. [34] U. Yun, G. Lee and E. Yoon, Efficient high utility pattern mining for establishing manufacturing plans with sliding window control, IEEE Trans. Industrial Electronics, 64 (2017), 7239-7249.  doi: 10.1109/TIE.2017.2682782. [35] M. Zhu, D. G. M. Mitchell and M. Lentmaier, Braided convolutional codes with sliding window decoding, IEEE Trans. on Communications, 65 (2017), 3645-3658.  doi: 10.1109/TCOMM.2017.2707073.

Figures(12)

Tables(4)