\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Supervised time series classification for anomaly detection in subsea engineering

  • *Corresponding author: Halvor Snersrud Gustad

    *Corresponding author: Halvor Snersrud Gustad 
Abstract / Introduction Full Text(HTML) Figure(24) / Table(8) Related Papers Cited by
  • Time series classification is of significant importance in monitoring structural systems. In this work, we investigate the use of supervised machine learning classification algorithms on simulated data based on a physical system with two states: Intact and Broken. We provide a comprehensive discussion of the preprocessing of temporal data, using measures of statistical dispersion and dimension reduction techniques. We present an intuitive baseline method and discuss its efficiency. We conclude with a comparison of the various methods based on different performance metrics, showing the advantage of using machine learning techniques as a tool in decision making.

    Mathematics Subject Classification: Primary: 62M10, Secondary: 62P30, 68T07.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Stack with sensors and corresponding data

    Figure 2.  Two 1-hour simulations from the dataset comparing a broken and intact well under similar conditions. Plots are given for the $ x $ and $ y $ component of the different physical measurements. The two top rows give the time series while the bottom row shows phase plots

    Figure 3.  Pair plot showing of the scatter and distribution of data after a standard deviation transform (left). Plot visualizing the transformed data in 3 dimensions (right)

    Figure 4.  Pair plot of the data after using aforementioned covariance transform. For certain combinations the broken and intact cases separate quite well

    Figure 5.  Ratio each component explains

    Figure 6.  Left: Visualization of the lines capturing the relation between the standard deviation of accelerations in the flex-joint and wellhead bending moments using linear regression. The lines are meant to be displayed on a vessel's monitor and gradually fade over time highlighting the most recent behaviour. Right: Distribution of data for the baseline method

    Figure 7.  Classification of the STD data from the Noise 1 data set with 3 principal components

    Figure 8.  Classification of the COV data from the Noise 1 data set with 4 principal components. The 3D visualization is made with 3 components

    Figure 9.  Example of a horizontal decision tree with depth 3. Node 1 is the parent node of nodes 2 and 3

    Figure 10.  Decision tree algorithm illustrated as in [15]

    Figure 11.  The effect of post-pruning in the reduction of overfitting. Scenario: Noise 50, COV-PCA(4), Gini (bottom-right block of Table 3.)

    Figure 12.  The effect of $ \texttt{ccp_alpha}$ on the structure and the accuracy of the tree. Scenario: Noise 1, COV-PCA(4), Entropy (marked in bold in Table 3.)

    Figure 13.  DT generated with entropy as splitting criterion on the data set consisting of the first four PCs of the COV data. Blue and orange are used for intact and broken, respectively. A light colour indicates a high entropy, an intense colour a low entropy.

    Figure 14.  The same DT as in Figure 13 post pruned with $ \texttt{ccp_alpha}$ = 0.01

    Figure 15.  Figure showing linear SVMs performance on dataset with STD transform (left column) or COV transform and 3 PCs (right column). Both are created from a subset of the data set containing only one physical direction

    Figure 16.  A typical one-dimensional CNN architecture

    Figure 17.  The figures illustrate the transformation of the input data by the CNN in both the training and test sets, under the three different noise scenarios. Prior to the output layer, which predicts the class, each individual time series is converted into a two-dimensional vector and can be visually represented as a point on a plane. In the case of Noise 1 and Noise 10, the data points belonging to the two categories form separate clusters

    Figure 18.  Confusion matrix used to evaluate the performance of the classification techniques

    Figure 19.  Number of analyses for fixed configurations. In red is the combination of configurations that we analyze in this work

    Figure 20.  To the left a pair plot of the data after using aforementioned standard deviation transform on wells with a tight wellhead housing. For certain combinations the broken and intact cases separate quite well. To the right a 3D plot showing the spread of the data

    Figure 21.  To the left a pair plot of the data after using aforementioned standard deviation transform on wells with a slack wellhead housing. For certain combinations the broken and intact cases separate quite well. To the right a 3D plot showing the spread of the data

    Figure 22.  Pair plot of the data after using aforementioned covariance transform on wells with a tight wellhead housing. For certain combinations the broken and intact cases separate quite well

    Figure 23.  Pair plot of the data after using aforementioned covariance transform on wells with a slack wellhead housing. For certain combinations the broken and intact cases separate quite well

    Figure LISTING 1.  Architecture of the CNN used in the eperiments of Section 7

    Table 1.  List of abbreviations and notations

    Nomenclature
    accx, accy x and y component of the acceleration
    ASM Attribute Selection Measure
    bmx, bmy x and y component of the bending moment
    BOP Blowout Preventer
    CNN Convolutional Neural Network
    DAS Data Acquisition System
    DT Decision Tree
    DWS Deep Water Strain
    FJ Flex Joint
    LogR Logistic Regression
    ML Machine Learning
    MLP Multi-layer Perceptron
    PCA Principal Component Analysis
    SMU Subsea Motion Units
    STD Standard Deviation
    SVD Singular Value Decomposition
    SVM Support Vector Machine
    TSC Time Series Classification
    WLR Wire Load Relief
     | Show Table
    DownLoad: CSV

    Table 2.  Accuracy of LogR-PCA applied to the STD and COV data from the different data sets with different number of PCs. In bold are marked the scenarios that will be reported in Table 6 for comparison purposes

    Data set and data transformation Accuracy (%)
    Number of PCs
    1 2 3 4 5 6 7
    Noise 1 STD 55.99 54.53 69.26 69.17 98.46 98.62 -
    COV 55.24 55.56 65.88 99.69 99.84 100 100
    Noise 10 STD 55.66 54.53 69.17 69.17 98.14 98.14 -
    COV 55.56 55.87 64.16 99.53 99.84 99.84 99.84
    Noise 50 STD 54.29 54.21 68.77 69.01 89.97 91.26 -
    COV 55.56 56.81 54.93 79.34 91.06 95.62 96.09
     | Show Table
    DownLoad: CSV

    Table 3.  Performance of DTs tested on different scenarios. In bold are marked the scenarios that will be reported in Table 6 for comparison purposes

    Data set
    Noise 1 Noise 10 Noise 50
    Data transformation Splitting criterion DT prun. Depth # of nodes Train acc. (%) Test acc. (%) Depth # of nodes Train acc. (%) Test acc. (%) Depth # of nodes Train acc. (%) Test acc. (%)
    STD Entropy no 15 454 100 93.69 16 480 100 93.45 19 1018 100 84.39
    pre 13 416 99.47 93.61 13 428 99.37 93.93 12 782 97.17 84.87
    post 12 154 95.57 91.59 12 142 95.31 91.99 10 128 87.15 83.5
    Gini no 15 530 100 93.45 14 600 100 93.37 19 1078 100 83.25
    pre 13 520 99.84 93.61 13 556 99.55 93.28 10 686 95.47 83.25
    post 10 116 93.75 89.56 10 106 91.42 90.13 9 82 85.17 81.96
    COV Entropy no 6 48 100 98.28 8 54 100 98.9 11 118 100 94.99
    pre 5 44 99.57 98.28 5 44 98.98 98.75 5 50 95.54 91.86
    post 6 22 98.83 97.97 6 26 98.94 98.59 6 24 95.61 92.8
    Gini no 7 73 100 98.9 9 80 100 98.59 10 136 100 95.62
    pre 6 60 99.61 99.06 6 62 99.41 98.28 6 76 97.65 95.77
    post 6 26 98.63 98.44 5 24 98.16 98.28 7 32 97.06 95.15
    COV-PCA(4) Entropy no 9 48 100 99.22 8 44 100 98.75 26 792 100 78.72
    pre 5 30 99.61 99.06 7 40 99.92 98.75 6 102 83.95 82.79
    post 4 12 99.26 98.75 4 14 98.86 98.9 10 66 83.4 82.0
    Gini no 9 50 100 98.9 7 58 100 99.37 20 820 100 77.93
    pre 5 34 99.65 98.9 7 58 100 99.37 7 206 87.67 80.44
    post 4 12 99.14 98.75 4 12 98.94 99.06 6 24 81.4 79.97
     | Show Table
    DownLoad: CSV

    Table 4.  Accuracy for linear SVM and RBF SVM applied to the noisy test sets. The number of support vectors for the SVM with the RBF kernel is given in the SV columns. An asterisk (*) indicates that only one physical direction was used from the sensors. In bold are marked the scenarios that will be reported in Table 6 for comparison purposes

    Data trans. (#PCs) Noise 1 Noise 10 Noise 50
    Linear RBF Linear RBF Linear RBF
    Acc. Acc. SV Acc. Acc. SV Acc. Acc. SV
    STD(3)* 0.940 0.950 1264 0.866 0.874 1866 0.650 0.668 3591
    COV(3)* 0.986 0.986 465 0.974 0.987 568 0.928 0.923 1066
    COV(4)* 0.983 0.990 418 0.988 0.984 441 0.927 0.940 980
    COV(6)* 0.994 0.999 364 0.983 0.994 444 0.933 0.942 954
    STD(6) 0.978 0.983 969 0.926 0.942 1345 0.682 0.726 3239
    COV(6) 0.988 0.993 621 0.982 0.994 616 0.946 0.958 992
    COV(7) 0.993 0.998 484 0.993 0.996 481 0.953 0.970 853
    COV(21) 0.999 1.000 462 0.996 0.998 519 0.947 0.972 923
     | Show Table
    DownLoad: CSV

    Table 5.  Combination of hyperparameters yielding the best results in each scenario, corresponding to the plots in Figure 17, after conducting 100 trials with $ \texttt{Optuna}$

    Selected hyperparameters
    Noise 1 Noise 10 Noise 50
    activation function LeakyReLU LeakyReLU Swish
    learning rate $ \eta $ $ 2.562 \cdot 10^{-2} $ $ 2.102 \cdot 10^{-3} $ $ 1.017 \cdot 10^{-2} $
    weight decay $ 1.243 \cdot 10^{-5} $ $ 1.221 \cdot 10^{-5} $ $ 1.520 \cdot 10^{-7} $
    batch size 30 10 30
    MSE train $ 8.856 \cdot 10^{-6} $ $ 5.968 \cdot 10^{-5} $ $ 6.068 \cdot 10^{-4} $
    MSE test $ 2.815 \cdot 10^{-5} $ $ 3.054 \cdot 10^{-4} $ $ 2.427 \cdot 10^{-3} $
     | Show Table
    DownLoad: CSV

    Table 6.  Performance of the methods. Given the high scoring of the classical ML algorithms on the full data set they are here compared using $ 4 $ PCs of the COV-transformed data set

    Data set Method Precision Recall F1 Score Train Time (ms) Test Time (ms)
    Noise 1 LogR-PCA 0.997 0.997 0.997 10.195 0.990
    DT-PCA 0.997 0.987 0.992 6.662 0.998
    SVM-PCA 0.990 0.990 0.990 133.799 51.615
    CNN 1.000 1.000 1.000 $ \sim $ 3 min 30.535
    Noise 10 LogR-PCA 0.997 0.994 0.995 12.408 1.001
    DT-PCA 1.000 0.987 0.993 5.207 0.999
    SVM-PCA 0.988 0.988 0.988 24.639 3.003
    CNN 1.000 1.000 1.000 $ \sim $ 3 min 27.133
    Noise 50 LogR-PCA 0.808 0.750 0.778 11.026 1.016
    DT-PCA 0.830 0.808 0.819 10.910 0.994
    SVM-PCA 0.940 0.940 0.940 212.493 106.985
    CNN 0.995 1.000 0.998 $ \sim $ 4 min 49.181
     | Show Table
    DownLoad: CSV

    Table 7.  Hyperparameter ranges for the pre-pruning and choice of the $ \alpha $ for the post-pruning of the DTs, used to obtain the results reported in Table 3. * except for the Noise 50 data set where $ \alpha $ = 0.003

    Pre-pruning Post-pruning
    Data transformation Splitting criterion Hyperparameter Range $\alpha $
    STD Entropy $\texttt{max_depth}$ $ [2, 13]\cap\mathbb{N} $
    $\texttt{min_samples_split}$ $ [2, 4]\cap\mathbb{N} $ 0.003
    $\texttt{min_samples_leaf}$ $ [1, 2]\cap\mathbb{N} $
    Gini $\texttt{max_depth}$ $ [2, 13]\cap\mathbb{N} $
    $\texttt{min_samples_split}$ $ [2, 4]\cap\mathbb{N} $ 0.002
    $\texttt{min_samples_leaf}$ $ [1, 2]\cap\mathbb{N} $
    COV Entropy $\texttt{max_depth}$ $ [2, 5]\cap\mathbb{N} $
    $\texttt{min_samples_split}$ $ [2, 4]\cap\mathbb{N} $ 0.01
    $\texttt{min_samples_leaf}$ $ [1, 2]\cap\mathbb{N} $
    Gini $\texttt{max_depth}$ $ [2, 6]\cap\mathbb{N} $
    $\texttt{min_samples_split}$ $ [2, 4]\cap\mathbb{N} $ 0.003
    $\texttt{min_samples_leaf}$ $ [1, 2]\cap\mathbb{N} $
    COV-PCA(4) Entropy $\texttt{max_depth}$ $ [2, 8]\cap\mathbb{N} $
    $\texttt{min_samples_split}$ $ [2, 4]\cap\mathbb{N} $ 0.01*
    $\texttt{min_samples_leaf}$ $ [1, 2]\cap\mathbb{N} $
    Gini $\texttt{max_depth}$ $ [2, 8]\cap\mathbb{N} $
    $\texttt{min_samples_split}$ $ [2, 4]\cap\mathbb{N} $ 0.003
    $\texttt{min_samples_leaf}$ $ [1, 2]\cap\mathbb{N} $
     | Show Table
    DownLoad: CSV

    Table 8.  Range of values allowed for each hyperparameter in the experiments with CNNs, with the third column describing how the values were explored using $ \texttt{Optuna}$

    Hyperparameter Range Distribution
    activation function {Tanh, Swish, Sigmoid, ReLU, LeakyReLU} discrete uniform
    learning rate $ [1 \cdot 10^{-4} , 1 \cdot 10^{-1}] $ log uniform
    weight decay $ [1 \cdot 10^{-7} , 5 \cdot 10^{-4}] $ log uniform
    batch size $ \{10, 30, 50,100\} $ discrete uniform
     | Show Table
    DownLoad: CSV
  • [1] T. Akiba, S. Sano, T. Yanase, T. Ohta and M. Koyama, Optuna: A next-generation hyperparameter optimization framework, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
    [2] D. Bank, N. Koenigstein and R. Giryes, Autoencoders, 2021.,
    [3] M. A. Belay, S. S. Blakseth, A. Rasheed and P. S. Rossi, Unsupervised Anomaly detection for IoT-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions, Sensors, 2023.
    [4] J. Berkson, Application of the Logistic Function to Bio-Assay, Journal of the American Statistical Association, 39 (1944), 357-365. 
    [5] B. E. Boser, I. M. Guyon and V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT '92, Association for Computing Machinery, New York, NY, USA (1992), 144-152.
    [6] M. M. Bronstein, J. Bruna, T. Cohen and P. Veličković, Geometric deep learning: Grids, Groups, Graphs, Geodesics, and Gauges, 2021.
    [7] J. Cadima and I. T. Jolliffe, Principal component analysis: A review and recent developments, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374 (2016), 20150202.  doi: 10.1098/rsta.2015.0202.
    [8] J. CervantesF. Garcia-LamontL. Rodríguez-Mazahua and A. Lopez, A comprehensive survey on support vector machine classification: Applications, challenges and trends, Neurocomputing, 408 (2020), 189-215. 
    [9] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1023/A:1022627411411.
    [10] H. I. FawazG. ForestierJ. WeberL. Idoumghar and P. A. Muller, Deep learning for time series classification: A review, Data Mining and Knowledge Discovery, 33 (2019), 917-963.  doi: 10.1007/s10618-019-00619-1.
    [11] F. E. Harrell, Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis, vol. 608, Springer, 2001.
    [12] C. F. Higham and D. J. Higham, Deep learning: An introduction for applied mathematicians, SIAM Review, 61 (2019), 860-891.  doi: 10.1137/18M1165748.
    [13] H. Hotelling, Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, 24 (), 417-441. 
    [14] D. W. H. Jr, S. Lemeshow and R. X. Sturdivant, Applied Logistic Regression, vol. 398, John Wiley & Sons, 2013.
    [15] H. Kim, Artificial Intelligence for 6G, Springer International Publishing, Cham, 2022.
    [16] D. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, in International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
    [17] Y. LeCunY. Bengio and G. Hinton, Deep learning, Nature, 521 (2015), 436-444. 
    [18] D. Lee, S. Malacarne and E. Aune, Vector quantized time series generation with a bidirectional prior model, in Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, (eds. F. Ruiz, J. Dy and J.-W. van de Meent), vol. 206 of Proceedings of Machine Learning Research, PMLR, (2023), 7665-7693.
    [19] R. J. Lewis, An Introduction to Classification and Regression Tree (CART) Analysis, in Annual Meeting of the Society for Academic Emergency Medicine in San Francisco, California, vol. 14, 2000.
    [20] F. T. Liu, K. M. Ting and Z.-H. Zhou, Isolation forest, in 2008 Eighth IEEE International Conference on Data Mining, (2008), 413-422.
    [21] S. Menard, Logistic Regression: From Introductory to Advanced Concepts and Applications, Sage, 2010.
    [22] F. Mola and R. Siciliano, A fast splitting procedure for classification trees, Statistics and Computing, 7 (1997), 209-216. 
    [23] J. N. Morgan and J. A. Sonquist, Problems in the Analysis of Survey Data, and a Proposal, Journal of the American Statistical Association, 58 (1963), 415-434. 
    [24] Orcina Ltd, Orcaflex, 2023, https://www.orcina.com/orcaflex/.
    [25] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library, in Advances in Neural Information Processing Systems, vol. 32, Curran Associates, Inc., 2019.
    [26] K. Pearson, On lines and planes of closest fit to systems of points in space, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2 (1901), 559-572. 
    [27] V. PodgorelecP. KokolB. Stiglic and I. Rozman, Decision trees: An overview and their use in medicine, Journal of Medical Systems, 26 (2002), 445-463. 
    [28] J. R. Quinlan, Induction of decision trees, Machine Learning, 1 (1986), 81-106. 
    [29] L. E. Raileanu and K. Stoffel, Theoretical Comparison between the Gini Index and Information Gain Criteria, Annals of Mathematics and Artificial Intelligence, 41 (2004), 77-93.  doi: 10.1023/B:AMAI.0000018580.96245.c6.
    [30] L. Rokach and O. Maimon, Top-Down Induction of Decision Trees Classifiers-A Survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35 (2005), 476-487. 
    [31] B. Schölkopf, R. Williamson, A. Smola, J. Shawe-Taylor and J. Platt, Support vector method for novelty detection, in Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS'99, MIT Press, Cambridge, MA, USA, (1999), 582-588.
    [32] C. Sensors, TILT - 57A DYNAMIC INCLINOMETER, Three-Axis Accelerometer, Three-Axis Gyroscope, 2024, https://ctisensors.com/products/tilt-5x-dynamic-inclinometer/.
    [33] R. H. Shumway and D. S. Stoffer, Time Series Analysis and Its Applications, Springer Cham, 2017. doi: 10.1007/978-3-319-52452-8.
    [34] Y. Y. Song and L. Ying, Decision tree methods: applications for classification and prediction, Shanghai Archives of Psychiatry, 27 (2015), 130. 
    [35] S. Tangirala, Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm, International Journal of Advanced Computer Science and Applications, 11 (2020), 612-619. 
    [36] A. VenkatasubramaniamJ. WolfsonN. MitchellT. BarnesM. Jaka and S. French, Decision trees in epidemiological research, Emerging Themes in Epidemiology, 14 (2017), 1-12. 
    [37] D. N. Veritas, Recommended Practice, Technical Report DNV-RP-E104, 2019.
  • 加载中

Figures(24)

Tables(8)

SHARE

Article Metrics

HTML views(1840) PDF downloads(160) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return