Distribution | Gussian | Uniform | Rayleigh | Exponential |
$ \rho = \bar{L}/\tilde{L} $ | $ 1.154 \pm 0.012 $ | $ 1.000 \pm 0.010 $ | $ 1.136 \pm 0.013 $ | $ 1.265 \pm 0.016 $ |
We introduce a novel method for Additive Noise Analysis for Persistence Thresholding (ANAPT) which separates significant features in the sublevel set persistence diagram of a time series based on a statistics analysis of the persistence of a noise distribution. Specifically, we consider an additive noise model and leverage the statistical analysis to provide a noise cutoff or confidence interval in the persistence diagram for the observed time series. This analysis is done for several common noise models including Gaussian, uniform, exponential, and Rayleigh distributions. ANAPT is computationally efficient, does not require any signal pre-filtering, is widely applicable, and has open-source software available. We demonstrate the functionality of ANAPT with both numerically simulated examples and an experimental data set. Additionally, we provide an efficient $ \Theta(n\log(n)) $ algorithm for calculating the zero-dimensional sublevel set persistence homology.
Citation: |
Figure 2. Sublevel set persistence applied to $ x(t) $ of a single variable function or time series with and without additive noise $ \epsilon $ from $ \mathcal{N} $, shown in red and blue, respectively. This demonstrates the stability of persistent homology with the time series (left) with and without additive noise and the small effect on the resulting persistence diagrams (right). In addition, the light red region separates the significant features from those associated to additive noise
Figure 6. Numeric function fitting of Equationk__ge (52) to the mean of the median lifetime $ \tilde{L} $ of $ f_i(t) $ for $ i \in [1,3] $ where $ \mathcal{N} $ is unit variance Gaussian additive noise with $ \delta \in [0,2] $ being incremented to understand the effects of signal on the median lifetime
Figure 10. Numerical simulations with $ 10^5 $ data points for each of the four investigated noise models with probability densities on the top row and lifetime probability densities on the bottom row with their associated cutoff $ C_\alpha $. The distribution parameters were set as $ \sigma = \Delta = \lambda = \sigma = 1 $ and were estimated as $ \sigma \approx 0.996 $, $ \Delta \approx 1.013 $, $ \lambda \approx 1.009 $, and $ \sigma \approx 0.998 $ for the Gaussian, uniform, Rayleigh, and exponential distributions, respectively
Figure 11. Example cutoff calculation for time series $ x(t) = A(\sin(\pi t) + \sin(t)) + \epsilon $, where $ \epsilon $ is additive noise from a Gaussian distribution with zero mean and four different standard deviations as $ \sigma \in [0.2, 1, 2, 4] $. The resulting sublevel set persistence diagrams with cutoff $ C_\alpha^* $ are shown to the right
Figure 13. Histogram of the highlighted section of signal in Fig. 12 with fitted Rayleigh distribution
Table 1.
Ratios
Distribution | Gussian | Uniform | Rayleigh | Exponential |
$ \rho = \bar{L}/\tilde{L} $ | $ 1.154 \pm 0.012 $ | $ 1.000 \pm 0.010 $ | $ 1.136 \pm 0.013 $ | $ 1.265 \pm 0.016 $ |
Table 2. Constants of Equation (52) for each distribution type investigated in this work with associated uncertainty from ten trials
Distribution | Gussian | Uniform | Rayleigh | Exponential |
$ c_1 $ | $ 0.845 \pm 0.029 $ | $ 0.880 \pm 0.017 $ | $ 0.726 \pm 0.026 $ | $ 0.436 \pm 0.036 $ |
$ c_2 $ | $ 0.809 \pm 0.061 $ | $ 0.639 \pm 0.026 $ | $ 0.605 \pm 0.054 $ | $ 0.393 \pm 0.075 $ |
Table Algorithm 1. Zero-Dimensional Persistence Algorithm
Data: Array A of n Real Numbers Result: Persistence Diagram |
$\begin{array}{l} {\bf{begin}}\\ \;\;\left| \begin{array}{l} \;\;\;\;M = {\rm{list}}\;{\rm{of}}\;{\rm{local}}\;{\rm{extrema}}\;{\rm{in}}\;A\left( {{\rm{non - endpoint}}\;{\rm{maxima}}\;{\rm{and}}\;{\rm{minima}}} \right);\\ \;\;\;\;Q = {\rm{priority}}\;{\rm{queue}}\;{\rm{of}}\;{\rm{pointers}}\;{\rm{to}}\;{\rm{elements}}\;{\rm{of}}\;M;\\ \;\;\;\;{\bf{while}}\;\left| M \right| > 3\;{\bf{do}}\\ \;\;\;\;\;\;\left| \begin{array}{l} m \leftarrow Q.pop;\\ b \leftarrow {\rm{min}}\left\{ {m.height,m.next.height} \right\};\\ d \leftarrow {\rm{max}}\left\{ {m.height,m.next.height} \right\};\\ {\rm{Add}}\;(b,d){\rm{ to}}\;pairs;\\ {\rm{Update}}\;Q\;{\rm{and}}\;M; \end{array} \right.\\ \;\;\;\;{\bf{end}}\\ \;\;\;\;{\bf{return}}\;pairs \end{array} \right.\\ {\bf{end}} \end{array}$ |
[1] | N. Atienza, R. Gonzalez-Diaz and M. Rucco, Persistent entropy for separating topological features from noise in vietoris-rips complexes, Journal of Intelligent Information Systems, 52 (2017), 637-655. doi: 10.1007/s10844-017-0473-4. |
[2] | G. E. P. Box, G. M. Jenkins, G. C. Reinsel and G. M. Ljung, Time Series Analysis: Forecasting and Control, John Wiley & Sons, 2016. |
[3] | P. Bubenik, Statistical topological data analysis using persistence landscapes, Journal of Machine Learning Research, 16 (2015), 77-102. |
[4] | P. Bühlmann and P. Buhlmann, Sieve bootstrap for time series, Bernoulli, 3 (1997), 123-148. doi: 10.2307/3318584. |
[5] | G. Carlsson, J. Gorham, M. Kahle and J. Mason, Computational topology for configuration spaces of hard disks,, Physical Review E, 85 (2012), 011303. doi: 10.1103/PhysRevE. 85.011303. |
[6] | F. Chazal, V. De Silva, M. Glisse and S. Oudot, The Structure and Stability of Persistence Modules, SpringerBriefs in Mathematics. Springer, [Cham], 2016. doi: 10.1007/978-3-319-42545-0. |
[7] | F. Chazal, B. T. Fasy, F. Lecci, B. Michel, A. Rinaldo and L. Wasserman, Robust topological inference: Distance to a measure and kernel distance, J. Mach. Learn. Res., 18 (2018), Paper No. 159, 40 pp. |
[8] | F. Chazal, B. T. Fasy, F. Lecci, A. Rinaldo, A. Singh and L. Wasserman, On the bootstrap for persistence diagrams and landscapes, arXiv preprint, arXiv: 1311.0376, 2013. |
[9] | S. Chowdhury and F. Mémoli, Convergence of hierarchical clustering and persistent homology methods on directed networks, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 1152–1169, SIAM, Philadelphia, PA, 2018. doi: 10.1137/1.9781611975031.75. |
[10] | D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete & Computational Geometry, 37 (2006), 103-120. doi: 10.1007/s00454-006-1276-5. |
[11] | S. Czesla, T. Molle and J. H. M. M. Schmitt, A posteriori noise estimation in variable data sets, Astronomy & Astrophysics, 609 (2018), A39. doi: 10.1051/0004-6361/201730618. |
[12] | C. J. A. Delfinado and H. Edelsbrunner, An incremental algorithm for Betti numbers of simplicial complexes on the 3-sphere, Computer Aided Geometric Design, 12 (1995), 771-784. doi: 10.1016/0167-8396(95)00016-Y. |
[13] | M. Dindin, Y. Umeda and F. Chazal, Topological data analysis for arrhythmia detection through modular neural networks, In Advances in Artificial Intelligence, 2020, 177–188. doi: 10.1007/978-3-030-47358-7_17. |
[14] | H. Edelsbrunner and J. Harer, Computational Topology - an Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. |
[15] | H. Edelsbrunner and J. Harer, Persistent homology-a survey, Contemporary Mathematics, 453 (2008), 257-282. doi: 10.1090/conm/453/08802. |
[16] | H. Edelsbrunner, D. Letscher and A. Zomorodian, Topological persistence and simplification, Discrete & Computational Geometry, 28 (2002), 511-533. doi: 10.1007/s00454-002-2885-2. |
[17] | B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh and et al., Confidence sets for persistence diagrams, The Annals of Statistics, 42 (2014), 2301-2339. doi: 10.1214/14-AOS1252. |
[18] | S. Gholizadeh and W. Zadrozny, A short survey of topological data analysis in time series and systems analysis, arXiv preprint, arXiv: 1809.10745, 2018. |
[19] | J. Hu, J. B. Gao and K. D. White, Estimating measurement noise in a time series by exploiting nonstationarity, Chaos, Solitons & Fractals, 22 (2004), 807-819. doi: 10.1016/j.chaos.2004.02.061. |
[20] | F. A. Khasawneh and E. Munch, Utilizing topological data analysis for studying signals of time-delay systems, Time Delay Systems, Adv. Delays Dyn., Springer, Cham, 7 (2017), 93–106. |
[21] | F. A. Khasawneh and E. Munch, Topological data analysis for true step detection in periodic piecewise constant signals, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science, 474 (2018), 20180027, 24pp. doi: 10.1098/rspa. 2018.0027. |
[22] | F. A. Khasawneh, E. Munch and J. A. Perea, Chatter classification in turning using machine learning and topological data analysis, IFAC-PapersOnLine, 51 (2018), 195–200. doi: 10.1016/j. ifacol. 2018.07.222. |
[23] | P. Lawson, A. B. Sholl, J. Quincy Brown, B. Terese Fasy and C. Wenk, Persistent homology for the quantitative evaluation of architectural features in prostate cancer histology, Scientific Reports, 9 (2019), Article number: 1139. doi: 10.1038/s41598-018-36798-y. |
[24] | H. Lee, H. Kang, M. K. Chung, B. -N. Kim and D. S. Lee, Persistent brain network homology from the perspective of dendrogram, IEEE Transactions on Medical Imaging, 31 (2012), 2267–2277. doi: 10.1109/TMI. 2012.2219590. |
[25] | E. Munch, A user's guide to topological data analysis, Journal of Learning Analytics, 4, 2017. doi: 10.18608/jla. 2017.42.6. |
[26] | A. Myers and F. A. Khasawneh, On the automatic parameter selection for permutation entropy, Chaos: An Interdisciplinary Journal of Nonlinear Science, 30 (2020), 033130, 17 pp. doi: 10.1063/1.5111719. |
[27] | A. Myers, E. Munch and F. A Khasawneh, Persistent homology of complex networks for dynamic state detection, Phys. Rev. E, 100 (2019), 022314, 14 pp. doi: 10.1103/physreve. 100.022314. |
[28] | A. Otto, M. C. Yesilli and F. A. Khasawneh, Topological feature vectors for chatter detection in turning processes, The International Journal of Advanced Manufacturing Technology, 2022. arXiv: 1905.08671. doi: 10.1007/s00170-021-08242-5. |
[29] | J. A. Perea, A brief history of persistence., |
[30] | J. A. Perea and J. Harer, Sliding windows and persistence: An application of topological methods to signal analysis, Foundations of Computational Mathematics, 15 (2015), 799-838. doi: 10.1007/s10208-014-9206-z. |
[31] | D. Petrushenko and F. A. Khasawneh, Uncertainty propagation of system parameters to the dynamic response: An application to a benchtop pendulum, ASME 2017 International Mechanical Engineering Congress and Exposition, 2018, 10 pages. doi: 10.1115/IMECE2017-71105. |
[32] | D. N. Politis and J. P. Romano, The stationary bootstrap, Journal of the American Statistical Association, 89 (1994), 1303-1313. doi: 10.1080/01621459.1994.10476870. |
[33] | M. Rucco, F. Castiglione, E. Merelli and M. Pettini, Characterisation of the idiotypic immune network through persistent entropy, In Proceedings of ECCS 2014, Springer International Publishing, 2016, 117–128. doi: 10.1007/978-3-319-29228-1_11. |
[34] | F. Takens, Detecting strange attractors in turbulence, Dynamical Systems and Turbulence, Warwick 1980 (Coventry, 1979/1980), 366–381, Lecture Notes in Math., 898, Springer, Berlin-New York, 1981. |
[35] | C. J. Tralie and J. A. Perea, (Quasi)periodicity quantification in video data, using topology, SIAM Journal on Imaging Sciences, 11 (2018), 1049–1077. doi: 10.1137/17M1150736. |
[36] | K. Urbanowicz and J. A. Hołyst, Noise-level estimation of time series using coarse-grained entropy, Physical Review E, 67 (2003), 046218. doi: 10.1103/PhysRevE.67.046218. |
[37] | X. Wan, W. Wang, J. Liu and T. Tong, Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range, BMC Medical Research Methodology, 14 (2014), Article number: 135. doi: 10.1186/1471-2288-14-135. |
[38] | G. R. Wood and B. P. Zhang, Estimation of the lipschitz constant of a function, Journal of Global Optimization, 8 (1996), 91-103. doi: 10.1007/BF00229304. |
Persistence diagram summarizing the sublevel sets of a function
Sublevel set persistence applied to
Histograms
Additive noise probability distributions
Example time series showing sample
Numeric function fitting of Equationk__ge (52) to the mean of the median lifetime
Demonstration of distribution parameter
Comparison between the persistent homology, bootstrap resampling, and ANAPT methods for separating noise from topological signal in the sublevel set persistence diagram. The example signal is generated from a chaotic Lorenz system
Comparison of distribution parameter estimation techniques (low-Pass filter residuals, cubic spline residuals, and sublevel set persistence) for estimating
Numerical simulations with
Example cutoff calculation for time series
Experimental time series from free drop response of single pendulum with corresponding cutoffs in the persistence diagram and time-order lifetimes plot
Histogram of the highlighted section of signal in Fig. 12 with fitted Rayleigh distribution