Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of $ k $-dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.
Citation: |
Figure 1. Samples of firn Micro-CT images at different depths. Lighter grey regions represent ice-space and darker regions represent pore-space in each image. In this work, there are 14 firn samples (not columns) in total, the depths of which range from 7 to 78 m in the firn column. Only select samples from that range of depths are shown here. Each sample consists of over 900 cross-sectional slices of 2D grayscale images of size $ 400\times400 $ pixels. In other words, each sample is a 3D image of dimension $ 400\times 400\times 900 $ voxels. We refer readers to our github page (https://github.com/azlawson/Firn) for a visualization of these 3D images. Only the 300th, 450th, and 600th slices from each sample's stack of slices are shown here. Counts of contiguous ice and pore-space regions, respectively, as determined by an expert in firn, are shown in parentheses
Figure 2. Motivational example of sublevel set filtration and the corresponding persistence diagram. (a) The original grayscale image $ f $. (b)-(i) Sublevel set of $ f $ at different values of threshold $ t $. We color the sublevel set as white. (j) Labels for the rice grains. We used GUDHI[9] to calculate persistence and track generators. Red points are the generators corresponding to the purple points in (k), and the green one is the infinite generator in (k). (k) 0-dimensional persistence diagram, $ D_0 $. The purple points correspond to features with red labels in (j). (l) Illustration of the lifespan cutoff with selected points in purple. (m) Illustration of the PD Thresholding method with selected points in purple
Figure 4. Application of DPFD to the rice example shown in Figure 2. The parameters are $ M = 20 $, $ q = 0.1 $, and $ L_0 = 50 $. Purple (darker) points are detected by the DPFD, and the exact 79 features are detected
Figure 5. Application of lifespan cutoff to the rice example shown in Figure 2. Purple (darker) points are $ D_0^{83} $. $ \#D_0^{83} = 78 $. Missing and mislabeled features are highlighted by green circles
Figure 10. An example of DPFD on a firn image. The image is tagged based on the selected diagram points. From left to right are: the persistence diagram computed on the original image, the persistence diagram computed on the preprocessed image, and the original image tagged by points corresponding to the features identified by DPFD
Figure 12. Left: The average count of ice regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of ice regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. Right: The difference between the true number of ice regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
Figure 13. Left: The average count of pore-space regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of pore-space regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. These values represent the number of contiguous pore-space regions with depth. Right: The difference between the true number of pore-space regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
[1] | R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag and S. Weinberger, Persistent homology for random fields and complexes, in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Inst. Math. Stat. (IMS) Collect., 6, Inst. Math. Statist., Beachwood, OH, 2010,124–143. doi: 10.1214/10-IMSCOLL609. |
[2] | N. Atienza, R. Gonzalez-Diaz and M. Rucco, Persistent entropy for separating topological features from noise in vietoris-rips complexes, J. Intelligent Information Systems, 52 (2019), 637-655. doi: 10.1007/s10844-017-0473-4. |
[3] | N. Atienza, R. Gonzalez-Diaz and M. Rucco, Separating topological noise from features using persistent entropy, in Software Technologies: Applications and Foundations, Lecture Notes in Computer Science, 9946, Springer, Cham, 2016, 3–12. doi: 10.1007/978-3-319-50230-4_1. |
[4] | G. Carlsson, Topology and data, Bull. Amer. Math. Soc.(N.S.), 46 (2009), 255-308. doi: 10.1090/S0273-0979-09-01249-X. |
[5] | F. Chazal, V. de Silva and S. Oudot, Persistence stability for geometric complexes, Geom. Dedicata, 173 (2014), 193-214. doi: 10.1007/s10711-013-9937-z. |
[6] | F. Chazal and B. Michel, An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists, preprint, arXiv: 1710.04019. |
[7] | Y.-M. Chung and S. Day, Topological fidelity and image thresholding: A persistent homology approach, J. Math. Imaging Vision, 60 (2018), 1167-1179. doi: 10.1007/s10851-018-0802-4. |
[8] | D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103-120. doi: 10.1007/s00454-006-1276-5. |
[9] | P. Dlotko, Cubical complex, in GUDHI User and Reference Manual, GUDHI Editorial Board, 3.3.0 edition, 2020. Available from: https://gudhi.inria.fr/. |
[10] | H. Edelsbrunner and J. L. Harer, Computational Topology. An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. |
[11] | M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in KDD-96 Proceedings, 1996,226–231. |
[12] | B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42 (2014), 2301-2339. doi: 10.1214/14-AOS1252. |
[13] | A. Garin and G. Tauzin, A topological "reading" lesson: Classification of MNIST using TDA, 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, 2019, 1551–1556. doi: 10.1109/ICMLA.2019.00256. |
[14] | R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc., 45 (2008), 61-75. doi: 10.1090/S0273-0979-07-01191-3. |
[15] | J. F. Jardine, Data and homotopy types, preprint, arXiv: 1908.06323. |
[16] | T. Kaczynski, K. Mischaikow and M. Mrozek, Computational Homology, Applied Mathematical Sciences, 157, Springer-Verlag, New York, 2004. doi: 10.1007/b97315. |
[17] | K. Keegan, M. R. Albert and I. Baker, The impact of ice layers on gas transport through firn at the North Greenland Eemian Ice Drilling (NEEM) site, Greenland, The Cryosphere, 8 (2014), 1801-1806. doi: 10.5194/tc-8-1801-2014. |
[18] | A. Landais, J. M. Barnola, K. Kawamura, N. Caillon and M. Delmotte, et al., Firn-air $\delta$15n in modern polar sites and glacial–interglacial ice: A model-data mismatch during glacial periods in Antarctica?, Quaternary Science Reviews, 25 (2006), 49-62. doi: 10.1016/j.quascirev.2005.06.007. |
[19] | M. Lesnick and M. Wright, Interactive visualization of 2-D persistence modules, preprint, arXiv: 1512.00180. |
[20] | J. M. D. Lundin, C. M. Stevens, R. Arthern, C. Buizert and A. Orsi, et al., Firn model intercomparison experiment (FirnMICE), J. Glaciology, 63 (2017), 401-422. doi: 10.1017/jog.2016.114. |
[21] | I. Obayashi, Y. Hiraoka and M. Kimura, Persistence diagrams with linear machine learning models, J. Appl. Comput. Topol., 1 (2018), 421-449. doi: 10.1007/s41468-018-0013-5. |
[22] | N. Otter, M. A. Porter, U. Tillmann, P. Grindrod and H. A. Harrington, A roadmap for the computation of persistent homology, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0109-5. |
[23] | A. Patania, F. Vaccarino and G. Petri, Topological analysis of data, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0104-x. |
[24] | J. Schwander and B. Stauffer, Age difference between polar ice and the air trapped in its bubbles, Nature, 311 (1984), 45-47. doi: 10.1038/311045a0. |
[25] | L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501-535. doi: 10.1146/annurev-statistics-031017-100045. |
[26] | A. Zomorodian, Topological data analysis, in Advances in Applied and Computational Topology, Proc. Sympos. Appl. Math., 70, Amer. Math. Soc., Providence, RI, 2012, 1–39. doi: 10.1090/psapm/070/587. |
Motivational example of sublevel set filtration and the corresponding persistence diagram. (a) The original grayscale image
Illustration of the DPFD method. The yellow ellipses represent clusters that are labelled to be noise, while the purple cluster is labelled as features since its minimum lifespan is above the designated cluster cutoff
Application of DPFD to the rice example shown in Figure 2. The parameters are
Application of lifespan cutoff to the rice example shown in Figure 2. Purple (darker) points are
Precision and Recall plots for various
Rice grain count errors for various
The process for the DPFD model. From left to right we have the original image, the image resulting from the application of median blur and the morphological opening, the persistence diagram of the preprocessed image, and finally the application of DPFD
A plot of the computed optimal
An example of DPFD on a firn image. The image is tagged based on the selected diagram points. From left to right are: the persistence diagram computed on the original image, the persistence diagram computed on the preprocessed image, and the original image tagged by points corresponding to the features identified by DPFD
An example of DPFD over-labelling ice on a firn image
Left: The average count of ice regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of ice regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. Right: The difference between the true number of ice regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
Left: The average count of pore-space regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of pore-space regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. These values represent the number of contiguous pore-space regions with depth. Right: The difference between the true number of pore-space regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models