doi: 10.3934/fods.2021012

A density-based approach to feature detection in persistence diagrams for firn data

1. 

Program of Informatics and Analytics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA

2. 

Department of Mathematics, University of Maryland, College Park, MD 20742, USA

3. 

Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA

4. 

Department of Geological Sciences and Engineering, University of Nevada, Reno, Reno, NV 89557, USA

5. 

Department of Mathematics, William & Mary, Williamsburg, VA 23185, USA

* Corresponding author: Austin Lawson

Received  January 2021 Revised  April 2021 Published  May 2021

Fund Project: Hoffman and Chung were partially supported by NSF Grant DMS-1950549. Chung, Keegan, and Day were partially supported by the Army Research Office under Grant Number W911NF-20-1-0131. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein

Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of $ k $-dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.

Citation: Austin Lawson, Tyler Hoffman, Yu-Min Chung, Kaitlin Keegan, Sarah Day. A density-based approach to feature detection in persistence diagrams for firn data. Foundations of Data Science, doi: 10.3934/fods.2021012
References:
[1]

R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag and S. Weinberger, Persistent homology for random fields and complexes, in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Inst. Math. Stat. (IMS) Collect., 6, Inst. Math. Statist., Beachwood, OH, 2010,124–143. doi: 10.1214/10-IMSCOLL609.  Google Scholar

[2]

N. AtienzaR. Gonzalez-Diaz and M. Rucco, Persistent entropy for separating topological features from noise in vietoris-rips complexes, J. Intelligent Information Systems, 52 (2019), 637-655.  doi: 10.1007/s10844-017-0473-4.  Google Scholar

[3]

N. Atienza, R. Gonzalez-Diaz and M. Rucco, Separating topological noise from features using persistent entropy, in Software Technologies: Applications and Foundations, Lecture Notes in Computer Science, 9946, Springer, Cham, 2016, 3–12. doi: 10.1007/978-3-319-50230-4_1.  Google Scholar

[4]

G. Carlsson, Topology and data, Bull. Amer. Math. Soc.(N.S.), 46 (2009), 255-308.  doi: 10.1090/S0273-0979-09-01249-X.  Google Scholar

[5]

F. ChazalV. de Silva and S. Oudot, Persistence stability for geometric complexes, Geom. Dedicata, 173 (2014), 193-214.  doi: 10.1007/s10711-013-9937-z.  Google Scholar

[6]

F. Chazal and B. Michel, An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists, preprint, arXiv: 1710.04019. Google Scholar

[7]

Y.-M. Chung and S. Day, Topological fidelity and image thresholding: A persistent homology approach, J. Math. Imaging Vision, 60 (2018), 1167-1179.  doi: 10.1007/s10851-018-0802-4.  Google Scholar

[8]

D. Cohen-SteinerH. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103-120.  doi: 10.1007/s00454-006-1276-5.  Google Scholar

[9]

P. Dlotko, Cubical complex, in GUDHI User and Reference Manual, GUDHI Editorial Board, 3.3.0 edition, 2020. Available from: https://gudhi.inria.fr/. Google Scholar

[10]

H. Edelsbrunner and J. L. Harer, Computational Topology. An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069.  Google Scholar

[11]

M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in KDD-96 Proceedings, 1996,226–231. Google Scholar

[12]

B. T. FasyF. LecciA. RinaldoL. WassermanS. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42 (2014), 2301-2339.  doi: 10.1214/14-AOS1252.  Google Scholar

[13]

A. Garin and G. Tauzin, A topological "reading" lesson: Classification of MNIST using TDA, 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, 2019, 1551–1556. doi: 10.1109/ICMLA.2019.00256.  Google Scholar

[14]

R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc., 45 (2008), 61-75.  doi: 10.1090/S0273-0979-07-01191-3.  Google Scholar

[15]

J. F. Jardine, Data and homotopy types, preprint, arXiv: 1908.06323. Google Scholar

[16]

T. Kaczynski, K. Mischaikow and M. Mrozek, Computational Homology, Applied Mathematical Sciences, 157, Springer-Verlag, New York, 2004. doi: 10.1007/b97315.  Google Scholar

[17]

K. KeeganM. R. Albert and I. Baker, The impact of ice layers on gas transport through firn at the North Greenland Eemian Ice Drilling (NEEM) site, Greenland, The Cryosphere, 8 (2014), 1801-1806.  doi: 10.5194/tc-8-1801-2014.  Google Scholar

[18]

A. LandaisJ. M. BarnolaK. KawamuraN. Caillon and M. Delmotte, Firn-air $\delta$15n in modern polar sites and glacial–interglacial ice: A model-data mismatch during glacial periods in Antarctica?, Quaternary Science Reviews, 25 (2006), 49-62.  doi: 10.1016/j.quascirev.2005.06.007.  Google Scholar

[19]

M. Lesnick and M. Wright, Interactive visualization of 2-D persistence modules, preprint, arXiv: 1512.00180. Google Scholar

[20]

J. M. D. LundinC. M. StevensR. ArthernC. Buizert and A. Orsi, Firn model intercomparison experiment (FirnMICE), J. Glaciology, 63 (2017), 401-422.  doi: 10.1017/jog.2016.114.  Google Scholar

[21]

I. ObayashiY. Hiraoka and M. Kimura, Persistence diagrams with linear machine learning models, J. Appl. Comput. Topol., 1 (2018), 421-449.  doi: 10.1007/s41468-018-0013-5.  Google Scholar

[22]

N. Otter, M. A. Porter, U. Tillmann, P. Grindrod and H. A. Harrington, A roadmap for the computation of persistent homology, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0109-5.  Google Scholar

[23]

A. Patania, F. Vaccarino and G. Petri, Topological analysis of data, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0104-x.  Google Scholar

[24]

J. Schwander and B. Stauffer, Age difference between polar ice and the air trapped in its bubbles, Nature, 311 (1984), 45-47.  doi: 10.1038/311045a0.  Google Scholar

[25]

L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501-535.  doi: 10.1146/annurev-statistics-031017-100045.  Google Scholar

[26]

A. Zomorodian, Topological data analysis, in Advances in Applied and Computational Topology, Proc. Sympos. Appl. Math., 70, Amer. Math. Soc., Providence, RI, 2012, 1–39. doi: 10.1090/psapm/070/587.  Google Scholar

show all references

References:
[1]

R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag and S. Weinberger, Persistent homology for random fields and complexes, in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Inst. Math. Stat. (IMS) Collect., 6, Inst. Math. Statist., Beachwood, OH, 2010,124–143. doi: 10.1214/10-IMSCOLL609.  Google Scholar

[2]

N. AtienzaR. Gonzalez-Diaz and M. Rucco, Persistent entropy for separating topological features from noise in vietoris-rips complexes, J. Intelligent Information Systems, 52 (2019), 637-655.  doi: 10.1007/s10844-017-0473-4.  Google Scholar

[3]

N. Atienza, R. Gonzalez-Diaz and M. Rucco, Separating topological noise from features using persistent entropy, in Software Technologies: Applications and Foundations, Lecture Notes in Computer Science, 9946, Springer, Cham, 2016, 3–12. doi: 10.1007/978-3-319-50230-4_1.  Google Scholar

[4]

G. Carlsson, Topology and data, Bull. Amer. Math. Soc.(N.S.), 46 (2009), 255-308.  doi: 10.1090/S0273-0979-09-01249-X.  Google Scholar

[5]

F. ChazalV. de Silva and S. Oudot, Persistence stability for geometric complexes, Geom. Dedicata, 173 (2014), 193-214.  doi: 10.1007/s10711-013-9937-z.  Google Scholar

[6]

F. Chazal and B. Michel, An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists, preprint, arXiv: 1710.04019. Google Scholar

[7]

Y.-M. Chung and S. Day, Topological fidelity and image thresholding: A persistent homology approach, J. Math. Imaging Vision, 60 (2018), 1167-1179.  doi: 10.1007/s10851-018-0802-4.  Google Scholar

[8]

D. Cohen-SteinerH. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103-120.  doi: 10.1007/s00454-006-1276-5.  Google Scholar

[9]

P. Dlotko, Cubical complex, in GUDHI User and Reference Manual, GUDHI Editorial Board, 3.3.0 edition, 2020. Available from: https://gudhi.inria.fr/. Google Scholar

[10]

H. Edelsbrunner and J. L. Harer, Computational Topology. An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069.  Google Scholar

[11]

M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in KDD-96 Proceedings, 1996,226–231. Google Scholar

[12]

B. T. FasyF. LecciA. RinaldoL. WassermanS. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42 (2014), 2301-2339.  doi: 10.1214/14-AOS1252.  Google Scholar

[13]

A. Garin and G. Tauzin, A topological "reading" lesson: Classification of MNIST using TDA, 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, 2019, 1551–1556. doi: 10.1109/ICMLA.2019.00256.  Google Scholar

[14]

R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc., 45 (2008), 61-75.  doi: 10.1090/S0273-0979-07-01191-3.  Google Scholar

[15]

J. F. Jardine, Data and homotopy types, preprint, arXiv: 1908.06323. Google Scholar

[16]

T. Kaczynski, K. Mischaikow and M. Mrozek, Computational Homology, Applied Mathematical Sciences, 157, Springer-Verlag, New York, 2004. doi: 10.1007/b97315.  Google Scholar

[17]

K. KeeganM. R. Albert and I. Baker, The impact of ice layers on gas transport through firn at the North Greenland Eemian Ice Drilling (NEEM) site, Greenland, The Cryosphere, 8 (2014), 1801-1806.  doi: 10.5194/tc-8-1801-2014.  Google Scholar

[18]

A. LandaisJ. M. BarnolaK. KawamuraN. Caillon and M. Delmotte, Firn-air $\delta$15n in modern polar sites and glacial–interglacial ice: A model-data mismatch during glacial periods in Antarctica?, Quaternary Science Reviews, 25 (2006), 49-62.  doi: 10.1016/j.quascirev.2005.06.007.  Google Scholar

[19]

M. Lesnick and M. Wright, Interactive visualization of 2-D persistence modules, preprint, arXiv: 1512.00180. Google Scholar

[20]

J. M. D. LundinC. M. StevensR. ArthernC. Buizert and A. Orsi, Firn model intercomparison experiment (FirnMICE), J. Glaciology, 63 (2017), 401-422.  doi: 10.1017/jog.2016.114.  Google Scholar

[21]

I. ObayashiY. Hiraoka and M. Kimura, Persistence diagrams with linear machine learning models, J. Appl. Comput. Topol., 1 (2018), 421-449.  doi: 10.1007/s41468-018-0013-5.  Google Scholar

[22]

N. Otter, M. A. Porter, U. Tillmann, P. Grindrod and H. A. Harrington, A roadmap for the computation of persistent homology, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0109-5.  Google Scholar

[23]

A. Patania, F. Vaccarino and G. Petri, Topological analysis of data, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0104-x.  Google Scholar

[24]

J. Schwander and B. Stauffer, Age difference between polar ice and the air trapped in its bubbles, Nature, 311 (1984), 45-47.  doi: 10.1038/311045a0.  Google Scholar

[25]

L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501-535.  doi: 10.1146/annurev-statistics-031017-100045.  Google Scholar

[26]

A. Zomorodian, Topological data analysis, in Advances in Applied and Computational Topology, Proc. Sympos. Appl. Math., 70, Amer. Math. Soc., Providence, RI, 2012, 1–39. doi: 10.1090/psapm/070/587.  Google Scholar

https://github.com/azlawson/Firn) for a visualization of these 3D images. Only the 300th, 450th, and 600th slices from each sample's stack of slices are shown here. Counts of contiguous ice and pore-space regions, respectively, as determined by an expert in firn, are shown in parentheses">Figure 1.  Samples of firn Micro-CT images at different depths. Lighter grey regions represent ice-space and darker regions represent pore-space in each image. In this work, there are 14 firn samples (not columns) in total, the depths of which range from 7 to 78 m in the firn column. Only select samples from that range of depths are shown here. Each sample consists of over 900 cross-sectional slices of 2D grayscale images of size $ 400\times400 $ pixels. In other words, each sample is a 3D image of dimension $ 400\times 400\times 900 $ voxels. We refer readers to our github page (https://github.com/azlawson/Firn) for a visualization of these 3D images. Only the 300th, 450th, and 600th slices from each sample's stack of slices are shown here. Counts of contiguous ice and pore-space regions, respectively, as determined by an expert in firn, are shown in parentheses
9] to calculate persistence and track generators. Red points are the generators corresponding to the purple points in (k), and the green one is the infinite generator in (k). (k) 0-dimensional persistence diagram, $ D_0 $. The purple points correspond to features with red labels in (j). (l) Illustration of the lifespan cutoff with selected points in purple. (m) Illustration of the PD Thresholding method with selected points in purple">Figure 2.  Motivational example of sublevel set filtration and the corresponding persistence diagram. (a) The original grayscale image $ f $. (b)-(i) Sublevel set of $ f $ at different values of threshold $ t $. We color the sublevel set as white. (j) Labels for the rice grains. We used GUDHI[9] to calculate persistence and track generators. Red points are the generators corresponding to the purple points in (k), and the green one is the infinite generator in (k). (k) 0-dimensional persistence diagram, $ D_0 $. The purple points correspond to features with red labels in (j). (l) Illustration of the lifespan cutoff with selected points in purple. (m) Illustration of the PD Thresholding method with selected points in purple
Figure 3.  Illustration of the DPFD method. The yellow ellipses represent clusters that are labelled to be noise, while the purple cluster is labelled as features since its minimum lifespan is above the designated cluster cutoff
Figure 2. The parameters are $ M = 20 $, $ q = 0.1 $, and $ L_0 = 50 $. Purple (darker) points are detected by the DPFD, and the exact 79 features are detected">Figure 4.  Application of DPFD to the rice example shown in Figure 2. The parameters are $ M = 20 $, $ q = 0.1 $, and $ L_0 = 50 $. Purple (darker) points are detected by the DPFD, and the exact 79 features are detected
Figure 2. Purple (darker) points are $ D_0^{83} $. $ \#D_0^{83} = 78 $. Missing and mislabeled features are highlighted by green circles">Figure 5.  Application of lifespan cutoff to the rice example shown in Figure 2. Purple (darker) points are $ D_0^{83} $. $ \#D_0^{83} = 78 $. Missing and mislabeled features are highlighted by green circles
Figure 6.  Precision and Recall plots for various $ (q,M) $ pairs
Figure 7.  Rice grain count errors for various $ (q,M) $ pairs
Figure 8.  The process for the DPFD model. From left to right we have the original image, the image resulting from the application of median blur and the morphological opening, the persistence diagram of the preprocessed image, and finally the application of DPFD
Figure 9.  A plot of the computed optimal $ q $ and $ M $ over depth
Figure 10.  An example of DPFD on a firn image. The image is tagged based on the selected diagram points. From left to right are: the persistence diagram computed on the original image, the persistence diagram computed on the preprocessed image, and the original image tagged by points corresponding to the features identified by DPFD
Figure 11.  An example of DPFD over-labelling ice on a firn image
Figure 12.  Left: The average count of ice regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of ice regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. Right: The difference between the true number of ice regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
Figure 13.  Left: The average count of pore-space regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of pore-space regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. These values represent the number of contiguous pore-space regions with depth. Right: The difference between the true number of pore-space regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
[1]

Antonio Rieser. A topological approach to spectral clustering. Foundations of Data Science, 2021, 3 (1) : 49-66. doi: 10.3934/fods.2021005

[2]

George Siopsis. Quantum topological data analysis with continuous variables. Foundations of Data Science, 2019, 1 (4) : 419-431. doi: 10.3934/fods.2019017

[3]

Tyrus Berry, Timothy Sauer. Consistent manifold representation for topological data analysis. Foundations of Data Science, 2019, 1 (1) : 1-38. doi: 10.3934/fods.2019001

[4]

Zhi Guo Feng, K. F. Cedric Yiu, K.L. Mak. Feature extraction of the patterned textile with deformations via optimal control theory. Discrete & Continuous Dynamical Systems - B, 2011, 16 (4) : 1055-1069. doi: 10.3934/dcdsb.2011.16.1055

[5]

Qiang Yin, Gongfa Li, Jianguo Zhu. Research on the method of step feature extraction for EOD robot based on 2D laser radar. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1415-1421. doi: 10.3934/dcdss.2015.8.1415

[6]

Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565

[7]

Daniel Mckenzie, Steven Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 2019, 1 (3) : 307-327. doi: 10.3934/fods.2019014

[8]

Michael Herty, Lorenzo Pareschi, Giuseppe Visconti. Mean field models for large data–clustering problems. Networks & Heterogeneous Media, 2020, 15 (3) : 463-487. doi: 10.3934/nhm.2020027

[9]

Danuta Gaweł, Krzysztof Fujarewicz. On the sensitivity of feature ranked lists for large-scale biological data. Mathematical Biosciences & Engineering, 2013, 10 (3) : 667-690. doi: 10.3934/mbe.2013.10.667

[10]

Yunmei Lu, Mingyuan Yan, Meng Han, Qingliang Yang, Yanqing Zhang. Privacy preserving feature selection and Multiclass Classification for horizontally distributed data. Mathematical Foundations of Computing, 2018, 1 (4) : 331-348. doi: 10.3934/mfc.2018016

[11]

Evariste Sanchez-Palencia, Jean-Pierre Françoise. Topological remarks and new examples of persistence of diversity in biological dynamics. Discrete & Continuous Dynamical Systems - S, 2019, 12 (6) : 1775-1789. doi: 10.3934/dcdss.2019117

[12]

Frédéric Grognard, Frédéric Mazenc, Alain Rapaport. Polytopic Lyapunov functions for persistence analysis of competing species. Discrete & Continuous Dynamical Systems - B, 2007, 8 (1) : 73-93. doi: 10.3934/dcdsb.2007.8.73

[13]

Junying Hu, Xiaofei Qian, Jun Pei, Changchun Tan, Panos M. Pardalos, Xinbao Liu. A novel quality prediction method based on feature selection considering high dimensional product quality data. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021099

[14]

Jiangchuan Fan, Xinyu Guo, Jianjun Du, Weiliang Wen, Xianju Lu, Brahmani Louiza. Analysis of the clustering fusion algorithm for multi-band color image. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1233-1249. doi: 10.3934/dcdss.2019085

[15]

Jian Zhao, Fang Deng, Jian Jia, Chunmeng Wu, Haibo Li, Yuan Shi, Shunli Zhang. A new face feature point matrix based on geometric features and illumination models for facial attraction analysis. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1065-1072. doi: 10.3934/dcdss.2019073

[16]

Daniel Roggen, Martin Wirz, Gerhard Tröster, Dirk Helbing. Recognition of crowd behavior from mobile sensors with pattern analysis and graph clustering methods. Networks & Heterogeneous Media, 2011, 6 (3) : 521-544. doi: 10.3934/nhm.2011.6.521

[17]

Pooja Bansal, Aparna Mehra. Integrated dynamic interval data envelopment analysis in the presence of integer and negative data. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021023

[18]

Zhouchen Lin. A review on low-rank models in data analysis. Big Data & Information Analytics, 2016, 1 (2&3) : 139-161. doi: 10.3934/bdia.2016001

[19]

Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen. Big data collection and analysis for manufacturing organisations. Big Data & Information Analytics, 2017, 2 (2) : 127-139. doi: 10.3934/bdia.2017002

[20]

Runqin Hao, Guanwen Zhang, Dong Li, Jie Zhang. Data modeling analysis on removal efficiency of hexavalent chromium. Mathematical Foundations of Computing, 2019, 2 (3) : 203-213. doi: 10.3934/mfc.2019014

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]