1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
original | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
k=60 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 |
k=120 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
k=180 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
k=240 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
k=300 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
k=360 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Persistent homology is a tool within topological data analysis to detect different dimensional holes in a dataset. The boundaries of the empty territories (i.e., holes) are not well-defined and each has multiple representations. The proposed method, Empty Territory (EmT), provides representations of different dimensional holes with a specified level of complexity of the territory boundary. EmT is designed for the setting where persistent homology uses a Vietoris-Rips complex filtration, and works as a post-analysis to refine the hole representation of the persistent homology algorithm. In particular, EmT uses alpha shapes to obtain a special class of representations that captures the empty territories with a complexity determined by the size of the alpha balls. With a fixed complexity, EmT returns the representation that contains the most points within the special class of representations. This method is limited to finding 1D holes in 2D data and 2D holes in 3D data, and is illustrated on simulation datasets of a homogeneous Poisson point process in 2D and a uniform sampling in 3D. Furthermore, the method is applied to a 2D cell tower location geography dataset and 3D Sloan Digital Sky Survey (SDSS) galaxy dataset, where it works well in capturing the empty territories.
Citation: |
Figure 1. The three loops are equivalent in the sense that they all encircle the same hole. However, it can be desirable to find a representation that captures the empty territory, with no data point inside. The first two loops contain data points inside them, while the third one is empty. EmT is a method for finding the third type of representation at a specified level of complexity
Figure 2. For a VR complex, if there are pairwise intersections between $ k $ points, the corresponding $ k $-simplex is added to the simplicial complex. (a) three circles with radius $ \frac{\delta}{2} $ that intersect pairwise, and $ a, b, c $ are within a distance of $ \delta $ from each other. The $ VR(X, \delta) $ includes three points $ a, b, c $, the edges $ \{ab, bc, ca\} $, and the triangle $ \{abc\} $. (b) the resulting $ VR(X, \delta) $
Figure 3. (a) The same point cloud as in Fig. 1 is used to generate the persistence diagram of (b). The farther a point is from the diagonal, the longer it appears in the filtration. There are 40 $ H_0 $ generators and one prominent $ H_1 $ generator, which is consistent with the point cloud
Figure 4. (a) The green circles occupy the area around the data points. The remaining area is the resulting alpha hull, outlined by the red curves. By straightening the arcs, it becomes an alpha shape, shown in blue lines. (b) A concave hull based on the alpha shape (blue polygon). (c) An inner shell (blue polygon) based on the same alpha shape as (a)
Figure 6. The same dataset from previous examples is used here to illustrate EmT. (a) Convex hull of the dataset in green lines. The red circles are set $ S_X $ and green crosses are set $ S_{X'} $. (b) The alpha hull in read lines and alpha shape in blue lines. Green circles are $ \alpha $ balls and green points are $ C_{\alpha} $. (c) Blue lines are $ CH_{\alpha} $ and red pluses are circle centers that are inside $ CH_{\alpha} $. (d) The comparison between the EmT output $ \widetilde{IS_{\alpha}} $ as blue diamonds and the original representation point set $ S_X $ returned by persistent homology as red circles
Figure 7. The same outer loop is used in the four examples displayed in (a), (b), (c), and (d), but with different points inside. The black points are data points, the red circles are the raw representations from the persistent homology algorithm, the blue triangles are representations of the EmT algorithm, and in (c) the cyan curves are the corresponding alpha hull. In (a), the raw representation of persistent homology is not able to detect the inner points, while the EmT representation includes the inner points. In (b), both the raw representation and EmT have the inner points in their representations. In (c), while the raw representation contains the larger loop with the inner points, the EmT representation includes two separate loops, as indicated by the corresponding alpha hulls. In (d), due to the amount of scattered points inside the larger loop, the raw representation is not able to identify it; therefore the EmT representation is not able to find the larger loop either since EmT takes the raw representation as input
Figure 8. A dataset from the homogeneous Poisson point process with intensity $ 300 $ in $ [0, 1]\times[0, 1] $. (a) Persistence diagram of the dataset. (b) The eight $ H_1 $ generators with the largest persistences reported by persistent homology in different colors. (c) The representations of the eight $ H_1 $ generators obtained by the EmT approach, where the boundary of the empty regions are better captured
Figure 9. (a) A dataset generated from the uniform distribution in $ [0, 1]\times[0, 1]\times[0, 1] $; points inside the red regions are deleted from the dataset. (b) The persistence diagram of the dataset. (c) The raw representations from the persistent homology algorithm (blue and green points). The representations are displayed in blue and green shading, where the red points are observations inside the shaded volumes. (d) The representations from the EmT approach (blue and green points). The representations are displayed in blue and green shading, and there is no point inside the shading volumes
Figure 10. (a) Cell tower locations in Minnesota where each of the black point indicates a cell tower. (b) The loop overlaps with the Red Lake Indian Reservation. (c) The persistence diagram of the dataset. (d), (e) Representations of eight $ H_1 $ generators with the largest persistences from persistent homology and the EmT approach, respectively
Figure 12. Persistence diagrams of the centers of k-means clustering for different k's. The k-means are made on the dataset shown in Fig. 11a
Figure 13. (a) Cosmic voids representations ($ H_2 $ generators) reported by the persistent homology algorithm. (b) Cosmic voids representations from the EmT approach. As a qualitative comparison, the representations of EmT trace the empty volumes, while the raw representations reported by persistent homology do not define well the empty regions. (c) The raw representation of a particular cosmic void is displayed in the blue shading and the red points are galaxy inside the volume. (d) The EmT representation of a particular cosmic is displayed in the blue shading and there is no galaxy inside. The axis units are $ Mpc/h $, which megaparsec divided by a parameter $ h $ (which accounts for the uncertainty about the expansion rate of the universe). One parsec is approximately 3.26 lightyears
Table 1. Summary table indicating which of the eight void regions is detected for different k values; 1 = detected and 0 = not detected. The corresponding labeled void regions are shown in Fig. 11a
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
original | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
k=60 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 |
k=120 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
k=180 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
k=240 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
k=300 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
k=360 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
[1] | S. Asaeedi, F. Didehvar and A. Mohades, α-concave hull, a generalization of convex hull, Theoretical Computer Science, 702 (2017), 48-59. doi: 10.1016/j.tcs.2017.08.014. |
[2] | P. Bendich, J. S. Marron, E. Miller, A. Pieloch and S. Skwerer, Persistent homology analysis of brain artery trees, The annals of applied statistics, 10 (2016), 198-218. doi: 10.1214/15-AOAS886. |
[3] | G. Carlsson, A. Zomorodian, A. Collins and L. J. Guibas, Persistence barcodes for shapes, International Journal of Shape Modeling, 11 (2005), 149-187. |
[4] | F. Chazal, B. Fasy, F. Lecci, B. Michel, A. Rinaldo, A. Rinaldo and L. Wasserman, Robust topological inference: Distance to a measure and kernel distance, The Journal of Machine Learning Research, 18 (2017), Paper No. 159, 40 pp. |
[5] | S. N. Chiu, D. Stoyan, W. S. Kendall and J. Mecke, Stochastic Geometry and Its Applications, John Wiley & Sons, 2013. doi: 10.1002/9781118658222. |
[6] | D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete & Computational Geometry, 37 (2007), 103-120. doi: 10.1007/s00454-006-1276-5. |
[7] | D. Kahle and H. Wickham, ggmap: Spatial Visualization with ggplot2, The R Journal, 5 (2013), 144-161. |
[8] | V. De Silva and G. E. Carlsson, Topological estimation using witness complexes., SPBG, 4 (2004), 157-166. |
[9] | Edelsbrunner, Letscher and Zomorodian, Topological persistence and simplification, Discrete & Computational Geometry, 28 (2002), 511–533, URL https://doi.org/10.1007/s00454-002-2885-2. |
[10] | H. Edelsbrunner and J. Harer, Persistent homology-a survey, Contemporary Mathematics, 453 (2008), 257-282. doi: 10.1090/conm/453/08802. |
[11] | H. Edelsbrunner and J. Harer, Computational Topology: An Introduction, American Mathematical Soc., 2010. |
[12] | H. Edelsbrunner, D. Kirkpatrick and R. Seidel, On the shape of a set of points in the plane, IEEE Transactions on information theory, 29 (1983), 551-559. doi: 10.1109/TIT.1983.1056714. |
[13] | H. Edelsbrunner and D. Morozov, Persistent homology: Theory and practice, European Congress of Mathematics, 31–50, Eur. Math. Soc., Zürich, 2013. |
[14] | H. Edelsbrunner and E. P. Mücke, Three-dimensional alpha shapes, ACM Transactions on Graphics (TOG), 13 (1994), 43-72. |
[15] | S. Emrani, T. Gentimis and H. Krim, Persistent homology of delay embeddings and its application to wheeze detection, IEEE Signal Processing Letters, 21 (2014), 459-463. |
[16] | B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan and A. Singh et al., Confidence sets for persistence diagrams, The Annals of Statistics, 42 (2014), 2301-2339. doi: 10.1214/14-AOS1252. |
[17] | A. Galton, Pareto-optimality of cognitively preferred polygonal hulls for dot patterns, in Spatial Cognition VI. Learning, Reasoning, and Talking about Space (eds. C. Freksa, N. S. Newcombe, P. Gärdenfors and S. Wölfl), Springer Berlin Heidelberg, Berlin, Heidelberg, 45 (2008), 409–425. |
[18] | R. Ghrist, Barcodes: the persistent topology of data, Bulletin of the American Mathematical Society, 45 (2008), 61-75. doi: 10.1090/S0273-0979-07-01191-3. |
[19] | J. A. Hartigan and M. A. Wong, Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), 28 (1979), 100-108. |
[20] | Y. Hoffman, O. Metuki, G. Yepes, S. Gottlöber, J. E. Forero-Romero, N. I. Libeskind and A. Knebe, A kinematic classification of the cosmic web, Monthly Notices of the Royal Astronomical Society, 425 (2012), 2049-2057. |
[21] | V. Icke, R. Weygaert et al., The galaxy distribution as a voronoi foam, Quarterly Journal of the Royal Astronomical Society, 32 (1991), 85. |
[22] | F.-S. Kitaura and R. E. Angulo, Linearization with cosmological perturbation theory, Monthly Notices of the Royal Astronomical Society, 425 (2012), 2443-2454. |
[23] | S. Lloyd, Least squares quantization in pcm, IEEE Transactions on Information Theory, 28 (1982), 129-137. doi: 10.1109/TIT.1982.1056489. |
[24] | J. MacQueen et al., Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Oakland, CA, USA, 1967,281–297. |
[25] | A. Moreira and M. Y. Santos, Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points. |
[26] | M. C. Neyrinck, ZOBOV: A parameter-free void-finding algorithm, Monthly Notices of the Royal Astronomical Society, 386 (2008), 2101-2109. |
[27] | P. Peebles, The void phenomenon, The Astrophysical Journal, 557 (2001), 495. |
[28] | A. Pisani, P. Sutter, N. Hamaus, E. Alizadeh, R. Biswas, B. D. Wandelt and C. M. Hirata, Counting voids to probe dark energy, Physical Review D, 92 (2015), 083531. |
[29] | R. R. Rojas, M. S. Vogeley, F. Hoyle and J. Brinkmann, Photometric properties of void galaxies in the sloan digital sky survey, The Astrophysical Journal, 617 (2004), 50. |
[30] | T. Sousbie, The persistent cosmic web and its filamentary structure–i. theory and implementation, Monthly Notices of the Royal Astronomical Society, 414 (2011), 350-383. |
[31] | M. A. Strauss, D. H. Weinberg, R. H. Lupton, V. K. Narayanan, J. Annis, M. Bernardi, M. Blanton, S. Burles, A. Connolly, J. Dalcanton et al., Spectroscopic target selection in the Sloan Digital Sky Survey: The main galaxy sample, The Astronomical Journal, 124 (2002), 1810. |
[32] | P. Sutter, G. Lavaux, B. D. Wandelt and D. H. Weinberg, A public void catalog from the sdss dr7 galaxy redshift surveys based on the watershed transform, The Astrophysical Journal, 761 (2012), 44. |
[33] | R. van de Weygaert, Fragmenting the universe. 3: The constructions and statistics of 3-d voronoi tessellations, Astronomy and Astrophysics, 283 (1994), 361-406. |
[34] | L. Vietoris, Über den höheren Zusammenhang kompakter Räume und eine Klasse von zusammenhangstreuen Abbildungen, Mathematische Annalen, 97 (1927), 454-472. doi: 10.1007/BF01447877. |
[35] | H. Wagner, C. Chen and E. Vuçini, Efficient computation of persistent homology for cubical data, in Topological Methods in Data Analysis and Visualization II, Springer, 2012, 91–106. doi: 10.1007/978-3-642-23175-9_7. |
[36] | L. Wasserman, Topological data analysis, Annual Review of Statistics and Its Application, 5 (2018), 501-535. doi: 10.1146/annurev-statistics-031017-100045. |
[37] | K. Xia and G.-W. Wei, Persistent homology analysis of protein structure, flexibility, and folding, International Journal for Numerical Methods in Biomedical Engineering, 30 (2014), 814-844. doi: 10.1002/cnm.2655. |
[38] | X. Xu, J. Cisewski-Kehe, S. B. Green and D. Nagai, Finding cosmic voids and filament loops using topological data analysis, Astronomy and Computing. |
[39] | X. Zhu, Persistent homology: An introduction and a new text representation for natural language processing., in IJCAI, 2013, 1953–1959. |
[40] | A. Zomorodian, Fast construction of the Vietoris-Rips complex, Computers & Graphics, 34 (2010), 263-271. |
[41] | A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete & Computational Geometry, 33 (2005), 249-274. doi: 10.1007/s00454-004-1146-y. |
The three loops are equivalent in the sense that they all encircle the same hole. However, it can be desirable to find a representation that captures the empty territory, with no data point inside. The first two loops contain data points inside them, while the third one is empty. EmT is a method for finding the third type of representation at a specified level of complexity
For a VR complex, if there are pairwise intersections between
(a) The same point cloud as in Fig. 1 is used to generate the persistence diagram of (b). The farther a point is from the diagonal, the longer it appears in the filtration. There are 40
(a) The green circles occupy the area around the data points. The remaining area is the resulting alpha hull, outlined by the red curves. By straightening the arcs, it becomes an alpha shape, shown in blue lines. (b) A concave hull based on the alpha shape (blue polygon). (c) An inner shell (blue polygon) based on the same alpha shape as (a)
(a), (b) and (c) are three different ways of occupying the area from the interior of the dataset with
The same dataset from previous examples is used here to illustrate EmT. (a) Convex hull of the dataset in green lines. The red circles are set
The same outer loop is used in the four examples displayed in (a), (b), (c), and (d), but with different points inside. The black points are data points, the red circles are the raw representations from the persistent homology algorithm, the blue triangles are representations of the EmT algorithm, and in (c) the cyan curves are the corresponding alpha hull. In (a), the raw representation of persistent homology is not able to detect the inner points, while the EmT representation includes the inner points. In (b), both the raw representation and EmT have the inner points in their representations. In (c), while the raw representation contains the larger loop with the inner points, the EmT representation includes two separate loops, as indicated by the corresponding alpha hulls. In (d), due to the amount of scattered points inside the larger loop, the raw representation is not able to identify it; therefore the EmT representation is not able to find the larger loop either since EmT takes the raw representation as input
A dataset from the homogeneous Poisson point process with intensity
(a) A dataset generated from the uniform distribution in
(a) Cell tower locations in Minnesota where each of the black point indicates a cell tower. (b) The loop overlaps with the Red Lake Indian Reservation. (c) The persistence diagram of the dataset. (d), (e) Representations of eight
(a) A Voronoi foam simulation. The eight large red points are the void seeds that generate eight void regions. The green crosses highlight one of the eight void regions. (b) Persistence diagram of the original dataset with 1200 points
Persistence diagrams of the centers of k-means clustering for different k's. The k-means are made on the dataset shown in Fig. 11a
(a) Cosmic voids representations (