\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

On the limits of topological data analysis for statistical inference

  • *Corresponding author: Siddharth Vishwanath

    *Corresponding author: Siddharth Vishwanath 
Abstract / Introduction Full Text(HTML) Figure(9) Related Papers Cited by
  • Topological data analysis has emerged as a powerful tool for extracting the metric, geometric and topological features underlying the data as a multi-resolution summary statistic, and has found applications in several areas where data arises from complex sources. In this paper, we examine the use of topological summary statistics through the lens of statistical inference. We investigate necessary and sufficient conditions under which valid statistical inference is possible using topological summary statistics. Additionally, we provide examples of models that demonstrate invariance with respect to topological summaries.

    Mathematics Subject Classification: Primary: 62R40, 62F30; Secondary: 94A16.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Illustration of different asymptotic regimes

    Figure 2.  Scatterplot and Betti curve for point clouds $ \mathbb{X}_n $ obtained from two different distributions in $ \{f_{\boldsymbol{\theta }}: \boldsymbol{\theta } \in \Theta\} $. (Left) $ \mathbb{X}_n \sim f_{\boldsymbol{\theta }} $ for $ \boldsymbol{\theta } = (0.46, 0.47, 0.03, 0.04) $ in blue, (Center) $ \mathbb{X}_n \sim f_{\boldsymbol{\theta }} $ for $ \boldsymbol{\theta } = (0.17, 0.29, 0.21, 0.24) $ in orange, and (Right) the (normalized) Betti curve $ r \mapsto \beta_1(\mathbb{X}_n, r)/n $ for the two point clouds

    Figure 3.  Betti curves and the Betti numbers in the thermodynamic regime for $ \{f_{\boldsymbol{\theta }}: \boldsymbol{\theta } \in \Theta\} $ from Example 3.7

    Figure 4.  Illustration $ f_{\theta }(x, y) $ from Example 4.2 for $ \theta \in \{\pi/15, \pi/4, 2\pi/3\} $

    Figure 5.  The sets $ \left\{\boldsymbol{x} \in \mathbb{R}^2: f_\rho(\boldsymbol{x}) \geq t\right\} $ for $ \rho \in \{-0.99, -0.5, 0.99\} $ respectively. For a fixed level $ t $, all three of them have the same mass. In general, $ \mathbb{P}_{\rho}\big({\{\boldsymbol{x} \in \mathbb{R}^2 : f_\rho(\boldsymbol{x}) \ge t}\}) $ is the same for each $ |\rho|<1 $

    Figure 6.  Superlevel sets of the probability density function $ f_{\alpha } $ on $ \mathcal{X} $ from Example 5.9 when $ \alpha \in \{0, -5, 8\} $. For a fixed level $ t \ge 0 $, the mass of the superlevel set $ \{\boldsymbol{x} \in \mathcal{X} : f_{\alpha }(\boldsymbol{x}) \ge t\} $ is the same for each $ \alpha $

    Figure 7.  Illustration of $ \mathcal{F}$-equivalence when $ g \sim 0.5 \Gamma(10, 5) + 0.5 \Gamma(10, 2) $ is a mixture of Gamma distributions. For different values of $ \theta \in \Theta $, the excess mass functions $ \hat{f}_{\theta}(t) $ are identical, as shaded in orange for $ t = 0.006 $

    Figure 8.  Illustration of the family of distributions from Example 5.9

    Figure 9.  Superlevel sets of the density function $ f_{\alpha } $ on $ \mathcal{X}= \mathbb{S}^2 \setminus \{\boldsymbol{p}, -\boldsymbol{p}\} $ from Example A.3 when $ g \sim \rm{Beta}(10, 10) $ is the density function of a Beta distribution on $ (0, 1) $, and for $ \alpha\in \{0, 1, -2, -5\} $

  • [1] R. J. AdlerS. Agami and P. Pranav, Modeling and replicating statistical topology and evidence for CMB nonhomogeneity, Proceedings of the National Academy of Sciences, 114 (2017), 11878-11883.  doi: 10.1073/pnas.1706885114.
    [2] P. BendichJ. S. MarronE. MillerA. Pieloch and S. Skwerer, Persistent homology analysis of brain artery trees, The Annals of Applied Statistics, 10 (2016), 198-218.  doi: 10.1214/15-AOAS886.
    [3] C. A. N. BiscioN. ChenavierC. Hirsch and A. M. Svane, Testing goodness of fit for point processes via topological data analysis, Electronic Journal of Statistics, 14 (2020), 1024-1074.  doi: 10.1214/20-EJS1683.
    [4] O. Bobrowski, Homological connectivity in random Čech complexes, Probability Theory and Related Fields, 183 (2022), 715-788.  doi: 10.1007/s00440-022-01149-6.
    [5] O. Bobrowski and R. J. Adler, Distance functions, critical points, and the topology of random Čech complexes, Homology, Homotopy and Applications, 16 (2014), 311-344.  doi: 10.4310/HHA.2014.v16.n2.a18.
    [6] O. Bobrowski and M. Kahle, Topology of random geometric complexes: A survey, Journal of Applied and Computational Topology, 1 (2018), 331-364.  doi: 10.1007/s41468-017-0010-0.
    [7] O. Bobrowski and S. Mukherjee, The topology of probability distributions on manifolds, Probability Theory and Related Fields, 161 (2015), 651-686.  doi: 10.1007/s00440-014-0556-x.
    [8] K. Borsuk, On the imbedding of systems of compacta in simplicial complexes, Fundamenta Mathematicae, 35 (1948), 217-234.  doi: 10.4064/fm-35-1-217-234.
    [9] P. Bubenik and P. T. Kim, A statistical approach to persistent homology, Homology, Homotopy and Applications, 9 (2007), 337-362.  doi: 10.4310/HHA.2007.v9.n2.a12.
    [10] V. H. Can and K. D. Trinh, Random connection models in the thermodynamic regime: Central limit theorems for add-one cost stabilizing functionals, Electronic Journal of Probability, 27 (2022), 1-40.  doi: 10.1214/22-EJP759.
    [11] E. G. Z. CentenoG. MoreniC. VriendL. Douw and F. A. N. Santos, A hands-on tutorial on network and topological neuroscience, Brain Structure and Function, 227 (2022), 741-762.  doi: 10.1007/s00429-021-02435-0.
    [12] M. K. Chung, S.-G. Huang, A. Gritsenko, L. Shen and H. Lee, Statistical inference on the number of cycles in brain networks, in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), IEEE, 2019,113-116. doi: 10.1109/ISBI.2019.8759222.
    [13] V. Divol and T. Lacombe, Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport, Journal of Applied and Computational Topology, 5 (2021), 1-53.  doi: 10.1007/s41468-020-00061-z.
    [14] P. Dłotko, N. Hellmer, Ł. Stettner and R. Topolnicki, Topology-driven goodness-of-fit tests in arbitrary dimensions, Statistics and Computing, 34 (2024), 34, 23 pp. doi: 10.1007/s11222-023-10333-0.
    [15] M. L. Eaton, Group Invariance Applications in Statistics, Institute of Mathematical Statistics and the American Statistical Association, Hayward, CA and Alexandria, VA, 1989.
    [16] H. Edelsbrunner and J. L. Harer, Computational Topology: An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069.
    [17] B. Efron and D. V. Hinkley, Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information, Biometrika, 65 (1978), 457-483.  doi: 10.1093/biomet/65.3.457.
    [18] B. T. FasyF. LecciA. RinaldoL. WassermanS. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, The Annals of Statistics, 42 (2014), 2301-2339.  doi: 10.1214/14-AOS1252.
    [19] G. B. Folland, Real Analysis: Modern Techniques and Their Applications, John Wiley & Sons, New York, 1999.
    [20] M. GameiroY. HiraokaS. IzumiM. KramarK. Mischaikow and V. Nanda, A topological measurement of protein compressibility, Japan Journal of Industrial and Applied Mathematics, 32 (2015), 1-17.  doi: 10.1007/s13160-014-0153-5.
    [21] M. GhoshN. Reid and D. A. S. Fraser, Ancillary statistics: A review, Statistica Sinica, 20 (2010), 1309-1332. 
    [22] A. GoelK. D. Trinh and K. Tsunoda, Strong law of large numbers for Betti numbers in the thermodynamic regime, Journal of Statistical Physics, 174 (2019), 865-892.  doi: 10.1007/s10955-018-2201-z.
    [23] A. Goetz, On measures in fibre bundles, Colloquium Mathematicum, 7 (1959), 11-18.  doi: 10.4064/cm-7-1-11-18.
    [24] R. F. Gunst and J. T. Webster, Density functions of the bivariate chi-square distribution, Journal of Statistical Computation and Simulation, 2 (1973), 275-288.  doi: 10.1080/00949657308810052.
    [25] A. HatcherAlgebraic Topology, Cambridge University Press, Cambridge, UK, 2002. 
    [26] Y. HiraokaT. Shirai and K. D. Trinh, Limit theorems for persistence diagrams, The Annals of Applied Probability, 28 (2018), 2740-2780.  doi: 10.1214/17-AAP1371.
    [27] M. Kahle, Random geometric complexes, Discrete & Computational Geometry, 45 (2011), 553-573.  doi: 10.1007/s00454-010-9319-3.
    [28] M. Kahle and E. Meckes, Limit theorems for Betti numbers of random simplicial complexes, Homology Homotopy Appl., 15 (2013), 343-374.  doi: 10.4310/HHA.2013.v15.n1.a17.
    [29] H. KamiyaA. Takemura and S. Kuriki, Star-shaped distributions and their generalizations, Journal of Statistical Planning and Inference, 138 (2008), 3429-3447.  doi: 10.1016/j.jspi.2006.03.016.
    [30] J. KrebsB. Roycraft and W. Polonik, On approximation theorems for the Euler characteristic with applications to the bootstrap, Electronic Journal of Statistics, 15 (2021), 4462-4509.  doi: 10.1214/21-EJS1898.
    [31] J. Krebs and W. Polonik, On the asymptotic normality of persistent Betti numbers, arXiv preprint, arXiv: 1903.03280.
    [32] C. J. Lloyd, E. J. Williams and P. S. Yip, Ancillary Statistics-Ⅱ, Encyclopedia of Statistical Sciences, 1.
    [33] K. Meng, J. Wang, L. Crawfordc and A. Eloyan, Randomness of shapes and statistical inference on shapes via the smooth Euler characteristic transform, Journal of the American Statistical Association, (2024), 1-25. doi: 10.1080/01621459.2024.2353947.
    [34] Y. MileykoS. Mukherjee and J. Harer, Probability measures on the space of persistence diagrams, Inverse Problems, 27 (2011), 124007.  doi: 10.1088/0266-5611/27/12/124007.
    [35] D. W. Müller and G. Sawitzki, Excess mass estimates and tests for multimodality, Journal of the American Statistical Association, 86 (1991), 738-746.  doi: 10.1080/01621459.1991.10475103.
    [36] S. Oudot and E. Solomon, Inverse problems in topological persistence, in Topological Data Analysis: The Abel Symposium 2018, Springer International Publishing, 2020,405-433. doi: 10.1007/978-3-030-43408-3_16.
    [37] T. Owada and A. M. Thomas, Limit theorems for process-level Betti numbers for sparse and critical regimes, Advances in Applied Probability, 52 (2020), 1-31.  doi: 10.1017/apr.2019.50.
    [38] M. PenroseRandom Geometric Graphs, vol. 5, Oxford University Press, Oxford, UK, 2003.  doi: 10.1093/acprof:oso/9780198506263.001.0001.
    [39] W. Polonik, Measuring mass concentrations and estimating density contour clusters-an excess mass approach, The Annals of Statistics, 23 (1995), 855-881.  doi: 10.1214/aos/1176324626.
    [40] B. RoycraftJ. Krebs and W. Polonik, Bootstrapping persistent Betti numbers and other stabilizing statistics, The Annals of Statistics, 51 (2023), 1484-1509.  doi: 10.1214/23-AOS2277.
    [41] J. A. Schouten, Ricci–Calculus: An Introduction to Tensor Analysis and Its Geometrical Applications, Springer–Verlag, Berlin, Germany, 1954.
    [42] T. A. SeveriniLikelihood Methods in Statistics, Oxford University Press, 2000.  doi: 10.1093/oso/9780198506508.001.0001.
    [43] S. S. Sørensen, C. A. N. Biscio, M. Bauchy, L. Fajstrup and M. M. Smedskjaer, Revealing hidden medium-range order in amorphous materials using topological data analysis, Science Advances, 6 (2020), eabc2320. doi: 10.1126/sciadv.abc2320.
    [44] N. SteenrodThe Topology of Fibre Bundles, vol. 44, Princeton University Press, 1999. 
    [45] A. M. Thomas and T. Owada, Functional limit theorems for the Euler characteristic process in the critical regime, Advances in Applied Probability, 53 (2021), 57-80.  doi: 10.1017/apr.2020.46.
    [46] K. D. Trinh, A remark on the convergence of Betti numbers in the thermodynamic regime, Pacific Journal of Mathematics for Industry, 9 (2017), 4, 7 pp. doi: 10.1186/s40736-017-0029-0.
    [47] K. TurnerY. MileykoS. Mukherjee and J. Harer, Fréchet means for distributions of persistence diagrams, Discrete & Computational Geometry, 52 (2014), 44-70.  doi: 10.1007/s00454-014-9604-7.
    [48] R. A. Wijsman, Invariant Measures on Groups and Their Use in Statistics, Institute of Mathematical Statistics, Hayward, CA, 1990. doi: 10.1214/lnms/1215540653.
    [49] D. Yogeshwaran and R. J. Adler, On the topology of random complexes built over stationary point processes, The Annals of Applied Probability, 25 (2015), 3338-3380.  doi: 10.1214/14-AAP1075.
    [50] D. YogeshwaranE. Subag and R. J. Adler, Random geometric complexes in the thermodynamic regime, Probability Theory and Related Fields, 167 (2017), 107-142.  doi: 10.1007/s00440-015-0678-9.
  • 加载中

Figures(9)

SHARE

Article Metrics

HTML views(3044) PDF downloads(356) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return