Foundations of Data Science
March 2021 , Volume 3 , Issue 1
Select all articles
In prior work [
Rankability paper [
This paper considers a new approach to using Markov chain Monte Carlo (MCMC) in contexts where one may adopt multilevel (ML) Monte Carlo. The underlying problem is to approximate expectations w.r.t. an underlying probability measure that is associated to a continuum problem, such as a continuous-time stochastic process. It is then assumed that the associated probability measure can only be used (e.g. sampled) under a discretized approximation. In such scenarios, it is known that to achieve a target error, the computational effort can be reduced when using MLMC relative to i.i.d. sampling from the most accurate discretized probability. The ideas rely upon introducing hierarchies of the discretizations where less accurate approximations cost less to compute, and using an appropriate collapsing sum expression for the target expectation. If a suitable coupling of the probability measures in the hierarchy is achieved, then a reduction in cost is possible. This article focused on the case where exact sampling from such coupling is not possible. We show that one can construct suitably coupled MCMC kernels when given only access to MCMC kernels which are invariant with respect to each discretized probability measure. We prove, under verifiable assumptions, that this coupled MCMC approach in a ML context can reduce the cost to achieve a given error, relative to exact sampling. Our approach is illustrated on a numerical example.
We propose two related unsupervised clustering algorithms which, for input, take data assumed to be sampled from a uniform distribution supported on a metric space
Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.
Add your name and e-mail address to receive news of forthcoming issues of this journal:
[Back to Top]