# American Institute of Mathematical Sciences

eISSN:
2639-8001

All Issues

## Foundations of Data Science

December 2021 , Volume 3 , Issue 4

Select all articles

Export/Reference:

2021, 3(4): 677-700 doi: 10.3934/fods.2021017 +[Abstract](943) +[HTML](564) +[PDF](2061.67KB)
Abstract:

The \begin{document}$p$\end{document}-persistent \begin{document}$q$\end{document}-combinatorial Laplacian defined for a pair of simplicial complexes is a generalization of the \begin{document}$q$\end{document}-combinatorial Laplacian. Given a filtration, the spectra of persistent combinatorial Laplacians not only recover the persistent Betti numbers of persistent homology but also provide extra multiscale geometrical information of the data. Paired with machine learning algorithms, the persistent Laplacian has many potential applications in data science. Seeking different ways to find the spectrum of an operator is an active research topic, becoming interesting when ideas are originated from multiple fields. In this work, we explore an alternative approach for the spectrum of persistent Laplacians. As the eigenvalues of a persistent Laplacian matrix are the roots of its characteristic polynomial, one may attempt to find the roots of the characteristic polynomial by homotopy continuation, and thus resolving the spectrum of the corresponding persistent Laplacian. We consider a set of simple polytopes and small molecules to prove the principle that algebraic topology, combinatorial graph, and algebraic geometry can be integrated to understand the shape of data.

2021, 3(4): 701-727 doi: 10.3934/fods.2021020 +[Abstract](930) +[HTML](377) +[PDF](1448.29KB)
Abstract:

We study the problem of diffeomorphometric geodesic landmark matching where the objective is to find a diffeomorphism that, via its group action, maps between two sets of landmarks. It is well-known that the motion of the landmarks, and thereby the diffeomorphism, can be encoded by an initial momentum leading to a formulation where the landmark matching problem can be solved as an optimisation problem over such momenta. The novelty of our work lies in the application of a derivative-free Bayesian inverse method for learning the optimal momentum encoding the diffeomorphic mapping between the template and the target. The method we apply is the ensemble Kalman filter, an extension of the Kalman filter to nonlinear operators. We describe an efficient implementation of the algorithm and show several numerical results for various target shapes.

2021, 3(4): 729-767 doi: 10.3934/fods.2021024 +[Abstract](991) +[HTML](313) +[PDF](5721.65KB)
Abstract:

Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. To do that, we use a generalized penalty function instead of an \begin{document}$L_{2}$\end{document} penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analyses to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.

2021, 3(4): 769-791 doi: 10.3934/fods.2021028 +[Abstract](513) +[HTML](212) +[PDF](2487.2KB)
Abstract:

The task of predicting missing entries of a matrix, from a subset of known entries, is known as matrix completion. In today's data-driven world, data completion is essential whether it is the main goal or a pre-processing step. Structured matrix completion includes any setting in which data is not missing uniformly at random. In recent work, a modification to the standard nuclear norm minimization (NNM) for matrix completion has been developed to take into account sparsity-based structure in the missing entries. This notion of structure is motivated in many settings including recommender systems, where the probability that an entry is observed depends on the value of the entry. We propose adjusting an Iteratively Reweighted Least Squares (IRLS) algorithm for low-rank matrix completion to take into account sparsity-based structure in the missing entries. We also present an iterative gradient-projection-based implementation of the algorithm that can handle large-scale matrices. Finally, we present a robust array of numerical experiments on matrices of varying sizes, ranks, and level of structure. We show that our proposed method is comparable with the adjusted NNM on small-sized matrices, and often outperforms the IRLS algorithm in structured settings on matrices up to size \begin{document}$1000 \times 1000$\end{document}.

2021, 3(4): 793-824 doi: 10.3934/fods.2021030 +[Abstract](639) +[HTML](265) +[PDF](2154.76KB)
Abstract:

We present an ensemble filtering method based on a linear model for the precision matrix (the inverse of the covariance) with the parameters determined by Score Matching Estimation. The method provides a rigorous covariance regularization when the underlying random field is Gaussian Markov. The parameters are found by solving a system of linear equations. The analysis step uses the inverse formulation of the Kalman update. Several filter versions, differing in the construction of the analysis ensemble, are proposed, as well as a Score matching version of the Extended Kalman Filter.