# American Institute of Mathematical Sciences

eISSN:
2639-8001

All Issues

## Foundations of Data Science

March 2022 , Volume 4 , Issue 1

Select all articles

Export/Reference:

2022, 4(1): 1-36 doi: 10.3934/fods.2021033 +[Abstract](948) +[HTML](278) +[PDF](9090.56KB)
Abstract:

One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [19], crocker plots [55], and multiparameter rank functions [37]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable continuity property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [57]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.

2022, 4(1): 37-70 doi: 10.3934/fods.2021034 +[Abstract](495) +[HTML](215) +[PDF](2814.07KB)
Abstract:

The classical Langevin Monte Carlo method looks for samples from a target distribution by descending the samples along the gradient of the target distribution. The method enjoys a fast convergence rate. However, the numerical cost is sometimes high because each iteration requires the computation of a gradient. One approach to eliminate the gradient computation is to employ the concept of "ensemble." A large number of particles are evolved together so the neighboring particles provide gradient information to each other. In this article, we discuss two algorithms that integrate the ensemble feature into LMC, and the associated properties.

In particular, we find that if one directly surrogates the gradient using the ensemble approximation, the algorithm, termed Ensemble Langevin Monte Carlo, is unstable due to a high variance term. If the gradients are replaced by the ensemble approximations only in a constrained manner, to protect from the unstable points, the algorithm, termed Constrained Ensemble Langevin Monte Carlo, resembles the classical LMC up to an ensemble error but removes most of the gradient computation.

2022, 4(1): 71-122 doi: 10.3934/fods.2021036 +[Abstract](356) +[HTML](176) +[PDF](5811.84KB)
Abstract:

Given an undirected measurement graph \begin{document}$G = ([n], E)$\end{document}, the classical angular synchronization problem consists of recovering unknown angles \begin{document}$\theta_1, \dots, \theta_n$\end{document} from a collection of noisy pairwise measurements of the form \begin{document}$(\theta_i - \theta_j) \mod 2\pi$\end{document}, for each \begin{document}$\{i, j\} \in E$\end{document}. This problem arises in a variety of applications, including computer vision, time synchronization of distributed networks, and ranking from preference relationships. In this paper, we consider a generalization to the setting where there exist \begin{document}$k$\end{document} unknown groups of angles \begin{document}$\theta_{l, 1}, \dots, \theta_{l, n}$\end{document}, for \begin{document}$l = 1, \dots, k$\end{document}. For each \begin{document}${\left\{{{i, j}}\right\}} \in E$\end{document}, we are given noisy pairwise measurements of the form \begin{document}$\theta_{\ell, i} - \theta_{\ell, j}$\end{document} for an unknown \begin{document}$\ell \in \{1, 2, \ldots, k\}$\end{document}. This can be thought of as a natural extension of the angular synchronization problem to the heterogeneous setting of multiple groups of angles, where the measurement graph has an unknown edge-disjoint decomposition \begin{document}$G = G_1 \cup G_2 \ldots \cup G_k$\end{document}, where the \begin{document}$G_i$\end{document}'s denote the subgraphs of edges corresponding to each group. We propose a probabilistic generative model for this problem, along with a spectral algorithm for which we provide a detailed theoretical analysis in terms of robustness against both sampling sparsity and noise. The theoretical findings are complemented by a comprehensive set of numerical experiments, showcasing the efficacy of our algorithm under various parameter regimes. Finally, we consider an application of bi-synchronization to the graph realization problem, and provide along the way an iterative graph disentangling procedure that uncovers the subgraphs \begin{document}$G_i$\end{document}, \begin{document}$i = 1, \ldots, k$\end{document} which is of independent interest, as it is shown to improve the final recovery accuracy across all the experiments considered.

2022, 4(1): 123-136 doi: 10.3934/fods.2021037 +[Abstract](582) +[HTML](199) +[PDF](897.63KB)
Abstract:

Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that requires gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually require a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the \begin{document}$\texttt{R}$\end{document} package \begin{document}$\texttt{smartGrad}$\end{document} and in C\begin{document}$\texttt{++}$\end{document}.

2022, 4(1): 137-163 doi: 10.3934/fods.2022001 +[Abstract](453) +[HTML](184) +[PDF](3919.55KB)
Abstract:

This paper presents a new mathematical signal transform that is especially suitable for decoding information related to non-rigid signal displacements. We provide a measure theoretic framework to extend the existing Cumulative Distribution Transform [29] to arbitrary (signed) signals on \begin{document}$\overline {\mathbb{R}}$\end{document}. We present both forward (analysis) and inverse (synthesis) formulas for the transform, and describe several of its properties including translation, scaling, convexity, linear separability and others. Finally, we describe a metric in transform space, and demonstrate the application of the transform in classifying (detecting) signals under random displacements.