# American Institute of Mathematical Sciences

eISSN:
2639-8001

All Issues

## Foundations of Data Science

June 2020 , Volume 2 , Issue 2

Select all articles

Export/Reference:

2020, 2(2): 83-99 doi: 10.3934/fods.2020006 +[Abstract](316) +[HTML](183) +[PDF](382.82KB)
Abstract:

This article studies how to form CUR decompositions of low-rank matrices via primarily random sampling, though deterministic methods due to previous works are illustrated as well. The primary problem is to determine when a column submatrix of a rank \begin{document}$k$\end{document} matrix also has rank \begin{document}$k$\end{document}. For random column sampling schemes, there is typically a tradeoff between the number of columns needed to be chosen and the complexity of determining the sampling probabilities. We discuss several sampling methods and their complexities as well as stability of the method under perturbations of both the probabilities and the underlying matrix. As an application, we give a high probability guarantee of the exact solution of the Subspace Clustering Problem via CUR decompositions when columns are sampled according to their Euclidean lengths.

2020, 2(2): 101-121 doi: 10.3934/fods.2020007 +[Abstract](248) +[HTML](163) +[PDF](12303.07KB)
Abstract:

Microscopy imaging of plant cells allows the elaborate analysis of sub-cellular motions of organelles. The large video data set can be efficiently analyzed by automated algorithms. We develop a novel, data-oriented algorithm, which can track organelle movements and reconstruct their trajectories on stacks of image data. Our method proceeds with three steps: (ⅰ) identification, (ⅱ) localization, and (ⅲ) linking. This method combines topological data analysis and Ensemble Kalman Filtering, and does not assume a specific motion model. Application of this method on simulated data sets shows an agreement with ground truth. We also successfully test our method on real microscopy data.

2020, 2(2): 123-154 doi: 10.3934/fods.2020008 +[Abstract](184) +[HTML](85) +[PDF](3078.01KB)
Abstract:

This paper describes a hierarchical learning strategy for generating sparse representations of multivariate datasets. The hierarchy arises from approximation spaces considered at successively finer scales. A detailed analysis of stability, convergence and behavior of error functionals associated with the approximations are presented, along with a well chosen set of applications. Results show the performance of the approach as a data reduction mechanism for both synthetic (univariate and multivariate) and a real dataset (geo-spatial). The sparse representation generated is shown to efficiently reconstruct data and minimize error in prediction. The approach is also shown to generalize well to unseen samples, extending its prospective application to statistical learning problems.

2020, 2(2): 155-172 doi: 10.3934/fods.2020009 +[Abstract](72) +[HTML](36) +[PDF](703.62KB)
Abstract:

This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.

2020, 2(2): 173-205 doi: 10.3934/fods.2020010 +[Abstract](59) +[HTML](76) +[PDF](20126.1KB)
Abstract:

The stochastic variational approach for geophysical fluid dynamics was introduced by Holm (Proc Roy Soc A, 2015) as a framework for deriving stochastic parameterisations for unresolved scales. This paper applies the variational stochastic parameterisation in a two-layer quasi-geostrophic model for a \begin{document}$\beta$\end{document}-plane channel flow configuration. We present a new method for estimating the stochastic forcing (used in the parameterisation) to approximate unresolved components using data from the high resolution deterministic simulation, and describe a procedure for computing physically-consistent initial conditions for the stochastic model. We also quantify uncertainty of coarse grid simulations relative to the fine grid ones in homogeneous (teamed with small-scale vortices) and heterogeneous (featuring horizontally elongated large-scale jets) flows, and analyse how the spread of stochastic solutions depends on different parameters of the model. The parameterisation is tested by comparing it with the true eddy-resolving solution that has reached some statistical equilibrium and the deterministic solution modelled on a low-resolution grid. The results show that the proposed parameterisation significantly depends on the resolution of the stochastic model and gives good ensemble performance for both homogeneous and heterogeneous flows, and the parameterisation lays solid foundations for data assimilation.