Foundations of Data Science
June 2022 , Volume 4 , Issue 2
Select all articles
We establish a new theory which unifies various aspects of topological approaches for data science, by being applicable both to point cloud data and to graph data, including networks beyond pairwise interactions. We generalize simplicial complexes and hypergraphs to super-hypergraphs and establish super-hypergraph homology as an extension of simplicial homology. Driven by applications, we also introduce super-persistent homology.
We study the deformation of the input space by a trained autoencoder via the Jacobians of the trained weight matrices. In doing so, we prove bounds for the mean squared errors for points in the input space, under assumptions regarding the orthogonality of the eigenvectors. We also show that the trace and the product of the eigenvalues of the Jacobian matrices is a good predictor of the mean squared errors on test points. This is a dataset independent means of testing an autoencoder's ability to generalize on new input. Namely, no knowledge of the dataset on which the network was trained is needed, only the parameters of the trained model.
We introduce a novel method for Additive Noise Analysis for Persistence Thresholding (ANAPT) which separates significant features in the sublevel set persistence diagram of a time series based on a statistics analysis of the persistence of a noise distribution. Specifically, we consider an additive noise model and leverage the statistical analysis to provide a noise cutoff or confidence interval in the persistence diagram for the observed time series. This analysis is done for several common noise models including Gaussian, uniform, exponential, and Rayleigh distributions. ANAPT is computationally efficient, does not require any signal pre-filtering, is widely applicable, and has open-source software available. We demonstrate the functionality of ANAPT with both numerically simulated examples and an experimental data set. Additionally, we provide an efficient
Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons
We consider the problem of static Bayesian inference for partially observed Lévy-process models. We develop a methodology which allows one to infer static parameters and some states of the process, without a bias from the time-discretization of the afore-mentioned Lévy process. The unbiased method is exceptionally amenable to parallel implementation and can be computationally efficient relative to competing approaches. We implement the method on S & P 500 log-return daily data and compare it to some Markov chain Monte Carlo (MCMC) algorithm.
Add your name and e-mail address to receive news of forthcoming issues of this journal:
[Back to Top]