Foundations of Data Science
December 2020 , Volume 2 , Issue 4
Select all articles
We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.
In computational fluid dynamics, there is an inevitable trade off between accuracy and computational cost. In this work, a novel multi-fidelity deep generative model is introduced for the surrogate modeling of high-fidelity turbulent flow fields given the solution of a computationally inexpensive but inaccurate low-fidelity solver. The resulting surrogate is able to generate physically accurate turbulent realizations at a computational cost magnitudes lower than that of a high-fidelity simulation. The deep generative model developed is a conditional invertible neural network, built with normalizing flows, with recurrent LSTM connections that allow for stable training of transient systems with high predictive accuracy. The model is trained with a variational loss that combines both data-driven and physics-constrained learning. This deep generative model is applied to non-trivial high Reynolds number flows governed by the Navier-Stokes equations including turbulent flow over a backwards facing step at different Reynolds numbers and turbulent wake behind an array of bluff bodies. For both of these examples, the model is able to generate unique yet physically accurate turbulent fluid flows conditioned on an inexpensive low-fidelity solution.
We study two methods for differentially private analysis of bounded data and extend these to nonnegative queries. We first recall that for the Laplace mechanism, boundary inflated truncation (BIT) applied to nonnegative queries and truncation both lead to strictly positive bias. We then consider a generalization of BIT using translated ramp functions. We explicitly characterise the optimal function in this class for worst case bias. We show that applying any square-integrable post-processing function to a Laplace mechanism leads to a strictly positive maximal absolute bias. A corresponding result is also shown for a generalisation of truncation, which we refer to as restriction. We also briefly consider an alternative approach based on multiplicative mechanisms for positive data and show that, without additional restrictions, these mechanisms can lead to infinite bias.
We study the problem of estimating linear response statistics under external perturbations using time series of unperturbed dynamics. Based on the fluctuation-dissipation theory, this problem is reformulated as an unsupervised learning task of estimating a density function. We consider a nonparametric density estimator formulated by the kernel embedding of distributions with "Mercer-type" kernels, constructed based on the classical orthogonal polynomials defined on non-compact domains. While the resulting representation is analogous to Polynomial Chaos Expansion (PCE), the connection to the reproducing kernel Hilbert space (RKHS) theory allows one to establish the uniform convergence of the estimator and to systematically address a practical question of identifying the PCE basis for a consistent estimation. We also provide practical conditions for the well-posedness of not only the estimator but also of the underlying response statistics. Finally, we provide a statistical error bound for the density estimation that accounts for the Monte-Carlo averaging over non-i.i.d time series and the biases due to a finite basis truncation. This error bound provides a means to understand the feasibility as well as limitation of the kernel embedding with Mercer-type kernels. Numerically, we verify the effectiveness of the estimator on two stochastic dynamics with known, yet, non-trivial equilibrium densities.
Many techniques for data science and uncertainty quantification demand efficient tools to handle Gaussian random fields, which are defined in terms of their mean functions and covariance operators. Recently, parameterized Gaussian random fields have gained increased attention, due to their higher degree of flexibility. However, especially if the random field is parameterized through its covariance operator, classical random field discretization techniques fail or become inefficient. In this work we introduce and analyze a new and certified algorithm for the low-rank approximation of a parameterized family of covariance operators which represents an extension of the adaptive cross approximation method for symmetric positive definite matrices. The algorithm relies on an affine linear expansion of the covariance operator with respect to the parameters, which needs to be computed in a preprocessing step using, e.g., the empirical interpolation method. We discuss and test our new approach for isotropic covariance kernels, such as Matérn kernels. The numerical results demonstrate the advantages of our approach in terms of computational time and confirm that the proposed algorithm provides the basis of a fast sampling procedure for parameter dependent Gaussian random fields.
Add your name and e-mail address to receive news of forthcoming issues of this journal:
[Back to Top]