# American Institute of Mathematical Sciences

eISSN:
2639-8001

All Issues

## Foundations of Data Science

September 2021 , Volume 3 , Issue 3

Special issue on Data Assimilation

Managing Guest Editor: Christopher Jones1
Guest Editors: Marc Bocquet2, Jana de Wiljes3, John Harlim4, Matthias Morzfeld5, Elaine Spiller6, Xin T. Tong7

1 RENCI, University of North Carolina at Chapel Hill, USA
2 CEREA, École des Ponts and EDF R&D, Île-de-France, France
3 Institute for Mathematics, University of Potsdam, Germany
4 Department of Mathematics, Department of Meteorology and Atmospheric Science, Institute for Computational and Data Sciences, The Pennsylvania State University, USA
5 Cecil H. and Ida M. Green, Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, USA
6 Mathematical and Statistical Sciences, Marquette University, USA
7 National University of Singapore, Singapore

Select all articles

Export/Reference:

2021, 3(3): 305-330 doi: 10.3934/fods.2020015 +[Abstract](2532) +[HTML](930) +[PDF](653.35KB)
Abstract:

The reconstruction of the dynamics of an observed physical system as a surrogate model has been brought to the fore by recent advances in machine learning. To deal with partial and noisy observations in that endeavor, machine learning representations of the surrogate model can be used within a Bayesian data assimilation framework. However, these approaches require to consider long time series of observational data, meant to be assimilated all together. This paper investigates the possibility to learn both the dynamics and the state online, i.e. to update their estimates at any time, in particular when new observations are acquired. The estimation is based on the ensemble Kalman filter (EnKF) family of algorithms using a rather simple representation for the surrogate model and state augmentation. We consider the implication of learning dynamics online through (ⅰ) a global EnKF, (ⅰ) a local EnKF and (ⅲ) an iterative EnKF and we discuss in each case issues and algorithmic solutions. We then demonstrate numerically the efficiency and assess the accuracy of these methods using one-dimensional, one-scale and two-scale chaotic Lorenz models.

2021, 3(3): 331-369 doi: 10.3934/fods.2021011 +[Abstract](1332) +[HTML](517) +[PDF](1442.88KB)
Abstract:

This paper provides a unified perspective of iterative ensemble Kalman methods, a family of derivative-free algorithms for parameter reconstruction and other related tasks. We identify, compare and develop three subfamilies of ensemble methods that differ in the objective they seek to minimize and the derivative-based optimization scheme they approximate through the ensemble. Our work emphasizes two principles for the derivation and analysis of iterative ensemble Kalman methods: statistical linearization and continuum limits. Following these guiding principles, we introduce new iterative ensemble Kalman methods that show promising numerical performance in Bayesian inverse problems, data assimilation and machine learning tasks.

2021, 3(3): 371-411 doi: 10.3934/fods.2020018 +[Abstract](2228) +[HTML](747) +[PDF](1519.67KB)
Abstract:

Ensemble Kalman Inversion (EnKI) [23] and Ensemble Square Root Filter (EnSRF) [36] are popular sampling methods for obtaining a target posterior distribution. They can be seem as one step (the analysis step) in the data assimilation method Ensemble Kalman Filter [17,3]. Despite their popularity, they are, however, not unbiased when the forward map is nonlinear [12,16,25]. Important Sampling (IS), on the other hand, obtains the unbiased sampling at the expense of large variance of weights, leading to slow convergence of high moments.

We propose WEnKI and WEnSRF, the weighted versions of EnKI and EnSRF in this paper. It follows the same gradient flow as that of EnKI/EnSRF with weight corrections. Compared to the classical methods, the new methods are unbiased, and compared with IS, the method has bounded weight variance. Both properties will be proved rigorously in this paper. We further discuss the stability of the underlying Fokker-Planck equation. This partially explains why EnKI, despite being inconsistent, performs well occasionally in nonlinear settings. Numerical evidence will be demonstrated at the end.

2021, 3(3): 413-477 doi: 10.3934/fods.2021001 +[Abstract](3663) +[HTML](939) +[PDF](19933.3KB)
Abstract:

This work demonstrates the efficiency of using iterative ensemble smoothers to estimate the parameters of an SEIR model. We have extended a standard SEIR model with age-classes and compartments of sick, hospitalized, and dead. The data conditioned on are the daily numbers of accumulated deaths and the number of hospitalized. Also, it is possible to condition the model on the number of cases obtained from testing. We start from a wide prior distribution for the model parameters; then, the ensemble conditioning leads to a posterior ensemble of estimated parameters yielding model predictions in close agreement with the observations. The updated ensemble of model simulations has predictive capabilities and include uncertainty estimates. In particular, we estimate the effective reproductive number as a function of time, and we can assess the impact of different intervention measures. By starting from the updated set of model parameters, we can make accurate short-term predictions of the epidemic development assuming knowledge of the future effective reproductive number. Also, the model system allows for the computation of long-term scenarios of the epidemic under different assumptions. We have applied the model system on data sets from several countries, i.e., the four European countries Norway, England, The Netherlands, and France; the province of Quebec in Canada; the South American countries Argentina and Brazil; and the four US states Alabama, North Carolina, California, and New York. These countries and states all have vastly different developments of the epidemic, and we could accurately model the SARS-CoV-2 outbreak in all of them. We realize that more complex models, e.g., with regional compartments, may be desirable, and we suggest that the approach used here should be applicable also for these models.

2021, 3(3): 479-541 doi: 10.3934/fods.2021022 +[Abstract](900) +[HTML](324) +[PDF](22142.91KB)
Abstract:

The disparity in the impact of COVID-19 on minority populations in the United States has been well established in the available data on deaths, case counts, and adverse outcomes. However, critical metrics used by public health officials and epidemiologists, such as a time dependent viral reproductive number (\begin{document}$R_t$\end{document}), can be hard to calculate from this data especially for individual populations. Furthermore, disparities in the availability of testing, record keeping infrastructure, or government funding in disadvantaged populations can produce incomplete data sets. In this work, we apply ensemble data assimilation techniques which optimally combine model and data to produce a more complete data set providing better estimates of the critical metrics used by public health officials and epidemiologists. We employ a multi-population SEIR (Susceptible, Exposed, Infected and Recovered) model with a time dependent reproductive number and age stratified contact rate matrix for each population. We assimilate the daily death data for populations separated by ethnic/racial groupings using a technique called Ensemble Smoothing with Multiple Data Assimilation (ESMDA) to estimate model parameters and produce an \begin{document}$R_t(n)$\end{document} for the \begin{document}$n^{th}$\end{document} population. We do this with three distinct approaches, (1) using the same contact matrices and prior \begin{document}$R_t(n)$\end{document} for each population, (2) assigning contact matrices with increased contact rates for working age and older adults to populations experiencing disparity and (3) as in (2) but with a time-continuous update to \begin{document}$R_t(n)$\end{document}. We make a study of 9 U.S. states and the District of Columbia providing a complete time series of the pandemic in each and, in some cases, identifying disparities not otherwise evident in the aggregate statistics.

2021, 3(3): 543-561 doi: 10.3934/fods.2021018 +[Abstract](811) +[HTML](375) +[PDF](410.59KB)
Abstract:

The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number (\begin{document}$M$\end{document}) of non-interacting agents (targets) with a large number (\begin{document}$M$\end{document}) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-\begin{document}$M$\end{document} limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with \begin{document}$M = 1$\end{document}) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large \begin{document}$M$\end{document}.

2021, 3(3): 563-588 doi: 10.3934/fods.2021003 +[Abstract](1551) +[HTML](686) +[PDF](388.38KB)
Abstract:

Consider the class of Ensemble Square Root filtering algorithms for the numerical approximation of the posterior distribution of nonlinear Markovian signals, partially observed with linear observations corrupted with independent measurement noise. We analyze the asymptotic behavior of these algorithms in the large ensemble limit both in discrete and continuous time. We identify limiting mean-field processes on the level of the ensemble members, prove corresponding propagation of chaos results and derive associated convergence rates in terms of the ensemble size. In continuous time we also identify the stochastic partial differential equation driving the distribution of the mean-field process and perform a comparison with the Kushner-Stratonovich equation.

2021, 3(3): 589-614 doi: 10.3934/fods.2021019 +[Abstract](826) +[HTML](389) +[PDF](5969.81KB)
Abstract:

Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.

2021, 3(3): 615-645 doi: 10.3934/fods.2021023 +[Abstract](961) +[HTML](435) +[PDF](474.25KB)
Abstract:

Control-type particle filters have been receiving increasing attention over the last decade as a means of obtaining sample based approximations to the sequential Bayesian filtering problem in the nonlinear setting. Here we analyse one such type, namely the feedback particle filter and a recently proposed approximation of the associated gain function based on diffusion maps. The key purpose is to provide analytic insights on the form of the approximate gain, which are of interest in their own right. These are then used to establish a roadmap to obtaining well-posedness and convergence of the finite \begin{document}$N$\end{document} system to its mean field limit. A number of possible future research directions are also discussed.

2021, 3(3): 647-675 doi: 10.3934/fods.2021025 +[Abstract](780) +[HTML](323) +[PDF](1015.46KB)
Abstract:

This papers shows that nonlinear filter in the case of deterministic dynamics is stable with respect to the initial conditions under the conditions that observations are sufficiently rich, both in the context of continuous and discrete time filters. Earlier works on the stability of the nonlinear filters are in the context of stochastic dynamics and assume conditions like compact state space or time independent observation model, whereas we prove filter stability for deterministic dynamics with more general assumptions on the state space and observation process. We give several examples of systems that satisfy these assumptions. We also show that the asymptotic structure of the filtering distribution is related to the dynamical properties of the signal.