# American Institute of Mathematical Sciences

March  2020, 2(1): 55-80. doi: 10.3934/fods.2020004

## Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization

 1 CEREA, joint laboratory École des Ponts ParisTech and EDF R & D, Université Paris-Est, Champs-sur-Marne, France 2 Nansen Environmental and Remote Sensing Center, Bergen, Norway, and Sorbonne University, CNRS-IRD-MNHN, LOCEAN, Paris, France 3 Departement of Meteorology, University of Reading and NCEO, United-Kingdom, and Mathematical Institute, Utrecht University, The Netherlands 4 Nansen Environmental and Remote Sensing Center, Bergen, Norway

* Corresponding author: Marc Bocquet

Published  March 2020

The reconstruction from observations of high-dimensional chaotic dynamics such as geophysical flows is hampered by (ⅰ) the partial and noisy observations that can realistically be obtained, (ⅱ) the need to learn from long time series of data, and (ⅲ) the unstable nature of the dynamics. To achieve such inference from the observations over long time series, it has been suggested to combine data assimilation and machine learning in several ways. We show how to unify these approaches from a Bayesian perspective using expectation-maximization and coordinate descents. In doing so, the model, the state trajectory and model error statistics are estimated all together. Implementations and approximations of these methods are discussed. Finally, we numerically and successfully test the approach on two relevant low-order chaotic models with distinct identifiability.

Citation: Marc Bocquet, Julien Brajard, Alberto Carrassi, Laurent Bertino. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science, 2020, 2 (1) : 55-80. doi: 10.3934/fods.2020004
##### References:

show all references

##### References:
From top to bottom: representation of the flow rate $\boldsymbol \phi_ \mathbf{A}$ with a NN, integration of the flow rate into $\mathbf{f}_ \mathbf{A}$ using an explicit integration scheme (here a second-order Runge Kutta scheme), and ${N_\mathrm{c}}-$fold composition up to the full resolvent $\mathbf{F}_ \mathbf{A}$. $\delta t$ is the integration time step corresponding to the resolvent $\mathbf{f}_ \mathbf{A}$
On the left hand side: Properties of the surrogate model obtained from full but noisy observation of the L96 model in the nominal configuration ($L = 4$, $K = 5000$, $\sigma_y = 1$, ${N_\mathrm{y}} = {N_\mathrm{x}} = 40$). On the right hand side: Properties of the surrogate model obtained from full but noisy observation of the L05Ⅲ model in the nominal configuration ($L = 4$, $K = 5000$, $\sigma_y = 1$, ${N_\mathrm{y}} = {N_\mathrm{x}} = 36$). From top to bottom, are plotted the FS (NRMSE as a function of lead time in Lyapunov time), the LS (all exponents), and the PSD (in log-log-scale). A total of $10$ experiments have been performed for both configurations. The curves corresponding to each member are drawn with thin blue lines while the mean of each indicator over the ensemble are drawn in thick dashed orange line
but for several values of the training window length $K$. Each curve is the mean over $10$ experiments with different sets of observations. The LS and PSD of the reference models are also plotted for comparison">Figure 3.  Same as Figure 2 but for several values of the training window length $K$. Each curve is the mean over $10$ experiments with different sets of observations. The LS and PSD of the reference models are also plotted for comparison
On the left hand side: Properties of the surrogate model obtained from full but noisy observation of the L96 model in the nominal configuration ($L = 4$, $K = 5000$, ${N_\mathrm{y}} = {N_\mathrm{x}} = 40$ and with several $\sigma_y$). On the right hand side: Properties of the surrogate model obtained from full but noisy observation of the L05Ⅲ model in the nominal configuration ($L = 4$, $K = 5000$, $\sigma_y = 1$, ${N_\mathrm{y}} = {N_\mathrm{x}} = 36$ and with several $\sigma_y$). From top to bottom, are plotted the FS (NRMSE as a function of lead time in Lyapunov time) and the PSD (in log-log-scale), averaged over an ensemble of $10$ samples
On the left hand side: Properties of the surrogate model obtained from partial and noisy observation of the L96 model in the nominal configuration ($L = 4$, $K = 5000$, $\sigma_y = 1$, ${N_\mathrm{x}} = 40$) where ${N_\mathrm{y}}$ is varied. On the right hand side: Properties of the surrogate model obtained from partial and noisy observation of the L05Ⅲ model in the nominal configuration ($L = 4$, $K = 5000$, $\sigma_y = 1$, ${N_\mathrm{x}} = 36$) where ${N_\mathrm{y}}$ is varied. From top to bottom, are plotted the mean FS (NRMSE as a function of lead time in Lyapunov time), the mean LS (all exponents), and the mean PSD (in log-log-scale). A total of $10$ experiments have been performed for both configurations
Scalar indicators for nominal experiments based on L96 and L05Ⅲ. Key hyperparameters are recalled. The statistics of the indicators are obtained over $10$ samples
 Model ${N_\mathrm{y}}$ $\sigma_y$ $K$ $L$ $\pi_ \frac{1}{2}$ $\sigma_q$ $\lambda_1$ L96 $40$ $1$ $5000$ $4$ $4.56 \pm 0.06$ $0.08790 \pm 2\, 10^{-5}$ $1.66 \pm 0.02$ L05Ⅲ $36$ $1$ $5000$ $4$ $4.06 \pm 0.21$ $0.07720 \pm 2\, 10^{-5}$ $1.03 \pm 0.05$
 Model ${N_\mathrm{y}}$ $\sigma_y$ $K$ $L$ $\pi_ \frac{1}{2}$ $\sigma_q$ $\lambda_1$ L96 $40$ $1$ $5000$ $4$ $4.56 \pm 0.06$ $0.08790 \pm 2\, 10^{-5}$ $1.66 \pm 0.02$ L05Ⅲ $36$ $1$ $5000$ $4$ $4.06 \pm 0.21$ $0.07720 \pm 2\, 10^{-5}$ $1.03 \pm 0.05$
Scalar indicators for L96 and L05Ⅲ in their nominal configuration, using either the full or the approximate schemes. The statistics of the indicators are obtained over $10$ samples
 Model Scheme $\pi_ \frac{1}{2}$ $\sigma_q$ $\lambda_1$ L96 Approximate $4.56 \pm 0.06$ $0.08790 \pm 2\, 10^{-5}$ $1.66 \pm 0.02$ L96 Full $4.24 \pm 0.07$ $0.09152$ $1.66 \pm 0.02$ L05Ⅲ Approximate $4.06 \pm 0.21$ $0.07720 \pm 2\, 10^{-5}$ $1.03 \pm 0.05$ L05Ⅲ Full $3.97 \pm 0.17$ $0.08024$ $1.03 \pm 0.04$
 Model Scheme $\pi_ \frac{1}{2}$ $\sigma_q$ $\lambda_1$ L96 Approximate $4.56 \pm 0.06$ $0.08790 \pm 2\, 10^{-5}$ $1.66 \pm 0.02$ L96 Full $4.24 \pm 0.07$ $0.09152$ $1.66 \pm 0.02$ L05Ⅲ Approximate $4.06 \pm 0.21$ $0.07720 \pm 2\, 10^{-5}$ $1.03 \pm 0.05$ L05Ⅲ Full $3.97 \pm 0.17$ $0.08024$ $1.03 \pm 0.04$
 [1] Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 [2] Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031 [3] Andreas Chirstmann, Qiang Wu, Ding-Xuan Zhou. Preface to the special issue on analysis in machine learning and data science. Communications on Pure & Applied Analysis, 2020, 19 (8) : i-iii. doi: 10.3934/cpaa.2020171 [4] Ming Yan, Alex A. T. Bui, Jason Cong, Luminita A. Vese. General convergent expectation maximization (EM)-type algorithms for image reconstruction. Inverse Problems & Imaging, 2013, 7 (3) : 1007-1029. doi: 10.3934/ipi.2013.7.1007 [5] Stefano Galatolo. Global and local complexity in weakly chaotic dynamical systems. Discrete & Continuous Dynamical Systems, 2003, 9 (6) : 1607-1624. doi: 10.3934/dcds.2003.9.1607 [6] Michele La Rocca, Cira Perna. Designing neural networks for modeling biological data: A statistical perspective. Mathematical Biosciences & Engineering, 2014, 11 (2) : 331-342. doi: 10.3934/mbe.2014.11.331 [7] Alexandre J. Chorin, Fei Lu, Robert N. Miller, Matthias Morzfeld, Xuemin Tu. Sampling, feasibility, and priors in data assimilation. Discrete & Continuous Dynamical Systems, 2016, 36 (8) : 4227-4246. doi: 10.3934/dcds.2016.36.4227 [8] Jianping Zhou, Yamin Liu, Ju H. Park, Qingkai Kong, Zhen Wang. Fault-tolerant anti-synchronization control for chaotic switched neural networks with time delay and reaction diffusion. Discrete & Continuous Dynamical Systems - S, 2021, 14 (4) : 1569-1589. doi: 10.3934/dcdss.2020357 [9] Liqiang Zhu, Ying-Cheng Lai, Frank C. Hoppensteadt, Jiping He. Characterization of Neural Interaction During Learning and Adaptation from Spike-Train Data. Mathematical Biosciences & Engineering, 2005, 2 (1) : 1-23. doi: 10.3934/mbe.2005.2.1 [10] King Hann Lim, Hong Hui Tan, Hendra G. Harno. Approximate greatest descent in neural network optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 327-336. doi: 10.3934/naco.2018021 [11] Liu Hui, Lin Zhi, Waqas Ahmad. Network(graph) data research in the coordinate system. Mathematical Foundations of Computing, 2018, 1 (1) : 1-10. doi: 10.3934/mfc.2018001 [12] Quan Hai, Shutang Liu. Mean-square delay-distribution-dependent exponential synchronization of chaotic neural networks with mixed random time-varying delays and restricted disturbances. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 3097-3118. doi: 10.3934/dcdsb.2020221 [13] K. L. Mak, J. G. Peng, Z. B. Xu, K. F. C. Yiu. A novel neural network for associative memory via dynamical systems. Discrete & Continuous Dynamical Systems - B, 2006, 6 (3) : 573-590. doi: 10.3934/dcdsb.2006.6.573 [14] Débora A. F. Albanez, Maicon J. Benvenutti. Continuous data assimilation algorithm for simplified Bardina model. Evolution Equations & Control Theory, 2018, 7 (1) : 33-52. doi: 10.3934/eect.2018002 [15] Yingying Li, Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems & Imaging, 2009, 3 (3) : 487-503. doi: 10.3934/ipi.2009.3.487 [16] Leong-Kwan Li, Sally Shao, K. F. Cedric Yiu. Nonlinear dynamical system modeling via recurrent neural networks and a weighted state space search algorithm. Journal of Industrial & Management Optimization, 2011, 7 (2) : 385-400. doi: 10.3934/jimo.2011.7.385 [17] Ying Sue Huang. Resynchronization of delayed neural networks. Discrete & Continuous Dynamical Systems, 2001, 7 (2) : 397-401. doi: 10.3934/dcds.2001.7.397 [18] Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011 [19] Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics & Games, 2021, 8 (3) : 203-231. doi: 10.3934/jdg.2021008 [20] Shyan-Shiou Chen, Chih-Wen Shih. Asymptotic behaviors in a transiently chaotic neural network. Discrete & Continuous Dynamical Systems, 2004, 10 (3) : 805-826. doi: 10.3934/dcds.2004.10.805

Impact Factor:

## Tools

Article outline

Figures and Tables