Multilevel Ensemble Kalman Filtering based on a sample average of independent EnKF estimators

We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.

1. Introduction. We develop a new multilevel ensemble Kalman filter method (MLEnKF) for the setting of finite-dimensional state space and discrete-time partial observations polluted by additive Gaussian noise. Our method makes use of recent hierarchical variance-reduction techniques [20,14,18] to improve the asymptotic efficiency of weak approximations of filtering distributions compared to standard ensemble Kalman filtering (EnKF). We consider settings where solutions of nonlinear dynamics models must be approximated by numerical methods.
The herein introduced MLEnKF method consists of a hierarchy of independent samples of pairwise-coupled EnKF estimators where, in particular, the Kalman gains of every EnKF sample thus also are independent. Our method is fundamentally different from the original MLEnKF [22], which consists of a hierarchy of coupled ensembles on different resolution levels, all sharing one global "multilevel" Kalman gain.
The motivations for developing the new MLEnKF method are threefold. First, the method is closer to classic EnKF, and we therefore believe it will be easier to implement for practitioners. Second, imposing slightly stricter regularity assumptions, we can prove better asymptotic efficiency results for this method than for the original one. And third, creating a rigorous convergence theory for the new MLEnKF method is a stepping stone towards a multi-index Ensemble Kalman filtering (MIEnKF) method; See [19] for highly efficient approximations of McKean-Vlasov dynamics by the multi-index Monte Carlo method, and Appendix D for a sketch of the said extension to MIEnKF.
The main theoretical contributions of this work are Theorems 1 and 2, which respectively derive L p -convergence rates for weak approximations in the largeensemble and fine-numerical-resolution limits with EnKF. Theorem 2 is novel, and while Theorem 1 is similar to [22,Theorem 3.11]; but, to the best of our knowledge, this is the first fully proved convergence result for EnKF in the said limits (i.e., in both limits simultaneously). Estimates for EnKF's and MLEnKF's asymptotic computational cost versus accuracy for the respective methods are provided in Corollaries 1 and 2, respectively. From these estimates we conclude that MLEnKF asymptotically outperforms EnKF whenever Assumptions 1 and 2 hold.
1.1. Literature review. The EnKF method was first introduced in the seminal work [12], and due to its ease of use and impressive performance in high dimensions, it quickly became a popular method for weather prediction, ocean-atmosphere science, and oil reservoir simulations [29,26,1]. The standard version of EnKF with perturbed observations, which is the method we will study and extend to the multilevel setting in this work, first appeared in [25] and ensuing analysis [6] showed that adding artificial noise to the observations may be viewed as a consistency step for avoiding covariance deflation of the empirical filtering distribution. L p -convergence of the first and second sample moments in the large-ensemble limit for EnKF was first treated by a short argument in [37] for the linear model setting and the result was subsequently extended to a set of nonlinear filtering problems by a more technical argument in [36] (in the sense of deriving L p -convergence rates for weak approximations of sufficiently smooth quantities of interest).
The multilevel Monte Carlo (MLMC) method was introduced for efficient weak approximations of random fields in [20] and for stochastic differential equations in [14]. The MLEnKF method was developed for finite-dimensional settings in [22] and countable, infinite-dimensional settings in [7]. In [13] a multilevel hybrid EnKF method was developed for solving reservoir history matching problems. Multilevel particle filters using a multilevel-coupled resampling algorithm was introduced in [27]. The multilevel transform particle filter, applying optimal transportation mapping in the analysis step, was treated in [17,16]. In the context of Bayesian inverse problems, multilevel sequential Monte Carlo methods have been studied in [3,2,38,34] and a multilevel Markov Chain Monte Carlo method was developed in [10]. The multi-fidelity Monte Carlo method [39] is a recent and close kin of MLMC that differs from MLMC by having fixied its estimator's finest resolution level and by having more extensive coupling of samples than only pairwise. This often reduces the estimator's statistical error even more effectively than MLMC. A multi-fidelity Monte Carlo EnKF method was developed in [40].
Under sufficient regularity, EnKF converges to the so-called mean-field Kalman filter in the large-ensemble limit. In nonlinear-dynamics or observation settings, however, the mean-field Kalman filter is not equal to the Bayes filter [35,36,11,41]. Due to this discrepancy between EnKF and the Bayes filter even in the largeensemble limit, due to the large uncertainty in the estimation of the model error and due to the constraints imposed by challening high-dimensional problems and limited computational budgets, a considerable number of works have, instead of studying the large-ensemble limit, focused on the large-time and/or continuoustime limit of the fixed-ensemble-size EnKF cf. [30,46,42,43,5,9,33]. However, a recurring problem for fixed-ensemble-size EnKF is to determine how large the ensemble ought to be to equilibrate the model error with the other error contributions (statistical error and bias). And the large-ensemble limit convergence rates for EnKF and MLEnKF that are presented in this work may be helpful for determining the ensemble size of a finite-ensemble EnKF method.

1.2.
Organization of this work. Section 2 describes the problem, the MLEnKF method, and L p -convergence results for EnKF and MLEnKF towards the mean-field EnKF. Section 3 presents numerical performance studies of EnKF and MLEnKF for two different problems. Appendix A presents a brief overview of the meanfield EnKF method. Appendix B contains proofs of the main theoretical results, Appendix C describes an algorithm for obtaining pseudo-reference solutions for nonlinear filtering problems, and Appendix D sketches an extension from MLEnKF to multi-index EnKF.
2. Problem setting and main results.
2.1. Problem setting. Let (Ω, (F t ), F = F ∞ , P) denote a complete filtered probability space, and for any k ∈ N and p ≥ 1, let L p t (Ω, R k ) denote the space of F t \B k −measurable functions u : Ω → R k such that E [|u| p ] < ∞. Here, B k represents the Borel σ-algebra on R k and u is said to be F t \B k −measurable if and only if u −1 (B) ∈ F t for all B ∈ B k . For a given state-space dimension d ∈ N and initial data u 0 ∈ ∩ p≥2 L p 0 (Ω, R d ), we consider the discrete stochastic dynamics for n = 0, 1, . . . u n+1 (ω) = Ψ n (u n , ω), ω ∈ Ω, (1) for a sequence of mappings Ψ n : R d × Ω → R d . The dynamics is associated with the stochastic differential equation (SDE) with coefficients a : Wiener process. We further assume the coefficients a and b are sufficiently smooth so that (Ω, R d ), and we note, from now on suppressing the dependence on ω whenever confusion is not possible, that u n+1 may be expressed as the n-fold composition of Ψ with itself evaluated at u 0 : u n+1 = Ψ n • Ψ n−1 • · · · • Ψ 0 (u 0 ). Associated with the dynamics (1), there exists a series of noisy observations y n = Hu n + η n , n = 1, 2, . . . , where H ∈ R d O ×d and η 1 , η 2 , . . . is a sequence of independent and identically distributed (iid) random variables satisfying η 1 ∼ N (0, Γ) with positive definite covari- ) . with σ denoting the completion of σ.
The series of observations up to time k is denoted by For notational convenicence and since no observations have been made at time 0, we defined Y 0 = ∅ above. In consistency with the propery that Y 0 holds no information we write P u0|Y0 := P u0 . The Bayes filter is a sequential procedure for determining the (conditional) prediction P u k |Y k−1 and update P u k |Y k . Assuming that the densities of said distributions exist, the stochastic dynamics (1) and the conditional independence (u k |u k−1 ) ⊥ Y k−1 yield the proportionality and Bayesian inference implies that where the likelihood function is given by In other words, if the updated distribution P u k−1 |y k−1 is known and suitable regularity assumptions hold, then the prediction and updated distribution at the next time can be computed up to a proportionality constant (although this step may be computationally intractable). In settings where u 0 is a Gaussian random variable and the dynamics Ψ is linear with additive Gaussian noise, the Kalman filter [28] solves the above filtering problem exactly. When Ψ is nonlinear, however, it is often not possible to solve the filtering problem exactly and one must resort to approximation methods. EnKF is a nonlinear and ensemble-based filtering method that preserves that tends to be particularly efficient when the "true" dimension of a filtering problem is far smaller than the state-space dimension.
For linear-Gaussian filtering problems, the empirical measure of EnKF converges towards the exact (Bayes filter) density P u k |Y k in the large-ensemble limit [37]. More generally, for instance when Ψ is nonlinear, EnKF converges to the mean-field EnKF in the large-ensemble limit, cf. Section A. As a consequence of EnKF employing a Gaussian-like update of its particles, the mean-field EnKF will in many cases not be equal to the Bayes filter [35,11]. But, to the best of our knowledge, there does not exist any thorough scientific comparison of the mean-field EnKF and the Bayes Filter, and Figure 1 shows that for the nonlinear dynamics Ψ defined by the SDE du = −(u + π cos(πu/5)/5)dt + σdW (5) and (1), the dissipative/contractive properties of the associated Fokker-Planck equation can produce prediction densities for the respective filtering methods that are indistinguishable to the naked eye. Objective. For a given quantity of interest (QoI) ϕ : R d → R, our objective is to construct an efficient filtering method for computing whereμ n denotes the mean-field EnKF updated measure at time n, cf. Appendix A. In the best of worlds, one would rather seek a method computing the exact expectation of ϕ with respect to the Bayes filter posterior, i.e., E [ϕ(u n ) | Y n ], but since the said posterior P un|Yn is generally not attainable by EnKF-based filtering methods, and since both EnKF and MLEnKF converge weakly towards mean-field EnKF, cf. Theorems 1 and 2, we will in this work focus on the simpler (but still very challenging) goal (6). Notation 1.
• The notation f g implies that f g and g f .

356
HÅKON HOEL, GAUKHAR SHAIMERDENOVA AND RAÚL TEMPONE • For r, s ∈ N, |x| denotes the Euclidean norm of a vector x ∈ R s and for A ∈ R r×s , |A| 2 := sup |x|=1 |Ax|. • For F\B d -measurable functions u : Ω → R d and p ≥ 1, • For any κ ∈ N r 0 , with N 0 := N ∪ {0}, and any sufficiently smooth functions of the form f : R r → R and g : R r × Ω → R, the respective ∂ κ -partial derivatives are defined by for all x ∈ R r and P−almost all ω ∈ Ω, where |κ| 1 := r i=1 κ i . This extends to vector-valued mappings of the form f : R r → R s by component-wise partial derivatives: denotes the space of k-times continuously differentiable functions η : R r → R s for which η and all of its partial derivatives of order up to and including k is uniformly bounded.
• C k P (R r , R s ) denotes the space of k-times continuously differentiable functions whose partial derivatives of order up to and including k have polynomial growth.
• For a mapping η : R r → R s , its Jacobian is denoted Dη, and its Hessian is D 2 η. • R r [x 1 , . . . , x d ] denotes the set of polynomials in d variables that are of total degree smaller or equal to r ∈ N 0 and have coefficients in R.

2.2.
Numerical approximation of the stochastic dynamics. We assume that for every time n ≥ 0, there exists a collection of progressively more accurate numerical solvers {Ψ N n : R d × Ω → R d } N ∈N satisfying the following assumptions: The initial distribution u 0 ∈ ∩ p≥2 L p 0 (Ω, R d ) and for any n ∈ N 0 , (i) for any p ≥ 2, there exists a c p > 0 such that (ii) there exists an α > 0 and a set of for some u ∈ ∩ p≥2 L p n (Ω, R d ) and {u N } N ⊂ ∩ p≥2 L p n (Ω, R d ) and (observabledependent constant) c ϕ > 0, then there exists another (observable-dependent constant)c ϕ > 0 such that iii) for any p ≥ 2, there exists a c p > 0 such that (iv) there exists a c 3 > 0 such that the computational cost of the numerical solution Ψ N n is P-almost surely bounded by n denotes the Euler-Maruyama numerical solution of (2) using N uniform timesteps, then Assumption 1 holds with α = 1 and F = C 4 P (R d , R), cf. [15,Chapter 7] and [31,Thm 14.5.2]. Noting that the uniformlybounded constraint is only imposed on partial derivatives of the coefficients, and not the coefficients themselves, the stated convergence rates for Euler-Maruyama apply for instance to the nonlinear dynamics (5).

2.3.
EnKF. For a solver Ψ N with fixed resolution N ≥ 1 and a fixed ensemble size P ≥ 1, the EnKF method consists of an ensemble of P particles that are iteratively simulated forward and updated in a way that can be viewed as a nonlinear extension of Kalman filtering. The evolution of the EnKF ensemble over the times n = 0, 1, . . . can be described as follows: We denote the updated ensemble at time n byv N,P n,1:P := {v N,P n,i } P i=1 and the prediction ensemble at the same time by v N,P n,1: , consists of P independent and P u0|Y0 -distributed particles, where we assume the initial distribution can be sampled exactly. The empirical measure induced by the ensemblev N,P 0,1:P may be viewed as the EnKF approximation of P u0|Y0 . Given an updated ensemble at time n ≥ 0, the prediction ensemble at time n + 1 is computed by simulating each particle forward v N,P n+1,i = Ψ N n (v N,P n,i ) for i = 1, 2, . . . , P. Thereafter, the updated ensemble at time n + 1 is computed through assimilating the new measurement y n+1 through the Kalman-filter-like and particle-wise formulâ v N,P n+1,i = (I − K N,P n+1 H)v N,P n+1,i + K N,P n+1ỹ n+1,i for i = 1, 2, . . . , P. Here K N,P n+1 = C N,P n+1 H T (HC N,P n+1 H T + Γ) −1 denotes the Kalman gain, denotes the biased sample covariance of the ensemblev N,P n,1:P , y n+1,i = y n+1 + η n+1,i , for i = 1, 2, . . . , P are iid perturbed observations with η n+1,1 ∼ N (0, Γ) and η j,i ⊥ u k for all j ≥ 1, k ≥ 0. Perturbed observations were originally introduced in [6] to correct an unwanted covariance-deflation feature in the original formulation of EnKF [12].
We note that for all filtering methods considered in this work, the iteratively augmented observation sequence y 1 , y 2 , y 3 , . . . is assumed to be given, i.e., non-random. The sources of randomness in the EnKF filter are therefore the driving noise in the dynamics and the perturbed observations. For EnKF with dynamics resolution N , randomness thus enter through {Ψ N,P n } n and {η n,i } n,i . Observe further that since the particlesv N,P n,1 ,v N,P n,2 , . . . ,v N,P n,P are identically distributed, the distribution ofv N,P n will depend on the ensemble size P and the model resolution N . If Assumption 1 holds, then Theorem 1 implies thatv N,P n,1:P ⊂ ∩ p≥2 L p n (Ω, R d ) for any P ∈ N, N ∈ N, and n ∈ N 0 . This ensures the existence of the EnKF empirical measure with the Dirac measure δ(dv; y) := 1 if y ∈ dv 0 otherwise.
The expectation of a QoI ϕ : R d → R with respect to the a probability measure µ : Note that the expectation with respect to the EnKF empirical measure takes the form of a sample average and that µ N,P n [ϕ] in fact is a random variable since it is a function of the ensemble's P particlesv N,P n,i (where P obviously is a finite number). For the mean-field measurē µ n = µ ∞,∞ n , on the other hand,μ n [ϕ] is a deterministic value; this is a consequence of P = ∞. Here, L refers to the finest resolution level of the MLEnKF estimator, and we impose the following exponential-growth constraints on the resolution and ensemble-size sequences: P +1 = 2P and N 2 s for some s > 0.
Before writing the explicit form of the MLEnKF estimator, note that since our goal, the value we want to approximate, is deterministic, cf. Notation 2, the following equality holdsμ This motivates the approximation where the last equality holds due to the linearity of the expectation operator. In the expectations for ≥ 1 we seek to employ pairwise coupling between the fine-level estimator µ N ,P n [ϕ] and the coarse-level estimator µ N −1 ,P −1 n [ϕ] through associating the driving noise and perturbed observations for each particle on level uniquely to one particle on level − 1. As the number of particles on the neighboring levels are not equal (P = 2P −1 ), this is clearly not possible in the current form of expectations of differences, so we replace µ [ϕ] by the average of two identically distributed random variables µ which implies that the following equality holds By approximating the -th the expectation by a sample average using M iid pairwise-coupled samples 2M .
As for classic MLMC estimators, the degrees of freedom L and {M } L =0 ⊂ N are determined through a constrained optimization problem where one seeks to equilibrate the variance contribution from all of the sample averages and the bias error cf. Corollary 2. The purpose of pairwise-coupled particles is to reduce the variance of the MLEnKF estimator and thereby improve the efficiency of the method by reducing the number of samples {M } needed to control the approximation error.
To describe the coupling between µ N ,P n and (µ ), we first introduce the following notation for the particles of the three underlying EnKF ensembles: refers to the i-th updated particle at time n of the EnKF measure µ N −1 ,P −1 ,k n , for k = 1, 2. The superscript f refers to fine-level particles simulated with numerical resolution N whereas c refers to the coarse-level particles simulated with numerical resolution N −1 . The set of all the fine-and coarse-level particles are respectively denoted bŷ v ,f n,1: , with the EnKF ensemblev ,f n,1:P inducing the empirical measure µ ,f n := µ N ,P n . The set of all the coarse-level particles is a union of two EnKF ensembles with P −1 particles in each, namelyv ,c n,1:P =v ,c1 n,1:P −1 ∪v ,c2 n,1:P −1 , wherev ,c1 n,1: andv ,c2 n,1:P −1 := {v ,c n,i } P i=P −1 +1 . The EnKF ensemblev ,c k n,1:P induces the empirical measure µ ,c k n := µ N −1 ,P −1 ,k n for k = 1, 2. By defining µ ,c n := µ ,c1 n + µ ,c2 n /2 and imposing the convention the MLEnKF estimator (8) can be written . . , L is independent and identically P u0|Y0distributed and it is pairwise coupled to the coarse-level initial ensemble through settingv ,f 0,i =v ,c 0,i . For pairwise-coupled updated-state particlesv ,f n,i andv ,c n,i at time n, we denote by the next-time prediction state respectively by v ,f n+1,i and v ,c n+1,i . The prediction state is obtained by simulation of model dynamics on neighboring resolutions : The simulations are pairwise coupled through sharing the same driving noise W . Then the sample covariance matrices and Kalman gains for the respective ensembles are computed by where the sample covariances are defined in a similar fashion as in for EnKF: And the new observation y n+1 is assimilated into the filter in the update step: where {η n,i } n, ,i are iid N (0, Γ)-distributed random variables. We note that the pairwise coupling of fine-and coarse-level particles is obtained through them sharing the same initial condition, driving noise W in the dynamics (10), and perturbed observations. And to further elaborate on the relation between pairwise-coupled particles and pairwise-coupled EnKF estimators, the numerator in the ( , m)-th summand/sample of (9) takes the following form when represented in terms of its particles: The motivation for pairwise-coupled particles is primarily to reduce the variance of the above sample/summand numerators , and ultimately to improve the efficiency of your MLEnKF estimator. Provided the variance of these terms are substantially reduced, the number of samples {M } needed to achieve a certain accuracy with your MLEnKF estimator may be lowered substantially, and this will improve the efficiency of your MLEnKF estimator. See Figure 2 for an illustration of one prediction-update iteration of the full MLEnKF estimator. Green and pink ovals represent fine-and coarse-level prediction-state particles, respectively, sharing the same initial condition and driving noise ω and the respective squares represent fine-and coarse-level updated-state particles sharing the perturbed observartions. The MLEnKF estimator is obtained by iid copies of pairwise-coupled samples, cf (9).  [22], on the other hand, all the particles in the full estimator are updated using the one shared Kalman gain. Therefore, the particles in the original MLEnKF method are all correlated, while all particles from different ( , m)-samples in the herein-introduced estimator are independent. The more extensive independence properties of the new MLEnKF estimator simplifies the proofs of asymptotic convergence results and in some settings also improves theoretical complexity bounds. See Corollary 2 and Remark 6 for a comparison of the theoretical complexity of the two methods.

2.5.
Main results. We now present the main convergence and complexity results for EnKF and MLEnKF. The proofs for these results are provided in Appendix B. We will need the following assumptions: Assumption 2. For any exponentially-growing sequence {N } of natural numbers, there exists a constant c Ψ > 0 and β > 0 such that for all ∈ N 0 ∪ {∞}, n ∈ N 0 , and p ≥ 2, (i) for all |κ| 1 ≤ 1, Remark 3. If the dynamics (1) is given by the SDE in Remark 1 and Ψ N n denotes the corresponding Euler-Maruyama numerical solution, then Assumption 2 holds with β = 1, cf. [45,23,24,21].
Theorem 1 (Convergence of EnKF). If Assumption 1 holds, then for any ϕ ∈ F, p ≥ 2 and n ∈ N 0 , where the hidden constant in depends on p, n and d (but not on N and P ).
Corollary 1. If Assumption 1 holds, then N −1/α and P −2 ensure that for any ϕ ∈ F, p ≥ 2 and n ∈ N 0 . The resulting computational cost is

Remark 4.
The proof of Theorem 1 shows that Theorem 1 and Corollary 1 apply . Theorem 2 (Convergence of MLEnKF). If Assumptions 1 and 2 hold, then for any L ≥ 1 and triplet of sequences {M }, {N }, {P } ⊂ N with P 2 and exponentially-growing N , it holds for any ϕ ∈ F, n ≥ 0, and p ≥ 2 that Remark 5. In Appendix B, the Theorems 1 and 2 are for simplicity proved for EnKF and MLEnKF methods using biased sample covariances, precisely as introduced in (7) and (11), respectively. However, the proofs can easily be extended, without any changes in convergence rates, to methods that instead use unbiased sample covariances, cf. Remark 7.
Corollary 2. If Assumptions 1 and 2 hold, then for any , s > 0, the configuration L = ensures that for any ϕ ∈ F, n ≥ 0 and p ≥ 2, Moreover, if one sets the parameter s such that then the cost of the MLEnKF estimator is bounded by The optimization of the parameter s is stated in terms of inclusion sets in (16) as that may be useful for implementations. For further details on this optimization and the computational cost of the MLEnKF estimator, see Appendix B.3. The a priori cost-versus-accuracy results of Corollaries 1 and 2 show that in settings where both rates are sharp, MLEnKF will asymptotically have better complexity than EnKF.
where µ ML ORG n [ϕ] denotes the original MLEnKF estimator. The computational cost of achieving this accuracy is bounded by To achieve the same O( )-accuracy for the two MLEnKF methods, the theoretical bounds imply that for any n ≥ 2 and any admissible (α, β)-values with α ≥ 1, In other words, if our theoretical bounds are sharp for both methods, then the new MLEnKF method asymptotically outperforms the original one in terms of computational cost versus accuracy. We note however that the regularity assumptions imposed for the new method's hierarchy of numerical solvers, cf. Assumption 2, are more restrictive than those needed for the original method, and that we have previously made the conjecture that the error bounds for the original method are not sharp, cf. [22,Remark 3]. 3. Numerical examples. In this section, we numerically compare the performance of MLEnKF and EnKF and (numerically) verify the results predicted by our theory. We will consider a couple of test problems with dynamics of the form where the potential V ∈ C ∞ P (R) satisfies V ∈ C ∞ B (R), the diffusion coefficient is constant; σ = 0.5, and linear observations (3) with H = 1 and Γ = 0.1. For any N ≥ 1, Ψ N will in this section denote the Milstein numerical scheme with uniform timestep ∆t = 1/N . This yields the convergence rates α = 1, β = 2.
For any x ∈ R, let Round(x) denote the nearest integer to x. Given an input RMSE constraint > 0, we seek to equilibrate the magnitude of all error terms (cf. (13) for EnKF and Corollary 2 for MLEnKF) by setting the degrees of freedom to P = Round(8 −2 ) and N = Round( −1 ) for EnKF, and for MLEnKF we set L = Round(log 2 ( −1 )) − 1, N = 2 +1 , P = 10 × 2 , and Over a sequence of iterations n = 1, . . . , N , we compare the performance of the methods in terms of error versus computer runtime (wall-clock time), where the error is measured in time-averaged-root-mean-squared error (RMSE): where for a given > 0 with α = 1 and β = 2, the last approximate asymptotic inequality follows from Corollary 1 Similarly, Reference solutions and computer architecture. For the first test problem we will consider, the dynamics Ψ is linear. Therefore, we compute the exact reference solutionμ n [ϕ] straightforwardly using the Kalman filter. The second test problem we will consider has nonlinear dynamics, and we need to approximate the reference solution. A pseudo-reference solution is then obtained by the deterministic meanfield EnKF (DMFEnKF) algorithm described in [35], cf. Appendix C.
The numerical simulations were computed in parallel on five cores on a Mac Pro with Quad-Core Intel Xeon X-5550 8-core processor with 16 GB of RAM. The computer code is written in the Julia programming language [4], and it can be downloaded from https://github.com/GaukharSH/mlenkf.

3.1.
Ornstein-Uhlenbeck process. We consider the SDE (20) with V (u) = u 2 /2, and the initial condition is u 0 ∼ N (0, Γ). This is an Ornstein-Uhlenbeck (OU) process with the exact solution  N (0, Γ). This SDE is the metastable motion of particles between the two wells, cf. [8,44]. Figure 4 shows the dynamics of a particle over N = 20 observation times. We observe that the particle remains in either one of the wells for a relatively long time; in other words, transitions between wells happen quite rarely. Figure 5 illustrates the well transition of an EnKF ensemble over a few predict-update iterations when the observations and the majority of the ensemble's particles are located in opposite wells for the first few iterations. Using the sameinput sequences as in the preceding example, we study the performance of the three methods in terms of runtime versus RMSE over timeframes of N = 10 and N = 20 observation times in Figure 6. Once again, the work rates are in agreement with theory (Corollaries 1, 2 and Remark 6), that the new MLEnKF method performs similarly as the original MLEnKF method, and that both methods asymptotically outperform EnKF.  Appendix A. Mean-field EnKF. The large-ensemble-limit of EnKF is referred to as the mean-field EnKF (MFEnKF). In the mean-field limit, the EnKF Kalman gain becomes a deterministic matrix. Consequently, there is no mixing between ensemble particles and the particles become iid. To describe the dynamics of MFEnKF, it thus suffices to consider a single particle. For fixed dynamics resolution Ψ N , letv N n denote the updated state of a mean-field particle at time n. The MFEnKF prediction and update dynamics is given bȳ denotes the deterministic mean-field Kalman gain, T is the mean-field prediction covariance, and y n+1 = y n+1 +η n+1 denotes the perturbed observation with {η n } N n=1 being independent and N (0, Γ)distributed random variables satisfyingη j ⊥ u k for all j, k ≥ 0.
By Assumption 1 (i) with |κ| 1 = 0, and using that L p n (Ω, This ensures the existence of the mean-field measureμ N n , i.e., the distributionv N n ∼ µ N n satisfiesv N n ∈ ∩ p≥2 L p (Ω, R d ) for all N ∈ N ∪ {∞} and n ∈ N 0 . For evaluating the expectation of a QoI ϕ : R d → R wrt to the mean-field measureμ N n , we recall from Notation 2 thatμ For a subset of nonlinear dynamics Ψ with additive Gaussian noise, it has been shown that in the large-ensemble limit, EnKF with perturbed observations converges to the Kalman filtering distribution with the standard rate O(P −1/2 ), cf. [36]. That is, in L p for all QoI ϕ : R d → R with sufficient regularity n ∈ N 0 , and p ≥ 2 whereμ n :=μ ∞ n . The result was extended to a subset of fully non-Gaussian models [35] with numerical approximations of the dynamics in [22], i.e., µ N,P n [ϕ] →μ n [ϕ] as P, N → ∞ (see also Theorem 1).
Appendix B. Theoretical proofs.
B.1. EnKF. In this section, we present a collection of theoretical results for EnKF that culminates with proof of Theorem 1.
Notation. The sample average of a QoI ϕ ∈ F applied to an ensemble of P identically distributed particles {v N,P n,i } P i=1 is defined as Proof. The proof of the first inequality is analogous to [22,Lemma 3.4], and the latter inequality follows fromv N n ∈ ∩ p≥2 L p (Ω, R d ), uniformly in N ≥ 1, cf. (23).
We next bound the difference between C N,P Lemma 2 (Difference between EnKF and MFEnKF covariance matrices). If Assumption 1 holds, then Proof. In order to bound C N,P n −C N n p , we introduce the auxiliary mean-field ensemblev N,P n,1:P = {v N,P n,i } P i=1 consisting of P identically distributed particles whose prediction/update dynamics is given by the mean-field Kalman gain, namelȳ For the first term, recalling the notation (24), we have  The first term in the last equality is bounded by Lemma 2, and the bound for the second term follows fromv N,P n ∈ L p (Ω, R d ). In the proof of Theorem 1, the error contribution from the sample covariance, be that a biased or an unbiased one, only enters through Lemma 2. This implies that the theorem holds and the convergence rate is not affected when replacing biased with unbiased sample covariances in the EnKF method.
Moreover, in the proof of Theorem 2 for MLEnKF, the error contribution from sample covariances only enter through Corollary 3 and Lemma 8. By a similar argument as above, the rates of the said corollary and lemma are not affected by replacing biased sample covariances by unbiased ones and Theorem 2 will also holds with the same convergence rate. Proof. Sincev N,P 0 =v N 0 , the first half of the statement holds for n = 0. Assume that for some n ≥ 1, v N,P n−1 −v N n−1 p P −1/2 holds for all p ≥ 2. Assumption 1(iii) then implies that By Hölder's inequality and Lemma 1, Sincev N n ,ỹ n ∈ ∩ r≥2 L r (Ω), inequalities (29) and (28) The argument holds for any p ≥ 2, and the proof follows by induction.
Before proving the main convergence result for EnKF, we introduce the notation µ N,P n [ϕ] to denote the average of the QoI ϕ over the empirical measure associated with the auxiliary mean-field ensemble {v N n,i } P i=1 , cf. (25), and we recall thatμ N n [ϕ] denotes the evaluation of empirical measure on QoI ϕ associated with the MFEnKF v N n , cf. Section A.
Proof of Theorem 1. By the triangle inequality, By Assumption 1(ii), ϕ ∈ F ⊂ C 2 P (R d , R), and Lemma 3 implies that For the second term, using the Marcinkiewicz-Zygmund (M-Z) inequality, that ϕ(v N n ) ∈ ∩ p≥2 L p (Ω, R) and thatv N,P n,1 , . . . ,v N,P n,P are iid withv N,P n,i D =v N n imply that For the last term, we have that and it remains to prove by induction that the right-hand side is Assume that for some k ≥ 1, there exists an observable dependent constant c ϕ > 0 such that Then, Assumption 1(ii) implies there exists ac ϕ > 0 such that In withη k ∼ N (0, Γ) and introduce the functionsφ N ,φ ∈ F defined bỹ It then follows by the mean-value theorem that From [22,Lemma 3.4], we have that which, since Γ is positive definite and thus |( Since all monomials of degree 1 and 2 are contained in F, the last inequality follows from (31) and the equivalence of the Euclidean and Frobenius norms in any dimension d < ∞. It holds by induction that for any n ≥ 0, B.2. MLEnKF. In this section, we present a collection of theoretical results for MLEnKF, including the proof of Theorem 2.
In order to obtain a connection between {v ,f n,i } P i=1 and {v ,c The random variablev ,fj n has the same driving noise and perturbed observations asv ,cj n and we note that We further introduce the auxiliary mean-field MLEnKF ensemble denoted by {v ,f n,1:P ,v ,c n,1:P } L =0 where the dynamics forv ,f n,i andv ,c n,i are coupled through driving noise-sharing dynamics and perturbed-observation-sharing updateŝ whereK ,f n :=K N n andK ,c n :=K N −1 n , cf. Section A, and with initial conditions equal identical to MLEnKF:v The particles (v ,f n,i ,v ,c n,i ) are thus coupled through sharing the same initial condition, driving noise W , and perturbed observations. The particle pair is also coupled to the MLEnKF pair (v ,f n,i ,v ,c n,i ) in all the same ways. In other words, (v ,f n,i ,v ,c n,i ) is shadowing (v ,f n,i ,v ,c n,i ). The ensemblev ,f n,1:P =v ,f1 n,1:P −1 ∪v ,f2 n,1:P −1 induces an empirical measureμ ,f n = (μ ,f1 n +μ ,f2 n )/2 andv ,c n,1:P =v ,c1 n,1:P −1 ∪v ,c2 n,1:P −1 induces µ ,c n = (μ ,c1 n +μ ,c2 n )/2, with the convention thatμ 0,c n =μ 0,c1 n =μ 0,c2 n = 0 for all n ≥ 0. Employing sequences {M }, {N } ⊂ N and L ∈ N with exactly the same values as those used for the MLEnKF estimator (9) we seek to shadow, we define the auxiliary mean-field MLEnKF estimator bȳ By (9) and Jensen's inequality and Lemma 11 yield For the second term in (35), note first that By applying the M-Z inequality twice (first in M and thereafter in P ) and Lemma 6, For the third term in (35), it follows by (31) that

Lemma 4 (Continuity of mean-field Kalman gains). It holds that
Proof. The proof is analogous to [22,Lemma 3.4].
In the remaining part of this section, we will assume that Assumptions 1 and 2 hold, and that the sequences {N }, {P } ⊂ N satisfy the constraints given in Section 2.4 (namely, P = 2P −1 and {N } exponentially increasing).
Lemma 6 (Stability of mean-field particles). For any n ≥ 0 and p ≥ 2, it holds that for multilevel mean-field prediction and update particles defined as in (32) and (33).
Corollary 3 (Continuity of mean-field and EnKF covariance matrices). For any n ≥ 0 and p ≥ 2, it holds that for multilevel prediction particles defined as in (10) and (32).
, the result follows from Lemma 2.
Corollary 4 (Distance between ensembles II). For any n ≥ 0 and p ≥ 2, the following asymptotic inequality holds for multilevel update particles defined as in (12) and (33).
Proof. Sincev Proof. From the proof of [22,Lemma 3.4], one may deduce that The above equations imply that By the positive definiteness of Γ, Lemmas 4, 5 and 6, and corollaries 3 and 4, We next show how to bound the first term on the right hand side of (36).
Lemma 8 (Continuity of covariance matrix double differences). For any n ≥ 0 and p ≥ 2, the following asymptotic inequality holds: Proof. Let us first recall that and introduce the following covariance matrices for the auxiliary mean-field multilevel EnKF ensembleC Using that we obtain For the first term, Lemma 6 and Corollary 4 yield For the second term, the identity aa T + bb n ] T p =: I 121 + I 122 The term I 121 can be bounded in a similar fashion as I 11 , and to bound the second term, we employ the identity aa , Jensen's inequality and Corollary 4: Consequently, For the second term in (38), the equation Hölder's inequality, Lemma 6 and the M-Z inequality imply that n −v ,c2 n = 0 was used in the last inequality).
Lemma 9. For any n ≥ 0, p ≥ 2, denoting Jacobian of ϕ by Dϕ, it holds that Proof. Letỹ ,j n denote the perturbed observation associated with (v ,fj n , v ,cj n ). By the update equations (12) and (32), and (v ,fj n ,v ,cj n ), we obtain the representation v ,fj n −v ,cj n −v ,fj n +v ,cj By (39) and Hölder's inequality The properties ϕ ∈ F ⊂ C 2 P (R d , R) andv ,c n ∈ ∩ r≥2 L r (Ω, R d ) imply Dϕ(v ,cj n ) 2p < ∞, and Lemmas 4, 5 and 6, and Corollaries 3 and 4 yield that For the last term, we use K ,c1 n = (K ,c1 n +K ,c2 n )/2+(K ,c1 n −K ,c2 n )/2, the Frobenius scalar product ·, · F on R d×d O × R d×d O and the shorthand D n := Dϕ v ,c1 Here, the second last inequality follows from the M-Z inequality applied to the right argument in the scalar product (as it is a sample average of P −1 iid, mean-zero random matrices). The last inequality follows by Lemmas 7 and 8 and (37).

Lemma 11 shows that Theorem 2 is achieved through bounding
To obtain this bound, we introduce the following sequence of random d × d matrices: for ≥ 0, r, s ∈ {1, . . . , n} and j ∈ {1, 2}, if r > s. Moreover, for j = 1, 2, We note that |A ,j r,s | ∈ ∩ q≥2 L q (Ω) for all index values. Corollary 5. For any n ≥ 2, k ≤ n − 1 and p ≥ 2 it holds that Sketch of proof. Proceeding as in the proof of Lemma 9, we obtain three terms that respectively are similar to J 1, , J 2, and J 3, in (40), but now with the prefactor . The proof is obtained through the equality Corollary 6. For any n ≥ 1, k ≤ s ≤ n − 1 and p ≥ 2 it holds that Sketch of proof. The statement of this corollary is similar to Corollary 5, but with . As the latter factor also is bounded (it is an element of ∩r ≥2 L r (Ω, R d×d )), the proof follows by a similar argument as in Corollary 5. However, since the prefactor in this case is a d × d matrix rather than a vector, the term in (41) that corresponds to J 3, in the proof of Lemma 9 with the shorthand G j k :

HÅKON HOEL, GAUKHAR SHAIMERDENOVA AND RAÚL TEMPONE
Here A ,j k+1,s DΨ N k (v ,cj k ) i denotes the i-th row vector of the matrix. Hence, we have d terms to bound that are similar to the J 3, -term in Corollary 5.
Corollary 7. For any n ≥ 1, k ≤ s ≤ n − 1, r ≤ n and p ≥ 2 it holds that Sketch of proof. The statement of this corollary is similar to Corollary 5, but here with the additional 1 × d postfactor (v ,f r ) T . The only technicality this introduces is in bounding the term in (42) that corresponds to J 3, in the proof of Lemma 9. That is, Here (v ,fj r ) T i denotes the i-th component of the row vector. We thus have d terms to bound which are similar to the J 3, -term in Corollary 5.
Lemma 10. For any n ≥ 1, k ≤ s ≤ n − 1, r ≤ n and p ≥ 2 it holds that and Proof. The mean-value theorem yields the expansion where for The expansion, Assumption 2 and Corollary 5 yield (43). And, similarly, Corollaries 6 and 7 respectively yield (44) and (45).

This yields
Cost(s) Appendix C. DMFEnKF algorithm. This section describes the algorithm for the density-based deterministic approximation of the MFEnKF, which iteratively computes the prediction density ρv n and updated density ρv n for n = 1, 2, . . . . Each iteration cycle consists of two steps: one transition from ρv n to ρv n+1 governed by the Fokker-Planck equation (FPE) and another from ρv n+1 to ρv n+1 via an affine transformation and a convolution with a Gaussian density. For simplicity, we show the algorithm for the one-dimensional state-space case, i.e, d = 1.
Let S t ρ denote a solution at time t of the FPE with the initial condition p(·, 0) = ρ. Note that for the mean-field dynamics (22) with N = ∞ that satisfies the SDE (20), it holds that Ψ(v n ) ∼ S 1 ρv n for any n ≥ 0. Since the updated mean-field ensembles can be viewed as the sum of the following independent random variableŝ v n = (I −K n H)v n +K n y n X +K nηn Y , the updated density can be written where * denotes the convolution operator. Algorithm 1: DMFEnKF Input: The initial updated density ρv 0 = ρ u0|Y0 , the number of time steps N t , the number of spatial steps N x , the discretization interval [x 0 , x 1 ], the simulation length N . Output: The prediction and updated density, ρv n and ρv n , respectively. 1 ∆t = 1 Nt , ∆x = x1−x0 Nx . 2 for n=1 : N do 3 Compute the prediction density ρv n (x) = S 1 ρv n−1 by Crank-Nicolson numerical method with the discretization steps (∆t, ∆x). 4 Compute the prediction covarianceC n = x 2 ρv n (x)dx − ( xρv n (x)dx) 2 using a quadrature rule. 5 Compute the Kalman gainK n =C n H T (HC n H T + Γ) −1 .

6
Compute the updated density ρv n = ρ X * ρ Y by discrete convolution of the two functions represented on the spatial mesh.
Remark 8. The discretization interval [x 0 , x 1 ] must be chosen such that the truncation error pertaining to the integral on the complement of said interval is close to zero, i.e.,