On the Incorporation of Box-Constraints for Ensemble Kalman Inversion

The Bayesian approach to inverse problems is widely used in practice to infer unknown parameters from noisy observations. In this framework, the ensemble Kalman inversion has been successfully applied for the quantification of uncertainties in various areas of applications. In recent years, a complete analysis of the method has been developed for linear inverse problems adopting an optimization viewpoint. However, many applications require the incorporation of additional constraints on the parameters, e.g. arising due to physical constraints. We propose a new variant of the ensemble Kalman inversion to include box constraints on the unknown parameters motivated by the theory of projected preconditioned gradient flows. Based on the continuous time limit of the constrained ensemble Kalman inversion, we discuss a complete convergence analysis for linear forward problems. We adopt techniques from filtering which are crucial in order to improve the performance and establish a correct descent, such as variance inflation. These benefits are highlighted through a number of numerical examples on various inverse problems based on partial differential equations.


Introduction
Inverse problems aim to recover a quantity of interest from perturbed noisy measurements. Traditionally inverse problems were solved by approximating in a least-squares manner, where the solution would correspond to the minimizer of a function [14]. Another more recent approach for solving inverse problems is to take a statistical approach where the quantity of interest is a probabilistic distribution constructed via Bayes' Theorem, known as the Bayesian approach, see e.g. [19,35]. The advantage of quantifying inverse problems through the latter is that it aids in quantifying uncertainty through statistical properties. Common algorithms designed for Bayesian inversion include sampling based methods such as Markov chain Monte Carlo (MCMC) methods but also variational Bayes' methods. One recent method aimed at solving Bayesian inverse problems through data assimilation methodologies [24,29] is ensemble Kalman inversion (EKI).
EKI can be viewed as the application of the ensemble Kalman filter (EnKF) [16,15,21,25] towards inverse problems. It was first proposed by Iglesias et al. [18,26] as a motivation for derivative-free inverse problems, offering a cheaper approximation of the solution compared to traditional methods. EKI provides a unique perspective as despite aimed at Bayesian inversion, its methodology is not based on sampling like MCMC, but instead a variational manner. Since its formulation a number of research directions have been considered such as applications, building theory and applying uncertainty quantification techniques [8,9,11,17,31].
However, the basic version of the EKI does not allow to incorporate additional constraints on the parameters, which often arise in many applications due to additional knowledge on the system. We will focus in the following on the efficient incorporation of box constraints. An example of this type of constrains includes the case of hierarchical EKI [11] where the hyperparameters are often defined through a uniform distribution. It is well known that the EKI may lead to estimates of the unknown parameters, which are unfeasible, i.e. the estimates do not satisfy the box constraints imposed by the uniform distribution on the hierarchical parameters. As a result there is a strong motivation to study the incorporation of constraints for the EKI.
Introducing constrained optimization for the EnKF has been a challenge, which has drawn increasingly more attention in recent years. A literature overview of existing methods for the treatment of linear and nonlinear constraints for Kalmanbased methods can be found in [1,34]. The projection of the estimates to the feasible set is a very common approach, see e.g. [20,37], which can be generalized to nonlinear constraints by linearization ideas. Most of the variants are motivated by interpreting the Kalman-based updates as a solution of a corresponding optimization problem, see [1] for more details. This viewpoint allows to straightforwardly include constraints on the parameters and states. In [13], the authors suggest a new approach to handle linear equality and inequality constraints for the EnKF and EKI by reparameterizing the solution of the optimization problem in the range of the covariance, i.e. by seeking the solution of the optimization problem in a subspace defined by the initial ensemble. In this work, we will adopt the optimization viewpoint, i.e. we will view the EKI as a derivative-free optimizer for the least-squares misfit functional (cp. [31]), and motivate the incorporation of additional constraints on the unknown parameters via projection-based methods to the feasible set. To illustrate the idea, we focus in the following on the incorporation of box-constraints, i.e. we will introduce a variant of the EKI ensuring that the estimate of the unknown parameter remains within a box of the parameter domain. Our work will build on the theory by Bertsekas [6,7] and others [32,33] for projected preconditioned gradient methods. For inverse problems, box-constrained optimization has been applied but in a different context and for different applications, we refer the reader to [28].
We will show that a simple projection of the EKI estimate to the feasible set will not necessarily lead to a convergent scheme. To accommodate this we propose using techniques from data assimilation such as variance inflation [2,3,22] to ensure that one can attain the correct descent direction. From this our analysis will consists of an existence and uniqueness result, a quantification of the ensemble collapse and the convergence of the residuals. We aim to compare this modified projected method to the simple projection of its continuos-time limit. The limit leads to an ordinary differential equation with discontinuous right hand side. The presented theory is based on a smoothened version of the limit, borrowing ideas from barrier methods.
In addition to the EKI based on perturbed observations, we will also consider the ensemble square root filter (ESRF) applied to inverse problems. The ESRF [23,27,36] is a modification of the EnKF, but with the key difference of being deterministic as there is no inclusion of the perturbed observations. We emphasize our work will be primarily focused on the EnKF for inverse problems. We will make note of this throughout the paper whether various results can be generalized to include the ESRF. In order to justify the proposed projected EKI, we provide numerical examples demonstrating the improvements, both for the original projected EKI but also the modified projected EKI with variance inflation.
The main contributions of our work can be summarized as follows: • We propose a new variant of the EKI and the ESRF for inverse problems which allows to incorporate box constraints on the unknown parameters. The modification of the original EKI is motivated by viewing the algorithm as a preconditioned gradient-like flow. The scheme introduces a tailored variance inflation to ensure descent directions with respect to the misfit functional in each iteration. • We derive a complete convergence analysis for the EKI with perturbed observations and ESRF for inverse problems based on the continuous time limit of the suggested variant in the case of a linear forward problem. • The validity of the results obtained are highlighted through numerical examples. All examples will be based on partial differential equations (PDEs) with linear and nonlinear forward operators.
1.1. Outline. The article is structured as follows: in Section 2 we provide an introduction to box-constrained optimization. We present this in a general sense before discussing its application to EKI, which we also review. Section 3 is dedicated to the derivation of the projected continuous time limit and gradient flow structure. Convergence and accuracy results will be presented in the linear setting, where we also introduce the notion of variance inflation. Section 4 is devoted to numerical experiments on PDE-constrained inverse problems. The purpose of this section is to verify the theoretical findings on both linear and non-linear problems. Finally in Section 5 we conclude with a number of remarks while providing avenues of future work.

Background on EKI and box-constrained optimization
The goal of computation for this work is to infer the unknown parameters u ∈ X from noisy data y ∈ R K of the form where G : X → R K is our parameter-to-observation mapping and K ∈ N denotes the number of observations. For simplicity, we will assume that the parameter space is finite dimensional, i.e. X = R n . Most of the results presented in the following sections can be straightforwardly generalized to the infinite-dimensional setting under suitable assumptions on the forward operator G. To avoid the technicalities arising from the analysis of infinite-dimensional differential equations and optimization problems, we will work under the assumption that the parameter space is finite dimensional (possibly after applying a finite-dimensional approximation of the unknown quantity u).
2.1. Continuous time limit of the EKI. We briefly describe the EKI algorithm below and refer to [16,18,31] for more details on the derivation of the method. EKI works by updating an initial ensemble of J particles {u through a two-step procedure. The EKI estimation of (2.1) is given by the usual EnKF update formula, also known as the analysis step where we have the inclusion of perturbed observations with t ∈ (0, T ], j ∈ {1, . . . , J} and initial condition Here ·, · Γ := ·, Γ −1 · denotes the weighted Euclidean inner product in R K . Note that the limit is derived by neglecting the perturbations of the observations in each iteration, see [31] for more details. The analysis of the stochastic differential equation corresponding to the limit of (2.2) including the perturbed observations will be subject to future work.
In the case of a linear forward problem, the limit can be equivalently viewed as a preconditioned gradient flow with · Γ := Γ −1 · denoting the weighted Euclidean norm in R K , misfit functional Φ(u (j) ; y) = 1 2 y − G(u (j) ) 2 Γ and empirical covariance C(u). 2.2. Continuous time limit of the ESRF for inverse problems. Using a deterministic transformation of the ensemble satisfying the Kalman equations leads to the ESRF for inverse problems, see [30]. Based on the results from Reich et al. [4,5], the continuous time limit of the ESRF for inverse problems is given by with t ∈ (0, T ], j ∈ {1, . . . , J} and initial condition and its gradient flow structure in the linear setting.
Both variants of the ensemble Kalman inversion satisfy the well-known subspace property, which states that the estimate will be a linear combination of the inital ensemble members.
Lemma 2.1. [31] Let A be the linear span of the initial ensemble {u ∈ A for all n ∈ N and t ∈ [0, T ] satisfying (2.4) and (2.7), respectively.
The preconditioned gradient flow structure viewpoint opens up the perspective to include additional constraints such as box constraints or more general, nonlinear constraints. The remaining section is devoted to the introduction of the box-constrained optimization. We will focus in the following on the linear case, i.e., assuming that G(·) = A· with A ∈ L(X , R K ) to illustrate the basic ideas. Then, the optimization problem consists of a linear least-squares problem, i.e. Φ(u; y) = 1 2 y − Au 2 Γ . We will modify the differential equations (2.4) and (2.7), respectively, to include the constraints using the ideas introduced below. We will see that we can interpret the suggested modifications as a variance inflation technique. In particular, this viewpoint implies that the subspace property will not hold anymore for the modified versions of the EKI.

2.3.
The linear box-constrained optimization problem. Motivated by the optimization perspective on the EKI, we consider the following optimization problem: The objective function consists of the least-squares function Φ : We further define the set of the constraints by for c j ∈ R n , δ j ∈ R, j = 1, . . . , m. We set h j (u) = c j , u + δ j , i.e. the feasible set Ω is given by Ω = {u ∈ X : h j (u) ≤ 0, j = 1, . . . , m}. The constrained optimization problem is then given by Note that the optimization problem (2.9) is convex, which implies that necessary optimality conditions are also sufficient [7]. Let u * be a Karush Kuhn Tucker (KKT)-point for (2.9), i.e. there exists λ * ∈ R m such that then u * is a global minimizer of (2.9). Box constraints are a special instance of the feasible set Ω, i.e. c j = ±e j , where e j is the j-th unit vector, and δ j are the lower or upper bounds on u j .

Remark 2.2.
In case A Γ −1 A is symmetric and positive definite, the optimization problem (2.9) is strictly convex. In particular, there exists at most one stationary point u * which is the global minimum for the problem (2.9) (cp. [7], Proposition 2.1.1). However, please note that the assumption on the regularity of A Γ −1 A is in general not satisfied for inverse problems due to the ill-conditioning of the problem.
2.3.1. The projected gradient method for box-constraints. We will shortly discuss the projected gradient method to numerically solve (2.9) and refer the reader e.g. to the work of Bertsekas [7] for a more detailed description of the method (in the discrete time setting).
. . , m}, m ≤ n, denote a box. We can define a general projection as P B : R n → B mapping the space of the ensemble to specific box. Since we will fix the box B for the following part, we will write P instead of P B . We define the projection P componentwise as The projected gradient method with step size α k > 0 is based on the iteration One can derive a continuous time by considering α k going to zero. By using directional derivatives we obtain (2.11) More details can be found for the continuous time limit of the projected EKI in Section 3.1.
Remark 2.3. Since the right hand side (RHS) of (2.11) is discontinuous, it is not obvious that a solution to this system exists. To ensure unique existence we consider a smoothed version of (2.11) by approximating the limit by ideas inspired from barrier methods. We introduce the parametrized, convex optimization problems with parameter ι > 0 and inequality constraints As ι → ∞, the log barrier functions become closer to the indicator function of the feasible set of the original problem. We equivalently consider the problems Theorem 2.4. Let u 0 ∈ Ω and u(t) denote the solution of the smoothed initial value problem of (2.11) with u(0) = u 0 . Further assume that A Γ −1 A is positive definite, and there exists a (unique global) minimizer u * ι of (2.13). Then for each ι > 0 it holds true that lim i.e. the solution u(t) converges to the (global) minimizer of (2.13).
Proof. We define V (u) = 1 2 u−u * ι 2 and prove that V is a strict Lyapunov-function by the strict convexity of the optimization problem. The flow of V satisfies Remark 2.5. By duality arguments (see [10, 11.2] for more details), the accuracy of the approximation can be bounded by where u * denotes the minimizer of the original problem (2.9). In particular, Φ(u * ι ) → Φ(u * ) for ι → ∞ and thus u * ι → u * . Remark 2.6. Formally, we obtain in the limit the same result stated in Theorem 2.4 for u(t) being a solution of the initial value problem and the index sets where we have used the strict convexity of Φ and optimality of u * and we conclude formally with lim t→∞ u(t) = u * .
Corollary 2.7. Let u 0 ∈ Ω and u(t) denote the solution of the smoothed initial value problem of (2.11) with u(0) = u 0 . Further assume that there exists a (global) minimizer of (2.9). Then it holds true that lim where u * ι is a KKT-point of (2.13). Proof. Let u * ι be an arbitrary KKT-point of (2.13). The flow in the observation space is given by which corresponds to the gradient flow of a strictly convex optimization problem in the observation space. Thus, by the same arguments as before in Theorem 2.4, the claim follows.
Remark 2.8. Formally, we again obtain in the limit the same results stated in Corollary 2.7 for u(t) being a solution of (2.11): Let u * be an arbitrary KKT-point of (2.9). Using where · F denotes the Frobenius norm, we have and obtain the integral inequality Application of Gronwall's integral inequality yields

2.3.2.
The preconditioned projected gradient method. We consider the preconditioned version of the iteration (2.10) in discrete time given by where D k is a symmetric, positive definite matrix. It is well known that arbitrary choice of D k gives no descent for any choice of α > 0. We will briefly discuss an example demonstrating that the preconditioning of the gradient flow does not lead to a descent direction in general (cp. [6]). This is highlighted in Figure 1. Example 2.9. We consider the 2-dimensional quadratic example: We define the quadratic function and consider the minimization problem of F with the constraints x 1 ∈ R, x 2 ≤ 0.
The gradient of Φ is given by The current iterate is given by x k = 1 0 . As a preconditioner, we consider the symmetric, positive definite symmetric matrix For α > 0, the next iteration is given by where P is the projection onto R × R ≤0 . Then, i.e. for all α > 0 the function value of the objective function increases.
Example 2.9 shows that a simple projection strategy for the EKI, which is a preconditioned gradient flow in the linear case, does not lead to a convergent descent method in general.
To ensure a descent direction for the preconditioned projected gradient method, we follow the approach of [6, Proposition 1], which suggests to use matrices D k diagonal with respect to the subset of indices containing We will give more details on the modification of the preconditioner in the context of the EKI in the following. In particular, we will make use of data assimilation methods such as variance inflation to ensure a descent. This will be discussed in greater detail in Section 3.

Convergence analysis
In this section we introduce a variant of the projected EKI and derive the continuous limit of the algorithm. We will provide a complete convergence analysis of the proposed modification in the linear setting. This will include an accuracy result of the proposed algorithm, the analysis of the ensemble collapse and the convergence to the truth.
Recall that the componentwise projection onto the box B is denoted by P. To incorporate the projection into our scheme we modify the procedure described in Section 2. We now define our prediction step in its projected form as and similarly for the covariances Then by using these estimates we can construct our update formula in its closed form which is n,P + C up n,P (C pp n,P + h −1 Γ) −1 (y − G(u (j) n.P )), u (j) n+1,P = P(ũ (j) n+1,P ), which is an almost identical copy of the update formula for the non-projected EnKF. Now in order to test the projection we aim to derive a continuous time limit.
3.1. Continuous time limit. We now consider deriving a continuous time limit for the projected EnKF for inverse problems. The form of the limit will differ to that described in Section 1 , as we wish to quantify the limit in terms of directional derivatives. We begin be recalling the equations in closed form, given by the component-wise increments By using the Neumann expansion for part of the Kalman gain, we observe for positive-semidefinite C that t ), i = m + 1, . . . , n.

(3.3)
Finally, we obtain the continuous time limit for the EnKF by where v i (u (j) ) is given by In the linear case we can write v(u (j) ) as with Φ(u) = 1 2 Au − y 2 Γ from the minimization problem (2.9). Remark 3.1. Similarly to Remark 2.3 the RHS of (3.4) is discontinuous and we will consider the smoothed system for j = 1, . . . , J.

Variance inflation.
The EnKF and the ESRF are known to have certain difficulties in a high dimensional setting. One of the cases is where the dimension of the parameter space is considerably larger than the dimension of the ensemble. As a result this can undervalue the importance of the covariance matrices. One common way to alleviate this issue is through variance inflation [2,3,22]. The idea is to inflate the empirical covariance by another covariance matrix which is of the prior. This can usually take two forms, additive inflation or multiplicative inflation, both defined below as where C 0 is our prior covariance and ϑ > 0 is a constant. If we relate this to EKI, this was first applied in [31]. To keep our work consistent with previous work we stick to the case of additive inflation (3.6). By incorporating this form of inflation our gradient flow structure is modified to We note if one was to apply it for the projection the limit would differ in that the u would be projection under P.

Convergence analysis in the linear case.
In the linear case we view the continuous time limit of projected EnKF as preconditioned gradient flow.

3.3.1.
Transformed method for the EnKF. In Example 2.9 we have seen that in the case of preconditioned projected gradient methods, it is not possible to ensure a descent direction for an arbitrary choice of a positive definite matrix D k . Based on the results of Bertsekas [6], we will transform (3.4) in a way such that the preconditioner is diagonal with respect to an index set I + (u) which is built similar to the set from (2.15). Since we consider a system of particles we will use a preconditioner which is diagonal with respect to an index set which depends on the whole ensemble of particles u = (u (j) ) j=1,...,J , in particular we set Similarly, we could also choose the index set to bê In this work we will focus on the choice of I + (u). Therefore, we consider the preconditioned gradient flow given by where p(u t ) and the preconditioner is given by Let {x (j) } J j=1 be a system of particles in R n . Without loss of generality we assume I + (x) = {1, . . . , r} for r ≤ m. Hence, the the preconditioner D(x) can be written as where (Ĉ(x)) i,j∈{1,...,n−r} = (C(x)) i,j∈{r+1,...,n} .
Remark 3.2. Consider the continuous time limit of the projected EnKF in equation (3.4). When we introduce variance inflation for (3.5) with the help of a diagonal matrix, e.g. by then the preconditioner D(u) = C(u) + εI is diagonal with respect to the index set I + (u) and can be written in the form of (3.10). This can be seen as follows: since u (j) evolves by (3.9) we have that u for all i and j. For simplicity, we assume m = n. Let i ∈ I + (u), then it follows Since the particles are feasible for all t ≥ 0, we obtain i −ū i = 0 for all j. This leads to for i ∈ I + (u), k ∈ {1, . . . , n} and i = k. Hence, we can view the projected EnKF with variance inflation as special case of the presented method.

3.3.2.
Transformed method for the ESRF. The continuous limit (3.4) and the gradient flow (3.5) can be adapted to the case of ESRF. The same can be said for the projected preconditioned limit above. In order to do so the closed form would need to be changed such that the limit follows a similar form to (2.4).
The resulting transformed method for the ESRF is given by ) and the preconditioner is again given by (3.10).
For simplicity we will focus in the following on the transformed method for EKI (3.9). All results can be easily adapted to the transformed method (3.12) for the ESRF.

Convergence Results.
Let u * be KKT-point for (2.9) and (u (j) t ) be the solution of the system (3.9). Our aim is to prove the convergence of the residual r (j) t To show convergence we have to control the empirical covariance C(u t ) over time. In particular, in the following proposition we will prove that we can bound the ensemble spread e (j) ∈ Ω, j = 1, . . . , J be the initial ensemble and u(t) denotes the solution for the approximated flow of (3.9) given by The positive definiteness of D(u t ) yields with D(u t ) = VΛV T , V orthogonal matrix of eigenvectors and Λ diagonal matrix of eigenvalues. Thus, dV (ut) dt ≤ 0.
We are now ready to prove the following theorem regarding the residual in the parameter spacer Since Φ is a strictly convex function, we obtain ∈ Ω, j = 1, . . . , J be the initial ensemble, A Γ −1 A be positive semi-definite, ε > 0 and assume that there exists a (global) minimizer for (2.13). Then it holds true that where u * is a KKT-point of (2.13).
Proof. Follows similarly to the proof of Corollary 2.7.
As mentioned in Remark 3.2 the projected EnKF with variance inflation can be viewed as a special case of the developed transformed method in (3.9). Thus, we can adopt the derived results also for this case.
t ) be the solution of the projected EnKF with variance inflation introduced by a diagonal matrix, i.e. the solution of the system (3.4) with v i (u (j) ) given by (3.11) and initial ensemble u Since Ĉ (x) 0 0 0 is symmetric positive semi-definite, we have that Similar argument as before leads to the assertion.
As shown before we can also prove the convergence of the residuals in the parameter space when we reduce the variance inflation over time. We emphasize for the numerics that we consider a system with noise-free measurements to verify the presented theory and that we will consider a smoothened version of algorithms in the continuous time limits.
4.1. Linear inverse problem. We now seek to numerical check whether the BC optimization introduced in Section 3 works in practice. For the continuum limit we solve the ODE through the MATLAB solver ode45. In both cases we use an ensemble size of J = 5. Our forward model will be a linear 1D elliptic PDE of the form, where we seek a solution p ∈ U := H 1 0 (D) from The inverse problem associated with (4.1) is the recovery of u ∈ X from 16 pointwise measurements of p. Our forward solver for (4.1) is a piecewise finite element method with mesh size h = 2 −8 . We will assume our computational domain is D = (0, π). Our mapping G(·) = A· is assumed to be linear where we set A = O • G, where O : U → R K is the observational operator and G : X → U is the forward operator. We specify the covariance of the noise as Γ = γ 2 I where γ = 0.01, and we choose T = 10 6 . The initial ensemble u 0 is chosen through a Fourier basis representation, and the inflation parameters are taken as α = 0.75 and R = 1.
As stated in Remark 3.2, the projected EnKF with variance inflation can be viewed as a special case of the transformed version provided in (3.9). This suggests that both methods should perform similarly and outperform the original EnKF with no constraint. For the numerics we now specify the projected EnKF as the projected EnKF without variance inflation and the transformed EnKF as the projected EnKF with variance inflation. For the linear case we compared the projected EnKF, the transformed version and the original method without projecting onto the box. Note that the numerical solution of both the projected EnKF and the transformed EnKF are based on a smoothed version of the indicator function 1 y (x) by a linear functioñ 1 y (x) with1 y (y) = 1 and1 y (y ± ι) = 0 for the upper and lower bound respectively.
Our numerics will consist of two different cases: (1) The truth u † lies outside the box and O gives full observations. (2) The truth u † lies outside the box, and O gives low dimensional observations.
To understand the effect of the different forms of observations, if we have full observations then there exists a unique KKT-point to (2.9), implying the true parameter is a KKT point. If we have low dimensional observations then A Γ −1 A is only positives semi-definite. Thus, we can only expect convergence of the cost functions. For the first case we we choose full observations and for the latter we choose 15 observations.
Our first set of experiments for the linear PDE are shown in Figures 2 -4, where we assume that we have full observations. The left hand side image of Figure 2 compares the performance of the different methods at reconstructing the truth. As known with EnKF, it can exhibit a smooth reconstruction which is exactly seen.   For the projected EnKF we notice a similar performance, however it takes into consideration the constraints as part of its reconstruction is on the boundary. However when analyzing the transformed EnKF, we see an improvement over previous methods, which is also evident from the comparison of the observations from the right hand image. Figure 3 shows the ensemble collapse of each method, which we see is achieved. We observe the spread behaves almost at an identical rate for the EnKF and projected EnKF. To highlight further the benefit of using the transformed version, Figure 4 demonstrates this by showing a sharper decrease in both the KKT residuals and the difference of the misfit functional and global minimizer. As the projected method eventually levels flattens at 10 −1 for both, the transformed version continues the achieve smaller differences.

4.2.
Non-linear inverse problem. To demonstrate the effectiveness of the transformed projection, we consider a 2D nonlinear PDE. We will use an analogous nonlinear version of our linear elliptic PDE, which arises in subsurface flow. The forward problem associated with the Darcy flow model is using the permeability κ ∈ L ∞ (D) to solve −∇ · (κ∇p) = f, x ∈ D, (4.3) where ∇· denotes the divergence and we have imposed zero boundary conditions (4.4). Our forward solver is based on a second-order centred finite difference method with mesh size h = 2 −4 . We take the source term of (4.3) as f = 1. The inverse problem associated with (4.3) , where we specify κ = exp(u), is to reconstruct κ from noisy linear functionals (4.5) y k = l k (κ) + η k , k = 1, · · · , K.
By defining G k (u) = l k (κ), we can rewrite (4.5) as the inverse problem The inverse Darcy flow equation has been studied, and is continuously used as an example which is well-represented for PDE-constrained Bayesian inversion. We stick to the same setting as in subsection 4.1, but with the modifications of only having 16 low dimensional observations, and testing a system with l = 64. Our prior u ∼ N (0, C) is simulated through a Karhunen-Loève expansion of the form where (λ j , φ j ) is the eigenbasis of the covariance operator C, expressed as where σ 2 is a scaling constant, ν > d/2 is the smoothness of the prior and is the Laplace operator in 2D. The hyperparameters are specified as σ 2 = 1 and ν = 2. As suggested through Remark 3.2 the projected EnKF with variance inflation can be viewed as a special case of the transformed version.
Since our analysis was in the linear setting we need to discuss how to incorporate variance inflation in the nonlinear setting.

4.2.1.
Variance inflation in the nonlinear setting. In the nonlinear case there is no obvious representation of the continuous time limit of the EnKF as preconditioned gradient flow. We consider the following approximation based on the Taylor expansion G(u (j) ) − G(ū) ≈ DG(ū)(u (j) −ū), where DG(ū) is the Jacobian matrix of G atū. Hence, we will approximate the mixed sample covariance C up (u) by Note that we have introduced a second approximationḠ ≈ G(ū). We will use variance inflation in sense of (4.7) and we will write the EnKF with variance inflation through the following approximation By application of this method of variance inflation one should mention that there exists the disadvantage of computing the derivative of the nonlinear map G.
For the non-linear experiments Figure 7 shows the performance of each method w.r.t. to the truth. It can be seen that the EnKF without constraints does not remain within the feasible set unlike the other two methods, despite Figure 8 looking identical across all methods. To see a more in-depth representation of the performance we analyze the ensemble spread seen in Figure 9 where they all seem to converge to zero, where the rates look similar. However by looking at the difference    of the misfit functional with the minimizer in Figure 10, we see the difference of the transformed method continues to decrease while for the projected EnKF it starts to flatten, likewise with the original EnKF. This highlights further the benefit of using the transformed EnKF.

Conclusion
Our motivation behind this work was to gain insight in how constrained optimization could be implemented within ensemble Kalman inversion (EKI). To accommodate our algorithm of choice, we made use of the formulated projected Newton method discussed by Bertsekas et al. [6]. This was achievable as the projected Newton method could be related through the least-squares formulation. As a result a box-constrained optimization method was adapted for EKI. This included deriving a continuum limit and a gradient flow structure. The key insight from this work is that the projected EKI with box constraints does not always result in a correct descent direction to the minimizer. By tempering with the preconditioner through additive variance inflation it was shown that the correct descent direction could be acquired. This was shown analytically through a convergence analysis and various numerical experiments. The results presented in this work can also generalize to the ESRF.
As EKI can be interpreted as an optimizer, this encourages further research into adopting tools from optimization theory. The current article provides a basis into why one would be interested in using these approaches, however there remains numerous other avenues. These include trying to quantify the ensemble Kalman filter for inversion as an optimizer as well as implementing some form of gradient descent techniques [12]. Another direction to consider are hierarchical inverse problems. A hierarchical version of EKI was proposed in [11], where the hyperparameters were governed through a uniform distribution. As there is no guarantee the parameters will stay within the range of distribution, one could apply the techniques in this paper to a hierarchical setting. Finally in the non-linear setting gradients are required to approximate the continuum limits with variance inflation. As this procedure can be costly in high dimensions, we hope to improve on this burden with various computational techniques.
NKC acknowledges a Singapore Ministry of Education Academic Research Funds Tier 2 grant [MOE2016-T2-2-135]. SW is grateful to the DFG RTG1953 "Statistical Modeling of Complex Systems and Processes for funding of this research.