NUMERICAL EFFICACY STUDY OF DATA ASSIMILATION FOR THE 2D MAGNETOHYDRODYNAMIC EQUATIONS

. We study the computational efficiency of several nudging data as- similation algorithms for the 2D magnetohydrodynamic equations, using vary-ing amounts and types of data. We find that the algorithms work with much less resolution in the data than required by the rigorous estimates in [7]. We also test other abridged nudging algorithms to which the analytic techniques in [7] do not seem to apply. These latter tests indicate, in particular, that velocity data alone is sufficient for synchronization with a chaotic reference so- lution, while magnetic data alone is not. We demonstrate that a new nonlinear nudging algorithm, which is adaptive in both time and space, synchronizes at a super exponential rate.

1. Introduction. Various approaches to data assimilation, whereby observations are incorporated into mathematical models, date back at least fifty years. Satelliteborne weather observations became available in the 1960s, inspiring Charney, Halem and Jastrow [10] to inject data directly into the nonlinear term of a model for the atmosphere (see also [21,30]). A remarkable treasure of geomagnetic data from over four centuries of ships' logs has been gathered [26], which prompts the question of how such information could be incorporated into models of the magnetic field at the core-mantle boundary [23]. We present in this paper numerical test results on data assimilation by nudging applied to the magnetohydrodynamic (MHD) equations.
The nudging approach to data assimilation dates back to the 1970s [28,32]. It is easily described in terms of an evolution equation for which initial data is unknown. Measurements are collected at sparse locations in space at all times t ∈ [0, t 0 ], represented by I h (y(t)), where I h is some interpolating linear operator approximating the identity with resolution h. This data is then fed into the initial value problem It was shown for a reaction diffusion equation [4] and the 2D Navier-Stokes equations (NSE) [3] that if µ is sufficiently large, and h sufficiently small, then Y (t) converges to the reference solution y(t) at an exponential rate as t increases. Thus, after only a small amount of time t 0 , one can effectively use Y (t 0 ) as initial data in (1), and solve for t > t 0 . The initial data in (2) can be arbitrarily chosen; it need not be zero. Convergence of this nudging type algorithm has also been established for the 3D Brinkman-Forchheimer Extended Darcy model [29], the 3D Navier-Stokesα model [1], the subcritical surface quasi-geostrophic (SQG) equation [24] and the damped driven Korteweg-de Vries equation [25]. Unlike probabilistic approaches to data assimilation such as extended Kalman filters, and 3D/4D Var (see the books [13,31]), nudging does not include uncertainty quantification. The analysis in [3] has, however, been extended in [5] to the case where the observational data is corrupted by stochastic noise. More recently, in [6], a downscaling nudging scheme has been developed in which coarse mesh statistics are obtained from discrete spatial measurements.
Abridged nudging algorithms in which data are collected on only a subset of the variables have been analyzed for several systems. It was shown in [16] that data from just one component of velocity is sufficient to recover a reference solution of the 2D NSE. Similarly, for 2D Rayleigh-Bénard (RB) convection with no-slip boundary conditions, it was shown in [15] that velocity data alone suffices; temperature observations are not needed. Then for the RB problem with stress-free boundary conditions it was proved in [19] that one can synchronize using data only in the horizontal component of velocity. Charney's conjecture, that temperature data alone would suffice for RB convection, remains unproved, though it is shown to be so for both a 3D porous media convection model [17], and 3D planetary geostrophic model [18].
All of the above works on abridged algorithms provide rigorous estimates for lower bounds on µ and upper bounds on the spatial resolution h. It is likely that the algorithms perform better, i.e., convergence is achieved with data that is much more coarse than the estimates require. This is in fact demonstrated in numerical tests carried out for the 2D NSE [20,22], and 2D RB convection model [2,14].
The numerical results for the MHD equations presented in this paper are compared with rigorous estimates in [7]. Since there are two fields, there are several natural abridged algorithms to consider: velocity data alone, magnetic field data alone, and one component of both. In addition, we test an algorithm that uses just one of the two Elsässer variables, which are the sum and difference of the two fields. As in numerical tests on other systems, we generally demonstrate that the algorithms work much better than the rigorous estimate guarantee. Considering the vast historical geomagnetic data, however, it should be noted that the abridged algorithm using only magnetic field data does not appear to converge.
In Section 2 we present three nudging algorithms and state the rigorous estimates for their convergence proved in [7]. Then in Section 3 we determine a near optimal value of µ through numerical experiments over a short time interval. That is followed by longer time simulations which demonstrate convergence. Other algorithms which have not (yet) been analyzed are tested numerically in Section 4. Some are abridged algorithms in which data is collected in a certain subset of the state variables. Others are inspired by [27] in which the nudging term is amplified when small. Finally in Section 5 we demonstrate the robustness of the main algorithm to noise in the observed data.
2. Rigorous results. The MHD equations are a set of equations that model fluid dynamics when the fluid is conductive and coupled with an evolving magnetic field. This situation arises when an electrically conductive fluid is in the presence of a magnetic field; as the fluid moves, the magnetic field responds, and in turn induces motion in the fluid. These nonlinear interactions are quadratic, and couple the evolution equations of the velocity and magnetic fields.
Let u and b represent the velocity and magnetic fields respectively, and p the fluid pressure. We consider the MHD system as follows: on the domain Ω = [0, 2π] 2 , equipped with periodic boundary conditions. We denote the fluid viscosity by ν and the fluid density by ρ (which we assume to be constant and homogeneous). The role of viscosity for the magnetic field is played by λ := (µ 0 σ) −1 , where µ 0 is the permeability of free space and σ the electrical conductivity of the fluid. The extra pressure term q is included for computational purposes; the divergence free condition on b will force q to be exactly 0, but we choose to include q so that the divergence free condition can be explicitly enforced in our simulations. We assume for simplicity that the forcing functions, f and g, are time independent.
In this context, we define the nondimensional Grashof number to be: The Grashof number is a comparison between the amount of energy being put into the system and the amount of energy being dissipated, and serves as a complexity parameter; at higher values for G we expect more information is needed to capture the dynamics. We apply the general idea of (2) to the MHD equations. We assume that for a given reference solution, (u, b), of (2.1), we have data being collected on some subset of the fields {u 1 , u 2 , b 1 , b 2 }. A nudging term could be introduced into the evolution equation for any variable on which we are collecting data, and so we can consider a different algorithm for each combination of the variables we assume to be measuring.

Remark 1.
As the pressure field, p, does not have an evolution equation, we cannot directly make use of any data collected on p with an equation like (2).
In this paper, we consider I h to be the projection onto the low modes, P N , defined as follows: if u has the Fourier expansion So in this case, h = 1 N , and it is easy to show that for u ∈ H 1 , Although we only consider the case I h = P N , the theoretical results are the same for any I h satisfying (5), such as volume interpolants. Rigorous results have also been obtained for nodal interpolants, which satisfy a less restrictive version of (5), at the cost of requiring higher resolution in the measurement data [7] (see [16] for examples of different interpolants satisfying (5), as well as interpolants which satisfy other regularity conditions). Indeed, an advantage of nudging is the flexibility of the interpolant. Practically speaking, Fourier modes are not expected to be directly measured in observed data, but could be obtained via a fast Fourier transform from data at nodes on a grid, albeit with an aliasing error (see [9] for more on spectral filtering of observables). The first algorithm we consider utilizes data collected on all the variables (except the pressure). As it uses more information about the reference solution than the other algorithms we consider, we expect it to have the best performance.

Algorithm 2.2. Solve
Note that we fold the affect of |B| into the definition ofp, for notational simplicity. In the next algorithm, we require data to be collected on only the horizontal components of each field.

Algorithm 2.3. Solve
In addition to measuring the components of u and b directly, it is possible to directly measure one of the Elsässer variables 1 2 (u± 1 ρµ0 b) [12]. With this in mind, we consider the following algorithm which requires collecting data on only one Elsässer variable, and hence, as with Algorithm 2.3, only half of the observables.

Algorithm 2.4. Solve
Sufficient conditions for the above algorithms to produce solutions which converge in L 2 (and H 1 ) to (u, b) were obtained in [7], which we summarize in the following theorem: on the global attractor. For each of the above algorithms, there exists constants c, C < ∞ (which depend on ν, λ and G) such that if µ > c and then the system of equations defined in the algorithm will have a unique strong solution, Specifically, for the case of Algorithm 2.2, and

For Algorithm 2.3 and Algorithm 2.4,
and We now test these algorithms to see how well they perform in practice. The rigorous results above ensure that Algorithms 2.2 -2.4 will work provided that the appropriate conditions on µ and h are satisfied, and that the data and solutions to the systems in the algorithms are exact. Those estimates, if sharp, would require a prohibitive amount of data: N ∼ G ((6) and (7)) or worse N ∼ G 3 ( (6) and (9)). We want to demonstrate that the algorithms are effective using data that is much more coarse than the estimates require.
3. Computational results. All of our computations were performed on the supercomputers at Indiana University, using dedalus [8], an open source pseudo-spectral package (see http://dedalus-project.org/). An implicit/explicit Runga Kutta 222 time stepping scheme was used (except in Section 5); the linear terms were solved implicitly and the nonlinear terms explicitly.

Reference solution.
To test the data assimilation algorithms, we first compute a reference solution which is to be recovered using only coarse projections. The desired reference solution should exhibit some nontrivial time-dependent behavior to adequately test the performance of the algorithm. In addition, for practical matters, we would like to have a relatively simple force, which can be accurately represented in low resolution simulations.
We choose a two-mode force for each equation, which generates some interesting dynamics. Specifically, we define the forces, f, g : Ω → R 2 , by g(x, y) = 2ℜ where M f and M g are constants chosen so that ∥f ∥ L 2 = ∥g∥ L 2 = 10. The modes were chosen essentially arbitrarily, as were the first components of the coefficients (the second components were then chosen to make the forces divergence free). We next adjust ν and λ to produce an interesting reference solution. We expect the flow to become more turbulent the smaller we take ν and λ. However, decreasing min{ν, λ} may necessitate increasing the resolution of the computational grid, and in turn taking a smaller time step. As a compromise we take With the forces in (11) and (12), this yields the Grashof number G = 10 5 .
We define ∥(u, b)∥ for any semi-norm ∥ · ∥ as We consider sufficient resolution to be achieved using 256 2 dealiased Fourier modes, because when computed with 512 2 modes, the qualitative features of the chaotic solution are essentially unchanged. All computations were performed using the time step dt = 0.0001. Figure 1 shows some properties of the computed reference solution, (u, b). The time series plotted in (a) and (b) shows the solution settling quickly upon a chaotic solution. In (c),(d) the projection in the L 2 , H 1 -plane of the solutions after the transient period indicates a qualitatively similar attractor for the two resolutions 256 2 and 512 2 . This projection is chosen to get a visual sense of the Taylor wave number κ 2 τ = ⟨∥∇ · ∥ 2 L 2 ⟩/⟨∥ · ∥ 2 L 2 ⟩, and because there are explicit curves in this plane that bound the global attractor of the Navier-Stokes and related fluid systems (see [11]). Figure 2 shows the curl of the velocity and magnetic fields of the computed reference solution at time t = 729.92, when the data assimilation starts. We see that there are several small eddies and complicated structures for the data assimilation algorithms to attempt to capture. 3.2. Convergence. All of the data assimilation simulations were performed starting from t 0 = 729.92, with initial data for the algorithm equal to 0, computing (u(t), b(t)) simultaneously with (u(t), b(t)) for t > t 0 . We compare the resulting evolutions to see how the algorithm being tested performed in terms of the relative error, where we define the relative error as For each simulation, we choose an algorithm to generate (u, b), and set N (so the number of modes of the reference solution for which we have data will be (2N +1) 2 ), as well as µ, which amplifies the feedback. We would like to take µ large, because the error in (8) and (10) decreases like e −cµt , hence a larger µ increases the convergence rate and therefore reduces the amount of simulation time required. However, if µ becomes too large, it destabilizes the solution over small scales (small scales meaning finer resolutions than the larger, coarser scales captured by P N ) by mixing in the feedback. This feedback is compensated by the dissipation, provided condition (6) holds. Hence, given a value for N , we have an upper bound for µ in the sense that, with ν = λ = .01, (6) gives µ N 2 /100. However, it is unclear what the optimal value for µ is because of the constants involved and because (6) is a sufficient condition for convergence rather than a necessary condition. Figure 3 shows the resulting error after simulating Algorithms 2.2 -2.4 for 5 simulation seconds, using different values of µ and N .
We see that the benefit of increasing µ (which is considerable initially) quickly diminishes in all cases. Furthermore, it appears that each algorithm performs as well with N = 32 as with N = 128 (note that N = 128 constitutes using data on all the computational modes, as our 256 2 resolution is supported on a [−128, 128] × [−128, 128] grid in Fourier space). Based on these results, we decide that µ = 20 is a reasonable value to use in longer simulations. Figure 4 shows the convergence results we obtain with µ = 20 for Algorithms 2.2, 2.3, and 2.4, when N = 32, as well as the convergence of Algorithm 2.2 when N = 3. As expected, Algorithm 2.2 performs much better than Algorithm 2.3 and Algorithm 2.4. Perhaps surprisingly, Algorithm 2.4 performs noticeably better than Algorithm 2.3. Algorithm 2.3, though it does not perform as well, still shows near monotonic convergence, and it is reasonable to speculate that if the simulation time were extended, the error would reach 10 −9 , as do the other algorithms.
4. Other algorithms. We now test additional algorithms, which, unlike those in Section 3.2, have no rigorous estimates to support them. In each case we solve the nudged system for (u, b) on [t 0 , t 0 + T ], with the initial condition u(t 0 ), b(t 0 ) ≡ 0.

Different abridged algorithms.
In the following algorithm, we collect data on only the velocity field.  The next algorithm requires collecting data on only the magnetic field. Since it differs from Algorithm only in which variables are nudged, we specify simply by Note that because the evolution equation for b depends on u only through products with b, we might suspect that this algorithm will fail to recover the reference velocity u, as it would be impossible for the case that g ≡ b 0 ≡ 0 (so b(t) ≡ 0 ∀t), as was pointed out in [7]. However, we can still consider the possible success of this algorithm when the reference magnetic field is nonzero, which is our present situation.
We also consider how much of an improvement using data from 3 out of the 4 fields might yield compared to only using data from 2 fields. With this in mind, we define the following two algorithms.  Figure 5 shows the convergence results we obtain for the above four algorithms, using the same reference solution as before and µ = 20. We see that Algorithm 4.2 shows no sign of converging, even with data on all the computational modes, i.e.    . This is somewhat surprising, as Algorithm 2.4 performs better while requiring fewer measurements. We also note that Algorithm 4.1 seems to be converging. Although it was not proved, it was conjectured in [7] that such an algorithm would converge.

4.2.
Nonlinear nudging. We demonstrated in Figure 3 how to empirically find the optimal value for µ for a given amount of data. Thanks to the exponential rate of convergence, this can be done in relatively short computational time. Yet we can avoid this by using an adaptive method which automatically adjusts the amplification of the nudging. In [27], Larios and Pei introduced the following nonlinear variation on (2): where N took several different forms, one being for some γ ∈ (0, 1) and some norm ∥ · ∥. In addition, they demonstrated for the Kuramoto-Sivashinky equation that their approach achieved synchronization with the reference solution at a super-exponential rate.
We will now adapt their approach for the MHD using a modification of Algorithm 2.2 as an illustration. Each of the next four algorithms amount to solving for (u, b) on [t 0 , t 0 +T ], with the initial condition u(t 0 ), b(t 0 ) ≡ 0. The first nonlinear function we test closely resembles N 2 . Algorithm 4.5. Solve (16) with N defined by We note that the use of the L 2 -norm in the nonlinear factor interferes with running the code in parallel. To address this we consider one which uses only Fourier modes and has the advantage then of being adaptive in both time and space. Again we specify in terms of velocity, since the expression for the magnetic field should be clear. Algorithm 4.6. Solve (16) with N defined by for |k| ≤ N, and F (N (u, u)) (k) = 0 for |k| > N , where F denotes the Fourier transform and ψ is defined as in Algorithm 4.5 .

Remark 2.
We also considered variations on Algorithm 4.5 and Algorithm 4.6 which use a continuously adapted exponent determined by velocity, magnetic field, and even pressure data. Preliminary efforts however, have produced only schemes that seem unstable. This approach will be further studied in a future work, along with other schemes which make use of pressure data. The results of using Algorithms 4.5 and 4.6 are shown in Figure 6, and compared to the rates of convergence we obtained previously with a static nudging constant µ; specifically, we compare to the cases where µ = 1 and µ = 20. We see that the nonlinear schemes converge at a super-exponential rate, and are able to successfully synchronize with the reference solution without prior knowledge of a near optimal value for µ. 5. Measurement noise. In most practical situations, measurement noise is inevitable. In this section we simulate the impact of such noise in the case of Algorithm 2.3. We do this by replacing each occurrence of P N (u i ) and P N (b j ) with the interpolation of the measured values including noise, which, in our case, are Fourier coefficients F(P N (u i ))(k)+N (0, σ ui ) and F(P N (b j ))(k)+N (0, σ bj ) respectively (for i, j ∈ {1, 2}), where the standard deviations σ ui and σ bj gauge the precision of the instrumentation taking the measurements.
To simulate the noise, at each time step and for each measurement, we generate a random number from the normal distribution with mean zero and variance σ 2 , divide it by the square root of the time step ( √ 10 −4 = 0.01), and add the quotient to the exact value of the computed solution before using it in the data assimilation equation. For each evaluation we sample an independent random number ϵ ∼ N (0, σ) for each k ∈ Z 2 , |k| ≤ N = 32, setũ 1 (k) = u 1 (k) + ϵ √ dt , and then use x → ∑ |k|≤Nũ 1 (k)e ik·x in the data assimilation equations instead of P N (u 1 ). The same is also done for each of the other fields: u 2 , b 1 and b 2 . To ensure the integrity of the integration, we use a first order implicit/explicit scheme for the ODE solver at each time step. The results of including measurement noise are shown in Figure 7. The approximations successfully synchronize with the reference solution, down to a saturation level of noise that appears to be linear in the standard deviation of the measurement noise.
6. Summary. We have seen that data assimilation by nudging is highly effective for the MHD equations. In fact, it works with extremely coarse data when measurements are taken on all of the fields, even in the presence of measurement noise. Our results indicate that the Elsässer variable formulation is the most efficient way to use measurement data (without measuring every field), which leads us to suspect it may have deeper meaning. All of the abridged algorithms seem to work in our study, with the exception of the one which did not incorporate any measurements of the velocity field, as was conjectured in [7]. Thus, the vast collection of geomagnetic data alone seems insufficient for assimilation by nudging. On the other hand, an algorithm which incorporates only velocity field measurements is quite effective.
We also tested new algorithms which adaptively adjust the relaxation factor on the nudging term to amplify it as the error gets small. The straightforward adaptation of nonlinear nudging in [27] to the MHD system results in super exponential convergence, but due to a norm calculation, does not run well in parallel. We then demonstrated that a nonlinear nudging approach in terms of each Fourier mode converges at nearly the same super exponential rate. This final algorithm is not only adaptive in both time and space, but also well-suited to run in parallel.