Modelling uncertainty using stochastic transport noise in a 2-layer quasi-geostrophic model

The stochastic variational approach for geophysical fluid dynamics was introduced by Holm (Proc Roy Soc A, 2015) as a framework for deriving stochastic parameterisations for unresolved scales. This paper applies the variational stochastic parameterisation in a two-layer quasi-geostrophic model for a beta-plane channel flow configuration. We present a new method for estimating the stochastic forcing (used in the parameterisation) to approximate unresolved components using data from the high resolution deterministic simulation, and describe a procedure for computing physically-consistent initial conditions for the stochastic model. We also quantify uncertainty of coarse grid simulations relative to the fine grid ones in homogeneous (teamed with small-scale vortices) and heterogeneous (featuring horizontally elongated large-scale jets) flows, and analyse how the spread of stochastic solutions depends on different parameters of the model. The parameterisation is tested by comparing it with the true eddy-resolving solution that has reached some statistical equilibrium and the deterministic solution modelled on a low-resolution grid. The results show that the proposed parameterisation significantly depends on the resolution of the stochastic model and gives good ensemble performance for both homogeneous and heterogeneous flows, and the parameterisation lays solid foundations for data assimilation.

(Communicated by the associate editor name) Abstract. The stochastic variational approach for geophysical fluid dynamics was introduced by Holm (Proc Roy Soc A, 2015) as a framework for deriving stochastic parameterisations for unresolved scales. This paper applies the variational stochastic parameterisation in a two-layer quasi-geostrophic model for a β-plane channel flow configuration. We present a new method for estimating the stochastic forcing (used in the parameterisation) to approximate unresolved components using data from the high resolution deterministic simulation, and describe a procedure for computing physically-consistent initial conditions for the stochastic model. We also quantify uncertainty of coarse grid simulations relative to the fine grid ones in homogeneous (teamed with small-scale vortices) and heterogeneous (featuring horizontally elongated large-scale jets) flows, and analyse how the spread of stochastic solutions depends on different parameters of the model. The parameterisation is tested by comparing it with the true eddy-resolving solution that has reached some statistical equilibrium and the deterministic solution modelled on a low-resolution grid. The results show that the proposed parameterisation significantly depends on the resolution of the stochastic model and gives good ensemble performance for both homogeneous and heterogeneous flows, and the parameterisation lays solid foundations for data assimilation.

Introduction.
1.1. Motivation. The process of "upscaling", or "coarse graining" of the fine-scale data is in common usage in computational simulations at coarser scales. The goal of the present paper is to quantify the uncertainty in the process of upscaling, or coarse graining of fine-scale computationally simulated data for use in computational simulations at coarser scales, in the example of a two-level quasigeostrophic channel flow.
The question for coarse graining that we address in this paper is the following: How can we use computationally simulated surrogate data at well resolved scales, in combination with the mathematics of stochastic processes in nonlinear dynamical systems, to estimate and model the effects on the simulated variability at much coarser scales of the computationally unresolvable, small, rapid, scales of motion at the finer scales? We will address this question in the context of two-level quasigeostrophic channel flow.
Stochastic parameterisation has been a very active research topic in the last 20 years and a number of different approaches have been proposed. [31,15] developed a general approach based on applying singular perturbation theory to the Fokker-Planck equation for the dynamical system. [46] proposed a rather general approach based on linear response theory that the system being modelled is formed from weak coupling between two self-contained dynamical systems, and leads to the introduction of a stochastic term plus a memory term. A parameterisation was proposed in [1] based on averaging a system of slow and fast variables over the invariant measure of the slow variables given fixed values of the fast ones; this idea was investigated within a low-order coupled ocean-atmosphere model in [43].
In ocean modelling there also exists a tradition of "stochastic backscatter which goes back to [29], who first developed the stochastic backscatter prescription is as a supplement to the Smagorinsky viscosity and applied it in simulations of a planar turbulent shear mixing layer. Since then, the stochastic backscatter prescription has become a standard feature for modelling eddy effects in ocean modelling. See, e.g., [32] for a good treatment of the history of this development. The stochastic backscatter prescription differs from the current approach in that it represents stochastic forcing in the PV equation, rather than stochastic transport.
Our approach, known as Stochastic Advection by Lie Transport (SALT), differs from these methods in that its slow/fast decomposition is derived at the level of the Lagrangian flow map for the fluid, with a stochastic Ansatz being derived for the flow map in the diffusive limit [12]. This Anzatz simply states that fluid particles move with the mean flow plus a stochastic perturbation to the velocity that comes from averaging the small-scale dynamics. Then, the Ansatz can be passed to a variational principle for geophysical fluid dynamics equations, leading to Eulerian SPDEs where the stochastic terms appear as modifications to the transport terms.
The Ansatz for a stochastic Lagrangian flow map goes back to [28]. It was also the basis for the approach of Location Uncertainty (LU) advocated in [34]. However, the application of Lagrangian transport noise in the LU model differs from that in the SALT model, and both the LU model and the SALT model differ from Kraichnans advection model. Now, Kraichnans model applies transport noise to advection of a scalar fluid quantity. The LU model applies stochastic scalar transport to the momentum in the fluid motion equation. The SALT model applies stochastic scalar advection as a constraint on the back-to-labels map in Hamiltons variational principle for the fluid motion equation in [23]. As result of the different interpretations of Kraichnans stochastic scalar model, the stochastic Euler fluid equations in the LU model and in the SALT have mutually exclusive conservation laws. Namely, the LU fluid motion equation preserves kinetic energy and not Kelvin circulation, while the SALT fluid motion equation preserves Kelvin circulation and not kinetic energy.
Our approach is guided by recent results in [12] which showed that a multi-scale decomposition of the deterministic Lagrange-to-Euler fluid flow map g t into a slow large-scale mean and a rapidly fluctuating small-scale map leads to Lagrangian fluid paths x t = g t X with g 0 = Id on a manifold D governed by the stochastic process g t ∈ Diff(D) on the Lie group of diffeomorphic flows, which appears in the same form as had been proposed and studied for fluids in [23]; namely, where x = g t X, d represents stochastic differentiation, the divergence-free vector fields ξ k (x) =ẑ × ∇ζ k (x) for k = 1, 2, . . . , K, where the ζ k (x) are the streamfunctions for the prescribed vector functions ξ k (x) (to be described below),ẑ denotes the z-axis, x ∈ D on the domain of flow D, and ○ dW k (t) denotes the Stratonovich differential with independent Brownian motions dW k (t). The stochastic process for the evolution of the Lagrangian process g t in equation (1) involves the pullback of the Eulerian total velocity vector field, which comprises the sum of a drift displacement vector field u t (x)dt plus a sum over terms in ξ k (x) representing the (assumed stationary) spatial correlations of the temporal noise in the Stratonovich representation, each with its own independent Brownian motion in time. The ξ's represent spatial correlations of the rapid fluctuations which when replaced by noise in the homogenisation limit provide a vector of stochastic transport at the spatial coarse scale of the slow variables. This extra transport introduces uncertainty and widens the ensemble spread of the flow rather than simply improving the mean.
The choice of the Stratonovich form (as opposed to the Itô form) in the differential equation ensures that the variational derivation of SPDE makes sense. This is because the Stratonovich form satisfies the standard chain rule making the formal computation straightforward. Whilst the Stratonovich form is preferable from a physical/modelling perspective, the rigorous analysis of the solution of the SPDE requires the transformation from the Stratonovich form to the Itô form through the addition of a covariation between the integrand and the driving Brownian motion. Further details of the Stratonovich/Itô integral can be found, for example, in [27].
In [23] the velocity decomposition formula (1) was applied in the Hamilton-Clebsch variational principle to derive coadjoint motion equations as stochastic partial differential equations (SPDEs) whose ensemble of realisations can be used to quantify the uncertainty in the slow dynamics of the resolved mean velocity u t (x). Under the conditions imposed in the derivation of formula (1) in [12] using homogenization theory, the sum of vector fields in (1) that had been treated in [23] from the viewpoint of stochastic coadjoint motion was found to represent a bona fide decomposition of the fluid transport velocity into a mean plus fluctuating flow.
1.2. Data-driven modelling of uncertainty. The most familiar example of datadriven modelling occurs in numerical weather prediction (NWP). In NWP, various numerically unresolvable, but observable, local subgrid-scale processes, such as formation of fronts and generation of tropical cyclones, are expected to have profound effects on the variability of the weather. These subgrid-scale processes must be parameterized at the resolved scales of the numerical simulations. Of course, the accuracy of a given parameterization model often remains uncertain. In fact, even the possibility of modelling subgrid-scale properties in terms of resolved-scale quantities available to simulations may sometimes be questionable. However, if some information about the statistics of the small-scale excitations is known, such as the spatial correlations of its observed transport properties at the resolved scales, one may arguably consider modelling the effects of the small scale dynamics on the resolved scales by a stochastic transport process whose spatial correlations match the observations, at the computationally unresolvable scales. The ξ functions are the leading EOFs of the differences between the fine-grained and coarse-grained Lagrangian trajectories, evaluated on the fixed coarse grid.
1.3. The main content of the paper. The rest of the paper is structured as follows. Section 2 presents the deterministic and stochastic QG equations. Section 3 is divided into two parts. In the first part we present deterministic simulations of heterogeneous and homogeneous flows at different resolutions. In the second part, we develop a method for calibrating the ξ's by using the Lagrangian description of the flow. We also present a procedure for computing physically-consistent initial conditions for the stochastic model. Then, we quantify uncertainty in the homogeneous and heterogeneous solutions, and analyse how the spread of stochastic solutions depends on different parameters of the model. In Section 4 we compare the true solution, the deterministic solution modelled on the low-resolution grid, and the parameterised solution. In Section 5 we provide conclusions and outlook for future research.

2.1.
Deterministic case. The two-layer deterministic QG equations for the potential vorticity (PV) anomaly q in a domain Ω are given by the PV material conservation law augmented with forcing and dissipation [35,42]: where ψ is the stream function, J(f, g) = f x g y − f y g x is the Jacobian, the planetary vorticity gradient is given by parameter β, µ is the bottom friction parameter, and ν is the eddy viscosity. The computational domain Ω = [0, is a horizontally periodic flat-bottom channel of depth H = H 1 + H 2 given by two stacked isopycnal fluid layers of depth H 1 and H 2 , respectively ( Figure 1).
Forcing in system (2) is introduced through a vertically sheared, baroclinically unstable background flow (e.g., [4]) where the parameters U i are background-flow zonal velocities. The PV anomaly and stream function are related through two elliptic equations: with stratification parameters s 1 , System (2)-(4) is augmented by the integral mass conservation constraint [33] ∂ t ∬ Ω ψ [12] as well as by the periodic horizontal boundary conditions, and no-slip boundary conditions at the lateral (northern and southern) boundaries of the channel, where u = (u, v) is the velocity vector, and Γ 1 , Γ 2 , Γ 3 , Γ 4 are the boundaries of the computational domain ( Figure 1).
The QG model (2)-(7) is solved using the high-resolution CABARET method, which is based on a second-order, non-dissipative and low-dispersive, conservative advection scheme [26]. The distinctive feature of this scheme is its ability to simulate large-Reynolds-number flow regimes at much lower computational costs compared to conventional methods (see, e.g., [2,45,40,24]), since the scheme is low dispersive and non-oscillatory, it has a compact computational stencil in both space and time, and no dissipation error. We refer the reader to the study [26] for more detailed discussion of the CABARET method. The main salient feature here is that the CABARET method runs stably without diffusion, since the flux limiting scheme is sufficient to absorb enstrophy at small scales. We include diffusion in the model here purely to maintain a statistical steady state as required for our stochastic parameterisation.
2.2. Stochastic case. The stochastic version of the two-layer QG equations is given by the following system of equations [23]: The stochastic terms marked in red color are the only difference from the deterministic QG model (2). All other equations remain the same as in the deterministic case. As explained in the Introduction, the noise is Gaussian, white in time, with zero mean and its velocity-velocity correlation function may be written in terms of its eigenvectors ξ k =ẑ × ∇ζ k . This provides an interpretation of the ξ k .
3. Numerical results. For the purpose of this paper we consider a heterogeneous (Figures 2 and 3) and homogeneous ( Figure 5) flow regime (which correspond to flows with low (µ = 4 × 10 −8 s −1 ) and high (µ = 4 × 10 −7 s −1 ) drag, respectively) at different resolutions, and study how the parameterisation performs in each case. In brief, for the smaller bottom friction, we find that jet-like structures (also referred to as striations) emerge in high-resolution simulations, resulting from interplay of forcing, damping, and baroclinic instability. For larger bottom friction, the flow pattern is essentially homogeneous and no coherent jet-like structures are seen in this case.
Before presenting the results, it is helpful to explain the following four solutions used throughout the paper. In particular, • High-resolution deterministic solution q f computed on the fine grid G f = {N x × N y }, N x = 2049, N y = 1025 (dx ≈ dy ≈ 1.9 km). This solution is only used to compute the true solution, q a , described below; • Low-resolution deterministic solution q a computed on the coarse grids G c = 257 × 129, dx ≈ dy ≈ 15 km ( Figure 2, middle row) and G c = 129 × 65, dx ≈ dy ≈ 299 km ( Figure 3, upper row) as the solution of the elliptic equation (4) with the stream function ψ a , where ψ a is computed by spatially averaging the highresolution stream function ψ f over the coarse grid cell G c . We refer to q a as the truth or the true solution, and use it for comparison with the parameterised solution. Moreover, solving the QG equations on the low resolution grid (say, G c = 257 × 129) gives the solution (Figure 2, bottom row) which has nothing in common with the truth (Figure 2, middle row), not to mention the highresolution solution ( Figure 2, top row). The same is true for the coarser grid G c = 129 × 65 (compare the upper and second row in Figure 3). Note that we do not introduce extra notations to differentiate between q a computed on different grids, since we will discuss each case separately; • Low-resolution modelled solution q m (also referred to as the coarse-grid modelled solution) computed on the coarse grids G c = {257 × 129, 129 × 65} by simulating the QG model. This solution is used for parameterisation.
• Low-resolution parameterised solution q p (also referred to as the stochastic solution) computed on the coarse grids G c = {257×129, 129×65} by simulating the stochastic QG model.
For the low drag, which corresponds to the bottom friction coefficient µ = 4×10 −8 , flow dynamics is highly-heterogeneous, and small-scale features are prevalent in the first layer (Figures 2 and 3). The high-resolution flow q f (computed on the fine grid G f = 2049 × 1025 with the resolution dx ≈ dy ≈ 1.9 km) in the first layer consists of two flow regions: the fast flow within the jets and the slow flow between the jets. The flow dynamics in the first layer teems with small-scale eddies which, in turn, maintain the striated structure of the flow (see, e.g. [25]). The flow dynamics in the second layer is much less energetic than that of the first one, and exhibits neither small-scale features nor jets. The true solution q a , computed on the coarse grids G c = {257 × 129, 129 × 65}, captures the striated flow structure, small-scales features as well as flow energetics of the high-resolution solution q f in the first and second layer. However, the low-resolution modelled solution q m (the solution which has to be parameterised and then used in uncertainty quantification tests presented in Section 3.2.5) computed on the coarse grid G c by simulating the QG model cannot capture the correct structure (the number of jets and their positions) of the true flow dynamics. For example, q m 1 on the grid 257×129 has only two jets ( Figure 2, bottom row), while the true solution q a 1 (Figure 2, middle row) has four jets; we only count jets within the domain and exclude boundary layers from the consideration. The situation on the coarse grid 129 × 65 is even worse: the low-resolution solution q m 1 ( Figure 3, bottom row) has only one very latent jet if at all. Thus, the coarse-grid QG equations fail to model the proper jet-like structure of the flow. Apparently, the coarse resolution suppresses the small-scale eddies, which are thought to be one of the mechanisms responsible for maintaining the jets (see, e.g. [25]). In fact, we have found that the solution bifurcates between the grid 2049 × 1025 (four jets 1 ) and 1025 × 513 (only two jets), see Figure 4. In other words, there is no continuous degradation of coherent structures as the resolution decreases like, for example, in the double-gyre problem (e.g. [38]). Instead, one observes a fast switch from a solution with four jets to a solution with two jets, and eventually to a solution with only one weak, blurred jet on the grid 129 × 65 ( Figure 4). This bifurcation of the solution makes it very difficult to parameterise, since the low-resolution solution has a completely different structure compared to the true solution.
Going a bit ahead, we would like to note that our analysis of the time averaged total energy,Ē of the solutions computed on different grids (Ē 2049×1025 heter ; here, the energy is non-dimensional. This observation will be important later on for explaining why the parameterisation for the coarse grid 129 × 65 works worse than for the finer grid 257 × 129. Getting back to the analysis of QG solutions, we have found that for the high drag, with bottom friction coefficient µ = 4 × 10 −7 , the flow dynamics becomes more homogeneous ( Figure 5). As in the heterogeneous case, the high-resolution flow q f Figure 3. The same as in Figure 2, but for G c = 129 × 65 (dx ≈ dy ≈ 299 km).
is more energetic in the first layer than in the second, although this difference in the structure of the flow is less pronounced than that in the heterogeneous case.
Comparing the high-resolution solution q f with its coarse-grid analogue q a we conclude that the latter captures the small-scales features as well as the energetics of q f (Ē f homog = 26) in both layers; the time-averaged total energy of the space averaged solution isĒ a homog = 21. Unlike the heterogeneous flow, the coarse-grid modelled solution q m (the solution we parameterise and use in uncertainty quantification tests given in Section 3.2.5) is also homogeneous in structure and adequately restores the energetics of the true solution q a ; the time-averaged total energy of the modelled solution isĒ m homog = 17. In other words, the coarse-grid QG model properly represents the large-scale flow dynamics for flows with higher drag. In the case of high-drag flows, the zonally uniform eigenmodes responsible for maintaining the jet-like structure of the flow become more damped thus making jets much more latent compared with low-drag flows (see, e.g. [5]). The results presented in this section show that more interesting and energetic flow dynamics is confined in the first layer. Therefore, from now on we will focus our attention on the first layer unless stated otherwise.

Stochastic case.
3.2.1. Calibration of eigenvectors . We approach the choice of ξ i 's by starting from the Ansatz that models the effect of small scales on Lagrangian particles by noise, and which underpins the derivation of the stochastic QG model. We present a new methodology for modelling the difference between passive Lagrangian particles advected by the high-resolution deterministic velocity field u computed on the fine grid G f = 2049 × 1025 and its coarsened counterpartū computed on the coarse grid G c = 129 × 65 by differentiating the coarse-grained stream function ψ a . The stream function is computed by spatially averaging the high-resolution stream function ψ f In order to visualize the solutions on the same color scale we have multiplied the ones in the second layer by a factor of 5.
over the coarse grid cell G c . Based on this difference, we compute Empirical Orthogonal Functions (EOFs) [36,22], which are then used as the ξ's in the parameterised QG model. We also perform uncertainty quantification tests for the stochastic differential equation for Lagrangian particles (9) and the stochastic QG model (8), and study how the number of EOFs and size of the ensemble of stochastic solutions (referred to as ensemble members) affect the width of the stochastic spread. Note that we present the results only for the heterogeneous flow on the grid G c = 129×65, since the goal of this section is to demonstrate how the methodology works.

3.2.2.
Measuring the Lagrangian evolution. In the stochastic GFD framework, stochastic PDEs are derived from the starting assumption that (coarse-grained) fluid particles satisfy the equation where a is the Lagrangian label; some transport models based on a hierarchy of Markov models have been studied in a series of works [6,7]. The assumption (9) leads to, for example, the stochastic QG equation where i = 1, 2, and F i being the right hand side of (2). This is the system of stochastic PDEs that we actually solve. That is, equation (9) is not explicitly solved but describes the motion of fluid particles under the stochastic PDE solution.
The goal of the stochastic PDE is to model the coarse-grained components of a deterministic PDE that exhibits rapidly fluctuating components. The derivation of deterministic fluid dynamics starts from the equation After defining an averaged trajectoryx(a, t), we write on the assumption that the fluctuations in η in are faster than those inx; this scale separation is parameterised by the small parameter . Thus the deterministic equation forx is If η (denoted as ζ in [12]) has a fast dependency on t and has a stationary invariant measure, then according to homogenisation theory [12] we may average this equation over the invariant measure (subject to a centring condition) to get an effective equation After truncation of this sum, we recover equation (9).
We assume that u(x, t) can be modelled well with a fine grid simulation, whilst u(x, t) can be modelled with a coarse grid simulation. Then, we wish to estimate ξ's using data from u(x, t) in order to simulateū(x, t).
Our methodology is as follows. We spin up a fine grid solution from t = −T spin to t = 0 (till some statistical equilibrium is reached), then record velocity time series from t = 0 to t = M ∆t, where ∆t = kδt, and δt is the fine grid timestep. We define is the solution from the fine grid simulation. 2. Computeū(x ij (t), t) by spatially averaging u(x, t) over the coarse grid cell size around gridpoint (i, j). 3. Computex ij by solving the equatioṅ with the same initial condition. 4. Compute the difference ∆x m ij =x ij ((m + 1)∆t) − x ij ((m + 1)∆t), which measures the error between the fine and coarse trajectory.
Having obtained ∆x m ij , we extract the basis for the noise. This amounts to a Gaussian model of the form where ∆W k m are independent and identically distributed normal random variables with mean zero and variance one.
We estimate ξ by minimising where the choice of K can be informed by using EOFs.
In order to compute the EOFs, we first compute the matrix of observables and the covariance matrix R ∶= F T F (M − 1). Then, we compute the eigenvectors of R (EOFs), and use them as ξ's in the stochastic QG model (8).
Our choice of Empirical Orthogonal Function analysis is based on the capability of this method to extract spatially coherent, temporally uncorrelated and statistically significant modes of transient variability from multivariable time series. In particular, this method is efficient for dimensionality reduction, compression and spatio-temporal variability analysis of atmospheric and oceanic data. Generally speaking, one can use different flow decomposition methods instead of EOF analysis (e.g. Dynamic Mode Decomposition (DMD) [37], Optimized DMD [9], Singular Spectrum Analysis [14], etc.), and analyse how they affect the parameterisation, but such a study would be beyond the scope of this paper.

3.2.3.
Approximation of the Lagrangian evolution. In this section, we apply EOF analysis to the fluctuating component of ∆x and perform uncertainty quantification tests for the stochastic differential equation (SDE) (9) by comparing the true deterministic solution with the ensemble of stochastic equations. As the true deterministic solution, we take the solutionx c ∶=x(t) of the deterministic equation (15). The stochastic ensemble is given by the solution x(t) of SDE (9) computed for independent realizations of the Brownian noise W ; Lagrangian particles are remapped to their original positions (the nodes of the Eulerian grid G c = 129 × 65) every time step ∆t.
In order to solve the deterministic equation (15), we use the classical 4-stage Runge-Kutta method [21]. The SDE (9) is solved with the stochastic version of the Runge-Kutta method presented by Algorithm 1.
is the vector of coordinates of Lagrangian particles, with N L = N x N y , being the number of Lagrangian particles; in this case N x = 129, N y = 65. The Algorithm 1 Stochastic Runge-Kutta method for n = 0, 1, 2, . . . do k 1 = u(x n , t n ), l 1 = Ξ(x n , t n ), end for stochastic term Ξ(x, t) is given by Before passing to numerical results, we note that the size of the time step is a critical component for uncertainty quantification and data assimilation. The time step should be short enough that the stochastic ensemble encompasses the true solution. Our experiments show that ∆t = 24 hours properly fulfils this condition. Note that the stochastic terms in the SPDE are introduced to model the difference between the truth/resolved solution and the coarse-grained model. This coarsegraining takes place in space and time, and so the difference will depend on both the lengthscale and the timescale of the coarse graining process. Hence, the choice of time interval to compute the coarse graining will have an influence on the choice of the stochastic term (i.e., the ξ's).
First, we demonstrate that the true deterministic solutionx c is enclosed within a cloud of stochastic solutions S (also referred as a stochastic spread). To this end, we study how the area of the stochastic cloud (defined as the convex hull of the particle positions and denoted by A c ) depends on both the size of the stochastic ensemble, N , and the number of EOFs denoted by K. The results are presented in Figure 6.
We identify three key parameters which influence the area of the stochastic cloud: the number of EOFs, the size of the stochastic ensemble, and the flow velocity. As Figure 6 shows, the more EOFs are used in the stochastic model, the wider the stochastic spread becomes. This difference is more pronounced in the fast flow region, although it is relatively small, since the variance captured by 64 and 128 EOFs differs only by 3%. The same is true for the size of the stochastic ensemble and the velocity of the flow. Namely, the stochastic cloud widens as the ensemble size or the flow velocity increases. The velocity of the flow contributes much more to the size of the spread than the number of ensemble members or EOFs (compare the area of the stochastic cloud in the fast and slow region for different number of ensemble members). The results show that regardless of the flow dynamics the true solutionx c lies within the stochastic cloud almost everywhere.
To see a more global picture, we divide the flow dynamics into fast (the northern and southern boundary layers, and the jets) and slow (flow between the jets), where ⟨u⟩ is the maximum time-mean velocity, and Rd 1 is the first baroclinic Rossby deformation radius. We remark that Re can be defined by using different velocity and length scales (e.g. [41]). Our definition is focused on the mesoscale eddies characterized by the length scale up to O(100) km and jets. In terms of the Reynolds number, the flow decomposition is given by Re s < 432 and 432 ≤ Re f ≤ 1440, where Re s and Re f are the Reynolds numbers for the slow and fast flow dynamics, respectively.
As seen in Figure 7, the fast velocity regions appear intermittently between slow velocity regions. In particular, the fast flow region includes the jet-like structure and the boundary flows along the northern and southern boundary, while the slow flow regions mainly comprise the flows between the jets.
where A c s,i and A c f,i are the area of the stochastic cloud corresponding to the ith true Lagrangian particle belonging to the slow and fast flow, respectively. The dependence of the averaged area of the stochastic cloud for the slow and fast flow regions on the number of EOFs and the size of the stochastic ensemble is presented in Figure 8.  Upon analysing the results presented in Figure 8, we have found that they are in good agreement with the analysis of instantaneous snapshots ( Figure 6). In particular, the averaged area of the stochastic cloud A c is influenced by the three parameters we identified for the case of instantaneous flows: the number of EOFs, the size of the stochastic ensemble, and the flow velocity. More importantly, the qualitative behavior of A c is similar to those of A c (the area of the stochastic ensemble associated with a given Lagrangian particle). Namely, as the size of the stochastic ensemble increases so does the area of the cloud (for example, compare the red and brown lines for which the ensemble size is N = 100 and N = 400, respectively). The increase in the number of EOFs also leads to a larger area of the cloud (for instance, compare the red and magenta lines for which K = 64 and K = 128, respectively). However, it is not a significant increase, since the variance captured by 96 and 128 EOFs differs only by 3%. These results stay the same for the stochastic QG model studied in Section 3.2.5. The same is true for the flow velocity: the faster the flow, the larger the stochastic cloud (compare the red and blue lines which corresponds to the fast and slow velocity region, respectively). The most important observation here is that we can estimate the contribution of each parameter to the parameterisation. The size of the stochastic ensemble and the number of EOFs have a rather small effect on the area of the stochastic cloud. The most significant contribution comes from the velocity of the flow. Namely, the size of the stochastic cloud for fast flows is always larger than that for slow flows (compare the upper four curves with the lower four curves in Figure 8).
3.2.4. Initial conditions. The choice of the initial condition for the stochastic QG model is important, especially in the context of uncertainty quantification and data assimilation, for it significantly influences the evolution of the flow as well as its further predictability. A straightforward approach based on a random perturbation of the true solution at time t = 0.0 can inject unphysical perturbations into the flow which, in turn, can result in an unphysical solution. Therefore, in order to perform uncertainty quantification tests we need a number of independent realizations of the initial condition that are physically consistent with the flow dynamics. Namely, each independent realization of the initial condition is advected by the stochastic QG model until t 0 = 0, and thus, it is a solution to the SPDE. To this end, we start at time t = −t * with the true solution q a of the deterministic model and run it until t 0 = 0 with independent realizations of the Brownian noise W to produce independent samples from the initial condition. As a result, the ensemble of stochastic solutions (also referred to as ensemble members or particles), S(x, y), covers the true deterministic solution at time t 0 .
We initialise the ensemble in a manner consistent with performing ensemble data assimilation (using a particle filter, for example). The ensemble of initial data represents our belief about the coarse-grained solution state according to a particular scenario, which would be updated by the filter using observational data. It is important to note that in all the numerical simulations which follow, the initial stochastic ensemble is biased (if not stated otherwise), i.e. the ensemble mean is not equal to the true solution at time t = 0, since the ensemble mean can drift away from the true solution over the initial spin up interval T (see below) needed for computing the initial condition for the stochastic model. This is of course typical of a data assimilation scenario: assimilating data could reduce this bias at future times. In this work we do not assimilate data, but just investigate whether the ensemble spread is likely to be sufficient for stable data assimilation.
The goal of the next experiment is to study for how long this property holds. To this end, we introduce the following function which represents the time period spent by the true deterministic solution, q a , within the spread of stochastic solutions defined for each coordinate (x, y) as S(x, y) ∶= [min(q p (x, y, W t , t)), max(q p (x, y, W t , t))], where the minimum and maximum are computed over the stochastic ensemble, and q p (x, y, W t , t) is the solution of the parameterised QG model.
We carried out two numerical experiments to compute T S . In the first one, we start at time t * = −1 hour with the true solution q a (t * ) and run the stochastic QG model (8) until t 0 = 0 with 100 independent realizations of the Brownian noise W , and with the first 64 leading EOFs, which capture 96% of the total variance. In the second one, we do the same, but start at t * = −16 hours. Our results (not presented here) show that the spread of stochastic solutions S captures the true deterministic velocity u a , stream function ψ a , and PV anomaly q a for the heterogeneous and homogeneous flows in both experiments equally well almost everywhere in the computational domain except the neighbourhood of the northern and southern boundaries. This boundary layer dynamics is difficult to capture on the coarse grid, because of the low resolution. However, the boundary layer is very small with respect to the whole domain, and so its contribution to uncertainty quantification results is miniscule. The length of the time interval T has a minor influence on the behavior of the spread thus ensuring a better coverage of the true solution with the spread over a longer time period. Overall, we have shown that the stochastically advected deterministic initial condition provides a solid basis for uncertainty quantification tests and data assimilation, which will be the object of future research.

3.2.5.
Uncertainty quantification for the stochastic QG model. The Lagrangian evolution studied above demonstrates encouraging results. However, it cannot guarantee that the application of EOFs to the stochastic QG equations (8) is equally beneficial. Therefore, this section focuses on uncertainty quantification for the stochastic QG model. Here, we analyse the behaviour of the stochastic ensemble for two different cases. In the first case (Figure 9), we study how many points (each particular point represents the true deterministic solution at this point) in the computational domain remain within the spread, i.e. we analyse how the functioñ depends on the number of EOFs and the ensemble size. In the second case (Figure 10), we impose more stringent restrictions upon the parameterisation, and analyse how many points in the domain remain within one standard deviation of the ensemble mean, not within the whole ensemble as before. Namely, we study how the functioñ depends on the number of EOFs and the ensemble size. Here, S σ (x, y) ∶= [q p − σ(q p ),q p + σ(q p )], where σ(q p ) is the standard deviation of the ensemble mean: As seen in Figures 9 and 10, the smoother the field is, the more points in the domain remain within the spread. Namely, the stream function ψ a is enclosed within the spread in more points in the domain compared to the velocity u a (as well as v a , not shown) and PV anomaly q a . Using more EOFs leads to a better coverage of the truth. For example, using K = {1, 2, 4} EOFs gives a spread which gradually captures more individual points in the domain. However, this effect does not last for long. In particular, for K > 4 one does not observe a significantly better coverage compared with K = 4. The same conclusion holds for the size of the stochastic ensemble: the larger the size of the ensemble, the more points in the domain remain covered with the spread. However, the variation in the ensemble size has a much more minor effect on the spread than the variation in the number of EOFs. Obviously, the ratio of individual points enclosed into one standard deviation Figure 10. The same as in Figure 9, but forR Sσ .
of the mean of the stochastic spread is much less than that contained in the full spread (compare Figures 9 and 10). Figure 11. The same as in Figure 9, but for G c = 257 × 129. We show v a component of the velocity field, since it behaves differently from u a . For the higher resolution, using K = {1, 2, 4, 8, 16, 32, 64} leading EOFs allows to capture 39%, 61%, 77%, 90%, 97%, 99%, and 99.8% of the flow variability, respectively.
Next we study how the resolution influences the effect of the parameterisation, computingR S andR Sσ on a grid G c = 257 × 129 (Figures 11 and 12). As can be seen in Figures 11 and 12, many more points in the domain remain covered with the spread compared with the lower resolution. Surprisingly, the number of covered points even tends to increase in time. This unexpected result is analysed in Section 4. The results for the homogeneous flow (not shown) are qualitatively similar to those of the heterogeneous one. Figure 12. The same as in Figure 11, but forR Sσ .
To analyse a more global picture, we introduce the functioñ which shows how long the true deterministic solution remains within one standard deviation of the ensemble mean. We examine howT Sσ evolves in time at different realizations of initial conditions and the ensemble size. We also introduce another function which is computed with the stochastic QG model started from an initial condition t i , i = 1, 2, . . . , M . The function <T Sσ > isT Sσ averaged over the computational domain Σ = [0, L x ] × [0, L y ] and over the number of different initial conditions (we take M = 10 in this case, for being limited in computational resources), which are randomly chosen within the interval of one year (see Figure 13).
The results presented in Figure 13 show that the more EOFs is used in the parameterisation the longer the stochastic ensemble can cover the true solution for both the heterogeneous (Figure 13a) and homogeneous (not shown due to similar behaviour) flows. The same conclusion is true for the ensemble size: the larger the size of the ensemble, the longer the spread captures the true solution (Figure 13b). However, using yet more ensemble members does not yield a much better coverage of the true solution with the stochastic spread, especially for the homogeneous flow. Another observation is that the stochastic spread gradually loses the track of true solution in time.
For a deeper analysis of the stochastic spread, we study its reliability, which is one of the most important characteristics of the ensemble used in ensemble forecasting (see, e.g. [44]). One of the criteria for ensemble reliability is that the root mean square error of the ensemble mean, < RM SE( u ) >, is relatively close to the standard deviation of the ensemble, < σ( u ) >. In numerical weather prediction, his is called the skill-spread relationship. Here, we firstly average these quantities over the computational domain and different initial conditions t i (i = 1, 2, . . . , M ), and then compute them relative to the their spatial averages which are denoted as < RM SE( u ) > and < σ( u ) >, respectively. The dependence of < RM SE( u ) > and < σ( u ) > on the number of leading EOFs and the size of the ensemble over the time period of 24 hours is presented in Figure 14. We can see that there is a mismatch in the spread-skill, which we have explained and corrected in the sequel paper [11]. We do not show the results for the homogeneous flow, since they are very similar to those of the heterogeneous one.
We interpret the above results as follows. The difference between the solution of the SPDE approximated on the coarse scale and the projection on the same scale of the PDE solution is estimated by drawing independent samples from the solution of the SPDE and comparing their average with the coarsened solution of the PDE. The error between the two is measured by the corresponding mean square error M SE( u ). In the graphs presented in Figure 14, we can see the evolution of the square root of this quantity (more precisely its estimate obtained by using 100/200/400 independent samples) denoted by < RM SE( u ) >. The mean square error M SE( u ) is the sum between the variance of the solution of the SPDE and the square of the difference between the expected value of the solution of the SPDE and the coarsened solution of the PDE (the latter quantity is, in fact, the square of the bias of the approximation). Therefore, the root mean square error < RM SE( u ) > is always going to be larger than the standard deviation of the solution of the SPDE, denoted by < σ( u ) > . They coincide only when the bias is null. As seen in Figure 14, the difference between the root mean square error < RM SE( u ) > and standard deviation < σ( u ) >, approximated by the square root of the ensemble variance, for the heterogeneous and homogeneous flows appears at the very beginning of the simulations thus showing that the initial stochastic ensemble is biased as expected. This difference is small (1-2%) and grows slowly in time, showing that the bias remains small over the chosen timescale. The number of leading EOFs and the size of the ensemble members have only a minor effect on both the RMSE and the standard deviation.
The bias is reduced when the initial ensemble is unbiased. Of course, in this case, < RM SE( u ) > = < σ( u ) > at t = 0. The evolution of the ensemble unbiased at t = 0 is shown in Figure 15. As for the case of the biased ensemble, using more EOFs as well as the number of ensemble members leads to minor changes in < RM SE( u ) > and < σ( u ) > Despite the fact the initial ensemble is unbiased, the stochastic model still acquires a bias over time, and the discrepancy between < RM SE( u ) > and < σ( u ) > grows in time. The bias gets smaller as the grid cell decreases. In the limit in which there is no difference between the grid scale on which the PDE is computed and the grid scale on which the SPDE is computed the bias will be zero. The goal of the parameterisation is not to achieve a zero bias, but to provide a sufficiently large spread that the bias can be reduced by data assimilation, which has been studied in the sequel paper [11]. Figure 15. The same as in Figure 14, but for an unbiased ensemble. The results for the homogeneous flow look very similar.
Acting in the same vein, one can increase the amplitude of the noise which decreases the difference between < RM SE( u ) > and < σ( u ) >, but the ensemble still acquires a bias in time ( Figure 16).
All these simulations with different parameters of the stochastic model show that the bias increases and eventually becomes larger than the standard deviation. For example, for T spin = [−1, 0] hour and K = 1, the bias remains smaller than the standard deviation at 97% of the spatial grid points 1 hour after the spin-up time, but the proportion decreased to 34% of the grid points after 20 time steps (10 hours).
A substantial improvement can be obtained by increasing the number of driving Brownian motions. For example, if one uses K = {32, 64, 128}, the bias remains smaller than the standard deviation at 99% of the spatial grid points 1 hour after the spin-up, but there is a substantial benefit after 20 time steps, where the proportion of spatial grid points where bias remains less than the standard deviation increases to {59%, 60%, 60.5%}, respectively. The proportion increases as we add more and more Brownian motions up to a certain level. In this particular case, K = 32, it is close to the asymptotic value, which is approximately 60%. Another effect that we have investigated was the length of the spin-up period. If instead of T spin = [−1, 0] hour we use, say, T spin = [−12, 0] hours and T spin = [−24, 0] hours, and (K = 32), the above proportions become 75% and 81%, respectively. All these experiments show that the quality of the ensemble can be improved (when the "measure of goodness" is the bias/standard deviation comparison) over a given period of time by increasing both the number of Brownian motions driving the SPDE as well as the spin-up time. Another important conclusion is that our choice of functions ξ used in the stochastic parameterisation cannot deliver an unbiased solution.
3.3. Uncertainty quantification for the deterministic QG model. Another important question we study is to quantify how the uncertainty in the initial stochastic condition is propagated by the deterministic QG model and compare these uncertainty quantification results with the stochastic case for the grids G c = {129 × 65, 257 × 129}. For doing so, we start the deterministic QG model from the stochastic initial condition at time t = 0 (see Section 3.2.4 for details) and run it for each independent realization of the Brownian noise W , and compare the behavior of the deterministic (denoted by R S ) and stochastic (denoted byR S ) spreads computed with the deterministic and stochastic QG models, respectively. The results of this simulation are given in Figure 17.
The results in Figure 17 clearly indicate that the ratio of points covered by the spread in the deterministic case is much less compared to the stochastic case (compare Figures 17 and 9). For the higher resolution (G c = 257×129), the situation is similar (Figure 18), although more points in the domain remain within the spread compared with the lower-resolution simulation. These simulations show that the spread of stochastic solutions can track the true solution for a longer time compared with the deterministic model, thus providing solid foundations for data assimilation methods, which require the "truth" to be contained within the spread of stochastic solutions unless nudging methods are used.  To examine the length of time for which the true deterministic solution lie within one standard deviation of the stochastic (measured by <T Sσ >) and deterministic (measured by < T Sσ >) ensembles. Figure 19 shows the evolution of these quantities in time. As shown in Figure 19, the spread computed with the stochastic model keeps track of the true solution over a longer period of time compared to the one computed with the deterministic model (compare the solid lines (which correspond tõ T Sσ ; stochastic case) with the dashed lines (which correspond to T Sσ ; deterministic case) in Figure 19). This conclusion holds true for both the homogeneous and heterogeneous flows. It is important to note that despite the fact that the stochastic model has been calibrated for 24 hours (section 3.2.3), it gives very good results well beyond this time period for both heterogeneous and homogeneous flows.
In addition, we have studied how the space resolution affects the ensemble reliability by analysing forecast reliability rank histograms (Talagrand diagrams) for the stochastic QG model at different resolutions ( Figure 20). In particular, we have applied the approach used in [8]. Namely, the rank histograms are computed by randomly selecting 10 stochastic solutions out of the total of 100 every 4 hours and comparing the new observations with the stochastic solution perturbed by the noise with the same amplitude as the measurement noise.
As can be seen in Figure 20, the stochastic ensemble computed on the grid 129×65 has a bias (rank histograms are not flat; see Figure 20a) at all observation locations thus making the ensemble prediction unreliable. However, the situation changes for the better at the slightly less coarse 257 × 129 (Figure 20b). In particular, the rank histograms becomes flatter and are thus less biased. In order to correct the bias which is still there even at the higher resolution, and to produce a reliable forecast, we have proposed a data assimilation methodology which we investigated in the sequel paper [11]  Note that a pair of histograms at different resolutions shares the same location in the computation domain. Each histogram is based on 1000 forecast-observation pairs generated by solving the stochastic QG model. For simulating the stochastic QG model we use 100 stochastic solutions and 32 leading EOFs. Each stochastic solution for the rank histogram is selected randomly from the ensemble every 4 hours. The rank histograms at the higher resolution are flatter and t hus the forecasts are more reliable, although this resolution is still much less than the reference simulation. the stochastic model works. From now on, we only focus on the heterogeneous flow. It is instructive to firstly analyse whether the parameterisation can restore the number of jets when the model starts from a randomly perturbed zero initial condition ( Figure 21).
From Figure 21 we conclude that at the low resolution (G c = 129 × 65) the parameterisation cannot restore the correct number of jets, and therefore can lead to poor representation of the effects of the small scale dynamics on the resolved scales. We hypothesize that this can be due to the fact that the very low energy of the stochastic solution is not enough to properly shape the jets. In particular, the time averaged ensemble mean total energy is⟨E 129×65 ⟩ = 0.01. In contrast, at the higher resolution (G c = 257 × 129) the parameterisation demonstrates a much better performance, and restores the original number of jets. In this case, the energy is two orders of magnitude higher compared to the lower resolution case,⟨E⟩ 257×129 = 3.9.
In order to estimate the efficacy of the parameterisation we study how it performs at different resolutions and how accurate it can reproduce the truth. The results are presented in Figures 22 and 23.
The results in Figure 22 are shown for day 32 onwards, since all three solutions q a 1 , q m 1 , andq p 1 look very similar before that day. As seen in Figure 22, both the deterministic QG model (middle column) and its stochastic parameterisation (right column) gradually diffuse out the jets and ruin the structure of the flow. However, when the resolution is higher (G c = 257 × 129) the situation is different (Figure 23). In particular, the coarse-resolution deterministic model cannot maintain the jetlike structure of the flow and tries to switch to the two-jet regime it originally had (middle column). In contrast, the stochastic QG model can maintain the original structure of the flow over a much longer period of time (right column) compared with the lower resolution model. Analysis of the relative l 2 -norm error between the true solution and its approximations (modelled solution u m and parameterised solution u p ) shows that the deterministic model on the grid G c = 129×65 outperforms the stochastic parameterisation ( Figure 24, left). In particular, after approximately 7 days the parameterised solution (blue line) starts loosing the track of the truth, while the deterministic QG model started from the same stochastic initial conditions gives better results (red line). When the resolution is higher (G c = 257 × 129) the parameterised model performs much better than the deterministic one ( Figure 24, right). We hypothesise that at higher resolutions small scales are better energized thus intensifying the backscatter mechanism (e.g, [39]) which, in turn, channels the energy into the zonal jets.
In summary, by comparing the uncertainty quantification results for the heterogeneous and homogeneous flow, we have found that the stochastic spread captures the true deterministic solution q a (computed on the coarse grid G c ) for much more individual grid points in the computational domain ( Figure 17) and for much longer times ( Figure 19) than the deterministic spread (compare either the stream function ψ a 1 , velocity u a 1 , or PV anomaly q a 1 for the stochastic spread (top row) and deterministic spread (bottom row) in Figures 17 and 19). Therefore, for data assimilation, the proposed parameterisation would be considerably more preferable  than the deterministic QG model; not only for heterogeneous flows, but also for homogeneous ones. This study has also shown that the higher the resolution is, the better the stochastic parameterisation performs. Overall, we conclude that the parameterisation of the stochastic QG model is robust to large variations of the flow dynamics and governing parameters, and can be equally well applied to both homogeneous and heterogeneous flows.

5.
Conclusion and future work. In this paper we have introduced a stochastic parameterisation for unresolved eddy motions in a two-layer quasi-geostrophic channel model with forcing and dissipation. The parameterisation is based upon the idea of "transport noise", which models the modifications to the velocity field due to unresolved dynamics. This model assumes that the transport of large scale components is accurate, but that the velocity field used to transport these components is missing contributions from unresolved scales. We have proposed a method Figure 23. The same as in Figure 22, but for G c = 257 × 129. for extracting stochastic forcing by post-processing high resolution simulations, and demonstrated it by using uncertainty quantification experiments for both the SDE and stochastic QG model for homogeneous and heterogeneous flow dynamics. We have also shown that the quality of the stochastic ensemble can be improved by increasing both the number of Brownian motions driving the SPDE as well as the spin-up time. However, our choice of functions ξ used in the stochastic parameterisation cannot deliver an unbiased solution, and therefore it requires a bias correction, which can be done using data assimilation methods. Another important finding is that higher resolutions significantly improve the parameterised solution. Overall, our results show that the proposed parameterisation significantly depends on the resolution of the stochastic model and gives good ensemble performance for both homogeneous and heterogeneous flows, and they lay the solid foundation for data assimilation.
In future work, we intend to use this approach as the basis for data assimilation algorithms, to investigate the assimilation of data from a high-resolution deterministic model into a low-resolution stochastic model. We also intend to examine the derivation of the "prognostic" parameterisations derived in [17] in which the stochastic forcing patterns are determined from the coarse model itself using physical principles, rather than the "diagnostic" parameterisations of this paper where they are determined from high resolution simulations. The diagnostic approach proposed in this paper will provide important insight by comparing the diagnosed forcing with the state of the stochastic model.