BALANCED MODEL ORDER REDUCTION FOR LINEAR RANDOM DYNAMICAL SYSTEMS DRIVEN BY L´EVY NOISE

. When solving linear stochastic diﬀerential equations numerically, usually a high order spatial discretisation is used. Balanced truncation (BT) and singular perturbation approximation (SPA) are well-known projection techniques in the deterministic framework which reduce the order of a control sys- tem and hence reduce computational complexity. This work considers both methods when the control is replaced by a noise term. We provide theoretical tools such as stochastic concepts for reachability and observability, which are necessary for balancing related model order reduction of linear stochastic dif- ferential equations with additive L´evy noise. Moreover, we derive error bounds for both BT and SPA and provide numerical results for a speciﬁc example which support the theory.


1.
Introduction. Many mathematical models of real-life processes pose challenges during numerical computations, due to their large size and complexity. Model order reduction (MOR) techniques are methods that reduce the computational complexity of numerical simulations, an overview of MOR methods is provided in [1,30]. MOR techniques such as balanced truncation (BT) and singular perturbation approximation (SPA) are methods which have been introduced in [23] and [21], respectively, for linear deterministic systemṡ a balancing transformation is found, which is used to project the state space of size n to a much smaller dimensional state space (see, e.g. [1]).
Recently, the theory for BT and SPA has been extended to stochastic linear systems of the form where A, B and C as above, and N k ∈ R n×n and M k (k = 1, . . . , q) are uncorrelated scalar square integrable Lévy processes with mean zero (often q = 1 and the special case of Wiener processes are considered, see, for example, [6,9,12]). In this case BT and SPA require the solution of more general Lyapunov equations of the form where c k = E[M k (1) 2 ] for general Lévy processes. Note that c k = 1, k = 1, . . . , q for the case of a Wiener process [6]. We refer to [7,9,26,28] for a detailed theoretical and numerical treatment of balancing related MOR for (1).
In this paper we are going to study balancing related MOR for systems of the form where BdM (t) = m i=1 b i dM i (t) and b i is the ith column of B ∈ R n×m . The processes M i are the components of a square integrable mean zero Lévy process M = (M 1 , . . . , M m ) T that takes values in R m . Consequently, these components are not necessarily uncorrelated. For a general theoretical treatment of SDEs with Lévy noise we refer to [2].
The setting in (2) is of particular interest in many applications. If one is interested in a large number of different realisations of the output y(t) (e.g. to compute moments of the form E [f (y(t))]), then one needs to solve the SDE in (2) a large number of times. For a state space of high dimension this is computationally expensive. Reduction of the state space dimension decreases the computational complexity when sampling the solution to (2), as the SDE can then be solved in much smaller dimensions. Hence the computational costs are reduced dramatically.
The linear system (2) is a problem where the control is noise. In this case the standard theory for balancing related MOR applied to a deterministic system no longer applies.
Balanced truncation has been applied to linear systems with white noise before. The discrete time setting was discussed in [3]. For the continuous time setting, dissipative Hamiltonian systems with Wiener noise were treated in [11,13], but no error bounds were provided. In this paper we consider both BT and SPA model order reduction. As far as we are aware, no theory and in particular error bounds for balancing related MOR have been developed for continuous time SDEs with Lévy noise.
Using theory for linear stochastic differential equations with additive Lévy noise we provide a stochastic concept of reachability. This concept motivates a new formulation of the reachability Gramian. We prove bounds for the error between the full and reduced system which provide criteria for truncating, e.g. criteria for a suitable size of the reduced system. We analyse both BT and SPA and apply the theory directly to an application arising from a second order damped wave equation.
We now consider a particular example which explains why the above setting is of practical interest.
Motivational example. In [27] the lateral time-dependent displacement Z of an electricity cable impacted by wind was modeled by the following one-dimensional symbolic second order SPDE with Lévy noise: for t ∈ [0, T ], ζ ∈ [0, π] and α > 0. This system represents a damped wave equation with deterministic force term u(t) (scaled here with a term e −(ζ− π 2 ) 2 ) and another input term, representing, say, wind moving the electric cable, which here is modelled by multiplicative noise ∂ ∂t M 1 (t). Boundary and initial conditions are given by Z(t, 0) = 0 = Z(t, π) and Z(0, ζ), ∂ ∂t Z(t, ζ) For small > 0, the output equation is approximately the position of the middle of the cable. This system (with multiplicative noise) was treated in [27], and it was shown that transforming this SPDE into a first order SPDE and then discretising it in space, leads to a system of the form (1) where q = m = p = 1. One drawback of this approach is, that, if u ≡ 0, we have Z(t, ζ) = 0 for all t > 0 and ζ ∈ [0, π] due to the boundary and initial conditions. Consequently, wind has no impact on the cable if no forcing is applied. However, we would like to model the influence of the wind, even if there is no force term u. Therefore, a more realistic scenario, which models the wind as some form of stochastic input, is the following symbolic equation for t ∈ [0, T ], ζ ∈ [0, π] and α > 0, boundary and initial conditions as in (4), and M k the components of a square integrable mean zero Lévy process M = (M 1 , . . . , M m ) T that takes values in R m . Hence, in (6), the wind is modelled by additive noise terms (in contrast to (3), where it was modelled by multiplicative noise). In this paper, we consider a framework which covers this model. Moreover we modify the output in (5) and let so that both the position and velocity of the middle of the string are observed. Transformation and discretisation of this SPDE leads to a system of the form (2) where A is an asymptotically stable matrix, i.e. σ(A) ⊂ C − . This paper is set up as follows. Section 2 provides the theoretical tools for balancing linear SDEs with additive Lévy noise. We explain the theoretical concepts of reachability and observability in this setting and show how this motivates MOR using BT and SPA. Moreover we provide theoretical error bounds for both methods.
In Section 3 we show how a wave equation driven by Lévy noise can be transformed into a first order equation and then reduced to a system of the form (2) by using a spectral Galerkin method. Numerical results which support our theory are provided in Section 4.

2.
Balancing for linear stochastic differential equations with additive Lévy noise. In [1,21,23] balancing related MOR was considered for deterministic systems of the formẋ where A ∈ R n×n was assumed to be asymptotically stable, i.e. σ(A) ⊂ C − , B ∈ R n×m , C ∈ R p×n and u ∈ L 2 ([0, T ]) for all T > 0 was a deterministic control. We now turn our attention to a stochastic system which, in Section 3.2, represents a spatially discretised version of an SPDE. The matrices A, B and C are as above and the R m -valued process M is a square integrable Lévy processes with mean zero. One might interpret system (9) as system (8) with u(t) =Ṁ (t) but the noiseṀ (t) is no control in the classical sense. First of all, stochastic controls were not admissible in the deterministic setting and secondly the classical derivative does not exist. So, if we want to study balancing related MOR for the "particular control"Ṁ (t), we need to make sense of this setting which we do by the Ito-type SDE in (9).
In the deterministic case reachability and observability concepts are introduced to characterise the importance of states. Difficult to reach states (states which require large energy to reach them) and difficult to observe states (states which only produce little observation energy) are seen to be unimportant in the systems dynamics. In balancing related MOR the idea is to create a system, where the dominant reachable and observable states are the same. Those are then truncated to obtain a reduced order model (ROM).
Applying balancing related MOR to (9) requires a few modifications compared to the classical deterministic framework. We introduce a stochastic reachability concept in Section 2.1 which also leads to a different reachability Gramian compared to the deterministic case. For the observation concept we follow the deterministic approach. We then describe the procedure of balancing for systems with additive noise in Section 2.2 which is similar to the deterministic case. Afterwards, we will discuss two particular techniques which are BT and SPA. Since it is not a priori clear whether these approaches for system (9) perform as well as for deterministic systems, we contribute an error bound for both BT and SPA in Section 2.3. These error bounds enable us to point out the cases, where BT and SPA work well, and they can be used to find a suitable ROM dimension.
2.1. Reachability and observability. With suitable reachability and observability concepts we want to analyze which states in system (9) are unimportant and hence can be neglected.
Reachability. We begin with a stochastic reachability concept, where the particular choice of M is taken into account. Starting from zero (x 0 = 0) in equation we investigate how much the noise can control the state away from zero. We define what is meant be reachability in the stochastic case, where x(t, x 0 , M ), t ≥ 0, denotes the solution to (10) with initial condition x 0 ∈ R n and noise process M .
for every open set O ⊆ R n .
We refer to [31], where weak controllability was analyzed for equations with Wiener noise. Weak controllability turns out to be similar to condition (11).
To characterise the degree of reachability of a state, we introduce finite time reachability Gramians P (t) := E x(t, 0, M )x T (t, 0, M ) which are the covariance matrices of x(t, 0, M ) at fixed times t ≥ 0. Before we study the meaning of these Gramians, we show that P (t) is the solution of a matrix differential equation. where Proof. We replace x(t, 0, M ) by x(t) to shorten the notation in the proof. Using Ito's formula in Corollary 1, we obtain the following for x(t)x T (t), t ≥ 0: where e i is the i-th unit vector and we used x 0 = 0. Inserting the stochastic differential of x(t) yields Since the Ito integrals have mean zero, we have ..,n , where we replaced x(s−) by x(s). This does not impact the integrals since a càdlàg process has at most countably many jumps on a finite time interval (see [2, Theorem 2.7.1]). Applying Corollary 1 again, the stochastic differential of BM (t)M T (t)B T is given by: ..,n after taking the expected value. In [24,Theorem 4.44] it was shown that the covariance function is has the same jumps and the same martingale part as e T i BM , we know by (58) which concludes the proof.
To find a representation for P (t) we need the following straightforward result.
Proof. The product rule yields , and integrating gives the result.
Setting A 1 = A 2 = A and K 1 = K 2 = BQ 1 2 M in Proposition 2, we see that t 0 e As BQ M B T e A T s ds solves the differential equation (12). Since the solution to (12) is unique, we have Consequently, x T P (t)x is an increasing function. If Q M = I, then we obtain the reachability Gramian of the deterministic setting (8), see [1]. This is also the case if M is a standard Wiener process. The finite reachability Gramian P (t) provides information about the reachability of a state which we see from the following identity: Consequently, we know that x(t, 0, M ), x R n = 0, t ∈ [0, T ], P a.s. if and only if x ∈ ker P (T ) meaning that x(t, 0, M ) is orthogonal to ker P (T ). Since P (T ) is symmetric positive semidefinite, we have (ker P (T )) ⊥ = im P (T ) and hence We observe from (15) that all the states that are not in im P (T ) are not reachable and thus they do not contribute to the system dynamics. As a first step to reduce the system dimension it is necessary to remove all the states that are not in im P (T ).
We will see in the next Proposition, that the finite reachability Gramians can be replaced by the infinite Gramian since their images coincide. This (infinite) Gramian exists due to the asymptotic stability of A. It is easier to work with P since it can be computed as the unique solution to P satisfies (17) since P (t) satisfies (12) due the asymptotic stability of A. For the case Q M = I this Gramian was discussed in [1,Section 4.3] in the context of balancing for deterministic systems (8).
Proposition 3. The images of the finite reachability Gramians P (t), t > 0, and the infinite reachability Gramian P are the same, that is, Proof. Since P and P (t) are symmetric positive semidefinite, it is enough to show that their kernels are equal.
Let us now assume that we already removed all the unreachable states from (10). So, (11) holds which implies that im P = R n . We choose an orthonormal basis of R n , consisting of eigenvectors {p k } n k=1 of P , and the following representation holds: We investigate how much the noise influences x(t, 0, M ) in the direction of p k . If a state remains close to zero, it barely contributes to the system dynamics. Those states can be identified with the help of the positive eigenvalues {λ k } n k=1 of P . Using (14) and the fact that P (T ) is increasing, we obtain Hence, if λ k is small, then the the corresponding coefficient . This means that the noise hardly steers the state in the direction of p k . Consequently, the states that are difficult to reach are contained in the space spanned by the eigenvectors corresponding to the small eigenvalues of P . We continue by reasoning why using the modified reachability Gramian P associated with the stochastic system (9) is better than using the reachability Gramian P D = ∞ 0 e As BB T e A T s ds of the deterministic system (8) instead. Proposition 4. For the reachability Gramians P of the stochastic system (9) and the reachability Gramian P D of the deterministic system (8) the following properties hold: (a) In general, we have im P ⊆ im P D .
Proof. Let v ∈ ker P D , then and since v ∈ ker P if and only if 0 = v T P v, we have ker P D ⊆ ker P . Consequently, we obtain im P ⊆ im P D due to (ker P ) ⊥ = im P and (ker P D ) In this case, all the above statements are equivalent. Therefore ker P D = ker P and hence im P = im P D .
To prove (c), assume v ∈ ker P . Pre-and postmultiplying (17) with v T and v, respectively, yields is a purely positive function up to Lebesgue zero sets. Hence, such that v ∈ ker P D . Having ker P D ⊂ ker P implies (c).
By (15) and im P ⊆ im P D (see Proposition 4 (a)), we obtain One could now think of using P D instead of P . However, if case (c) in Proposition 4 holds, we have im P ⊂ im P D and hence not all unreachable states can be identified. Consequently, using P D would underestimate the set of unreachable states. Even if we assume that the system is already completely reachable (e.g. (11) holds), inequality (19) cannot be obtained with P D . This means that we cannot identify the difficult to reach states with the help of the eigenvalues of P D . In Section 2.3 we will see that P , rather than P D , enters the error bound for the ROM.
Finally we note that the reachability Gramian P D of system (8) does not depend on the input u. If a "noisy control" is used, this does not apply, since P depends on Q M and hence on the Lévy process M .
Observability. We conclude this section by introducing a deterministic observability concept for the output equation corresponding to (10) with M ≡ 0. We recall known facts from [1, Subsection 4.2.2] to characterise the importance of certain initial states in the system dynamics since we are in a situation without noise. We assume to have an unknown initial state x 0 ∈ R n in the following observation problem and aim to reconstruct x 0 from the observation y on the entire time interval [0, ∞).
it cannot be reconstructed by the observation. Otherwise, x 0 is called observable. A system a called completely observable if every initial state is observable.
In order to determine the observability of a state, we consider the energy that is caused by the observations of x 0 : where we used that The observability Gramian Q exists due to the asymptotic stability of A and is the unique solution to The above relation is obtained by replacing A and BQ M B T in (16) and (17) by A T and C T C, respectively. From (21) we see that x 0 is unobservable if and only if x 0 ∈ ker Q. Hence, the system is completely observable if and only if ker Q = {0}. Besides the unobservable states we aim to remove the difficult to observe states from the system in order to obtain an accurate ROM. The difficult to observe states are those producing only little observation energy, i.e. the corresponding observations y are close to zero in the L 2 sense. Using (21) again, the difficult to observe states are contained in the eigenspaces spanned by the eigenvectors of Q corresponding to the small eigenvalues.

2.2.
Balancing related MOR. Before considering balanced truncation (BT) and singular perturbation approximation (SPA) we summarise the general theory for balancing and how to find a balancing transformation.
States that are difficult to reach have large components in the span of the eigenvectors corresponding to small eigenvalues of the reachability Gramian P , cf. (19). Similarly, states that are difficult to observe are the ones that have large components in the span of eigenvectors corresponding to small eigenvalues of the observability Gramian Q, see (21). Hence in order to produce accurate ROMs one eliminates states that are both difficult to reach and difficult to observe. To this end we need to find a basis in which the dominant reachable and observable states are the same, which is done by a simultaneous transformation of the Gramians.
Let T ∈ R n×n be a nonsingular matrix. Transforming the states usinĝ the system (2) becomes The input-output map remains the same, only the state, input and output matrices are transformed. P and Q, the reachability and observability Gramians of the original systems which satisfy (17) and (22) can be transformed into reachability and observability Gramians of the transformed systemP = T P T T andQ = T −T QT −1 (by multiplying (17) with T from the left and T T from the right and (22) with T −T from the left and T −1 from the right). The Hankel singular values (HSVs) of σ 1 ≥ . . . ≥ σ n , where σ i = λ i (P Q), i = 1, . . . , n for the original and transformed system are the same. The above transformation is a balancing transformation if the transformed Gramians are equal to each other and diagonal. Such a transformation always exists if P, Q > 0 and can be obtained by choosing . . , σ n ) are the HSVs. Y , Z, L and K are computed as follows. Let P = KK T , Q = LL T be square root factorisations of P and Q, then an SVD of K T L = V ΣU T gives the required matrices. With this transformationP =Q = Σ.
Below, let T be the balancing transformation as stated above, then we partition the coefficients of the balanced realisation as follows: where A 11 ∈ R r×r etc. Furthermore, settingx = obtain the transformed partitioned system In this system, the difficult to reach and observe states are represented by x 2 , which correspond to the smallest HSVs σ r+1 , . . . , σ n , but of course r has to be chosen such that the neglected HSVs are small (σ r+1 σ r ). We discuss two methods (BT and SPA) to neglect x 2 leading to a reduced system of the form where A r ∈ R r×r , B r ∈ R r×m and C r ∈ R p×r (r n). Balanced truncation. For BT the second row in (25) is truncated and the remaining x 2 components in the first row and in (26) are set to zero. This leads to reduced coefficients which is similar to the deterministic case. The next lemma states that BT preserves asymptotic stability, which is known from the deterministic case, see [1,Theorem 7.9]. Lemma 2.3. Let the Gramians P and Q be positive definite and σ r = σ r+1 , then σ (A ii ) ⊂ C − for i = 1, 2, i.e. A 11 and A 22 are asymptotically stable.
The above lemma is vital for the error bound analysis in Section 2.3. Singular perturbation approximation. Instead of setting x 2 ≡ 0, a different approach is followed here. For deterministic systems (8) it can be observed that x 2 represents the fast variables of the system, i.e., x 2 is a a quasi steady state after a short time. Hence, settingẋ 2 = 0 can be a good ansatz to obtain a ROM. This is done in [21] for the determinsitic case. For the stochastic equation (25) the steady state of x 2 is not characterised by dx 2 = 0 but by a stationary probability distribution conditional on x 1 . However, following the 'naive' approach of simply setting dx 2 = 0 will lead to a ROM, where an error bound depending on the truncated HSVs can be derived. This result is given in Theorem 2.6. Now, inserting dx 2 = 0 in (25) yields an algebraic constraint where a zero initial condition is assumed. Applying Ito's product formula (57) to every summand of Inserting the differential of R and exploiting that the expectation of the Ito integrals is zero, gives where we set a(s) Since R i and e T i B 2 M have the same martingale parts and the same jumps, their compensator processes coincide by (58) and hence Applying Ito's product formula to (B 2 M ) T B 2 M and taking the expectation, we have which implies B 2 M = 0 P-a.s. Using this simplification in (28) yields which is well-defined by Lemma 2.3 and which we use in the first row of (25) and in (26). This leads to reduced order coefficients This reduced model is different to the deterministic case, that requires B r = B 1 − A 12 A −1 22 B 2 with an additional term in the output equation which does not depend on the state, see [21,Section 2]. In the deterministic case, the ROM is balanced [21], which is not true here due to the modification. Like in the deterministic case, the observability Gramian is given by Q r = Σ 1 = diag(σ 1 , . . . , σ r ). This property is obtained by multiplying (35) withÂ −T (Â is the matrix of the balanced system) from the left and withÂ −1 from the right and then evaluating the left upper block of the resulting equation, see also (44). The following example shows that the reduced order reachability is P r is not equal to Σ 1 in general which is different from the deterministic case. A is asymptotically stable and the system is balanced since P = Q = diag(4, 2, 1). We fix the reduced order dimension to r = 2 and compute the reduced order coefficients by SPA in (30). We know that Q r = diag(4, 2) but the reachability Gramian is up to the digits shown P r = ( 10.1604 2.5668 2.5668 11.8396 ) which we computed numerically. This implies that the HSVs are not a subset of the original ones anymore. Here, they are 6.5822 and 4.5822.
Since P r = Σ 1 in general, the asymptotic covariance of the reduced dynamics (t → ∞) is not consistent with the asymptotic covariance of x 1 under the full dynamics in (25). This might be unsatisfactory but could be resolved by replacing where the ROM is balanced). This replacement, however, will lead to a ROM, where an error bound as in Theorem 2.6 cannot be obtained. This lack of an error bound will be pointed out later. We conclude this Section by a stability result from [21].
Lemma 2.4. Let the Gramians P and Q be positive definite and σ r = σ r+1 , then The inverses in Lemma 2.4 exist because of the asymptotic stability of the matrices A 11 and A 22 , see Lemma 2.3.

Error bounds for BT and SPA.
Before we specify the error bounds for BT and SPA, we provide a general error bound comparing the outputs of (9) and (27) with asymptotically stable matrices A, A r and initial conditions x 0 = 0, x r,0 = 0. These outputs are then given by Ornstein-Uhlenbeck processes y r (t) = C r x r (t) = C r t 0 e Ar(t−s) B r dM (s), as mentioned in [2], see also [4,29]. Using these representations and Cauchy's inequality as well as Ito's isometry (see [24]), we obtain where Q M is the covariance matrix of M . Substitution and taking limits yields Using the definition of the Frobenius norm and the linearity of the trace, we have where P = ∞ 0 e As BQ M B T e A T s ds is the reachability Gramian of the original system satisfying (17), P r = ∞ 0 e Ars B r Q M B T r e A T r s ds the one of the reduced system satisfying and P g = ∞ 0 e As BQ M B T r e A T r s ds is the solution to Equation (33) is a consequence of Proposition 2 with A 1 = A and A 2 = A r and the fact that e As BQ M B T r e A T r s → 0 if s → ∞ due to the asymptotic stability of A and A r . The matrices P, P r and P g in the bound in (32) are all well-defined because A and A r are asymptotically stable. The error bound in (32) holds for both BT and SPA since both approaches preserve asymptotic stability, see Lemmas 2.3 and 2.4.
For both BT and SPA the representation in (32) can be used for practical computations of the error bound. The Gramian P is already available since it is required in the balancing procedure. The reduced model Gramian P r is computationally cheap because it is low dimensional assuming that we fix a small ROM dimension. The same is true for P g since it has only a few columns which makes the solution to (33) easily accessible. Since the error bound (32) is computationally cheap, it can be computed for several ROM dimensions and hence be used to find a suitable r.
In the next two Theorems we specify the general error bound in (32) for both BT and SPA and represent it in terms of the truncated HSVs σ r+1 , . . . , σ n of the system. Using the balanced realisation (24) of the original system withP =Q = Σ and its corresponding partition, we have and . . , σ r ) and Σ 2 = diag(σ r+1 , . . . , σ n ).
Theorem 2.5. Let y BT be the output of the reduced order system obtained by BT, then under the assumptions of Lemma 2.3, we have where P g,2 are the last n − r rows of T P g with T being the balancing transformation.
Proof. Evaluating the left and right upper block of (35) yields From (32) the error bound has the form = tr(CP C T ) + tr(C 1 P r C T 1 ) − 2 tr(CP g C T 1 ), since C r = C 1 . Using the balancing transformation T and the partition of CT −1 in (24), we obtain tr(CP C T ) = tr( . Now, the left upper block of (34) is such that P r = Σ 1 . Using the partitions of CT −1 and T P g = P g,1 P g,2 , we obtain tr(CP g C T 1 ) = tr(CT −1 T P g C T 1 ) = tr(C 1 P g,1 C T 1 ) + tr(C 2 P g,2 C T 1 ). Inserting these results into (38) gives Using tr(C 2 P g,2 C T 1 ) = tr(P g,2 C T 1 C 2 ) and substituting (37) yields .
where V = (A −1 22 A 21 ) T and P g,1 are the first r and P g,2 the last n − r rows of T P g with T being the balancing transformation.
If we multiply (35) withÂ −T from the left hand side and select the left and right upper block of this equation, we obtain Furthermore, multiplying (35) withÂ −T from the left and withÂ −1 from the right, the resulting left upper block of the equation is and thusĀ We define := tr CP C T + tr C P rC T − 2 tr CP gC T 1 2 which is the error bound for SPA. From the proof of Theorem 2.5 we know that the following holds By (44) and the definition of the reachability equation of the ROM, we have tr C P rC T = tr P rC TC = − tr P r (Ā T Σ 1 + Σ 1Ā ) = − tr Σ 1 (P rĀ T +ĀP r )

This leads to
We multiply (33) with the balancing transformation T from the left (here A r =Ā) and use the partitions of T AT −1 , T B from (24) and the partition of T P g = P g,1 P g,2 , and have We obtain tr CP gC T = tr CT −1 T P gC T = tr C 1 P g,1C T +tr C 2 P g,2C T using the partition of CT −1 in (24). With (43) we obtain ). Inserting the upper block of (45) leads to tr(C 2 P g,2C T ) = tr(Σ 2 A 22 P g,2 A T 21 A −T 22 ) + tr(Σ 1 (B 1 Q M B T 1 + P g,1Ā T + A 11 P g,1 )).
Using (42) and the properties of the trace function we have tr(Σ 1 (P g,1Ā T + A 11 P g,1 )) = − tr(P g,1C , which provides the required result.
The error bound representations in Theorems 2.5 and 2.6 depend on the n − r smallest HSVs σ r+1 , . . . , σ n . If the corresponding truncated components are unimportant, i.e. they are difficult to reach and observe, then the values σ r+1 , . . . , σ n are small and consequently the error bound is small. Hence, the ROM is of good quality.
The error bounds in Theorems 2.5 and 2.6 can be used to find a suitable reduced order dimension r. Small HSVs σ r+1 , . . . , σ n for fixed r would guarantee a small error.
where f 1 (Σ 2 ) is the bound in Theorem 2.6 when setting B r = B 1 and T . Due to this further term depending on Σ 1 , the quality of the ROM with B r = B 1 − A 12 A −1 22 B 2 cannot be estimated in terms of the truncated HSVs. Finally we note that the error bounds in Theorems 2.5 and 2.6 only hold for zero initial conditions as it is common with standard BT and SPA for deterministic systems. The rationale behind this is as follows: For non-zero intitial condition the part of the solution to the SDE involving the initial condition (e.g. Ce At x 0 ) needs to be computed only once and can be done offline (like the expensive part of reducing the model). Then, to generate several realisations of the output, the second and stochastic part (right side of equation (31)) can be solved online (and multiple times) in a reduced state space dimension, and hence with reduced computational complexity. Inhomogeneous initial conditions have been considered for deterministic systems [5,15] but the authors are unaware of any extensions to stochastic systems.
The next section provides a particular SDE to which we will apply the theory developed in this section.
3. Wave equations controlled by Lévy noise. In this section, we deal with a setting that covers the SPDE with its output in (6)-(7), a damped wave equation with additive noise which can formally be interpreted as withD i = 0 (i = 1, 2), α > 0 and the kth component of the control u k ≡Ṁ k , k ∈ {1, . . . , m}. Here, M 1 , . . . , M m are the components of an R m -valued Lévy processes M that is square integrable and has mean zero. This is in contrast to the setting in [27] where (46) with multiplicative Lévy noise was considered, e.g. D i = 0 linear bounded operators and u an m-dimensional stochastic control,M 1 andM 2 uncorrelated scalar Lévy processes. For the stability analysis of the uncontrolled equation (46) with Wiener noise (u ≡ 0) we refer to [8].
Since Lévy noise is no feasible control in the framework in [27], this setting requires further analysis. We transform damped wave equation with additive noise into a first order SPDE and define the corresponding solution in Section 3.1, following the approach in [8,27]. In Section 3.2, we explain how the resulting first order SPDE can be approximated by a spectral Galerkin scheme. We refer to [10,14,19,27], where similar techniques were applied.
3.1. Setting and transformation into a first order stochastic PDE. Let M = (M 1 , . . . , M m ) T be a square integrable Lévy processes with zero mean that takes values in R m . Moreover, M is defined on a complete probability space (Ω, F, (F t ) t≥0 , P), 1 it is adapted to the filtration (F t ) t≥0 and its increments M (t + h) − M (t) are independent of F t for t, h ≥ 0.
LetÃ : D(Ã) →H be a self adjoint and positive definite operator on a separable Hilbert spaceH and let {h k } k∈N be an orthonormal basis of eigenvectors ofÃ for H,Ãh where 0 <λ 1 ≤λ 2 ≤ . . . are the corresponding eigenvalues. We denote the well-defined square root ofÃ byÃ The (symbolic) second order SPDE we consider is given bÿ with initial conditions Z(0) = z 0 ,Ż(0) = z 1 , α > 0 and output We assumeB k ∈H and C ∈ L(D(Ã we transform this second order system into a first order system following the approach in [8,27]. The system (48)-(49) can be expressed as: where So far, we only worked with symbolic equation, since the classical derivative of a Lévy process does not exist in general. The next lemma from [25] provides a stability result and is vital to define a càdlàg mild solution of (50). 2 ) generates an exponential stable contraction semigroup (S(t)) t≥0 with We use this result to define the solution to (50).
2. An (F t ) t≥0 -adapted càdlàg process (X (t)) t≥0 is called mild solution to (50) if for all t ≥ 0 We refer to [24] for the definition of the Ito integral in (52). For the finite dimensional case, the definition of an Ito integral with respect to Lévy processes can be found in [2] and Ito integrals with respect to martingales are defined in [20].

Numerical approximation.
We study a spectral Galerkin scheme to approximate the mild solution of (50) with output (51), similar to the approach in [10,14,19,27] (mainly for SPDEs with Wiener noise). This approximation is based on a particular choice of an orthonormal basis {h k } k∈N of H, given by where {h k } k∈N and {λ k } k∈N are defined in (47), see [27]. In (6)- (7), we haveÃ = − ∂ 2 ∂ζ 2 on [0, π]. In this caseh k = 2 π sin(k·) andλ k = k 2 for k ∈ N. To approximate the H-valued process X in (50), we construct a sequence (X n ) n∈N of finite dimensional adapted càdlàg processes with values in H n = span {h 1 , . . . , h n }, defined by For the mild solution to (54), let (S n (t)) t≥0 be a C 0 -semigroup on H n given by for all x ∈ H. It is generated by A n such that the mild solution of equation (54) is Since A n is bounded, the C 0 -semigroup on H n is represented by S n (t) = e Ant , t ≥ 0. We formulate the main result of this section, which uses ideas from [10,14,19,27] and is proved in Appendix B. In the following, we make use of the property that the mild and the strong solution of (54) coincide, since we are in finite dimensions.
We write the output y n of the Galerkin system as an expression depending on the Fourier coefficients of the Galerkin solution X n . The coefficients of y n are y n (t) = y n (t), e R p = CX n (t), e R p = n k=1 Ch k , e R p X n (t), h k H , for = 1, . . . , p, where e is the -th unit vector in R p . We set and obtain y n (t) = Cx(t). The components Using the Fourier series representation of X n , we obtain Hence, the vector of Fourier coefficients x is given by where A = Ah j , h i H i,j=1,...,n = diag(E 1 , . . . , E n 2 ) with andλ the eigenvalues ofÃ, and b k = ( B k , h i H ) i=1,...,n for k = 1, . . . , m. We will often make use of the compact form of the SDE in (55) which is Applying the spectral Galerkin method to the system (6)-(7) the matrices of the semi-discretised system (55) are given by and C = [Ch 1 , . . . , Ch n ] with where we assume n to be even and = 1, . . . , n 2 .  Figure 1. Galerkin solution to the stochastic damped wave equation in (6).
In Figure 1 we plot the numerical solution to the stochastic damped wave equation for ζ ∈ [0, π] and in the time interval [0, π] where we set α = 2, n = 1000 and m = 2 (e.g. 2 stochastic inputs). The weighting functions for the two inputs are f 1 = 2 exp(−(x − π/2) 2 ) and f 2 = sin(x) exp(−(x − π/2) 2 ). The noise processes are M 1 (t) = W (t) √ 2 and M 2 (t) = N (t) i=1 K i is a compound Poisson process, where (N (t)) t≥0 is a Poisson process with parameter equal to 1, K i ∼ U(− √ 6, √ 6) are independent uniformly distributed jumps and W is a standard Wiener process. M 1 and M 2 are independent. The plot in Figure 1 shows a particular realisation of the solution to (6) at 6 specific times t ∈ {0.9, 1.2, 1.7, 2.1, 2.46, 2.93}. We see that the string moves up and down as expected due to the nonzero (stochastic) input. We observe that the third snapshot is taken after a jump occured in stochastic process. The corresponding output, namely both the position and the velocity in the middle of the string, is shown in Figure 2. In the plot for the velocity the noise generated by the Lévy process can be seen. The trajectory of the velocity is impacted by Lévy noise with jumps, where the velocity (e.g. the impact by wind) is randomly increased or reduced. The trajectory for the position of the cable in Figure 2 is smoother as it is the integral of the velocity. Finally, Figure 3 shows the velocity versus the position of the string, for the same sample path, in a phase portrait. The four jumps are clearly visible. 4. Numerical examples for MOR. We consider the spectral Galerkin discretisation of the second order damped wave equation which we discussed in detail in Section 3, and in particular, the example in (6)-(7) with two stochastic inputs and two outputs, namely position and velocity of the middle of the string. We set α = 2 and choose the weighting functions f i and the noise processes M i (i = 1, 2) as in Figure 1. We fix the state dimension to n = 1000 and reduce the Galerkin solution by BT and SPA. For computing the trajectories of the SDE we use the Euler-Maruyama method (see, e.g. [16,17]). Figures 4 and 5 show the logarithmic errors for the position y 1 and the velocity y 2 of the middle of the string, if MOR by BT is applied to the wave equation with stochastic inputs when reduced models of dimension 6 and 24, respectively, are computed. The first two plots in each of the figures show the logarithmic mean error for both the position and the velocity. One observation is that the position is generally more accurate than the velocity (about   one order of magnitude here), since the trajectories are smoother. Moreover, comparing the expected values of the errors of the reduced model of dimension r = 6 (first two plots in Figure 4) with the one of dimension r = 24 (first two plots in Figure 5) it can be seen that the latter ones are more accurate (an improvement of about one order of magnitude) as one would expect. The last two plots in Figures  4 and 5 show the logarithmic errors for position and velocity for one particular trajectory, which is the same as the one for the sample we considered in Section 3. Figures 6 and 7 show the logarithmic errors for the position y 1 and the velocity position is smaller than the error in the velocity, and, the error is smaller if a larger dimension of the reduced order model is used. Finally, we compare the error bounds for BT (see Theorem 2.5) and SPA (see Theorem 2.6) with the worst case mean errors, that is sup t∈[0,π] E y(t) − y r (t) R p for both methods in Table 1, where y = y 1 y 2 T is the full output of the original model and y r the ROM output. First, as expected both mean errors and error bounds are getting smaller the larger the size of the ROM. Moreover, both error bounds are rather tight and close to the actual error of the ROM, e.g. the bounds, which are worst case bounds also provide a good prediction of the true time domain error. We also note that BT performs better than SPA, both in actually computed mean errors as well as in terms of the error bounds.

5.
Conclusions. We have presented theory for balancing related model order reduction (MOR) applied to linear stochastic differential equations (SDEs) with additive Lévy noise. In particular we extended the concepts of reachability and observability to stochastic systems and formulated a new reachability Gramian. We then showed how balancing related MOR which is well known for deterministic systems can be extended to SDEs with additive Lévy noise, e.g. leads to the solution of a Lyapunov equation (with a slightly different right hand side). We proved a general error bound for reduced (asymptotically stable) systems in this setting and then gave specific bounds for balanced truncation (BT) and singular perturbation approximation (SPA) which depended on the neglected (small) Hankel singular values of the linear system. We finally applied our theory to a second order damped wave equation, discretised using a spectral Galerkin method, and controlled by Lévy noise. The numerical results showed that MOR can be applied successfully and that errors for both BT and SPA are small, and the error bounds tight.