MODEL REDUCTION OF CONTROLLED FOKKER–PLANCK AND LIOUVILLE–VON NEUMANN EQUATIONS

. Model reduction methods for bilinear control systems are compared by means of practical examples of Liouville–von Neumann and Fokker– Planck type. Methods based on balancing generalized system Gramians and on minimizing an H 2 -type cost functional are considered. The focus is on the numerical implementation and a thorough comparison of the methods. Structure and stability preservation are investigated, and the competitiveness of the approaches is shown for practically relevant, large-scale examples.


Introduction
Due to the growing ability to accurately manipulate single molecules by spectroscopic techniques, numerical methods for the control of molecular systems have recently attracted a lot of attention [4,30,41,61]. Key applications involve probing of mechanical properties of biomolecules by force microscopy and optical tweezers [23,32], or the control of chemical reaction dynamics by temporally shaped femtosecond laser pulses in femtochemistry [53,59]. A key feature of these small systems is that they are open systems, in that they are subject to noise and dissipation induced by the interaction with their environment, as a consequence of which the dynamics are inherently random and the description is on the level of probability distributions or measures rather than trajectories [47].
Depending on whether or not quantum effects play a role, the evolution of the corresponding probability distributions is governed by parabolic partial differential equations of either Liouville-von Neumann or Fokker-Planck type. The fact that the dynamics are controlled implies that the equations are bilinear as the control acts as an advection term that is coupled linearly to the probability distribution, but the main computational bottleneck clearly is that the equations, in spatially semi-discretized form, are high-dimensional which explains why model reduction is an issue; for example, in catalysis, optimal shaping of laser pulses requires the iterated integration of the dissipative Liouville-von Neumann (LvN) equation for reduced quantum mechanical density matrices, the spatial dimension of which grows quadratically with the number of quantum states involved [18]; cf. [36].
Many nonlinear control systems can be represented as bilinear systems by a suitable change of coordinates (as well as linear parametric systems), and it therefore does not come as a surprise that model reduction of bilinear control systems has recently been a field of intense research; see [7,12] and the references therein. In recent years, various model reduction techniques that were only available for linear systems have been extended to the bilinear case, among which are Krylov subspace techniques [44,6,16,39,45], interpolation-based approaches [1,8,24,25], balanced model reduction [2,9,49,29], empirical POD [20,21,34], or H 2 -optimal model reduction [8,25,60]. The downside of many available methods is their lack of structure preservation, most importantly, regarding asymptotic stability. In our case, positivity is an issue too, as we are dealing with probability distributions.
In this paper we compare two different model reduction techniques that represent different philosophies of model order reduction, with the focus being on practical computations and numerical tests rather than a theoretical analysis. The first approach is based on the interpolation of the Volterra series representation of the system's transfer function and gives a local H 2 -optimal approximation, because the interpolation is chosen so that the system satisfies the necessary H 2 -optimality conditions upon convergence of the algorithm; see [8] for details. The second approach is based on balancing the controllable and observable subspaces, and exploits the properties of the underlying dynamical system in that it uses the properties of the controllability and observability Gramians to identify suitable small parameters that are sent to 0 to yield a reduced-order system; for details, we refer to [29]. Both methods require the solution of large-scale matrix Sylvester or Lyapunov equations. While the computational effort of balanced model reduction is essentially determined by the solution of two generalized Lyapunov equations for controllability and observability Gramians, the effort of the H 2 -optimal interpolation method is mainly due to the solution of two generalized Sylvester equations in each step of the bilinear iterative rational Krylov algorithm (B-IRKA). We stress that both generalized Lyapunov or Sylvester equations can be solved iteratively at comparable numerical cost (for a given accuracy), but they all require the dynamics of the uncontrolled system to be asymptotically stable [55]. However, as both the dissipative LvN and Fokker-Planck operators have a simple eigenvalue zero, stability has to be enforced before solving Lyapunov or Sylvester equations, and in this paper we systematically compared stabilization techniques for both approaches.
The outline of the article is as follows: In Section 2 we briefly discuss the basic properties of bilinear systems and set the notation for the remainder of the article. Model reduction by H 2 -norm minimization and balancing are reviewed in Sections 3 and 4, along with some details regarding the numerical implementation for the specific applications considered in this paper in Section 5. Finally, in Section 6 we study model reduction of the Fokker-Planck equation comparing balancing and H 2norm minimization, and in Section 7 we carry out a similar study for the dissipative Liouville-von Neumann equation. We discuss our observations in Section 8. The article contains an appendix, Appendix A, that records some technical lemmas related to the asymptotic stability of bilinear systems.

Bilinear control systems
We start by setting the notation that will be used throughout this article. Let x(t) ∈ C n be governed by the time-inhomogeneous differential equation with coefficients A, N k ∈ C n×n , b k ∈ C n and u = (u 1 , . . . , u m ) T being a vector of bounded measurable controls u i (t) ∈ U ⊂ C. We assume that not all state variables x are relevant or observable, so we augment (2.1) by a linear output equation with C ∈ C l×n , l ≤ n. The systems of equations (2.1)-(2.2) is called a bilinear control system with inputs u(t) ∈ U m ⊂ C m and outputs y(t) ∈ C l . As is well-known, see e.g. [48,60], an explicit output representation for (2.2) can be obtained by means of successive approximations. The resulting so-called Volterra series is given as Moreover, based on a multivariate Laplace transform of these integrands, the system can alternatively be analyzed in a generalized frequency domain by means of generalized transfer functions. Since this will not be essential for the results presented here, we refrain from a more detailed discussion and refer to, e.g., [48].
2.1. Reduced-order models. We seek coefficientsÂ,N k ∈ C d×d ,b k ∈ C d and C ∈ C l×d with d n such that has an input-output behavior that is similar to (2.1)-(2.2). In other words, we seek a reduced-order model with the property that for any admissible control input u (to be defined below), the error in the output signal, is small, relative to u (in some norm) and uniformly on bounded time intervals. As will be outlined below, both model reduction schemes considered in this paper are closely related to the solutions of the following adjoint pair of generalized Lyapunov equations: where, in the first equation, we have introduced the shorthand B = (b 1 , . . . , b m ) ∈ C n×m . The Hermitian and positive semi-definite matrices P, Q ∈ C n×n are called the controllability and observability Gramians associated with (2.1)-(2.2)-assuming well-posedness of the Lyapunov equations and hence existence and uniqueness of P and Q. The relevance of the Gramians for model reduction is related to the fact that the nullspace of the controllability Gramian contains only states that cannot be reached by any bounded measurable control and that the system will not produce any output signal, if the dynamics is initialized in the nullspace of the observability Gramian [33]; as a consequence one can eliminate states that belong to ker(P ) ∩ ker(Q) without affecting the input-output behavior of (2.1)-(2.2); cf. [2].

Standing assumptions.
The following assumptions will be used throughout to guarantee existence and uniqueness of the solutions to the generalized Lyapunov equations (Assumption 1) and existence and uniqueness of the solution of the bilinear system (2.1) for all t ≥ 0 (Assumptions 2 and 3): Assumption 1: There exists constants λ, µ > 0, such that exp(At) ≤ λ exp (−µt) and where · = · 2 is the matrix 2-norm that is induced by the Euclidean norm | · |.
Assumption 2: The bilinear system (2.1)-(2.2) is bounded-input-boundedoutput (BIBO) stable, i.e., there exists M < ∞, such that for any input u with Specifically, we require that the admissible controls are uniformly bounded by with λ, µ as in Assumption 1, which by BIBO stability (Assumptions 2) implies that the output y(t) is bounded for all t ≥ 0 (cf. [52]).

H 2 optimal model reduction of bilinear systems
In this section, we recall some existing results on H 2 -optimal model order reduction for bilinear systems. For a more detailed presentation, see [60,8,24].
For a better understanding of the subsequent concepts, let us briefly focus on the linear case, i.e., N k = 0 in (2.1). Here, the Volterra series representation (2.3) simplifies to y(t) = ∞ 0 Ce As Bu(t − s) ds. If the input signal is a Dirac mass at 0, we obtain the impulse response h(t) = Ce At B. The H 2 -norm for linear systems now is simply defined as the L 2 -norm of the impulse response, i.e., Based on the latter definition and the Volterra series, in [60], the H 2 -norm has been generalized for bilinear systems as follows.
Definition 3.1. Let Σ = (A, N 1 , . . . , N k , B, C) denote a bilinear system as in (2.1). We then define its H 2 -norm by Obviously, for a bilinear system having a finite H 2 -norm, it is required that the system is stable in the linear sense, i.e., A has only eigenvalues in C − . Moreover, the matrices N k have to be sufficiently bounded. From [60], let us recall that Assumption 1 ensures that the bilinear system under consideration has a finite H 2norm, which, moreover, can be computed by means of the solution P and Q of the generalized Lyapunov equations (2.6) and (2.7), respectively. In particular, we have that Σ 2 H2 = tr(CP C * ) = tr(B * QB). Given a fixed system dimension l the goal of H 2 -optimal model order reduction now is to construct a reduced-order bilinear systemΣ such that Unfortunately, already in the linear case this is a highly nonconvex minimization problem such that finding a global minimizer is out of reach. Instead, we aim at constructingΣ such that first-order necessary conditions for H 2 -optimality are fulfilled. In [60], the optimality conditions from [58] are extended to the bilinear case. More precisely, it is shown that an H 2 -optimal reduced-order model is defined by a Petrov-Galerkin projection of the original model. Given a reduced-order system Σ, let us consider the associated error system as well as the generalized Lyapunov equations associated with it (3.

2)
A e P e + P e A * e + m k=1 N k,e P e N * k,e + B e B * e = 0, N * k,e Q e N k,e + C * e C e = 0.
Assuming the partitioning the first-order necessary optimality conditions now are In [60] the authors have proposed a gradient flow technique to construct a reducedorder model satisfying (3.4). Since here we are interested in computations for largescale systems for which this technique is not feasible, we instead use the iterative method from [8]. The main idea is inspired by the iterative rational Krylov algorithm from [28] and relies on solving generalized Sylvester equations of the form Based on a given reduced-order model (Â i ,N k,i ,B i ,Ĉ i ), the subspaces spanned by columns of the solutions X i , Y i ∈ C n×l are used to generate an updated reducedorder model. More precisely, given unitary matrices V i , W i ∈ C n×l such that span(V i ) = span(X i ) and span(W i ) = span(Y i ), we set This type of fixed-point iteration is repeated until the reduced-order model is numerically converged up to a prescribed tolerance. For more details on the iteration, we also refer to [8].

Balanced model reduction for bilinear systems
We shall briefly explain model reduction based on balancing controllability and observability. To this end we assume that the generalized Gramian matrices P, Q are both Hermitian positive definite which is guaranteed by the assumption that the bilinear system (2.1)-(2.2) is completely controllable and observable:

4.1.
Singularly perturbed bilinear systems. We consider a balancing transformation x → T −1 x under which the Gramians transform according to [42] (4.1) where the diagonal matrix Σ = diag(σ 1 , σ 2 , . . . , σ n ) with σ 1 ≥ σ 2 ≥ . . . ≥ σ n > 0 contains the real-valued Hankel singular values (HSV) of the system. Under the linear map T , the coefficients of (2.1)-(2.2) transform according to As the Hankel singular values are the square roots of the eigenvalues of the product QP , they are independent of the choice of coordinates. It can be shown (e.g. [5]) that a balancing transformation that makes the two Gramians Q and P equal and diagonal is given by the matrix where the matrices U, V, S, R are defined by the Cholesky decompositions P = S T S and Q = R T R of the two Gramians solving (2.6) and (2.7), and their singular value decomposition SR T = U ΣV T . Now suppose that Σ = (Σ 1 , Σ 2 ) with Σ 1 ∈ R d×d and Σ 2 ∈ R (n−d)×(n−d) corresponding to the splitting of the system states into relevant and irrelevant states. Further assume that Σ 2 Σ 1 in the sense that the smallest entry of Σ 1 is much larger than the largest entry of Σ 2 . The rationale of balanced model reduction is based on a continuity argument: if the space of the uncontrollable and unobservable states is spanned by the singular vectors corresponding to Σ 2 = 0, then, by continuity of the solution of (2.1)-(2.2) on the system's coefficients, small singular values should indicate hardly controllable and observable states that do not contribute much to the input-output behavior of the system.
Using the notation Σ 2 = O( ) with 0 < 1 and partitioning the balanced coefficients according to the splitting into large and small HSV, then yields the following singularly perturbed system of equations (see [29,31]): Here z = T −1 x, with z = (z 1 , z 2 ) ∈ C d × C n−d , denotes the balanced state vector where the splitting into z 1 , z 2 is in accordance with the splitting of the HSV into Σ 1 and Σ 2 . The splitting of the balanced coefficients can be understood accordingly.

4.2.
An averaging principle for bilinear systems. In order to derive reducedorder models of (2.1)-(2.2), we consider the limit → 0 in (4.3). This amounts to the limit of vanishing small HSV Σ 2 in the original bilinear system. We suppose that Assumptions 1-5 hold for all > 0. As we will show in Appendix A, the results in [11] can be modified to show that the matricesÃ 11 andÃ 22 are Hurwitz, and that their eigenvalues are bounded away from the imaginary axis. In this case, BIBO stability of the system together with the assumptions on the admissible controls imply that z 2 → 0 pointwise for all t > 0 as → 0. However, the rate at which z 2 tends to zero and hence the limiting bilinear systems clearly depends on the controls u, especially when u depends on . We give only a formal justification of the different candidate equations that can be obtained in the limit of vanishing small HSV and refer to [29] for further details.
, we expect that the first two equations in (4.3) decouple as → 0, which implies that the limiting bilinear system will be of the form The assumption that z 2 goes to zero faster than √ is the basis of the traditional balanced truncation approach in which the weakly controllable and observable degrees of freedom are eliminated by projecting the equations to the linear subspace The validity of the approximation for all t ≥ 0 requires that z 2 (0) = 0; cf. Remark 4.2 below. 4.4. Singular perturbation. If z 2 = O( √ ) the z 1 , z 2 equations do not decouple as → 0, and the limiting equation turns out to be different from (4.5). To reveal it, it is convenient to introduce scaled variables by z 2 = √ ζ by which (4.3) becomes (4.6) Equation (4.6) is an instance of a slow-fast system with z 1 being the slow variable and ζ = z 2 / √ being fast, and for non-pathological controls u, the averaging principle applies [27]. The idea of the averaging principle is to average the fast variables in the equation for z 1 against their invariant measure, because whenever is sufficiently small, the fast variables relax to their invariant measure while the slow variables are effectively frozen, and therefore the slow dynamics move under the average influence of the fast variables. This clearly requires that the convergence of the fast dynamics is sufficiently fast and independent of the initial conditions. The auxiliary fast subsystem for frozen slow variable z 1 reads It is obtained from (4.6) by rescaling the equations according to τ = t/ andζ(τ ) = ζ ( τ ),ũ(τ ) = u( τ ) and sending → 0. Since the admissible controls decay on time scales that are of order one in t (i. In other words, for fixed z 1 the fast dynamics converge to the Dirac mass δ m at m = −Ã −1 22Ã 21 z 1 . This can be rephrased by saying that for all admissible controls and in the limit → 0 the dynamics (4.6) collapse to the invariant subspace Averaging the fast variables in (4.6) against their invariant measure δ m , then yields the averaged equation for the slow variables: with the coefficients (4.10)Â The situation here is special, in that the controls decay sufficiently fast so that the invariant measure of the fast variables is independent of u. For other choices of admissible controls, however, the invariant measure may depend on u, which then gives rise to averaged equations with measure-valued right hand side [26,27,54]. The following approximation result has been proved in [29]; cf. [56].
Theorem 4.1. Let u = u ,γ in (4.6) be admissible, satisfying u(t) = u(t/ γ ) for some 0 < γ < 1. Further let y (t) be the observed solution of (4.6) with consistent initial conditions (z 1 (0), ζ (0)) = (η, −Ã −1 22Ã 21 η), and letȳ(t) denote the output of the averaged equation (4.9) on the bounded time interval [0, T ], starting from the same z 1 (0) = η. Then there exists a constant C = C(T ), such that We should stress that it is possible to relax the condition on the initial conditions that guarantees that (z 1 (0), ζ (0)) ∈ S 2 . In this case there will be a transient initial layer of thickness O( √ ), in which there is a rapid adjustment of the initial conditions to the invariant subspace S 2 and during which the averaged dynamics deviates from the original dynamics, with an O(1) error. A uniform approximation on [0, T ] can then be obtained by a so called matched asymptotic expansion that matches an initial layer approximation with the averaged dynamics [43].
Remark 4.2. For single-input systems (m = 1), a sufficient condition for BIBO stability of (2.1) is that A is Hurwitz, in which case there exists a δ > 0, such that A + sN is Hurwitz for all s ∈ [−δ, δ]. The stability of A is inherited by the Schur complementÂ, in (4.9) and consequentlyÂ + sN inherits stability, with a possibly smaller stability region. (See Appendix A for details.) Hence reduced single-input systems are again BIBO stable.

Numerical details
Before testing H 2 and balanced model reduction for examples from stochastic control and quantum dynamics, see Secs. 6 and 7, respectively, we will first focus on the numerical issues related to the scaling of the controls and the preprocessing of the unstable A matrix. 5.1. Structured bilinear systems. The subsequent numerical examples share several special properties that result from a physical interpretation and that require a careful numerical treatment. In this section, we provide some insight in how the model reduction methods are applied to the particularly structured bilinear systems. In fact, in the FPE as well as in the LvNE context, the initial setup leads to a purely bilinear system of the form In either case, the system exhibits a nontrivial stationary solution x e corresponding to a simple eigenvalue 0 of the system matrix A, i.e., Ax e = 0. For the applications we are interested in the deviation of the state x from the stationary solution. Let us therefore introduce the reference statex = x − x e that is governed by the bilinear system where the term Cx e can be interpreted as a constant nonzero feedthrough D of the system. For the reduced-order model, we thus may simply setD = Cx e such that we can simply focus on the output operator C. In accordance with standard model reduction concepts that assume a homogeneous initial condition, here we assume that the initial state of the original system is the equilibrium, i.e.,x(0) = x e −x e = 0.
While the system now has been transformed from a purely bilinear into a standard bilinear system, we still have to deal with the problem of a system matrix that is not asymptotically stable. In what follows, we present two different techniques that bypass this problem.

Sparsity preserving projection.
In our examples, the system matrices are mass and positivity preserving. Numerically this is reflected in the fact that the system matrix A as well as the bilinear coupling matrices have zero row sum. In other words, the vector 1 n := 1, . . . , 1 * ∈ C n satisfies 1 * n A = 1 * n N k = 0. The intuitive idea now is splitting the state into the direct sum of the asymptotically stable subspace and the eigenspace associated with the eigenvalue 0. Since a straightforward implementation in general will destroy the sparsity pattern of the matrices, we suggest to use a particular decomposition that has been introduced in a similar setup in [17]. Define the matrix where e n denotes the n-th unit vector in C n . An easy calculation now shows that the inverse R −1 is given as where the vectorx e ∈ C n−1 consists of the first n − 1 components of x e ∈ C n . Assume that the matrices A, N k and B are partitioned as follows Finally, a state space transformation z := R −1x yields the equivalent bilinear system Making use of the relations Ax e = 0 = A * 1 n = N * k 1 n , we conclude that the last This implies that the last component of z(t) is constant, and, due to z(0) = 0 vanishes for all times t. As a consequence, we can focus on the first n − 1 componentsz(t) of z(t) which, after some calculations, can be shown to satisfỹ Typically, the matrices A and N k result from finite difference or finite element discretization, respectively, and thus are sparse. The previous projection in fact only slightly increases the number of nonzero entries. Moreover, the matrices are given as the sum of the original data and a low rank update which can be exploited in a numerical implementation as well.
5.3. Discounting the system state. An ad-hoc alternative to the decomposition of the state space into stable and unstable directions is the "shifting" of the A-matrix by a translation A → A − αI for some α > 0. If A has a simple eigenvalue zero, as in our case, there exists an α > 0, such that the matrix A − αI is Hurwitz. For linear systems the shifting can be interpreted as a discounting of the controllability and observability functionals that renders the associated Gramians finite [15]. As the controllability and observability Gramians in the bilinear case are lacking a similar interpretation, the shifting has no clear functional analogue (cf. [10]). It is still possible to stabilize the system by a joint state-observable transformation (x, y) → (e −αt x, e −αt y) =: (x,ỹ) under which the system (2.1)-(2.2) transforms according to Even though (5.4) and (2.1)-(2.2) are equivalent as state space systems, the shifting clearly affects the Hankel singular value spectrum and, as a consequence, the reduced system. (As a matter of fact, the Hankel singular values do not even exist in case of the untransformed system.) Hence the parameter α should be regarded as a regularization parameters that must chosen as small as possible.
Later on we compare stabilization of the A matrix by state space decomposition and shifting in terms of the achievable state space reduction (i.e., decay of Hankel singular values) and fidelity of the reduced models. In the examples of model order reduction shown below, this can be achieved by a suitable scaling u → ηu, N k → N k /η, B → B/η with real η > 1 which leaves the equations of motion invariant but, clearly, not the Gramians. Hence, by increasing η, we drive the system to its linear counterpart. For the limit η → ∞, the system matrices N and B vanish and we obtain a linear system. For this reason, η should not be chosen too large.

5.5.
Calculation of the H 2 error. To quantify the error introduced by dimension reduction, we use the H 2 -norm introduced in Sec. 3. We emphasize that the effort required for computing the H 2 -error is negligible when compared to solving the generalized Lyapunov equations arising for balanced truncation and singular perturbation, respectively, which is seen as follows. Given a reduced-order system Σ, the associated H 2 -error is given as where P e solves (3.2). Using the particular structure of the error system, this is obviously the same as Σ −Σ 2 H2 = tr(CP C * ) − 2tr(CXĈ * ) + tr(ĈPĈ * ).
However, the term tr(CP C * ) now can be precomputed since P is required for the balancing-based methods anyway. What remains is the computation of the solutions X andP of the following the generalized Sylvester and Lyapunov equations, respectively Based on the results from [22], we can compute X = lim i→∞ X i andP = lim i→∞Pi as the limits of solutions to standard Sylvester and Lyapunov equations 5.6. Software. All of the numerical tests of the dynamical systems presented in the following have been carried out using the WavePacket software project which encompasses all numerical methods for model order reduction as discussed above.
Being hosted at the open-source platform Sourceforge.net, this program package is publicly available, along with many instructions and demonstration examples, see http://sf.net/projects/wavepacket and Refs. [51,50]. In addition to a mature MATLAB R version, there is also a C++ version currently under development.

Fokker-Planck equation
We start off with an example from stochastic control in classical mechanics: a semi-discretized Fokker-Planck equation (FPE) with external forcing. To this end, we consider the stochastic differential equation that governs the motion of a classical particle with position X t ∈ R n at time t > 0. The motion is influenced by the gradient of a smooth potential V , a deterministic control force u and a random forcing coming from the increments of the Brownian motion (W t ) t≥0 in R n . For simplicity we assume that the potential V is C ∞ , with Note that X t = X t (ω) is a random variable for every t > 0, and an equivalent characterization of the diffusion process X t is in terms of its probability distribution where A ⊂ R n is any measurable (Borel) subset of R n , and ρ : R n × R + → R + is the associated probability density whose time evolution is governed by the Fokker-Planck equation with the shorthand β = 2/σ 2 for the inverse temperature. The limit in the last equation, that must be understood in the sense of weak convergence of probability measures (or, equivalently, weak- * convergence), reflects our choice of deterministic initial condition X 0 = x; the regularization property of the parabolic FPE guarantees that ρ(·, t) is C 2 for any t > 0; moreover the solution stays non-negative. Later on, we will consider the case that the initial conditions are drawn from a probability density ρ 0 and thus replace δ x by ρ 0 . Note that by the divergence theorem, hence the total probability is conserved along the solution of (6.2). When u = u 0 is constant, the properties of the potential V entail that the solution to the FPE converges exponentially fast to a stationary solution ρ ∞ as t → ∞ (see, e.g., [37]). The stationary solution is then given as the unique normalized solution to the elliptic partial differential equation (6.4) 0 = ∇ · β −1 ∇ρ + ρ∇V u and has the form where we have introduced the shorthand V u (x) = V (x)−u 0 ·x for the tilted potential.
Later on we will study the convergence towards the stationary distribution that is exponential with a rate essentially given by the first non-zero eigenvalue −λ 1 > 0, and compare the fully discretized model with its reduced-order approximant.
6.1. Metastable model system. We consider the situation of a diffusive particle in R 2 that is confined by the following periodically perturbed quadruple-well potential 1 shown in Figure 1 (6.6) The potential has a deep energy well in the south-east of the x 1 -x 2 -plane, one slightly shallower well in the south-west and two even shallower wells in the northwest and north-east. The system is metastable, in that the time scale to reach the deepest potential energy well from any of the other three wells is of the order of the Arrhenius timescale e β∆Vmin 1 where ∆V min denotes the minimum energy barrier that a particle going from one well to the south-east well would have to overcome [14]. The various local minima of the potential energy surface that originate from the periodic perturbation do not have any significant effect on the transition rates between the main wells. The corresponding stationary density µ is shown in the upper left panel of Fig. 2. For moderate temperature (β = 4.0) essentially only the two main wells are populated, with considerably more weight on the deepest minimum (SE).
For sufficiently small mesh size h = (h 1 , h 2 ), the finite difference discretization is known to preserve positivity, norm and stochastic stability. As a consequence, the stationary distribution of the discretized equation is the unique asymptotically stable fixed point and approximately equal to the stationary solution µ of the original equation, evaluated at the grid points; cf. [35]. In matrix-vector notation, the discretization of (6.7) can be compactly written as (6.9) where v ∈ R n with n = n 1 n 2 is the column-wise tensorization of (w i,j ) i,j , i.e. v i+(j−1)n1 = w i,j , A ∈ R n×n is the discretization of the Fokker-Planck operator ∇ · β −1 ∇ρ + ρ∇V = β −1 ∆ρ + ∇V · ∇ρ + (∆V )ρ of the uncontrolled dynamics, and the N i are the discretization of the partial derivatives ∂/∂x i on the tensorized grid, u 1 and u 2 are the components of u.
By construction, −A is an M -matrix with a simple eigenvalue 0 that corresponds to the discretized unique stationary distribution π ≈ µ| Ω h , all other eigenvalues have strictly negative real parts. This is in contrast to the spectral properties of the original operator that is symmetric (essentially self-adjoint) when considered on the appropriately weighted Hilbert space, i.e., all its eigenvalues are real. We observe, however, that the dominant eigenvalues are real when the discretization is sufficiently fine.
Tab. 1 gives the 12 smallest eigenvalues (by their magnitude) of the matrix A and for a discretization of the domain Ω = (−6.0, 6.0) × (−5.5, 6.5) with uniform mesh size h 1 = h 2 = 0.25; the size of the resulting matrix A is 2401 × 2401. The L 1deviation between the eigenvector π to the eigenvalue λ 0 = 0 and µ evaluated at the grid points is smaller than 0.007. As the theory predicts, the matrix has 4 dominant eigenvalues close to 0 (including λ 0 = 0) that are separated from the rest of the spectrum. Figure 2 shows the 4 dominant eigenvectors of A, the first one being the stationary distribution that is essentially supported by the two deepest minima, the second one describing the dominant transition process between the deepest and the second deepest minimum, the third one representing the transitions between the second and the third deepest minimum and so on. The absolute values of the corresponding eigenvalues λ 1 , λ 2 , λ 3 < 0 represent (up to an error of order √ ) the transition rates between the dominant potential energy wells. The fact that the subdominant eigenvalues appear in clusters of 4 has to do with the approximate four-fold symmetry of the potential. By tilting the potential towards one or several of the minima (thus flattening some of the other minima) the number of eigenvalues in the dominant cluster changes according to the number of resulting wells. 6.3. Stable input-output system in standard form. We first augment (6.9) by an output equation. To this end we introduce the observable y = (y 1 , . . . , y 4 ) ≥ 0 denoting the probability for each of the four energy wells. The y i are given by summation of the density v over all mesh points corresponding to the four quadrants of the x 1 -x 2 -plane, which, using the tensorized form of the equation, can be written as (6.10) y = Cx for a matrix C ∈ R 4×n . The discretized FPE is bilinear, but it is homogeneous, i.e., it does not contain a purely linear term "Bu", which implies that no state is reachable from the origin v(0) = 0. 2 To transform (6.9) into the standard form (2.1), we follow the procedure described in Sec. 5.2.

Numerical results.
Here and throughout the following we will use the following short-hand notation when comparing results for the three approaches to model order reduction: BT stands for balanced truncation, as given by equation (4.5) in Sec. 4.3 whereas SP symbolizes the averaging principle derived from singular perturbation theory, as given by equations (4.9)-(4.10) in Sec. 4.4. Finally, H2 is the H 2 -optimal model order reduction of Sec. 3. The details of the following comparisons depend sensitively on the value of the parameter η used for scaling of the control field u(t) and matrices N k and B, which is necessary to guarantee existence and uniqueness of controllability and observability Gramians, see Sec. 5.4. For the particular example of the FPE dynamics for inverse temperature β = 4 investigated here, we use a value of η = 10 consistently for all three approaches to model order reduction. Moreover, to stabilize the A matrix we use here the projection method from Sec. 5.2. However, our results are practically unchanged when using the discounting approach described in Sec. 5.3 instead, assuming that the regularization parameter α is within a reasonable range.
The behavior of the H 2 -error defined in (5.5) for the discretized FPE is shown in Fig. 3. Similarly for all of the three methods, this error displays a plateau value of approximately 10 −5 for a reduced dimensionality of about d 60. Upon further reduction of the dimensionality we observe a rapid increase over several orders of magnitude indicating a decreased quality when reducing overly. In most cases it is found that the H 2 -error for the H2 method is slightly lower than for BT, which in turn is slightly lower than for the SP method.
While the H 2 -error characterizes the error of model order reduction for the limiting case of an infinitely short pulse (Dirac-like) control field, it may be also of interest to compare full versus reduced order models for more realistically shaped control fields. As an example we consider here the Fokker-Planck dynamics, again for β = 4, induced by a Gaussian-shaped control pulse along the x 2 -direction centered at t 0 = 150. Here σ = τ / √ 8 log 2 is chosen to yield a full width at half maximum of τ = 100 which is on the same order of magnitude as the relaxation time to equally account for the aspects of controllability and observability. The time evolution of the four above-mentioned observables (populations of the quadrants of the x 1 -x 2 plane) is shown in Fig. 4. The amplitude a = 0.5 of the pulse has been determined to drive approximately one half of the density from the lower minima (south) to the higher minima (north) at t ≈ 200. At later times, the populations return exponentially to their original values defined by the canonical density of Eq. (6.5).
Our numerical experiments show that the population dynamics for d = 100 is still practically indistinguishable from calculations in full dimensionality. When further reducing the model order down to d = 50 and d = 30, we observe that the quality of the SP method is superior to the BT or H2 method. However, despite of some minor differences, the overall performance of all three model order reduction schemes is impressive when considering that the original dimension of the problem is n = 2401. We observe that the H 2 error occasionally drops below machine precision. These occurrences appear at random and are not reproducible (depending e.g. on the computer used for the numerical calculation) and therefore we attribute them to numerical artifacts and exclude the values in the corresponding plots.
We emphasize that replacing the reduced-order model by a coarse finite-difference discretization of the advection-dominated Fokker-Planck equation is not advisable. For example, using a mesh size h 1 = h 2 = 1.25, and thus 11 grid points per dimension, corresponding to a system of dimension d = 121, we find that the error in the stationary distribution (i.e. the eigenvector to the eigenvalue λ 0 = 0) is of order 1 and that none of the dominant eigenvalues is approximated. For even larger mesh size, the eigenvalues of the matrix A cross the imaginary axis, resulting in an unstable system. Hence the recommended reduction strategy consists in first generating a sufficiently fine discretization of the original system and then reducing the dimension.

Liouville-von Neumann equation
As a second example we choose the dynamics of open q-state quantum systems. Usually those are formulated in terms of a matrix representation of the reduced density operator, ρ ∈ C q×q , the diagonal and off-diagonal entries of which stand for populations and coherences, respectively. The time-evolution of ρ is governed by a quantum master equation which, due to a formal similarity with the Liouville equation in classical mechanics, is termed Liouville-von Neumann (LvNE) equation [57,19] where we have used atomic units ( = 1). The first Liouvillian on the right hand side represents the closed system quantum dynamics where [·, ·] − stands for a commutator and where the field-free system is expressed in terms of its Hamiltonian matrix H 0 . The system can be controlled through the interaction of its dipole moment matrices µ k with electric field components F k (t) which is the lowest-order semiclassical expression for the interaction of a quantum system with an electromagnetic field. The second Liouvillian on the right hand side of (7.1) represents the interaction of the system with its environment thus accounting for time-irreversibility, i.e., dissipation and/or dephasing. A commonly used model for these processes is the Lindblad form [40] where the index c runs over all dissipation channels [18] and where [·, ·] + stands for an anti-commutator. The Lindblad operators C c describe the coupling to the environment in Born-Markov approximation (weak coupling, no memory), typically chosen to be projectors with rate constants (inverse times) Γ i←j . In order to cast the evolution equation (7.1) into the standard form of bilinear input-output systems (5.2) for deviations from the stationary solution, the density matrix ρ has to be mapped onto a vector x with n = q 2 components. Choosing the vectorization such that populations go in front of coherences offers the advantage that A is blockdiagonal with block sizes q and (n − q) where the latter block is diagonal. Moreover, the upper left submatrix of N is a zero matrix of size q × q. We note that typically both A and N are sparse matrices whereas B and C are not. For more details of the vectorization procedure and the associated construction of matrices A, N , B, and C from the LvNE, see Appendix A of Ref. [49]. With the Lindblad model introduced above, the LvNE (7.1) is trace-preserving (i.e., the sum of populations remains constant) and completely positive (i.e., the individual populations remain positive) thus ensuring the probabilistic interpretation of densities in quantum mechanics. Despite of the different discretization schemes used, the model bears many similarities with the discretized Fokker-Planck equation considered in Sec. 6, including the simple zero eigenvalue of the matrix A.
7.1. Double well model system. We apply our model reduction approaches to dissipative quantum dynamics described by a (one-dimensional) asymmetric double well potential as presented in our previous work [49, Figure 1]. Our parameters are chosen such that there are six (five) stationary quantum states which are essentially localized in the left (right) well. We also include the first ten eigenstates above the barrier separating the wells which are delocalized while even higher states are not considered for simplicity. In total, the q = 21 considered states lead to a density matrix with dimension n = 441. Thus, model order reduction can be mandatory, e.g., during a refinement of fields in optimal control. In the present model simulations, the dependence of rate constants Γ i←j with j > i describing the decay of populations (and associated decoherence) are obtained from the model of Ref. [3] which employs only one adjustable parameter which we choose as Γ ≡ Γ 0←2 ; the rates for upward transitions (i > j) are calculated from those for downward ones using the principle of detailed balance where E are the eigenvalues of the unperturbed Hamiltonian H 0 . Hence, the temperature Θ is the second parameter needed to set up matrix A (assuming Boltzmann constant k B = 1). The external control of the quantum system is modeled within the semi-classical approximation of Eq. (7.2): The electric field F (t) interacts linearly with the dipole moment µ which is assumed to be proportionate to the system coordinate of the double well system which is used to set up matrices N and B describing the controllability. To observe the system dynamics, we monitor the sums of the populations of the quantum states localized in the left and right well, and of the delocalized states over the barrier. These three quantities are used to construct the matrix C describing the observability [49].
7.2. Numerical results. As was already noted for the FPE example, the performance of the model order reduction schemes depends sensitively on the value of the parameter η used for scaling of the control field u(t) and matrices N k and B, see Sec. 5.4. For all examples from LvNE dynamics discussed here, we use a value of η = 3. In addition, the A matrices are stabilized using the projection method introduced in Sec. 5.2. Again, all results are practically unchanged when using the discounting approach described in Sec. 5.3 instead. We begin our discussion by considering the spectrum of the A-matrix as displayed in Fig. 5. With increasing dimension reduction, more and more of the eigenvalues with lowest (most negative) real parts are eliminated first. As has been detailed in Appendix A of Ref. [49], those correspond to quantum states which decay fastest. Hence, the eigenvalues of A with lowest real part are associated with lowest observability. At the same time, the order reduction tends to eliminate states with large imaginary part first. Those correspond to coherences between quantum states with large energy gaps for which the Franck-Condon (FC) factors are typically very low. Hence, the eigenvalues of A with largest imaginary part are associated with lowest controllability. A noteworthy exception are the results for d = 30 (green dots in Fig. 5) with real parts near zero. There, the imaginary parts (energy differences) near even multiples of ≈ 0.1 can be assigned to ladder climbing within each of the wells of the double well potential, while odd multiples correspond to transitions between the minima. Because the FC factors for the former ones are larger, they are more likely to be preserved in dimension reduction due to their higher controllability. This is seen most clearly in the left panel of Fig. 5, i.e., for the BT method. In summary, the model order reduction confines the spectrum of A to the lower (most controllable) and to the right (most observable) part of the complex number plane. In general, the results of the three different approaches (BT, SP, and H2 method) are very similar to each other.
To quantify the error introduced by model order reduction of the LvNE system, we consider the behavior of the H 2 -error as defined in (5.5). Our results for various values of the relaxation rate Γ (but constant temperature, Θ = 0.1) are shown in the left half of Fig. 6. The higher the value of the relaxation rate Γ, the smaller is the H 2 error and the earlier the error reaches a plateau at about 10 −10 . . . 10 −9 . Hence, dimension reduction is more effective for open quantum systems with larger rate constants for relaxation (and associated decoherence). Furthermore, it is noted that the BT and the H2 method yield similar H 2 errors at comparable computational effort so that there is no clear preference for either one of them.
Our results for various values of the temperature Θ (but constant relaxation, Γ=0.1) are shown in the right half of Fig. 6. For low (Θ = 0.07) and for medium (Θ = 0.1) temperatures, the H 2 error decreases with increasing dimensionality r and again reaches a plateau. However, at higher temperature (Θ = 0.2) the error decreases rapidly and reaches machine precision at r ≈ 100.
Again, in most cases the results for the different methods are close to each other, with the only exception being the lower temperature (Θ = 0.07), where the H 2 -error for the BT method is often found below that for the H2 method. At low temperature the system becomes less controllable, and this suggests that H2 does not always correctly capture the controllable states-which BT does by construction. As before in the Fokker-Planck example, we observe that the H 2 error occasionally drops below machine precision. As these occurrences appear at random and are not reproducible (depending e.g. on the computer used for the numerical calculation), we attribute them to numerical artifacts and exclude the values in the corresponding plots.
Finally, an example for the time evolution of the three above-mentioned observables (populations) in the asymmetric double well system (relaxation rate Γ = 0.1 and temperature Θ = 0.1) is investigated for the control field given in Eq. (6.11), here with a = 3, t 0 = 15, and τ = 10. The pulse drives the population, which is initially mainly in the left well of the potential, to delocalized quantum states over the barrier from where transitions to the right well are induced. The subsequent relaxation to the thermal distribution proceeds on a much longer time scale not shown here. In Fig. 7 we compare the results for full dimensionality (n = 441) with reduced dimensionality d. While the results for d = 100 are still essentially exact, the results for d = 50 start to deviate notably. For d = 30 only the BT method (left panel of Fig 7) reproduces the full dimensional ones qualitatively while SP method (center panel) as well as H2 method (right panel) fail completely.

Conclusions
In this paper, model reduction methods for bilinear control systems are compared, with a special focus on Fokker-Planck and Liouville-von Neumann equation. The methods can be categorized into balancing based (balanced truncation, singular perturbation) and interpolation based (H 2 optimization) reduction methods. While these methods have already been discussed in [2,49,9,8,24], our focus is on a direct and thorough comparison between all of them. Particularly, we draw the following conclusions with regard to computational complexity, accuracy and applicability to realistic bilinear dynamics. 8.1. Computational complexity. The computational effort of BT and SP is essentially determined by the solution of the two generalized Lyapunov equations (2.6) and (2.7). From a theoretical point of view, the complexity for solving these equations explicitly is O(n 6 ). On the other hand, an iterative approximation ( [22]) as described in Sec. 5.5 with r iteration steps only requires O(rn 3 ) operations (due to solving the standard Lyapunov equations in each step by a direct solver such as the Bartels-Stewart algorithm by lyap in MATLAB). As an alternative, the generalized equations can be rewritten as a linear problem which can be solved, e. g., by the bi-conjugate gradient method where it is advantageous to use the solutions of the corresponding ordinary equations for pre-conditioning.
The effort of H2 is mainly due to the solution of two generalized Sylvester equations in each step of the bilinear iterative rational Krylov algorithm (BIRKA). In contrast to BT/SP, a direct solution of these equations requires "only" O(l 3 n 3 ) operations (l denoting the dimension of the reduced model). Similarly, the cost for an iterative procedure is less since the standard Sylvester equations can be handled efficiently for sparse system matrices. Hence, a single step of BIRKA is computationally less expensive than performing the balancing step in BT/SP. However, the overall cost for BIRKA obviously depends on the number of iteration steps that is needed until the fixed point iteration is (numerically) converged, see Sec. 3. Based on the numerical examples studied here, we can not report significant differences between all three methods.

8.2.
Accuracy of reduced models. The overall performance of all three methods is very satisfactory. Both transient responses as well as spectral properties of the original model are faithfully reproduced by all reduced models (see Figs. 3-4 and 6-7. Despite the nature of H2, a significant difference of the quality (w.r.t. the H 2norm) of the reduced models cannot be observed. Also, the (moderate) additional effort for SP instead of BT does not seem to lead to more accurate reduced models. 8.3. Unstable bilinear dynamics and scaling. Both BT/SP and H2 require the dynamics of the unperturbed system to be stable. The spectrum of the matrix A representing the field-free FPE / LvNE dynamics is in the left half of the complex number plane, however, with an additional single eigenvalue zero. The effects of two different stabilization techniques, i.e. a shift of the spectrum of A versus a splitting of stable and unstable parts leads to similarly accurate results (see Secs. 5.2 and 5.3). The latter approach however has the benefit that the bilinear dynamics are not changed by projecting onto the asymptotically stable part.
For the generalized Lyapunov and/or Sylvester equations to be solvable, the norms of the matrices B and N k have to be kept below certain thresholds which is achieved by down-scaling these matrices and corresponding up-scaling of the control fields, cf. Sec. 5.4. This leaves the equations of motion invariant (but not the Gramians). Here we observe significantly different results depending on the choice/size of the scaling factor. In some cases, good results are obtained only for large scaling factors. However, we emphasize that large scaling factors drive the Gramians to those appearing for the linear(ized) system. For this reason, an automatic (large) choice of these factors is not recommended but has to be investigated for the problem under consideration on a case by case basis. From the numerical example, we believe that the scaling is a very important point for obtaining "optimal" reduced models.
8.4. Further issues. Another aspect related to the computation of the balancing transformation that we mention only for the sake of completeness is that it is often advisable to exploit sparsity and to use low-rank techniques that do not require to compute the full Gramians and their Cholesky factorization, one such example being the low-rank Cholesky factor ADI method [38,13]. These methods require some fine tuning of the parameters to enforce convergence, but for example, in case of the Fokker-Planck equation for which the matrices A and N that are extremely sparse and the rank of the matrix −BB T is much smaller than the size of the matrices A, N , there can be a considerable gain from using low-rank techniques.

Appendix A. Stability of balanced and reduced systems
We now prove that the balancing transformation (4.1)-(4.2) preserves the stability of the submatricesÃ 11 andÃ 22 . The idea of the proof essentially follows [5,Thm. 7.9]; see also [11]. We confine our attention toÃ 22 , the stability of which is needed for the averaging principle to apply, and we stress that the proof readily carries over to the proof thatÃ 11 is stable (Hurwitz). Let denote the coefficients of the balanced bilinear system for = 1. Proof. We first prove that the spectrum ofÃ 22 lies in the closed left complex halfplane (including the imaginary axis). To this end note that (2.7) implies that Now let v ∈ C n−d be an eigenvector ofÃ * 22 to the eigenvalue λ ∈ C, i.e.Ã * 22 v = λv. Multiplication of the last equation with v * and v from the both sides yields Noting that both Σ 1 and Σ 2 are positive definite, it follows that (λ) ≤ 0, thus the eigenvalues ofÃ 22 are in the left complex half-plane or on the imaginary axis. As a second step we will demonstrate that indeed (λ) < 0. We proceed by contradiction and suppose the contrary. Following [5], there exists a linear change of variables x → V x, x ∈ C n , such that withÂ 22 having eigenvalues in C − while the eigenvalues ofÂ 33 are pure imaginary. Under the change of variables, the balanced coefficients transform as follows: HereÂ 11 =Ã 11 ,N k,11 =Ñ k,11 ,B 1 =B 1 , andĈ 1 =C 1 . Accordingly, we havê

Now consider the (3, 3) block of the generalized Lyapunov equation (2.7) for the controllability Gramian that readŝ
Now let w be an eigenvector ofÂ 33 to a pure imaginary eigenvalue λ = iσ. Then sandwiching the last equation with w * and w from the left and from the right and iterating the argument from above, it follows that m k=1 which, by complete controllability and thus positivity of the matrixQ implies that (N k,31Nk,32Nk,33 ) T w = 0 for all k = 1, . . . , m. ThereforeB * 3 w = 0, and as we can pick w to be any of the linearly independent eigenvectors ofÂ 33 we conclude that B 3 = 0 ,N k,31 = 0 ,N k,32 = 0 ,N k,33 = 0 , k = 1, . . . , m .
By the same argument, using the adjoint Lyapunov equation (2.6) for the positive definite observability Gramian, it follows that

This entails that the (2, 3) block of the Lyapunov equation forQ has the form
HenceQ 23 = 0 and the analogous argument for the observability Gramian yields thatP 23 = 0. Note that the Gramians are hermitian, i.e.,Q 23 =Q * 32 andP 23 =P * 32 , which implies that the Gramians are block diagonal: The Lyapunov equations for the (1, 3) blocks thus readŝ A 13Q33 + Σ 1Â * 31 = 0 ,Â * 31P33 + Σ 1Â13 = 0 Now multiplying the first of the two equations by Σ 1 from the left and substituting Σ 1Â13 by −Â * 31P33 yieldsÂ 31P33Q33 = Σ 2 1Â * 31 . Interchanging the two Lyapunov equations we can show that Σ 2 1Â * 13 =Â 13Q33P33 . Now recall that the diagonal matrix Σ 2 contains the eigenvalues ofPQ orQP , and since the Gramians are block diagonal, it follows that Σ 2 2 contains the eigenvalues ofP 33Q33 orQ 33P33 . By the assumption that Σ 1 and Σ 2 have no eigenvalues in common, we conclude that This shows that the matrixÂ the form  Remark A.4. Note thatN k =Ñ k,11 −Ñ k,12Ã −1 22Ã 21 is not the (1,1) block of the matrixÃ −1Ñ k , but rather the (1,1) coefficient of the matrixÑ kÃ −1 unlessÃ and N commute. This has been pointed out in [46], and as a consequence, the singular perturbation approximation (4.9)-(4.10) is not the truncation of the reciprocal system as is the case for linear systems. Yet this does not affect the above argument and hence the stability of the Schur complementÂ =Ã 11 −Ã 12Ã   Table 1. Lowest twelve eigenvalues (in magnitude) of the discretization matrix A of the FPE example for inverse temperature β = 4 showing three clusters of four members each. Comparison of full versus reduced dynamics using the BT and H2 method. Results for the SP method (not shown) are very close to those for the BT method. For all practical purposes, the reduced systems for d = 200 are virtually indistinguishable from the full-rank system.