Optimal control of multiscale systems using reduced-order models

We study optimal control of diffusions with slow and fast variables and address a question raised by practitioners: is it possible to first eliminate the fast variables before solving the optimal control problem and then use the optimal control computed from the reduced-order model to control the original, high-dimensional system? The strategy"first reduce, then optimize"--rather than"first optimize, then reduce"--is motivated by the fact that solving optimal control problems for high-dimensional multiscale systems is numerically challenging and often computationally prohibitive. We state sufficient and necessary conditions, under which the"first reduce, then control"strategy can be employed and discuss when it should be avoided. We further give numerical examples that illustrate the"first reduce, then optmize"approach and discuss possible pitfalls.


Introduction
Optimal control problems for diffusion processes have attracted a lot of attention in the last decades, both in terms of the development of the theory as well as in terms of concrete applications to problems in the sciences, engineering and finance [20,39].Stochastic control problems appear in a variety of applications, such as statistics [17,16], financial mathematics [15,53], molecular dynamics [55,28] and materials science [57,6], to mention just a few.A common feature of the models used is that they are high-dimensional and possess several characteristic time scales.For instance, in single molecule alignment experiments, a laser field is used to stabilize the slowly-varying orientation of a molecule in solution that is coupled to the fast internal vibrations of the molecule, but ideally the controller would like to base the control protocol only on the relevant slow degree of freedom, i.e. the orientation of the molecule [56].
If the time scales in the system are well separated, it is possible to eliminate the fast degrees of freedom and to derive low-order reduced models, using averaging and homogenization techniques [51].Homogenization of stochastic control systems has been extensively studied by applied analysts using a variety of different mathematical tools, including viscosity solutions of the Hamilton-Jacobi-Bellman equation [8,18,1,42], backward stochastic differential equations [11,12,31], Gamma convergence [41,46] and occupation measures [37,38,36].The latter has been also employed to analyse deterministic control systems, together with differential inclusion techniques [21,58,24,5,59].The convergence analysis of multiscale control systems, both deterministic and stochastic, is quite involved and non-constructive, in that the limiting equations of motion are not given in explicit or closed form; see [35,22,33] for notable exceptions, dealing mainly with the case when the dynamics is linear.We shall refer to all these approaches-without trying to be exhaustive-as "first optimize, then reduce".
On the other side of the spectrum are model order reduction (MOR) techniques for large-scale linear and bilinear control systems that are based on tools from linear algebra and rational approximation.MOR aims at approximating the response of a controlled system to any given control input from a certain class, e.g., piecewise constant or square integrable functions; see, e.g., [25,4] and the references given there.A very popular MOR method is balanced truncation that gives easily computable error bounds in terms of the Hankel norm of the corresponding transfer functions [44,23], and which has recently been extended to deterministic and stochastic slow-fast systems, using averaging and homogenization techniques [29,26,27].In applications MOR is often used to drastically reduce the system dimension, before a possibly computational expensive optimal control problem is solved.In most real-world applications, solving an optimal control problems on the basis of the unreduced large-scale model is prohibitive, which explains the popularity of MOR techniques.We will call this approach "first reduce, then optimize".

The MOR approach: first reduce, then optimize
In this paper we focus on optimal control of diffusions with two characteristic time scales.As a representative example, we consider the diffusion of a driven Brownian particle in a two-scale energy landscape in one dimension dx s = (σu s − ∇Φ(x s , x s / )) ds + σβ −1/2 dw s , where u is any time-dependent driving force (or control variable) and w t is standard one-dimensional Brownian motion.The potential consists of a large metastable part with small-scale superimposed periodic fluctuations, Φ(x, y) = Φ 0 (x) + p(y) with p(•) a 1-periodic function.A typical potential is shown in Figure 1.Now, if u is given as a function of time, say bounded and continuous, it is known that x s converges in distribution to a limiting process x s as → 0, where x s solves the homogenized equation [52] Here 0 < A < 1 is an effective diffusivity that accounts for the slowing down of the dynamics due to the presence of local minima in the two-scale potential.The property that x weakly converges to x in the sense of probability measures will be referred to as forward stability of the homogenized equation.Now imagine a situation, in which u depends on x s via a feedback law where c(•; ) is a measurable function of x. (For simplicity, we do not consider the case that c carries an explicit time-dependence.)Specifically, we choose u from an admissible class of feedback controls so that the cost functional is minimized for some given running cost L ≥ 0 associated with the sample paths of x s and u s up to a random stopping time τ of the process.The aim of the paper is to study situations where the cost functional evaluated at u , converges to J(u), with u being the limit of u (in some appropriate sense).Specifically, we are dealing with the situation that a property that we will refer to as backward stability.If the homogenized equation is backward stable, it does not matter whether one first solves the optimal control problem and then sends to 0 or vice versa, in which case the control u is simply treated as a parameter.One of the implications then is that we can compute optimal controls from the homogenized model, such as (2), and use them in the original equation when is sufficiently small.
Unfortunately very few systems are backward stable in this sense, a notable exception being a system of the form (1) when the running cost L is quadratic in u, e.g.[38,Sec. 4.1].The reader may wonder why one should first reduce the equations before solving the optimal control problem anyway, rather than the other way round.One answer is that solving optimal control problems for high-dimensional multiscale systems is usually computationally infeasible, which often leaves no other choice; another answer is that there may be situations, in which a fully resolved model may not be explicitly available, but one only has a sufficiently accurate low-order model that captures the relevant dynamics of the system.In both cases one wants to make sure that the controls obtained from the low-order reduced model can be used in order to control the original system.

Mathematical justification of the MOR approach
In this article we consider the exceptional cases of backward stability and give necessary and sufficient conditions under which the reduced systems (disregarding the control) are indeed backward stable.It turns out that a class of optimal control problems that are backward stable are systems that are linear-quadratic in the control variable; they may be nonlinear in the state variables, though, and therefore cover many relevant applications in the sciences and engineering.Moreover we find that an additional requirement is that the controls of the multiscale system converge in a strong sense; an example of weak convergence, in which the systems fails to be backward stable due to lack of sequence continuity, is when the controls are oscillatory with rate 1/ around its homogenization limit, in case of which J (u ) does not converge to J(u) unless J is linear in u.For a related discussion of weak convergence issues in optimal control, we refer to [2,3].Similar problems for parameter estimation and filtering are discussed in [22,52,50,32,49].
Strong convergence of the control is a necessary, but not sufficient condition for backward stability of the model reduction approach (first reduce, then optimize), in which the control variable is treated as a parameter during the homogenization procedure.The class of control problems, which can be homogenized in the above way are systems of SDEs that can be transformed to systems in which the controls are absent.The class of such systems are linear-quadratic in the controls (but possibly nonlinear in the states), and can be transformed by a suitable logarithmic transformation of the value function of the optimal control problem: It can be shown (see [20]) that the log-transformed value function solves a linear boundary value problem that does not involve any control variables and can be homogenized using standard techniques.Once the linear equation has been homogenized, it can be transformed back to an equivalent optimal control problem that is precisely the limiting equation of the original multiscale control problem.A nice feature of the logarithmic transformation approach is that the optimal control can be expressed in terms of the solution of the linear boundary value problem, which can be solved efficiently using Monte-Carlo methods.This approach is helpful when the dynamics are high-dimensional, in which case any grid-based discretization of the above linear boundary value problem is prohibitive.(The case when the stopping time τ is deterministic and the log-transformed value function solves a linear transport PDE can be treated analogously.) Our approach is summarized in Table 1.
Table 1: Schematic approach of the homogenization procedure using logarithmic transformation.
The article is organized as follows: In Section 2 the model reduction approach for the indefinite time-horizon control problem with multiple time scales is outlined, with a brief introduction to dynamic programming and logarithmic transformations in Section 2.1.The model reduction problem is illustrated in Section 3 with three different numerical examples: underdamped motion of Langevin-type (Sec.3.1), diffusion in a highly-oscillatory potential (Sec.3.2), and the Gaussian linear quadratic regulator (Sec.3.3).The article contains three appendices: Appendix A discusses weak convergence under logarithmic transformations, Appendix B introduces the infinite time-horizon problem associated with the linear quadratic regulator example, Appendix C contains the proof of Theorem 3 and records various identities to bound the cost functional and the value function when using suboptimal controls.

Multiscale control problem
We start by setting the notation which we will use throughout this article.We denote by O ⊂ R n a bounded open set with sufficiently smooth boundary ∂O.Further let (z ,u s ) s≥0 be a stochastic process assuming values in R n that is the solution of where u s ∈ U ⊆ R n is the control applied at time s and w = (w s ) s≥0 is ndimensional Brownian motion and β > 0 is the (dimensionless) inverse temper-ature of the system.We assume that, for each > 0, drift and noise coefficients, b(•; ) and σ(•; ), are continuous functions on Ō, satisfying the usual Lipschitz and growth conditions that guarantee existence and uniqueness of the process [47].

Cost functional
We want to control (4) in such a way that an appropriate cost criterion is minimized where the control is active until the process leaves the set O. Assuming z ,u 0 = z ∈ O, we define τ to be the stopping time i.e., τ is the first exit time of the process z ,u s from O. Our cost criterion reads where L is the running cost that we assume to be of the form with G being continuous on Ō.Note that the -dependence of the cost functional J comes only through the dependence of the control on z ,u s .We will omit the dependence on z in J(u; z) and write it as J(u) whenever there is no ambiguity.

Logarithmic transformation
In order to pass to the limit → 0 in (4)- (7), we resort to the technique of logarithmic transformations that has been developed by Fleming and coworkers (see [20] and the references therein).We start by recalling the dynamic programming principle for stochastic control problems of the form (4)- (7).To this end we make the following assumptions (see [20, for further details on the first two of the following assumptions) :

Assumption 2
The running cost G(z) is continuous, nonnegative, and G(z) ≤ M 1 for all z ∈ Ō with bounded first order partial derivatives in z.

Assumption 3
There exist constants γ, C 1 > 0, which are independent of , such that E(exp(γτ We define the generator of the dynamics z ,u s by Notice that the generator depends on the control u.When the control is absent we will use the notation L := L (0).The next result is standard (e.g., see [20, Sec.IV.2])) and stated without proof.
be the solution of the Hamilton-Jacobi-Bellman (HJB) equation where the minimum goes over all admissible feedback controls of the form u s = c(z ,u s , s ; ).The minimizer is unique and is given by the feedback law The function V is called value function or optimal cost-to-go.The homogenization problem for ( 4)-( 7) can be studied using a multiscale expansion of the nonlinear PDE (8) in terms of the small parameter ; see, e.g., [7,38].In this article we remove the nonlinearity from the equation by means of a logarithmic transformation of the value function.Specifically, let

By chain rule,
which, together with the relation implies that ( 8) is equivalent to the linear boundary value problem for the function ψ .By the Feynman-Kac formula, (10) has an interpretation as a control-free sampling problem (see [47,Thm. 8.2.1]): where z s solves the control-free SDE Equations ( 8)-( 11) express a Legrendre-type duality between the value of an optimal control problem and cumulant generating functions [14,20]: In other words, where z ,u s satisfies the controlled SDE (4) and z s = z ,0 s .
By the above assumptions and the strong maximum principle for elliptic PDEs it follows that (10) has a classical solution ψ ∈ C 1,2 (O)∩C( Ō).Moreover, combining Assumption 3, (11) and Hölder's inequality, we have that where p = βM 1 /γ + 1 and q = γ/(βM 1 ) + 1, and thus In the course of the paper we will drop the assumption that the operator L is uniformly elliptic and instead require only that is hypoelliptic [43].In this case the matrix σσ T can be semidefinite, if the vector field b satisfies an additional controllability assumption, known as Hörmander's condition [10], which guarantees that the transition probability has a strictly positive density with respect to Lebesgue measure, in which case (10) and ( 8) have classical solutions; cf.[20, Sec.IV].

Homogenization problem
We now specify the class of multiscale systems considered in this article.Specifically, we address slow-fast systems of the form together with an exponential expectation Letting L denote the infinitesimal generator of ( 13), it holds that where Let us assume that ψ admits the following perturbation expansion in powers of : By substituting the ansatz into (15) and comparing different powers of we obtain a hierarchy of equations, the first three of which are We suppose that for each fixed x, the dynamics (13b) of the fast variables are ergodic, with the unique invariant density ρ x (y).Then by construction ρ x is the unique solution of the equation L * 0 ρ x (y) = 0, which together with the first equation of (16) implies that ψ 0 is independent of y.In order to proceed, we further assume that f 0 (x, y) satisfies the centering condition: The centering conditions, together with the strong maximum principle implies that the solution of the cell problem is unique, with ψ 1 (x, y) = Θ(x, y) • ∇ x ψ 0 (x).Multiplying ρ x (y) on both sides of the third equation in ( 16) and integrating with respect to y, we obtain where with

Homogenized control system
It follows using standard homogenization theory for linear elliptic equations (e.g.[48,51]) that for → 0 the solution of (15) converges to the leading term of the asymptotic expansion: where x s is the solution of the homogenized SDE with coefficients as given in (20).
The corresponding asymptotic expansion of the value function V for → 0 is obained by the logarithmic transformation ( 12): Therefore, using the ansatz Using the log-transformation property of the cumulant generating function (p.8), we conclude that V 0 is the value function of the optimal control problem where the minimization is subject to the homogenized dynamics According to (9), the optimal feedback law for the homogenized problem reads

Control of the full dynamics using reduced models
Our goal is to find the optimal control policy û = (û 1, , û2, ) for the fast/slow system (13) for 1.Using Theorem 1 and the asymptotic expansion of V , we have Notice that the leading terms in (25) are related to the value function of optimal control problem for the reduced SDE.This indicates that we may design the control policy from the reduced problem and use it to control the original multiscale equation.This assertion is justified by the following result for the general optimal control problem (4)- (7).Theorem 3. Let Assumptions 1,2 and 3 hold and, furthermore, suppose that < (γ/β) 1/2 and |u t − ût | ≤ uniformly in t.Then we have The proof of the theorem can be found in Appendix C.
Upon combining the above theorem with the formula for the optimal control policy in (25) we conclude that when the two time scales in the system are well separated, 1, the optimal control policy is well approximated by the leading order terms in (25) and results in a cost value that is nearly optimal.Remark 4. All considerations in this paper readily generalize to the averaging problem, i.e. when f 0 = g 1 = 0 in (13).This is not surprising since for averaging problems strong convergence ψ → ψ is expected to hold (when the diffusion coefficient α 1 in (13) is independent of the fast variable y).Related problems have been addressed in [49], in which the authors study parameter estimation and convergence of the maximum likelihood function under averaging and homogenization.

Three prototypical applications
In this section we apply the results presented in the previous section to three typical multiscale models.For each model we first state the optimal control problem along with its log-transformed counterpart, then we study the asymptotic limits of the value function and of the optimal control policy and give explicit formulae for the solution.The first two examples are taken from [49], while the third is adapted from [25].

Overdamped Langevin equation
We consider the second-order Langevin equation where 1, x ∈ R n , β > 0, and Φ being a smooth the potential energy function.Introducing the auxiliary variable y we can recast (27) as We consider the solution of the optimal control problem under the controlled Langevin dynamics We notice that ( 28) is somewhat different to the form specified in Section 2, since there is no noise and hence no control term in the equation for x .The infinitesimal generator correpsonding to (28) is hypoelliptic (rather than elliptic).Yet the standard homogenization arguments apply, for here the fast variable is y and the noise is acting uniformly in y.As a consequence the generator of the fast dynamics is uniformly elliptic, ans hence the standard theory applies.Let Assuming that the linear boundary value problem (10) associated with ψ has a classical solution, then the dual relation V = −β −1 log ψ holds and the results of the previous section carries over without alternations.

Homogenized control system
From the above and the considerations from the previous section we can conclude that the leading term of V (x, y) satisfies the optimal control problem of the homogenized SDE, which is subject to the homogenized equation Equation ( 32) is called the overdamped Langevin equation that is obtained from ( 27) by letting the inertial second-order term tend to zero [45].
We now derive an explicit asymptotic expression for the optimal feedback law û t := û2, t , with û t = ĉ (x ,u t , y ,u t ) and From (30) and the expansion ψ As before Θ is the solution to the associated cell problem.To solve it we notice that the infinitesimal generator of (28) has the form which implies that the cell problem for Θ reads with unique solution Θ(x, y) = y.Combining it with (33), we obtain the sought asymptotic expression for the optimal feedback law: with V 0 as given in (31).We therefore conclude that the optimal control û for the Langevin equation ( 27) converges to the optimal control of the overdamped equation ( 32) as → 0.Moreover, Theorem 3 guarantees that the control value is asymptotically exact if we replace û with the control û = − √ 2∇ x V 0 in the multiscale dynamics (30).Hence the overdamped equation is backward stable.

Langevin dynamics in a double-well potential
As an example consider the case n = 1, with running cost G(x) = 1 in ( 29) and random stopping time τ = inf{s > 0 : x ,u s > 2} .The dynamics are governed by the double-well potential depicted in Figure 2A.As the homogenized problem is one-dimensional, the leading term V 0 of the value function V can be computed by solving a twopoint boundary value problem.The resulting leading term (36) for the optimal control û t = ĉ (x ,u t ) is shown in Figure 2B.We then computed the cost function J = J(û ) starting from three different initial points x 0 = 1.0, 1.2, 1.5, using the approximation û t ≈ − √ 2∇ x V 0 (x ,u t ) . Figure 3 clearly shows that J approaches its infimum V 0 (x 0 ) as → 0. A clear advantage of controlling the full dynamics using the optimal control obtained from the reduced model here is that the infinitesimal generator L of the original Langevin dynamics is not self-adjoint, whereas the infinitesimal generator L of the reduced dynamics is essentially self-adjoint.That is, not only do we benefit from a lower dimensionality of the reduced-order model (by a factor of 2), but we also avoid solving a boundary value problem with a non-selfadjoint operator.

Diffusion in a periodic potential
We now consider the SDE [16,51]  x 0 = 1.0, approx x 0 = 1.2, approx x 0 = 1.5, approx Different colors correspond to different initial values x 0 .Lines marked with "×" are the value function V computed from the exponential expectation using Monte-Carlo.Lines marked with " " are the cost function J = J(û ), computed from the homogenized control with the original dynamics.We observe that the two values approach V 0 (x 0 ) as → 0 (horizontal line).
In order to relate this system with the homogenization problem studied in Section 2.2, we introduce the auxiliary variable y = x / and reformulate (37) as where x s , y s are driven by the same noise w s .The associated value function reads Notice that the same noise and the same control are applied to both equations.Clearly V (x) = V (x, x/ ) and the dual relation V (x, y) = −β −1 log ψ (x, y) applies, where ψ is defined as in Section 2.2.The generator of (40) now is

Homogenized control system
Applying the results of Section 2, we conclude that the leading term of V (x) is the value function of the following reduced-order optimal control problem: minimize subject to the homogenized dynamics with the effective diffusivity In the above formula ρ(y) = Z −1 exp(−βp(y)) denotes the invariant density of the fast variable y and Θ(y) is the solution of the Poisson equation Specifically, we have (cf.[52] for details) The value function of the homogenized control problem ( 42)-( 43) and the corresponding optimal control satisfy where Lψ 0 (x) = KL 2 ψ 0 (x) = βG(x)ψ 0 (x), ψ 0 (x) ∂O = 0, as given in (18).

Reduced model is not backward stable
In contrast to the previous example, however, the optimal control û obtained from the homogenized equation alone does meet the requirements of backward stability.This can be understood by noting that the optimal control the original dynamics is given by the feedback law which can be formally derived from the expansion ψ (x, x/ ) = ψ 0 (x) + ψ 1 (x, x/ ) + . . . .
After some manipulations we find that the asymptotic expression for c reads (46) where we used the shorthand c(x) = − √ 2K∇V 0 (x) in the last row.Therefore we conclude that c must be of the form Yet c(x, x/ ) does not converge to c(x) in any reasonable norm, for the x/ part keeps oscillating as → 0. What does converge, however, is the average: This fact is illustrated in Figure 5 which shows the oscillations of order one that are a consequence of the -periodic oscillations of the value function; since the optimal control law involves the derivative of the value function, oscillations of size in the value function turn into O(1) contributions to the optimal control.Figure 6 shows the difference between the homogenized value function V 0 (x) and its multiscale counterpart V (x) in the L 2 -norm.The figure also shows the L 2difference between the multiscale optimal feedback law c (x) and the corrected homogenized feedback law c(x, x/ ), including the oscillatory correction.This demonstrates strong O( ) convergence in L 2 of both value function and optimal control.
Remark 5.The above case is an example in which using a reduced-order models for optimal control is not recommended, for J(û ) does not converge to J(û) as → 0. Nonetheless, Theorem 3 suggests that we can use the leading term of c in (46) as an approximation of the feedback law for the multiscale dynamics (39).The effect of the corrector estimate (46), is to enforce convergence of the derivative of the value function, which entails (weak) convergence of the optimal control and convergence of the optimal cost value (cf.[16] for an application in importance sampling).

Mean first passage time and value function.
As a specific example, we have solved the optimal control problem ( 38)-( 39) for the mean first passage time, with G(x) = 1 and τ being the first passage time of the set {x ≤ 1.5}, and compared it with the solution of the homogenized system (42)- (43).The potential Φ 0 is chosen to be a tilted double-well potential, Φ 0 (x) = −5(exp (−0.2(x + 2.5) 2 ) + exp (−0.2(x − 2.5) 2 )) + 0.01x 4 + 0.8x ,    We have solved the associated boundary value problems using the finite-volume method presented in [40] using a mesh sufficiently fine for the error to be smaller than a certain threshold.The resulting value functions are presented in Figure 7.For comparison, we have also simulated the multiscale system driven by the optimal control for the homogenized system (44), with ût = ĉ(x ,u t ) and ĉ = − √ 2K∇V 0 .This situation amounts to using the (wrong) homogenized control in the original multiscale dynamics.To illustrate the shortcoming of such an approach, we have calculated the control value by Markov-jump Monte Carlo (MJMC) simulations (see [40]).As it is shown in Figure 7, equation (47) does not capture the control value J(û ) as → 0; in order to reproduce the control value correctly, one must instead use the corrected as given in (46).10).Dashed line: numerical solution of eq. ( 18).: MJMC sampling of (47).: MJMC sampling using (48).Throughout the simulations we have set β = 2

Linear-quadratic regulator
The third example is a multiscale linear quadratic regulator (LQR) problem that slightly falls out of the previous category.Specifically, we seek to minimize the time-averaged quadratic cost where I n×n denotes the n × n identity matrix.Specifically, plugging the ansatz into (51), it readily follows that S solves (52).Hence the optimal control for the linear quadratic regulator ( 49)-( 50) is given by the linear feedback law Under the above assumptions, the Riccati equation has a unique symmetric positive definite solution S for all values of > 0.Moreover, it follows that η = BB T : S , which is the principal eigenvalue of the linear eigenvalue equation for the log-transformed eigenfunction ψ = exp(−βV ).Notice that the eigefunction ψ corresponding to the principal eigenvalue −βη ≤ 0 is strictly positive as a consequence of the Perron-Frobenius theorem, hence its log transformation is well defined.

Reduced Riccati equation
Given the above assumptions on the matrices A and B, the homogenized version of the linear eigenvalue equation ( 53) can be easily computed, since the cell problem has an explicit solution.We find with the homogenized coefficients denoting the sum of the eigenvalues of the asymptotic covariance matrix of the fast degrees of freedom.The limiting eigenpair (η, ψ) is given by where S is the solution of the homogenized Riccati equation in accordance with the solution of the algebraic Riccati equation of singularlyperturbed LQR problems that has been discussed in the literature; see [22] and the references therein.It can be shown by perturbation analysis of the Riccati equation ( 52) using the Chow transformation (see, e.g., [34] and the references therein) that S corresponds to the top left k × k block of the matrix S up to O( 2 ).Moreover, for any open and bounded subset Ω ⊂ R n with smooth boundary, we have for V = −β −1 log ψ and some constant 0 < C 1 < ∞.The latter implies that uniformly on [0, τ Ω ] where τ Ω is the first exit time from Ω ⊂ R n and 0 < C 2 < ∞.For large values of β the probability that the process exits from Ω is exponentially small in β, i.e., the exit from the domain is a rare event (see, e.g., [60]) and hence we can employ the approximation τ Ω ≈ ∞ for all practical purposes.

270-dimensional ISS model
We consider the 270-dimensional model of a component of the International Space Station (ISS) that is taken from the SLICOT benchmark library [13].In this case, n = 270 and l = 3 in equation ( 49); the dimension of the slow subspace is set to k = 4, because the spectrum of dimensionless Hankel singular values of the full system shows a significant spectral gap at k = 4 when the slow variables are chosen as the observed variables; see [26] for details.The original system is Hamiltonian, but we pay no attention to the specific geometric structure of the equations here; cf.[29] for related work.The corresponding control task for the 4-dimensional reduced system thus is to minimize subject to the dynamics with Ā and B as in (55).Without loss of generality, we have ignored the additive constant Q in the cost term that appears in the homogenized eigenvalue equation (54).As before the optimal control is given by the linear feedback law ûs = − BT Sx s .
where S denotes the solution of (52).To verify the convergence of the value function numerically, we have computed eigenvalues of S and S , the matrix norms of S − S 11 and the norm of the matrix S with the S 11 block set to zero, called S r .Here S 11 refers to the upper left k × k block of the matrix S , in accordance with the notation in (50).Figure 8 shows this comparison for β = 0.01, which, given the parameters of the ISS model, amounts to the small noise regime; the plots clearly show that the convergence is of O( 2).We refrain from testing the convergence η → η of the corresponding nonlinear eigenvalue since the 1/ 2 singularity makes the evaluation of the trace term BB T : S numerically unstable for all interesting values of .

A Weak convergence under logarithmic transformations
As we have seen in Section 3.2 loss of backward stability of the model reduction approach is related to weak convergence of the multiscale controls.Weak convergence is mainly an issue for homogenization problems with periodic coefficients that do not involve any explicit time-dependence.For control problems on a finite time-horizon, a well-known result (e.g., see [48,Sec. 3] or [51,Sec. 20]) that is based on the maximum principle states that the convergence of the log-transformed parabolic equation is uniform on bounded time intervals under fairly weak assumptions.
In the indefinite time-horizon case considered in this paper, however, the lowest order approximation gives only weak convergence.In general, weak convergence is not preserved under nonlinear transformation.That is, given a weakly convergent sequence ψ on R and a nonlinear continuous function F : R → R, we have ψ ψ ⇒ F (ψ ) F (ψ) .
In our case, however, weak convergence follows from the properties of the logarithm and the fact that ψ is bounded away from 0. Let ψ be the solution of the elliptic boundary value problem (10) for T → ∞ and recall that ψ → ψ strongly in L 2 ( Ō) and ψ ψ weakly in H 1 ( Ō) .
Since log C > −∞ and O ⊂ R n is bounded it follows that log ψ ∈ L 2 ( Ō) and, by the same argument, log ψ ∈ L 2 ( Ō). Convergence now follows from the fact that log(x) is Lipschitz continuous with a Lipschitz constant L This implies strong convergence of the value function.For the optimal control, the above conditions give only weak convergence, which is implied by: Lemma 7. We have log ψ log ψ weakly in H 1 ( Ō) Proof.It suffices to show that ∇ log ψ ∇ log ψ in L 2 ( Ō).To this end recall that ∇ψ ∇ψ in L 2 ( Ō) since ψ converges weakly in H 1 ( Ō).Then, for all test functions φ ∈ L 2 ( Ō), using again that ψ ≥ C > 0 pointwise and uniformly in , We look at the two integrals separately.Using that 0 < ψ ≤ 1 it follows that Ō) and ∇ψ ∇ψ weakly in L 2 ( Ō).Now for the second integral: since the weakly convergent sequence ψ and its limit ψ are bounded in H 1 ( Ō) we conclude that ∇ψ ∈ L 2 ( Ō), which together with the boundedness of |ψ − ψ| implies that (ψ − ψ)∇ψ ∈ L 2 ( Ō).So, by the Cauchy-Schwarz inequality, which, together with the last Lemma yields the assertion.

B Ergodic control problem
We briefly discuss the ergodic control problem of Section 3.3 that is known to be related to an elliptic eigenvalue problem [30,9,19].In principle, the equivalence of ( 53) and ( 51) directly follows from the logarithmic transformation.Here, we give an alternative derivation of the associated HJB equation, starting from the underlying Kolmogorov backward equation.To this end let for a continuous bounded function G : R n → [0, ∞) Further let ϕ(z, t) be given by By the Feynman-Kac formula ϕ (z, t) is the solution of Here denotes the infinitesimal generator of our generic uncontrolled diffusion process.Setting V = −β −1 log ϕ , we can rewrite Equation ( 58) in the form η = lim t→∞ V (z, t) t .
Plugging the separation ansatz into (60)  with L, Ḡ defined in (20).Now suppose This indicates that the leading nonlinear eigenpair (η 0 , V 0 ) satisfies η 0 = lim sup By ergodicity of the controlled process, the above expectation is independent of the distribution of the initial values; see [55] and the references therein.

C Entropy bounds for the cost function
In this section we study the cost function of the optimal control problem from the point of view of change of measure.Consider the SDE where u s is any bounded measurable control that is adapted to z s .Let µ and µ u denote the path measures generated by ( 61) and (62), respectively.Then by Girsanov's theorem [47], we have that Let a cost functional be given by where G satisfies Assumption 2 from Section 2.1.Here we use the notation E µu to indicate that the expectation is understood with respect to the probability measure µ u .Moreover the dependence of J on the initial value z is omitted.Let û = argmin J(u), then from Theorem 1 we know ûs only depends on z s .Let μ denote the measure µ û for simplicity.Our purpose here is to estimate |J(u) − J(û)| when ||u − û|| L ∞ is small.We will make use of the following definition.
Definition 8.For two probability measures µ u , µ with µ u µ, the Kullback-Leibler divergence of µ u relative to μ is defined as We also assume that Assumption 3 from Section 2.1 holds: there exists γ > 0, such that E µ (e γτ ) = C 1 < +∞.As in Section 2.1, we have that Here and in the following, the conditioning on the initial value is omitted.We also need two technical estimates in order to study the convergence of the cost functional.We start with the following estimate.

Figure 1 :
Figure 1: Bistable potential (shown in red) with superimposed small-scale oscillations of period (in blue).

Figure 3 :
Figure 3: Overdamped Langevin dynamics.Cost function for different values of .Different colors correspond to different initial values x 0 .Lines marked with "×" are the value function V computed from the exponential expectation using Monte-Carlo.Lines marked with " " are the cost function J = J(û ), computed from the homogenized control with the original dynamics.We observe that the two values approach V 0 (x 0 ) as → 0 (horizontal line).

Figure 4 :
Figure 4: Controlled diffusion in a multiscale potential: minimize the transition time from the red to the blue region.

Figure 5 :
Figure 5: Value function and resulting optimal control (lower panel).

Figure 6 :
Figure 6: Strong L 2 convergence of value function and optimal control.

Figure 8 :
Figure 8: Hankel singular values and quadratic convergence of the matrix S in terms of the k dominant eigenvalues (upper left panel), the 1-1 matrix block (upper right panel) and the residual matrix S r (lower left panel); for smaller values of the numerical solution of the Riccati equation is dominated by roundoff errors, hence the results are not shown.The lower right panel shows the first 40 Hankel singular values (out of 270) when the slow variables are observed; the Hankel singular values are independent of .