A SPARSE MARKOV CHAIN APPROXIMATION OF LQ-TYPE STOCHASTIC CONTROL PROBLEMS

. We propose a novel Galerkin discretization scheme for stochastic optimal control problems on an indeﬁnite time horizon. The control problems are linear-quadratic in the controls, but possibly nonlinear in the state variables, and the discretization is based on the fact that problems of this kind admit a dual formulation in terms of linear boundary value problems. We show that the discretized linear problem is dual to a Markov decision prob- lem, prove an L 2 error bound for the general scheme and discuss the sparse discretization using a basis of so-called committor functions as a special case; the latter is particularly suited when the dynamics are metastable, e.g., when controlling biomolecular systems. We illustrate the method with several nu- merical examples, one being the optimal control of Alanine dipeptide to its helical conformation.

1. Introduction. A large body of literature, going back to the seminal work by Kramers [25] in the late 1930s, is concerned with the question: How well can a continuous diffusion in a multi-well energy landscape be approximated by a Markov jump process (MJP) in the regime of low temperatures?. Qualitatively, the approximation should be good if the system under consideration is metastable, in which case the process stays in the neighbourhood of the potential energy minima for a long time and occasionally makes rapid transitions (jumps) between the wells. These metastable regions then become the states of the MJP, and the jump rates are determined by the frequency of the transitions (see, e.g., [33,36]).
In this article we consider the approximation of optimal control problems with nonlinear diffusive dynamics by discrete Markov decision problems. This situation is more complicated than in Kramers' case, for one has to approximate the dynamics as well as the corresponding cost functional and the resulting control forces. The difficulty comes from the fact that dynamics enter as a constraint in the optimization of the cost functional, which makes the problem nonlinear, because the controls become a priori unknown functions of the state variables. Specifically, we consider reversible diffusions on an unbounded domain, with a cost functional that is linearquadratic in the controls, but possibly nonlinear in the state variables, and that is defined up until some random stopping time. The control problem admits a dual formulation in form of linear boundary value problem that is amenable to a discretization by standard means (cf. [39,50]). Problems of this kind appear relevant in various applications, including molecular dynamics [43], material science [44] or quantum computing [34], to mention just a few. We propose a Markov chain approximation (MCA) that is based on a Galerkin discretization the underlying dynamic programming equations, with basis functions that cover the metastable sets of the dynamics. The discretized problem can be interpreted as a Markov decision problem where the dynamics is given by a continuous-time Markov process on the metastabe sets (so-called core sets). As these core sets are not assumed to cover the state space, the method is sparse and meshless and hence can be applied to large-scale problems.
MCA are a versatile tool to approximate stochastic control problems. Going back to Kushner [26], the idea of MCA is to approximate the spatially discretized dynamics by a Markov chain and reformulate the underlying continuous control problem as a control problem for the approximating chain. For a spatial discretization with uniform grid, the MCA amounts to a finite-difference discretization of the dynamic programming equation [28,30]. Recent advances in MCA include collocation-based schemes with radial basis functions [24], finite volume approximations [48], switched systems and jump diffusions [42], or multi-player differential games [41].
The MCA that we propose here for the indefinite time-horizon case is based on a meshless Galerkin discretization of the dual problem and works away from the large deviations regime of vanishing temperature. This discretization is standard in that it shares features with related Galerkin methods for linear elliptic equations (see, e.g., [1,49]), however it is novel in that it preserves the particular structure of the reversible dynamics and the duality of the underlying Markov decision problem when the basis functions form a partition of unity. Related problems with indefinite time-horizon using grid-based techniques have been studied in [22,45,27]; cf. also [4] for a (non-Markovian) finite element method. For meshless discretizations of the parabolic dynamic programming equation of finite time-horizon problems we refer to, e.g., [9,24]; related ideas for the first-order dynamic programing equations of deterministic control have been used in [3,8].
A simple paradigm. As an introductory example consider the one-dimensional diffusion process (X t ) t≥0 satisfying the Itô stochastic differential equation where B t is standard Brownian motion, > 0 is noise intensity, called temperature in the following, V is the bistable potential energy shown in Figure 1, and u is a bounded measurable function, the control. Let us suppose that the control task is to force the particle in the left well to reach the right well in minimum time τ . When u = 0 and noise is low, the average transition time E[τ ] is exponentially large in the energy barrier ∆V between the wells, and we can decrease the barrier by tilting the potential according to V (x) → V (x) − ux. A controller will then seek to minimize the average transition time by tilting the potential without applying too much force, which leads to a quadratic cost functional of the form [39] Figure 1. Two typical realizations of the bistable system (1), with and without tilting (left panel). The corresponding potential energies are shown in the right panel.
that must be minimized over a set of admissible (e.g. adapted) control strategies, subject to the stochastic dynamics (1). (Here γ > 0 is an adjustable parameter.) In this paper we deal with the question how to solve optimal control problems of the above form beyond simple one-dimensional examples. The typical application that we have in mind is molecular dynamics that is both high-dimensional and displays vastly different time scales. This defines the basic requirements of the numerical method: it must handle problems with large state space dimension and it must be able to capture the relevant processes of the dynamics, typically the slowest degrees of freedom in the system. For moderate controls, and if the temperature is small compared to the energy barrier, the dynamics in the above example basically consists of rare jumps between the potential wells, with the jump rate being controlled by u. An efficient discretization would be one that resolves only the jumps between the wells by a 2-state Markov jump process with adjustable jump rates, according to the value of the control (cf. [12,36]). It is known (see [16]) that control problems of the above form can be transformed into a dual linear boundary value problem that can be approximated by an MJP on the metastable sets. The discretized linear problem turns out to be dual to a Markov decision problem and thus represents the natural Markovian discretization of the original stochastic control problem. The discretization is meshless, in that the number of states of the Markov model does not scale exponentially with the dimension of the continuous state space, hence the method avoids the curse of dimensionality of most grid-based schemes.
Organization of the article. The rest of the paper is organised as follows: In section 2 we introduce the class of optimal control problems studied and state the duality between optimal control and sampling for both continuous SDEs and MJPs. In section 3, the Galerkin projection method is introduced, and some results about the approximation error are discussed. We also give a stochastic interpretation of the discretized linear equation in terms of Elber's milestoning process [14]. Finally, we construct sampling estimators. Section 3 is the core of the paper and contains new results, including the meshless Galerkin algorithm and an estimate of the discretization error in L 2 . In section 4, we discuss numerical examples.
1.1. Elementary notation and assumptions. We implement the following notation and standing assumptions that will be used throughout the paper.
Dynamics. Let S = R d and consider the potential energy function V : S → R, that we assume to be two times continuously differentiable and bounded from below. Further assume that V (x) is polynomially growing at infinity like |x| 2k for some positive integer k. We consider the process X t ∈ S solving where B t ∈ R d is d-dimensional Brownian motion under a probability measure P , and u : [0, ∞) → U ⊂ R d is a time dependent measurable and bounded function.
Reversibility and invariant measure. For test functions ϕ : S → R that are two times continuously differentiable, the infinitesimal generator of the uncontrolled process X t = X 0 t is defined as the second-order differential operator Lϕ = ∆ϕ − ∇V · ∇ϕ .
Define dµ(x) = exp(− −1 V (x))dx to be the Boltzmann measure at temperature > 0. Without loss of generality, we assume that µ is normalized, so that µ(S) = 1. For the subsequent analysis it will be convenient to think of L as an operator acting on a suitable subspace of that is a weighted Hilbert space equipped with the scalar product It can be readily seen that L is symmetric with respect to the weighted scalar product, Lv, w µ = v, Lw µ , which implies that X t is reversible with respect to the Boltzmann measure µ. Moreover, by the above assumptions on the potential energy function, µ is the unique invariant measure of the process X t and satisfies for all test functions ψ ∈ L 2 (S, µ); see [29] for details.
Quadratic cost criterion. We now introduce the cost criterion that the controller choosing u in (2) seeks to minimize. To this end let A ⊂ S be a closed bounded subset of positive measure µ(A) > 0 with smooth (at least C 3 ) boundary ∂A and call τ A < ∞ the random stopping time We define the cost functional where f : S → R, called running cost, is any bounded nonnegative function with bounded first derivative; the factor 1/4 in the penalization term is merely conventional. Cost functionals of this form are called indefinite time horizon cost, because the terminal time τ A is random. We will sometimes need the conditioned variant of the above cost functional: Here E x [·] = E[·|X u 0 = x] is the conditional expectation over all realizations of the process X u starting at X u 0 = x. We use J to denote both unconditional and conditional cost functionals; it should always be clear from the context, which one is meant.
Admissible control strategies. We call a control strategy u = (u t ) t≥0 admissible if it is adapted to the filtration generated by B t , i.e., if u t depends only on the history of the Brownian motion up to time t, and if the equation for X u t has a unique strong solution. The set of admissible strategies is denoted by A.
Even though u t may depend on the entire past history of the process up to time t, it turns out that optimal strategies are Markovian, i.e., they depend only on the current state of the system at time t. In our case, in which the costs are accumulated up to a random stopping time τ A , the optimal strategies are of the form u t = α(X u t ) for some function α : S → R d . Hence the optimal controls are time-homogeneous feedback policies, depending only on the current state X u t , but not on t.

2.
Optimal control and logarithmic transformation. In this section we establish a connection between controlled diffusions and certain path sampling problems, the latter are associated with a linear boundary value partial differential equation (PDE) that can be discretized by standard numerical techniques for PDEs or Monte-Carlo. The duality between optimal control and path sampling goes back to Wendel Fleming and co-workers (see, e.g, [15]) and is based on a logarithmic transformation of the value function (see also [16,Sec. VI] and the references therein) 2.1. Duality between control and path sampling for diffusions. Our simple derivation of the duality between path sampling optimal control will be based on the Hamilton-Jacobi-Bellman equations of optimal control. To this end, we recall the dynamic programming principle for optimal control problems of the form (2)-(3) that we adapt from [16, Secs. VI. [3][4][5] and that we state without proof. Then where the minimizer u * = argmin J(u) is unique and given by the feedback law Before we proceed with the derivation of the dual sampling problem, we shall briefly discuss some of the consequences of the dynamic programming approach. Equation (6) is the Hamilton-Jacobi-Bellman (HJB) equation, also called dynamic programming equation associated with the following optimal control task: The function W (x) is called value function or optimal cost-to-go. Using the fact that optimal control is the gradient of two times the value function, the optimally controlled process X * t solves the SDE with the new potential U (x) = V (x) + 2W (x) . Note that X * t is reversible with respect to a tilted Boltzmann distribution having the density ρ * = exp(−U/ ). The reversibility follows from the fact that the value function does not depend on t, which would not be the case if the terminal time τ A were a deterministic stopping time rather than a first exit time. 1 Logarithmic transformation and Feynman-Kac formula (part I). The approach that is pursued in this article is to discretize the HJB equation by first removing the nonlinearity by a logarithmic transformation of the value function.
It follows by chain rule that which, together with the relation implies that (6) is equivalent to the linear boundary value problem By the above assumptions and the strong maximum principle for elliptic PDEs it follows that (11) has a classical solution φ ∈ C 2 (S \ A) ∩ C(∂A) that is uniformly bounded away from zero. The latter, together with (9)-(10), implies existence and uniqueness of classical solutions of that the dynamic programming equation (6) and hence smoothness of the value function. Now, by the Feynman-Kac theorem [32, Thm. 8.2.1], the linear boundary value problem has an interpretation in terms of a sampling problem. The solution (11) can be expressed as the conditional expectation over all realizations of the following SDE on S: 2.2. Duality between control and path sampling for jump processes. In the last section, we have established a connection between an optimal control problem and sampling of a continuous path observable φ(x). In this section, we will repeat the same construction for MJP, however, in reverse order: starting from a path observable for a MJP, we derive the dual optimal control using a logarithmic transformation.
Let (X t ) t≥0 be a MJP on the discrete state spaceŜ = {1, . . . , n} with infinitesimal generator G ∈ R n×n . (For simplicity, we assume thatŜ is finite.) The entries of the generator matrix G satisfy where the off-diagonal entries of G are the jump rates between the states i and j.
Logarithmic transformation and Feynman-Kac formula (part II). In accordance with the previous subsection letf :Ŝ → R be nonnegative and define the stopping time τ A = inf{t > 0 :X t ∈ A} to be the first hitting time of a subset A ⊂Ŝ. As before we introduce a function being the conditional expectation over the realizations of X t starting atX 0 = i. We have the following lemma that is the exact analogue of the Feynman-Kac formula for diffusions for the case of an MJP (see [17]).

Lemma 2.2. The functionφ(i) solves the linear boundary value problem
Now, in one-to-one correspondence with the log transformation procedure in the diffusion case, the functionŴ = − logφ can be interpreted as the value function of an optimal control problem for the MJP (X t ) t≥0 . The derivation of the dual optimal control problem goes back to [40], and we repeat it here in condensed form for the reader's convenience (see also [16, Sec. VI.9]): First of all note thatŴ satisfies the equation and define a new generator matrix by with v(i) > 0 for all i ∈Ŝ. The exponential term in the above equation forŴ can be recast as and used the identity min y∈R {e −y + ay} = a − a log a for a > 0. As a consequence, (14) is equivalent (i.e. dual) to which is the dynamic programming equation of a Markov decision problem, i.e. an optimal control problem for an MJP (e.g. see [16, Sec. VI.9]): Minimizê over all controls v > 0 and subject to the constraint that the process (X v t ) t≥0 is generated by G v . It readily follows from the derivation of (17) that the minimizer exists and is given by v * (i) =φ(i). The next lemma records some important properties of the controlled Markov jump process with generator G v and the corresponding cost functional (18).
with Z v an appropriate normalization constant, is the unique probability distribution such that G v is reversible with stationary distribution π v . (ii) LetP denote the probability measure on the space of trajectories generated bŷ X t with initial conditionX 0 = i, and letQ be the corresponding probability measure generated byX v t with the same initial conditionX v 0 = i. ThenQ is absolutely continuous with respect toP and the expected value of the running cost k v is the Kullback-Leibler (KL) divergence betweenQ andP , i.e., We will show that π v has the proposed form: This can only be true if the quantity The constant Z v is uniquely determined by the requirement that π v be normalized. Finally, from reversibility it follows directly that π v is also a stationary distribution of G v .
To show (ii), letP ,Q denote the path measures of the uncontrolled and the controlled Markov processes generated by G and G v , respectively. By the assumption that v(i) > 0 for all i ∈Ŝ, it follows thatP (·) = 0 if and only ifQ(·) = 0 which implies thatP andQ are mutually absolutely continuous. Now, let F t denote the filtration generated byX v s up to time s = t. Following [10, Sec. 3.1.4], the Radon-Nikodym derivative betweenQ andP is given by (cf. [31,Prop. 3 Moreover, we denote byX v s − the left-sided limit of the càdlàg processX v s and by N j s the process that counts how oftenX v jumps to state j in the interval [0, s], with the convention that N j 0 = 1 ifX v 0 = j. By the Martingale representation theorem, using the fact that τ A is almost surely finite, we have that with M j τ A being a (stopped) local Martingale with respect toQ. (Note that we have taken advantage of the fact that log δ j (j) = 0, so that the integrand on the right hand side vanishes whenX v s = j.) Hence, by definition of being equivalent to (16). This completes the proof.

Remark 1.
To reveal further similarities between (2)-(3) and the corresponding Markov decision problem, we emphasize that the quadratic penalization term in (3) equals the KL divergence between the reference measure P of the uncontrolled diffusion (13) and the probability measure Q of the controlled process (2), as can be shown using Girsanov's theorem [32,Thm. 8.6.8]. It holds that (cf. [20,18]): 3. Discretization: Galerkin projection point of view. In this section we will develop a discretization for optimal control problems of the type discussed in Section 2 using the method of logarithmic transformations. The discretization will approximate the continuous control problem with a control problem for a Markov jump process on finite state space. Our strategy is outlined in Figure 2.
In the first part of this section, we will develop the Galerkin projection for general subspaces and obtain some control of the discretization error. To refine this control, we specify the subspace D we project onto. As the state space is unbounded and  Figure 2. Discretization of continuous control problems via a logarithmic transform.
possibly high-dimensional, a grid-based discretization is prohibitive. Here we suggest a meshless discretization based on an incomplete partition of state space into so called core sets, that are the metastable regions of the uncontrolled dynamics. We will prove an error bound which gives us detailed control over the discretization error, even if very few basis functions are used. We should mention that clearly other choices are possible, such as radial basis functions [49] or moving least-squares [1], but for metastable systems like in molecular dynamics or chemical reaction kinetics using core sets and the associated basis of committor functions is beneficial.
In the second part of this section, we will develop the stochastic interpretation of the resulting matrix equation as the backward Kolmogorov equation of a MJP, which enables us to identify the discrete control problem for the MJP, as it was developed in Section 2. We will study the resulting discrete control problem and make a connection to Transition Path Theory [47] and core set Markov state models (MSM) [38].
3.1. Galerkin projection of the Dirichlet problem. As discussed above, we consider the boundary value problem with L and f as given above. We declare that φ| A = 1, so that the domain of φ is S. Following standard references (e.g. [6]) we construct a Galerkin projection of (20). For this purpose, we introduce the L 2 -based Sobolev space H 1 with norm φ H 1 = ∇u 2 µ + u 2 µ and the Hilbert spaces V = {ψ ∈ L 2 (S, µ), ψ H 1 < ∞} and V 0 = {ψ ∈ V, ψ| ∂A = 0}. We further define the symmetric and positive bilinear form Now if φ is a solution of (20), then it also solves the weak problem A Galerkin solutionφ is now any function satisfying with D 0 being a suitable finite dimensional subspace V 0 . Specifically, we choose basis functions χ 1 , . . . , χ n+1 with the following properties: (S1) The functions χ i : S → R are in V . (S2) The χ i form a partition of unity, that is All elements of D 0 := lin{χ 1 , . . . , χ n } will satisfy homogeneous Dirichlet boundary conditions in (20), and we will sometimes write D := χ n+1 ⊕ D 0 and think of the Galerkin solutionφ as an element in D. Now define the matrices (22) becomes a matrix equation for the unknown coefficientsφ i : which is the discretization of (20).
In order control the discretization error of the Galerkin method, we choose a norm · on V and introduce the two error measures: 1. The Galerkin error ε = φ −φ , i.e. the difference between original and Galerkin solution measured in · . 2. The best approximation error ε 0 = infψ ∈D φ −ψ , i.e. the minimal difference between the solution φ and any elementψ ∈ D. In order to obtain full control over the discretization error, we need bounds on ε, and we will get them by first obtaining a bound on the performance p := ε/ε 0 and then a bound on ε 0 . The latter will depend on the choice of subspace D. For the former, standard estimates assume the following · -dependent properties of A: (i) Boundedness: B(φ, ψ) ≤ α 1 φ ψ for some α 1 > 0 (ii) Ellipticity: for all φ ∈ V holds B(φ, φ) ≥ α 2 φ for some α 2 > 0. If both (i) and (ii) hold, Céa's lemma states that p ≤ α1 α2 , see e.g. [6]. For the energy norm φ 2 B := B(φ, φ) we have α 1 = α 2 = 1 and therefore p = 1, thus the Galerkin solutionφ is the best-approximation to φ in the energy norm.
Performance bound. The next two statements give a bound on p if errors are measured in the L 2 -norm. In this case, B(·, ·) is still elliptic but possibly unbounded. Later in this section, we will specify the bound on ε 0 for a specific Galerkin basis.

RALF BANISCH AND CARSTEN HARTMANN
Proof. In Appendix A.
Remark 2. Note that QB(1 − Q)v µ ≤ QBv µ is always finite even though B is possibly unbounded since v ∈ V ⊂ L 2 (S, µ) and Q is the projection onto a finite-dimensional subspace of L 2 (S, µ).
The bottom line of Theorem 3.1 is that if B leaves the subspace D almost invariant, thenφ is almost the best-approximation of φ in · µ . The following lemma gives a more detailed description. In the following, we will write · = · µ for convenience.
where m is the smallest eigenvalue of the matrixM := ( χ i , χ j µ ) ij .
Proof. The first statement is true since B is essentially self-adjoint. For the second statement, first of all holds by the triangle inequality. We now bound the term involving L. Notice that forφ = iφ i χ i ∈ D: Then, with the shorthands φ = Qφ, φ ⊥ = φ − φ = Q ⊥ φ, andM = ( χ i , χ j µ ) ij : A similar result holds for the term involving f . The statement now follows from a standard equivalence between finite-dimensional norms, φ 1 ≤ √ n φ 2 , and the fact thatM is symmetric, which implies that φ ,φ M =φ TMφ ≥ mφ Tφ = m φ 2 2 .
To summarize, Theorem 3.1 and Lemma 3.2 give us a formula for the projection performance p which states that How large or small δ f is will depend on the behaviour of f , e.g., if f = const then δ f = 0. Both δ f and δ L are always finite even though L is unbounded.
Best-approximation error bound. We now generalize results [12] on the approximation quality of MSMs for reversible equilibrium diffusions and estimate the best-approximation error ε 0 for the case that the subspace D is spanned by committor functions associated with the metastable sets of the dynamics. To this end suppose that the potential V (x) has n + 1 strict local minima x 1 , . . . , x n+1 . Let C 1 , . . . , C n+1 be convex sets around x 1 , . . . , x n+1 such that A = C n+1 . For example, the C i can be chosen as the basins of attraction of the deterministic limit dynamics for = 0, which are metastable sets of the stochastic dynamics.
Following the standard terminology in molecular dynamics (e.g. [37]), we call the C i the core sets of the dynamics. (In practice it is enough to pick only those minima that are sufficiently metastable and for which it holds that ∆V (·) where ∆V denotes the energy difference between the minimum and the nearest saddle point or local maximum.) We write C = ∪ n+1 i=1 C i and T = S \ C and introduce τ C = inf{t ≥ 0 : X t ∈ C}. We take χ i to be the committor function associated with the set C i , that is These χ i satisfy the assumptions (S2)-(S3) and (S1) except on the core set boundaries, which is a set of measure zero. Since we do not have a grid parameter, by which the approximation error can be controlled, standard PDE techniques for bounding ε 0 fail. Indeed, typically we will have very few basis functions compared to a grid-like discretization. The following theorem gives a bound on ε 0 . Theorem 3.3. Let Q be the orthogonal projection onto the subspace D spanned by the committor functions (24), and let φ be the solution of (20). Then we have , and P is the orthogonal projection onto V c = {v ∈ L 2 (S, µ), v| Ci = const on every C i } ⊂ L 2 (S, µ), with P ⊥ = 1 − P .

Proof. In Appendix B
In Theorem 3.3, κ is the maximum expected time of hitting the metastable set from outside (which is short). Note further that P ⊥ φ = 0 on T . The errors P ⊥ φ µ and P ⊥ φ ∞ measure how constant the solution φ is on the core sets. Theorem 3.3 suggests the following strategy to minimize ε 0 : (i) Place a core set C i in every metastable region where φ is expected to be almost constant, (ii) place core sets in regions with high invariant density µ in order to minimize µ(T ). This strategy requires knowledge of the invariant density µ. Identifying the metastable regions requires additional dynamical information. If this is not available, then a good guess is usually to use the deepest wells of µ. Remark 3. Theorem 3.3 together with Theorem 3.1 gives us full control over the discretization error ε. These error bounds generalize recent results [36,12] on the approximation quality of the dominant eigenvalues of a reversible diffusion by an MSM to general (bounded, nonnegative) observables. It would of course be nice to have an error estimate also for the value function. In general such an estimate is difficult to get, because of the nonlinear logarithmic transformation W = − log φ involved. However we know that φ and its discrete approximation are both uniformly bounded and bounded away from zero. Hence the logarithmic transformation is uniformly Lipschitz continuous on its domain, which implies that the L 2 error bounds holds for the value function with an additional prefactor given by the Lipschitz constant squared; for a related argument see [19] 3.2. Interpretation in terms of a Markov decision problem. We derive an interpretation of the discretized equation (23) in terms of a MJP. We introduce the diagonal matrix Λ with entries Λ ii = j F ij (zero otherwise) and the full matrix G = K − −1 (F − Λ), and rearrange (23) as follows: This equation can be given a stochastic interpretation. To this end let us introduce the vector π ∈ R n+1 with nonnegative entries π i = χ i , 1 and notice that i π i = 1 follows immediately from the fact that the basis functions χ i form a partition of unity, i.e. i χ i = 1. This implies that π is a probability distribution on the discrete state spaceŜ = {1, . . . , n + 1}. We summarize properties of the matrices K, F and G: Lemma 3.4. Let K, G, F and π be as above.
(i) K is a generator matrix (i.e. K is a real-valued square matrix with row sum zero and positive off-diagonal entries) with stationary distribution π that satisfies detailed balance (iii) G has row sum zero and satisfies π T G = 0 and π i G ij = π j G ji for all i, j ∈Ŝ.
(iv) There exists a (possibly -dependent) constant 0 < C < ∞ such that G ij ≥ 0 for all i = j if f ∞ ≤ C. In this case equation (25) admits a unique and strictly positive solutionφ > 0.
Proof. (i) follows from i χ i (x) = 1 and reversibility of L: We have i π(i)K ij = i χ i , Lχ j µ = L1, χ j µ = 0 and π(i)K ij = χ i , Lχ j µ = Lχ i , χ j µ = π(j)K ji . (ii) follows from f (x) being real and positive for all x. As for (iii), G has row sum zero by (i) and the definition of Λ. π(i)G ij = π(j)G ji follows from (i), (ii) and the fact that Λ is diagonal, and π T G = 0 follows directly. For (iv), rewrite (25) as the n×n-systemḠ λφ = g whereḠ λ is the first n rows and columns of G λ := −G+ −1 Λ, φ = (φ, 1) T and −g is the vector of the first n entries of the (n + 1)st row of G λ . Choose C such that −1 χ i , f χ j µ ≤ χ i , Lχ j µ for all i = j. Then g > 0 andḠ λ is a non-singular M -matrix and thus inverse monotone [2], which together with the equalityḠ λφ = g entailsφ > 0.
It follows that if the running costs f are such that (iv) in Lemma 3.4 holds, then G is a generator matrix of a MJP that we shall denote by (X t ) t≥0 , and by lemma 2.2, (25) has a unique and positive solution of the form (X s )ds X 0 = i withf (i) = Λ ii and τ A = inf{t ≥ 0|X t = i + 1}. In fact (25) can be interpreted as the backward Kolmogorov equation forφ. Moreover, the logarithmic transformation W = − logφ is well-defined and can be interpreted as the value function of the Markov decision problem (17)- (18), that is, we seek to minimizê This completes the construction of the discrete control problem. Note that in general 2 G = K, but both K and G are reversible with stationary distribution π.
Elber's milestoning process. The discretized equation admits a useful stochastic representation, by which its coefficients can be computed without knowing the committor functions. Define the (non-adapted) forward milestoning processX + t to be in stateX + t = i if X t visits core set C i next, and the backward milestoning processX − t to be in stateX − t = i if X t came from C i last. Then the discrete costs can be written aŝ is the probability density of finding the system in state x given that it came last from i. Hencef (i) is the average costs conditioned on the informationX − t = i, i.e. X t came last from A i , which is the natural extension to the full partition case wheref (i) was the average costs conditioned on the information that X t ∈ A i .
The matrix K ij = π −1 i χ i , Lχ j is reversible with stationary distribution π i = χ i , 1 = P µ (X − t = i) and is related to so called core MSMs. To see this, define the core MSM transition matrix P τ with components P τ ij = P(X + t+τ = j|X − t = i), and the mass matrix M with components M ij = P(X + t = j|X − t = i). Then, it is not hard to show that for reversible processes we have P τ ij = π −1 i χ i , T τ χ j µ and M ij = π −1 i χ i , χ j µ so that Thus K is formally the generator of P τ . 3 If the core sets are chosen as the metastable states of the system, K can be sampled directly fromX ± t . See [36,38] for more details on the construction and sampling of core MSMs. F can also be sampled using Therefore, as in the construction of core MSMs, we do not need to compute committor functions explicitly.
4. Numerical results. We will present two examples to illustrate the approximation of LQ-type stochastic control problems based on a sparse Galerkin approximation using MSMs. 4.1. 1D triple well potential. To begin with, we study diffusion in the triple well potential which is presented in Figure 4a. This potential has three minima at x 0/1 = ∓3.4 and x 2 = 0. We choose A = [x 0 − δ, x 0 + δ] with δ = 0.2 as the target set and the running cost f = σ = const, such that the control goal is to steer the particle into C 0 in minimum time. In Figure 4a the potential V and effective potential U are shown for = 0.5 and σ = 0.08 (solid lines), cf. equation (8). One can observe that the optimal control lifts the second and third well up such that the system is driven into C 0 quickly. First we validate our method with a convergence test using linear finite elements as basis functions χ i . To do so, we compute a reference solutionφ of (23) using linear finite elements on a uniform grid with spacing h r = 10 −4 . The resulting interpolation φ I = i χ iφ (i) is very close to the true solution φ of (20). We also compute a reference solution for the value function W I = i χ iŴ (i) witĥ W = − logφ and for the optimal control u I = −2∇W I . Then we compute coarser solutions φ I,h using various grid spacings 1 ≥ h ≥ 10 −3 and compute the L 2 error φ I,h − φ I µ , and L 2 errors for W I and u I similarly. The result is shown in Figure  3. The L 2 error of φ I,h is quadratic in h, as expected from the theory. Additionally, the L 2 error of W I,h is also quadratic in h which, given that the transformation between φ and W is nonlinear, is surprising. The error of u I,h is only linear in h; as expected one order of convergence is lost due to the fact that u I is the gradient of W I . Next, we use a committor basis. In accordance with the strategy to minimize minimize 0 in Theorem 3.3, we placed core sets C i = [x i − δ, x i + δ] in each of the three wells of the potential shown in Figure 4a, resulting in the set of three basis functions shown in Figure 4b. The L 2 errors achieved by solving (23) in this basis are shown as circles in Figure 3. We observe that the 3 committor functions achieve the same performance as linear finite elements with grid spacing h ≈ 0.2, which corresponds to ≈ 40 basis functions. Theorem 3.3 gives ε 0 ≤ 0.08, while the actual error is one order of magnitude smaller. The dashed line in Figure 4a gives the approximation to U calculated in the committor basis, which is in good agreement to the reference solution. In Figure 4c the optimal control u (solid line) and its approximation u I (dashed line) are shown. The core sets are shown in blue. The jumps in u I at the left boundaries of the core sets are due to the fact that the committor functions are only piecewise differentiable.
The computations so far require explicit knowledge of the basis functions χ i to compute the matrices K and F . For high-dimensional systems the committor basis is usually not explicitly known. To mimic this situation, we construct a core MSM to sample the matrices K and F . 100 trajectories of length T = 20.000 were used to build the MSM. In Figure 4d, the optimal cost starting from the rightmost well W (x 1 ) and its estimate using the core MSM are shown for = 0.5 and different values of σ. Each of the 100 trajectories has seen about four transitions. For comparison, a direct sampling estimate of W (x 1 ) using the same data is shown (green). The direct sampling estimate suffers from a large bias and variance. In contrast, the MSM estimator for W (x 1 ) performs well for all considered values of σ. The constant C which ensuresφ > 0 when σ ≤ C is approximately 0.2 in this case. This seems restrictive but still allows to capture all interesting information about φ and W (x 1 ).

4.2.
Alanine dipeptide. As a second, non-trivial example we study conformational transitions in Alanine dipeptide (ADP), a well-studied test system in molecular dynamics. We performed an all-atom simulation of ADP in explicit water (TIP3P) with the Amber FF99SB force field [23] using the GROMACS 4.5.5 simulation package [46]. The simulations were performed in the NVT ensemble, where the temperature was restrained to 300 K using the V-Rescale thermostat [7]. 20 trajectories of 200ns with 100ps equilibration runs were simulated. Covalent bonds to hydrogen atoms were constrained using the LINCS algorithm11 [21] (lincs iter = 1, lincs order = 4), allowing for an integration timestep of 2 fs. The leap-frog integrator was used. Lennard-Jones interactions were cut off at 1 nm. Electrostatic interactions were treated by the Particle-Mesh Ewald (PME) algorithm12 [11] with a real space cut-off of 1 nm, a grid spacing of 0.15 nm, and an interpolation order of 4. Periodic boundary conditions were applied in the x, y, and z-direction.
In Figure 5a, a cartoon of the molecule is shown. The full system including the water molecules has about 4000 degrees of freedom. However, it is well known that the conformational dynamics, which are the slowest dynamical processes in the system, can be monitored via the backbone dihedral angles φ and ψ. The dynamics along the other degrees of freedom happens on much faster timescales. For this reason, the value function will essentially be a function of φ and ψ; see [19] for precise statements. We will use this to build a Markov State Model which partitions the φ − ψ-plane.  Validation of the MSM approximation. We construct a full partition MSM using a uniform clustering into 36 × 36 boxes A i of size 10 • × 10 • in the φ − ψ-plane, and we use characteristic basis functions χ i (x) = 1 Ai for the discretization 4 . Figure  5b shows the free energy g i = − log π i = − log P[X t ∈ A i ] together with the three largest molecular conformations α, β and L α . The missing boxes have not seen any data. The slowest dynamical process is the switching between the left-handed L α structure and the right-handed α and β sheet structures. As is customary in MSM theory [36], we estimate the slowest implied timescale (ITS) as follows: For different lagtimes τ , we construct the MSM transfer operator P τ ij = π −1 i χ i , T τ χ j µ and compute t 1 (τ ) = − τ log λ 1 (τ ) where λ 1 (τ ) is the 2nd largest eigenvalue of P τ . The result is shown in Figure 6a. We observe a plateau for 6ps ≤ τ ≤ 30ps which indicates that the time-discrete snapshotsX τ n :=X nτ are well described by a Markov chain with transition matrix P τ for τ ≥ 6ps. The plateau is used to compute t 1 = 1560ps ± 6ps. For smaller values of τ ,X τ n is not Markovian due to recrossing effects. For this reason we also cannot sample K directly and have to work with the finite-time transfer operator P τ instead. Before proceeding to the optimal control problem, we study the effect of this time discretization on the mean first passage time ( where τ α∪β is the first hitting time of α ∪ β. Since the α, β and L α conformations are very metastable, t(x) is almost constant on L α and t Lα = E[t(x)|x ∈ L α ] can be computed from t 1 via t Lα = t 1 /π α∪β where π α∪β = 0.96 is the invariant measure of the α and β conformation combined [38]. This gives t Lα = 1626ps ± 6ps.
In Figure 6b,t τ Lα obtained by solving (28) is shown as a function of τ together with t Lα . A linear interpolation using the values for 6ps ≤ τ ≤ 30ps where the Markov assumption holds givest τ =t 0 + 0.8τ witht 0 = 1628ps, which is consistent with (29). This shows that the time discretization introduces only small, controllable errors for τ ≥ 6ps. In the following, we will work with τ = 10ps. Notice that the time discretization amounts to setting K = τ −1 (P τ − I) in (23).
Controlled transition to the α-helical structure. Next we consider an optimal control problem for steering the molecule into the α-structure. We choose as the target region A = α and define running costs in the (φ, ψ) variables as f (φ, ψ) = f 0 + f 1 ψ − ψ α 2 where · is a metric on the torus, and we choose f 0 = 0.01 and f 1 = 0.001 representing a mild penalty for being away from the target region in the ψ-direction. We discretize this control problem using the same partition and time discretization as for the MSM construction in section 4.2 and sample P τ and F from the MSM data. The resulting value functionŴ = logφ is shown in Figure 7a. Since the basis functions χ i are not differentiable and some data is missing inŴ , we have to construct an interpolation W I (φ, ψ) from the point dataŴ to obtain as estimate for the optimal control force u(φ, ψ) = −σ∇W I (φ, ψ). An interpolation based on a Delauney triangulation which is C 1 everywhere except at the data points is shown in Figure 7b.
To demonstrate that adding the control force u(φ, ψ) has the effect of speeding up the transition from L α to α, we would have to implement it in the MD simulation software. We leave that for future work. We can make a prediction of the anticipated effect within the MSM framework: In accordance with (15), we compute the transition matrix P τ v * of the optimally controlled process by with v * =φ for i = j. The discretized MFPT vectort * of the optimally controlled process can be computed from the Matrix equation Figure 7c and gives a speed up compared tot 0 of one order of magnitude. A larger speed up could easily be achieved by increasing f . In Figure  7d we show the free energy of the controlled process in log scale, which according to Lemma 2.3 is given by

The result is shown in
Observe that the L α and β conformations are now much less populated compared to the equilibrium distribution in Figure 5b: As in the 1D example, the control mainly has the effect of lifting the wells which are not in the target region up such that they become less metastable.

5.
Conclusions. We have developed a Galerkin projection method that leads to an approximation of certain optimal control problems for reversible diffusions by Markov decision problems. The approach is based on the dual formulation of the optimal control problem in terms of a linear boundary value problem that can be discretized in a straightforward way. In this article we propose a discretization that preserves reversibility and the generator form of the linear equations, i.e., the discretization of the infinitesimal generator of the original diffusion process can be interpreted as the infinitesimal generator of a reversible Markov jump process (MJP). Interpolation W I (φ, ψ) obtained fromŴ via a Delauney triangulation, and a steepest descent path from Lα to α. (c) MFPT to the α conformation for the optimally controlled process. (d) Free energy g v * = − log π v * of the optimally controlled process.
The discretized linear boundary value problem admits again a dual formulation in terms of a Markov decision problem. A sparse approximation that uses the basis of committor functions of metastable sets of the dynamics was discussed in detail: The discretization using committor functions does not require that the metastable sets partition the state space, hence the method can be applied to high-dimensional problems as they appear, e.g., in molecular dynamics. The committor functions in this case need not be known explicitly, as it is possible to sample the generator matrices and the discrete cost functions by a Monte-Carlo method, similarly to what is done in the Markov state modelling approach to protein folding. We could prove an L 2 error estimate for the Galerkin scheme, moreover the discretization was shown to preserve basic structural elements of the continuous problem, such as duality, reversibility or properties of the invariant measure. Our numerical results showed very good performance of the incomplete partition discretization on a simple toy example and a high-dimensional molecular dynamics problem, even with only a few basis functions, which is in line with the theoretical error bounds presented in this paper. While we addressed the discretization error in this paper in great detail, we did not address the sampling error. In particular, for large systems our construction requires the coefficients of the MJP and therefore the transition rates between all metastable states as an input. This is not fully satisfactory. We believe that the optimal control framework presented here should be linked with Monte-Carlo methods for rare events, e.g., [20,13], that exploit the same duality between optimal control and sampling to devise efficient importance sampling strategies as we did so as to reduce the sampling error. We leave the analysis of the sampling error to future work.
Appendix B. Best-approximation error bound. In this appendix, we prove lemma 3.3: Recall that κ = sup x∈T E x [τ S\T ] and P is the orthogonal projection onto the subspace V c = {v ∈ L 2 (S, µ), v = const on every C i } ⊂ L 2 (S, µ). Note that P ⊥ φ = 0 on C. The errors P ⊥ φ and P ⊥ φ ∞ measure how constant the solution φ is on the core sets. We write · = · µ throughout the proof for convenience.
Proof. The proof closely follows the proof of theorem (12) in [35]. The first step of the proof is to realize that the committor subspace D onto which Q projects can be written as D = {v ∈ L 2 (S, µ), v = const on every C i , Lv = 0 on C}. To see this, note that the values v takes on the C i can be used as boundary values for the Dirichlet problem Lv = 0 on T . A linear combination of committor functions is obviously a solution to this problem. But the solution to the Dirichlet problem must be unique, otherwise one can construct a contradiction to the uniqueness of the invariant distribution, see [35].
By definition, Q ⊥ φ ≤ φ − Iφ for every interpolation Iφ ∈ D of φ. Here we choose an interpolant q = Iφ with the property Lq = 0 on T, q = P φ on S \ T , which is possible by definition of P . Now D ⊂ V , therefore q ∈ V c and thus P q = q. As a consequence, (34) is equivalent to P LP q = 0 on T, q = P φ on S \ T.
Now define e := P φ − q. Then we have P LP e = P LP (P φ − q) = P LP φ − P LP q = P Lφ − P LP ⊥ φ − P LP q and by (35) and since Lφ = f φ on S \ A ⊃ T , we have P LP e = P f φ − P LP ⊥ φ on T, e = 0 on S \ T.
Since ΘP = P Θ = Θ, this can be written as The operator R = ΘLΘ is invertible on E Θ : If this wasn't the case, there would be a nontrivial solution v to Lv = 0 on T, v = 0 on S \ T.
But the solution to this boundary value problem is again unique, and hence there is only the trivial solution. This gives and R −1 = 1 |λ0| where λ 0 is the principal eigenvalue of R. Due to an estimate by Varadhan we have 1 |λ 0 | ≤ sup x∈T E x [τ S\T ] =: κ, see e.g. [5]. To complete the derivation we need to focus on the second term in (37). Since R −1 is an operator on E Θ , we can write it as R −1 ΘLP ⊥ φ =: Θg, where the function Θg solves by the definition of R and Θg. Therefore w := Θg − P ⊥ φ solves the boundary value problem Lw = 0 on T, w = −P ⊥ φ on S \ T (38) which implies that w ∞ ≤ P ⊥ φ ∞ , this follows from Dynkin's formula or Lemma 3 in [35]. Finally, Θg ≤ µ(T ) 1/2 Θg ∞ ≤ µ(T ) 1/2 ( P ⊥ φ ∞ + w ∞ ) ≤ 2µ(T ) 1/2 P ⊥ φ ∞ holds by the triangle inequality and the above considerations. Now focus on the first term in (37). Note that by the maximum principle, φ achieves its maximum of 1 on the boundary of S \ A ⊃ T , therefore max x∈T |φ(x)| ≤ 1. Then we have Now putting everything together, we arrive at Finally, note that by the triangle inequality which completes the proof.