On the convergence of the Sakawa-Shindo algorithm in stochastic control

. We analyze an algorithm for solving stochastic control problems, based on Pontryagin’s maximum principle, due to Sakawa and Shindo in the deterministic case and extended to the stochastic setting by Mazliak. We assume that either the volatility is an aﬃne function of the state, or the dynamics are linear. We obtain a monotone decrease of the cost functions as well as, in the convex case, the fact that the sequence of controls is minimizing


Introduction
In this work we consider an extension of an algorithm for solving deterministic optimal control problems introduced by Sakawa and Shindo in [19], and analyzed by Bonnans [7]. This algorithm has been adapted to a class of stochastic optimal control problems in Mazliak [15]. We extend here the analysis of the latter to more general situations appearing naturally in applications, and obtain stronger results regarding the convergence of iterates.
We introduce now the problem and the algorithm. Let (Ω, F, F, P) be a filtered probability space on which an m-dimensional standard Brownian motion W (·) is defined. We suppose that F = {F t } t∈[0,T ] (T > 0) is the natural filtration, augmented by all the P-null sets in F, associated to W (·) and we recall that F is right-continuous. Let us consider the following controlled Stochastic Differential Equation (SDE): (1.1) dy(t) = f (y(t), u(t), t, ω)dt + σ(y(t), u(t), t, ω)dW (t) t ∈ (0, T ), where f : R n × R r × [0, T ] × Ω → R n and σ : R n × R r × [0, T ] × Ω → R n×m are given maps. In the notation above y ∈ R n denotes the state function and u ∈ R r the control. We define the cost functional (1.2) J(u) = E T 0 (y(t), u(t), t)dt + g(y(T )) .
where : R n × R r × [0, T ] × Ω → R and g : R n × Ω → R are given. Precise definition of the state and control spaces, and assumptions over the data will be provided in the next sections. Let U ad be a non-empty closed, convex subset of R r and (1.3) U := u ∈ (H 2 ) r ; u(t, ω) ∈ U ad , for almost all (t, ω) ∈ (0, T ) × Ω , where H 2 := v ∈ L 2 ([0, T ] × Ω); the process (t, ω) ∈ [0, T ] × Ω → v(t, ω) is F-adapted . The control problem that we will consider is (1.4) Min J(u) subject to u ∈ U.
The first and third authors thank the support from project iCODE :"Large-scale systems and Smart grids: distributed decision making" and from the Gaspar Monge Program for Optimization and Operation Research (PGMO). The second author was supported by CIFASIS -Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas and INRIA.
(3) Set k = k + 1. Compute u k and y k such that y k is the state corresponding to u k and for almost all (t, ω) ∈ [0, T ] × Ω. We will see in Section 3 that u k is well defined if ε k is small enough. (4) Stop if some convergence test is satisfied. Otherwise, go to (2). The main idea of the algorithm is to compute at each step a new control that minimizes the augmented Hamiltonian K ε which depends on H and on a quadratic term that penalizes the distance to the current control. We can prove that this is a descent method and that the distance to a gradient and projection step tends to zero. Consequently, in the convex framework the algorithm is shown to be globally convergent in the weak topology of (H 2 ) r . Step 3 in the algorithm reveals its connection with the extension of the Pontryagin maximum principle [18] to the stochastic setting. We refer the reader to Kushner and Schweppe [14], Kushner [12,13], Bismut [4,6], Haussmann [11] and Bensoussan [2,3] for the the initial works in this area. Afterwards, general extensions were established by Peng [17] and by Cadenillas and Karatzas [9]. The stochastic maximum principle usually involves two pairs of adjoint processes (see [17]). Nevertheless, the gradient of the cost function depends only on one pair of adjoint processes and since we suppose that U is convex, the first order necessary condition at a local optimum u * depends only on ∇J(u * ) (see [20, p. 119-120] for a more detailed discussion).
In this paper we work with two types of assumptions. The first one supposes that σ in (1.1) does not depend on u and that the cost functions and g are Lipschitz. In the second assumption we suppose that the functions f and σ involved in (1.1) are affine with respect to (y, u). Thus, our results are a significant extension of those of [15]. Let us explain now our main improvements, referring to Remark 2.1(i) for other technical differences. In [15] the author studies a restricted form of our first assumption, he shows that if in addition σ is independent of the state y and the problem is convex, then, except for some subsequence, the iterates u k converges weakly to a solution of (1.4). If σ depends on the state y, it is proved in [15,Theorem 5] that given ε > 0, the algorithm can be suitably modified in such a way that every weak limit pointû of u k is an ε-approximate optimal solution, i.e. J(û) ≤ J(u) + ε for all u ∈ U. We show in Theorem 4.6 that such a modification is unnecessary as we prove that the sequence of iterates u k , generated by the algorithm described above, satisfies that each weak limit point of u k solves (1.4). Moreover, as we said before, in our second assumption we suppose that σ can depend in an affine manner on the control u and the state y and the Lipschtiz assumption on the cost terms and g are removed. This implies that the sequences of adjoint states p k are not bounded almost surely, which is the basic ingredient in the proof of the main results in [15]. Finally, let us underline that in Corollary 4.4 we prove that some weak forms of optimality conditions are satisfied for both assumptions. In the convex case, this allows us to prove that the iterates u k form a minimizing sequence, a result that is absent in [15] and also in the deterministic case studied in [7]. Of course, this implies that if in addition J is strongly convex, we have strong convergence of the sequence u k .
The article is organized as follows: in Section 2 we state the main assumptions that we make in the entire paper. In Section 3, we prove that the algorithm is well-defined. In Section 4 we analyse the convergence of the method. We show that the sequence of costs generated by the algorithm is decreasing and convergent. Under some convexity assumptions, we can prove that every weak limit point of the sequence of iterates solves (1.4). Finally, in the last section we consider a quadratic cost with respect to the control, we study the relation with the gradient plus projection method and we prove that the rate of convergence is linear in this case.

Main assumptions
Let us first fix some standard notations. Endowed with the natural scalar product in L 2 ([0, T ]× Ω) denoted by (·, ·), H 2 is a Hilbert space. We denote by · the L 2 ([0, T ] × Ω) norm on (H 2 ) l , for any l ∈ N. As usual and if the context is clear, we omit the dependence on ω of the stochastic processes. We set S 2 for the subspace of H 2 of continuous processes x satisfying that E sup t∈[0,T ] |x(t)| 2 < ∞. Finally, given an Euclidean space R l , we denote by | · | the Euclidean norm and by B(R l ) the Borel sigma-algebra.
Let us now fix the standing assumptions that will be imposed from now on. (H1) Assumptions for the dynamics: is C 2 and there exist a constant L > 0 and a process ρ ϕ ∈ H 2 such that for almost all (t, ω) ∈ [0, T ] × Ω and for all y,ȳ ∈ R n and u,ū ∈ U ad we have (H2) Assumptions for the cost: (a) The maps and g are respectively × Ω the map (y, u) → (y, u, t, ω) is C 2 , and there exist L > 0 and a process ρ (·) ∈ H 2 such that for almost all (t, ω) ∈ [0, T ] × Ω and for all y,ȳ ∈ R n and u ∈ U ad we have (d) For almost all ω ∈ Ω the map y → g(y, ω) is C 2 and there exists L > 0 such that for all y,ȳ ∈ R n and almost all ω ∈ Ω, (H3) At least one of the following assumptions holds true: (a) For all (y, u) ∈ R n × R r and almost all (t, ω) ∈ [0, T ] × Ω we have (2.4) σ u (y, u, t, ω) ≡ 0 and σ yy (y, t, ω) ≡ 0.
Remark 2.1. (i) Our assumptions (H1)-(H2)-(H3)(a) are weaker than those in [15], where it is supposed that and g are bounded and, in the statement of the main results, σ y ≡ 0. In addition, the data , g, f , σ do not depend explicitly on ω and the set U is assumed to be bounded.

Well-posedness of the algorithm
The aim of this section is to prove that the iterates of the Sakawa-Shindo algorithm are well defined. We need first the following lemma. , there exists C > 0 such that the solution (p, q) of (2.6) satisfies Proof. By Itô's formula and (2.6), we have that Using that p(T ) = ∇ y g(y(T )), our assumptions and the Cauchy-Schwarz and Young inequalities imply that for any ε > 0 we have for some constants C 1 , C 2 > 0. Now fixt ∈ [0, T ] and define r(t) := E |p(t)| 2 |Ft ≥ 0 for all t ≥t. Combining (3.4) and [1, Lemma 3.1] we have Thus, by Gronwall's Lemma there exists C > 0 independent of (t, ω) andt such that r(t) ≤ C for allt ≤ t ≤ T and so in particular |p(t)| 2 = r(t) ≤ C for a.a. ω. Sincet is arbitrary and p admits a continuous version, the result follows.
Proof. We follow [7,15]. Setting w := (y, p, q, v), element of E := R n × P × R n×m × U ad , we can rewrite K ε as K ε (u, w). We claim that for ε small enough: This holds if (H3)(a) holds since p, f uu and uu are bounded, and also if (H3)(b) since f and σ are affine and uu is bounded. By (3.9), K ε is a strongly convex function of u with modulus 1/(2ε), and hence, for all u 1 , u 2 ∈ U ad : On the other hand, for i = 1, 2, take Summing these inequalities for i = 1, 2 with (3.10) in which we set w := w 1 , we obtain that Inequality (3.2) easily follows from (3.13) and our assumptions. . Therefore u 1 := u ε 1 (y, p 0 , q 0 , u 0 ) is uniquely defined; so is u k by induction.

Convergence
In this section we prove our main results. If sup k ε k ≤ ε 0 where ε 0 is small enough, then the cost function decreases with the iterates (see Theorem 4.2 and Theorem 4.3). Moreover, if the problem is convex, then any weak limit point of the sequence u k solves the problem (see Theorem 4.6). We will need the following elementary Lemma.
Lemma 4.1. Under assumption (H1), there exists C > 0 such that for any u, u ∈ U, where y and y are respectively the associated states to u and u .
Now we consider the projection map P U : (H 2 ) r → U ⊂ (H 2 ) r , i.e. for any u ∈ (H 2 ) r , By [8, Lemma 6.2], (4.23) P U (u)(t, ω) = P U ad (u(t, ω)), a.e. t ∈ [0, T ], P − a.s., where P U ad : R r → U ad ⊂ R r is the projection map in R r . We have the following result: Assume that J is bounded from below and that assumptions (H1)-(H3) hold true. Then there exists ε 0 > 0 such that, if ε k < ε 0 , any sequence generated by the algorithm satisfies: Proof. The first two items are a consequence of Theorem 4.2 and the fact that J is bounded from below. Since u k minimizes K ε k we have then, and so

By [8, Proposition 3.8] we know that
then, by (4.26), (4.28) As P U is non-expansive in H 2 , we obtain Now, let us estimate the last term in the previous inequality. By (H3), considering any of the two cases, we have σ uy ≡ σ uu ≡ 0. Therefore, for a.e. t ∈ [0, T ] there exist (ŷ,û) ∈ R n × U ad such that (4.30) where in the last inequality, if (H3)-(a) holds we use Lemma 3.1 and assumptions (H1)-(H2), and if (H3)-(b) holds we use the fact that f uy ≡ f uu ≡ 0 and (H2). We conclude that Using (4.29)-(4.31), Lemma 4.1 yields the existence of C > 0 such that which proves the last assertion.
Corollary 4.4. Assume that the sequence generated by the algorithm u k is bounded, ε k < ε 0 and lim inf ε k ≥ ε > 0. Then, for every bounded sequence v k ∈ U we have that In particular and in the unconstrained case U = (H 2 ) r (4.35) lim k→∞ ∇J(u k ) = 0.
Proof. We define for each k, then we have By Theorem 4.3(3), we know u k − w k → 0. Using the fact that ∇J(u k ) = ∇ u H(y k , u k , p k , q k ), and the boundedness of u k , which in particular yields the boundedness of (p k , q k ) in (S 2 ) n × (H 2 ) n×m (see e.g. [16,Proposition 3.1] or [20,Chapter 7]), we can deduce that ∇J(u k ) is a bounded sequence in (H 2 ) r . Finally, since the sequence v k is bounded by assumption, we can conclude which proves (4.33). Inequality (4.34) follows from (4.33) by taking v k ≡ v, for any fixed v ∈ U, and identity (4.35) follows by letting v k = u k − ∇J(u k ), which is bounded in (H 2 ) r . Under convexity assumptions we obtain a convergence result.
Theorem 4.6. Let J(u), defined as above, be convex and bounded from below. Moreover, suppose that ε k < ε 0 , where ε 0 is given by Theorem 4.3, and lim inf ε k > 0. Then any weak limit pointū of u k is an optimal control and J(u k ) → J(ū).
Proof. Consider a subsequence u k i i∈N that converges weakly toū, then u k i is bounded. Using the convexity of J and (4.34), for all v ∈ U we obtain By Theorem 4.3 (1), J(u k ) is a convergent sequence, then we obtain where the last inequality holds because of the weak lower semi-continuity of J, which is implied by the convexity and continuity of J. Since (4.40) holds for all v ∈ U, we can conclude thatū is an optimal control and J(u k ) → J(ū).
Remark 4.7. The fact that any weak limit point is an optimal control can also be obtained with Theorem 4.3(3) and the arguments in [7].
We also want to note that the previous result is one of the main differences with respect to [15], where it is only shown that the weak limit points are ε-approximate optimal solutions.
If J is strongly convex in (H 2 ) r we obtain strong convergence of the iterates. Proof. Since J is strongly convex, there exists a unique u * such that J(u * ) = min u∈U J(u). Moreover, since Theorem 4.3(1) implies that J(u k ) ≤ J(u 0 ), for all k ∈ N, the strong convexity of J implies that the whole sequence u k is bounded and by the previous Theorem, J(u k ) → J(u * ). By the strong convexity of J and the optimality of u * , there exists β > 0 such that Since J(u k ) → J(u * ) this implies u k − u * → 0.

Rate of Convergence
In this section, as in [7], we present a particular case where, endowing with a new metric the space of controls, the algorithm is equivalent to the gradient plus projection method [10]. The iterates of the method, for the minimization of J over the set U, can be written as where ε k is the step size. Now, we consider the interesting case where the cost function is quadratic with respect to the control, so we make the additional assumption: (H4) The function has the form where 1 and the matrix-valued N process are such that (H2) is satisfied. In particular, N is essentially bounded. Moreover, we assume that N (t, ω) is symmetric for a. a. (t, ω). In what follows we assume that (H1) and (H2) are satisfied, and, since the cost is quadratic in u, we will assume (H3)-(b). Let us point out that instead of (H3)-(b), we can assume that and g satisfy (H3)-(a) with the additional assumption that U ad is bounded, but, for the sake of simplicity, we will only assume (H3)-(b), i.e. that the state equation can be written as (5.3) dy(t) = [A(t)y(t) + B(t)u(t) + C(t)]dt + m j=1 [A j (t)y(t) + B j (t)u(t) + C j (t)]dW j (t), t ∈ (0, T ), y(0) = y 0 where A, B, C, A j , B j and C j , for j = 1, ..., m are matrix-valued processes such that (H1) holds.
We also assume that the sequence {ε k } is constant and equal to ε > 0. For this ε let us set Note that if ε is small enough, then, since N is essentially bounded, there exist δ, δ > 0 such that δ |v| 2 ≥ v M ε (t, ω)v ≥ δ|v| 2 for a.a. (t, ω) and all v ∈ R r . Thus, defining the norm · ε is equivalent to the L 2 -norm in (H 2 ) r and so ((H 2 ) r , · ε ) is a Hilbert space.
Remark 5.1. If we note ∇ ε J the gradient of the functional J in ((H 2 ) r , · ε ), we obtain the relation where ∇J is the gradient of J in (H 2 ) r endowed with the L 2 -norm.
Proof. Let u k be the sequence generated by the algorithm, then by (4.25) and all the assumptions, we obtain skipping time arguments ∀v ∈ U ad , a.e. t ∈ [0, T ], P − a.s. Thus, and this is equivalent to By the previous Remark and (5.5), the inequality (5.9) implies that u k is the point obtained by the gradient plus projection method when we consider (H 2 ) r endowed with the metric defined by (5.5).
Under additional assumptions, the gradient plus projection method has a linear rate of convergence. In our framework, we will need the following result. in H 2 , where u ∈ U, y is the state associated to u, given by (1.1), and p, q are the corresponding adjoint states, given by (1.6). Now, let u and u in U, and y, p, q and y , p , q the associated states and adjoint states, respectively. Definep := p − p andq := q − q , then (p,q) solves the following BSDE, (5.11) dp(t) = − ∇ y 1 (y, t) − ∇ y 1 (y , t) + A(t) p(t) + m j=1 A j (t) q j (t) dt +q(t)dW (t) t ∈ (0, T ), p(T ) = ∇ y g(y(T )) − ∇ y g(y (T )). Theorem 5.4 (Linear convergence). Assume that there exists γ > 0 such that for a.a. (t, ω) we have v N (t, ω)v ≥ γ|v| 2 for all v ∈ R m . Moreover, suppose that the assumptions of Lemma 5.3 hold true and that 1 (·, t, ω) and g(·, ω) are convex for a.a. (t, ω). Then, J is strongly convex (in both norms), (1.4) has a unique solution u * and for ε small enough, there exists α ∈ (0, 1) and C > 0 such that where u k is the sequence generated by the algorithm.
Proof. By our assumptions J is strongly convex in ((H 2 ) r , · ), and it is easy to check that J is also strongly convex in ((H 2 ) r , · ε ) with a strong convexity constant β = γ/δ > 0, independent of ε small enough. Then, there exists a unique u * solution of (1.4).
Since the norms · ε and · are equivalent, by (5.6) and Lemma 5.3 we can deduce that ∇ ε J is Lipschitz in (H 2 ) r endowed with the norm · ε and the Lipschitz constant L is independent of ε small enough. It is known that the rate of convergence of the gradient plus projection method is linear when the gradient is Lipschitz, but we write the proof in a few lines for the reader's convenience. By Theorem 5.2 we know that (5.15) u k+1 = P ε U (u k − ε∇ ε J(u k )) where P ε U is the projection map in ((H 2 ) r , · ε ). By the optimality of u * we have u * = P ε U (u * − ε∇ ε J(u * )), then for any k we obtain (5.16) u If ε < 2β/L 2 we have 1 − 2εβ + ε 2 L 2 < 1. Using again the equivalence between · ε and · we deduce that (5.14) holds true.