Line search globalization of a semismooth Newton method for operator equations in Hilbert spaces with applications in optimal control

We consider the numerical solution of nonlinear and nonsmooth operator equations in Hilbert spaces. A semismooth Newton method is used for search direction generation. The operator equation is solved by a globalized semismooth Newton method that is equipped with an Armijo linesearch using a semismooth merit function. We prove that an accumulation point of the globalized algorithm is a solution and transition to fast local convergence under a directional Hadamard-like continuity assumption on the Newton matrix. In particular, no auxiliary descent directions or smoothing steps are required. Finally, we apply this method to a control-constrained and also to a regularized state-constrained optimal control problem subject to partial differential equations.


(Communicated by Liqun Qi)
Abstract. We consider the numerical solution of nonlinear and nonsmooth operator equations in Hilbert spaces. A semismooth Newton method is used for search direction generation. The operator equation is solved by a globalized semismooth Newton method that is equipped with an Armijo linesearch using a semismooth merit function. We prove that an accumulation point of the globalized algorithm is a solution and transition to fast local convergence under a directional Hadamard-like continuity assumption on the Newton matrix. In particular, no auxiliary descent directions or smoothing steps are required. Finally, we apply this method to a control-constrained and also to a regularized state-constrained optimal control problem subject to partial differential equations.
1. Introduction. In this article, we consider the problem of finding az ∈ Z with where Z is a Hilbert space, Z * denotes its topological dual, and f : Z → Z * is a locally Lipschitz continuous operator. In particular, we allow f to be nonlinear and nonsmooth. Problems of this type arise, e.g., from the reformulation of the necessary optimality conditions in control-constrained optimal control subject to partial differential equations (PDE), see Section 4. Semismooth Newton methods for operator equations of the type of (1) have been examined in [4,10,15,16,18,19,21,22,23,24]. A trust-region globalization of this method was presented in [22], global convergence is proved for a primal-dual active set method in finite dimensions in [15,16]. A primal-dual active set strategy and semismooth Newton methods may lead to identical algorithms [8]. The application of semismooth Newton methods to optimal control of PDE was considered, e.g., in [9,10,12,13,20,22,23,24]. For variationally discretized control-constrained optimal control problems subject to elliptic differential equations a globalization strategy was developed by Hinze and Vierling [12,13]. Discretized nonlinear optimal control problems subject to control and state constraints can be solved by globally convergent nonsmooth Newton methods, see for instance [7]. It has been demonstrated that a norm gap between range and domain of a projection operator is required for semismoothness, see e.g. [21,Th. 3.3], [22,Remark 3.34] or [24,Ch. 4]. In [21] the semismooth Newton method is revisited from a different point of view involving semismooth superposition operators and replacing Lipschitz continuity by weaker grow conditions.
We extend the local semismooth Newton method, see M. Ulbrich [22], by a globalization strategy in Hilbert spaces from [14]. We prove that if our global algorithm produces an accumulation point, then we have global convergence to a zero of f , see our main results Theorem 3.4 and Theorem 3.5, that, to our best knowledge, have not been proved so far. As in [22] we do not require a strict complementarity assumption for these results.
Our article is organized as follows. The concept of a semismooth Newton method is briefly presented in Section 2. In Section 3 we prove our main result. This globalization result is applied to optimal control problems with control constraints in Sect. 4, using an all-at-once approach. We close with numerical examples for control constraints and for state constraints in Sect. 5.
2. Semismooth Newton method. For applying a Newton method on (1) an appropriate substitute for the Fréchet-derivative f has to be found. Therefore we work with the following abstract semismoothness concept, cf. Def. 3.1 in [22]: The multifunction ∂ * f is called generalized differential of f . Please note that the uniform semismoothness of f in part (c) of Definition 2.1 is only required later on in the particular case that a (sub)sequence of stepsizes in the globalized Newton method tends to zero, compare Part b) of Theorem 3.4. However, there exist projectors s.t. uniform semismoothness implies Fréchet differentiability. For this and a discussion of non-uniform semismoothness of projectors see [21,Sect. 3]. Note that a different definition of uniform semismoothness is used by Correa and Joffre [5,Remark 6.3] in which the uniformity refers to a family of functions. For a uniform semismoothness within the context of mesh independence see [24,Ch. 6].
It is easy to prove that the direct product of (α-order) semismooth operators is (α-order) semismooth itself with respect to the direct product of the generalized differentials of its components. An equivalent relation holds for the composition of semismooth operators. For details see [22]. For Z = R n thanks to Rademacher's theorem such a substitute for locally Lipschitz continuous functions is given by Clarke's generalized Jacobian. In infinite dimensional spaces such a generalized differential is not given naturally and has to be defined separately, see e.g. [6,22].
A semismooth Newton method for equation (1) is given by (i) Choose a starting point z 0 ∈ Z and set k = 0.
(ii) Is a stopping criterion satisfied: STOP.
(iii) Choose an arbitrary M (z k ) ∈ ∂ * f (z k ) and compute the search direction s k ∈ Z by solving (iv) Set z k+1 = z k + s k , k := k + 1 and goto (ii).
Here we need some regularity condition for the operators M (z k ) ∈ ∂ * f (z k ) similar as in [22, Assumption 3.11 (i)].
This assumption is a standard assumption for Newton methods and can be verified specifically for various examples e.g. [9,10,17]. In a finite dimensional setting, this assumption will be satisfied close to a zeroz of f , if all elements M of ∂ * f (z) are non-singular, where ∂ * f denotes the compact-valued and upper semicontinuous subdifferentials of Bouligand, Clarke or Qi, compare [23,Remark 2.8].
For our globalization result we need Assumption 3.1, see below, for the subdifferentials. Furthermore we need a certain structure of the superposition operator arising within the context of the application to optimal control, see Assumption 4.2 below, that avoids a smoothing step in Algorithm 2.2 required in [6] within this context.
We recall the local convergence of Algorithm 2.2 that has been shown in [6] and [22]. Assumez ∈ V is a zero of (1), then the following assertions hold: (a) If f is ∂ * f -semismooth inz, then there exists a constant δ > 0, such that for all z 0 ∈z + δB Z Algorithm 2.2 generates a sequence {z k } ⊂ V that converges superlinearly toz. (b) If f in (a) is α-order ∂ * f -semismooth inz, 0 < α ≤ 1, the rate of convergence is 1 + α.
(c) If f (z k ) = 0 for all k, the sequence of residuals converges superlinearly, i.e.
3. Globalization. To construct a globally convergent Newton method Algorithm 2.2 is extended by an Armijo line search. An appropriate merit function Θ : Due to the fact that Z is a Hilbert space one can prove the semismoothness of (3).
To this end, let ·, · Z * ,Z denote the dual pairing between Z * and Z while (·, ·) Z is the inner product on Z. We need In particular, this assumption is fulfilled in the caseε = 0, corresponding to a directional Hadamard-like continuity of M .
The conditions in Assumption 3.1 have some similarity with the definition of Hadamard differentiability [1, Def. 2.45] along curves. In our context, however, these conditions ensure the continuity or the smallness of jumps of certain elements of the subdifferentials at least in the directions s k . Assumptions (a) and (b) are satisfied generically at all points of differentiability of f . At points of non-differentiability of f the assumptions do impose restrictions on the choice of the elements from the subdifferentials and enforce a continuous selection. There might be cases where such a selection is not possible, because the search direction s k may be obtained with some M (z k ) and points into a direction where M ρ is bounded away from M (z k ). In the latter case we need at least bounds on the jump of M along a search direction.
However, we are able to monitor this situation algorithmically since Assumption 3.1 (a) will be exploited in the subsequent lemma to prove that the search direction is a direction of descent. Hence, if the search direction turns out to be a direction of ascent, then Assumption 3.1 (a) will be violated. In our numerical experiments that, however, rely on a discretized version of Algorithm 3.3 we haven't encountered this situation, though.
where (·, ·) Z * denotes the inner product in Z * . Then the following assertions hold: (c) Let {z k } ⊂ Z be a sequence generated by Algorithm 2.2 and let Assumption 2.3 hold. For any s k being defined by Proof. (a) As common we identify the dual space Z * * with Z * by means of the Riesz representation. We find Taking the limit h Z * → 0 yields the assertion. (b) Due to the (α-order) ∂ * f -semismoothness and local Lipschitz continuity of f , the validity of assertion (a) and the boundedness of H (f ), all requirements of Proposition 3.7 in [22] are satisfied and Θ is (α-order) ∂ * Θ-semismooth. Thus assertion (b) holds. (c) With (a) we find The ∂ * Θ-semismoothness of Θ according to (b) implies

This relation holds for arbitrary choices of
Then, together with the continuity of H we find and sinceε < 1, s k is a direction of descent of Θ at z k unless Θ(z k ) = 0.
Hence with f (z k ) = 0 the Armijo line search in the following algorithm is well defined. The descent property of s k w.r.t. Θ(z k ) follows particularly from the construction of the merit function Θ. (i) Choose z 0 ∈ Z, β ∈ (0, 1), σ ∈ (0, 1/2) and set k = 0.
(ii) Is a stopping criterion satisfied: STOP.
(iii) Choose an arbitrary M (z k ) ∈ ∂ * f (z k ) and compute the search direction s k ∈ Z by solving and set α k = β i k . (v) Set z k+1 = z k + α k s k , k := k + 1, and goto (ii).
Using (4) the inequality in (5) can equivalently be written as Hence V k , s k Z * ,Z has not to be computed explicitly. Global convergence of Algorithm 3.3 can be shown by adapting Theorem 4.2 in [6] that has been derived within the frame of optimal control of ordinary differential equations.  Proof. We adapt the argumentation of Theorem 4.2 in [6]. Therefore let {z kj } j∈N be a subsequence with z kj →z and f (z kj ) = 0. Then, V (z kj ), s kj Z * ,Z = −2 Θ(z kj ) = − f (z kj ) 2 Z * < 0. According to Lemma 3.2 (c), s kj is a descent direction of Θ at z kj and the line search is well defined. There are two cases: (a) Assume α > 0. We have With σ ∈ (0, 1/2) and α ≤ α kj ≤ 1 it follows that 0 < 1−2 σ α kj ≤ 1−2 σ α < 1. Repeated application yields By the continuity of f ,z is a zero of f .
Z is a Hilbert space and thus reflexive. According to the Eberlein-Smulian theorem, there exists a weakly convergent subsequence {s k }, k ∈ I ⊆ J. Hence, there exists somes ∈ Z such that for every V (z) ∈ Z * we have We get The first term vanishes owing to the locally uniform semismoothness of Θ, the last term vanishes owing to (6) as k → ∞. For the remaining term we obtain withM , as defined in Assumption 3.1 (b), the following estimate The first term vanishes owing to the continuity of f and the boundedness of {M (z k )} and {s k }. The second term vanishes because of Assumption 3.1 (b).
In summary we have shown that lim k→∞,k∈I The line search in step (iv) of Algorithm 3.3 yields Passing to the limit and exploiting the previous considerations yields σ V (z),s Z * ,Z = V (z),s Z * ,Z .
Since σ ∈ (0, 1/2) this only holds for V (z),s Z * ,Z = 0. Thus, we have shown ,s Z * ,Z = 0. By the continuity of f ,z is a zero of f . Now we can prove the transition of Algorithm 3.3 to fast local convergence, provided it converges at all. We present the proof here in details in order to show that we do not need Assumption 3.1 (b) and the locally uniform semismoothness in Definition 2.1 (c). If f is ∂ * f -semismooth the sequence {z k } converges superlinearly toz and α k finally becomes 1. If f is α-order ∂ * f -semismooth the rate of convergence is 1 + α.
The idea of the proof is as follows. According to the local convergence result (Theorem 2.4), we know that the local semismooth Newton method (with α k = 1) converges superlinearly in some neighbourhood of a zero. Now, sincez is an accumulation point of the sequence {z k } there is a subsequence converging toz, which is assumed to be a zero. For k sufficiently large, z k is in a neighbourhood ofz, where we have superlinear convergence of the local method. For one step of the local method we show that α k = 1 satisfies the Armijo condition. Hence, the globalized method at z k will accept α k = 1 and from that point on the sequences of the local method and the globalized method coincide.
Proof. Let ε > 0 be arbitrary. The superlinear convergence result of the local algorithm (Theorem 2.4) with f being semismooth implies the existence of a δ > 0 such that for all z k −z Z ≤ δ we have Since f (z) = 0 the Lipschitz continuity of f yields with a constant L > 0. Combining this estimate with (7) we have On the other hand with Assumption 2.3

This yields
Together with (8) we have the estimate i.e. α k = 1 is accepted. With (7) and ε < 1 we can repeat the argument and obtain convergence of the whole sequence. If f is ∂ * f -semismooth the convergence is superlinear according to Theorem 2.4 (a). If f is α-order ∂ * f -semismooth the rate of convergence is 1 + α according to Th. 2.4 (b).
Thereby we have proved the transition to fast local convergence of Algorithm 3.3 requiring only Assumption 3.1 (a) on the directional Hadamard-like continuity of the Newton matrix and a standard assumption on its uniform non-singularity. 4. Application to optimal control. In control-constrained optimal control subject to PDE constraints the reformulation of the necessary optimality conditions leads to problems of the type of (1). For Hilbert spaces Y , W , U and the feasible set U ad ⊂ U such an optimal control problem, in general, is given as min Here y denotes the state while u is the control. Semismooth Newton methods for optimal control problems in the general setting of (P1) are investigated in [22], for more special cases see also [9,12,13,20,24]. Globalization strategies for semismooth Newton methods applied to a reduced formulation of (P1) are investigated, e.g., in [12,13,22]. Although Algorithm 3.3 would be applicable to this reduced problem as well, we want to solve the full Karush-Kuhn-Tucker (KKT) system related to (P1). In [22] this is referred to as the all-at-once approach.
As usual the Lagrange function L : Y × U × W → R is defined as Then the KKT-system is given by Lemma 4.1 (cf. Corollary 1.3 in [11]). Let (ȳ,ū) be a solution of (P1). Furthermore, let J : Y ×U → R and E : Y ×U → W * be continuously Fréchet-differentiable. Then there exists a Lagrange multiplierw ∈ W , such that the following optimality conditions hold: L y (ȳ,ū,w) = J y (ȳ,ū) + E y (ȳ,ū) * w = 0, We consider the Euclidean projection P U ad onto the set of admissible controls and introduce the continuous function π : the variational inequality in (13)  For further analysis let Ω ⊂ R n be an open, bounded, measurable set with measure µ(Ω) > 0. We define the set of control functions as U := L 2 (Ω). For simplicity we consider pointwise convex controls by using as the feasible set. The pointwise projection P U ad : R → R, ξ → max{a, min{ξ, b}} is (1-order) semismooth [22,Ex. 5.23]. Note that P U ad is not locally uniformly semismooth in any neighbourhood of a resp. b. In order to establish semismoothness of the superposition operator Π we follow [22,Assumption 5.20] and assume the following: Assumption 4.2. The following conditions hold: (a) E : Y × L 2 (Ω) → W * and J : Y × L 2 (Ω) → R are twice continuously Fréchetdifferentiable. (b) L u has the form L u (y, u, w) = λ u + G(y, u, w) and there exist λ > 0 and p > 2, such that The operator (y, u, w) ∈ Y × L 2 (Ω) × W → G(y, u, w) ∈ L p (Ω) is locally Lipschitz-continuous. Step size α k , norm f (z k ) Z * and norm of the search direction s k Z for the k-th iterate. These numerical results exhibit the superlinear convergence.
As already mentioned in Section 2, a generalized differential of (15) is not given naturally. Motivated by the sum and chain rule in finite dimensions in [22], the set-valued mapping M (y, u, w) =   L yy (y, u, w) L yu (y, u, w) E y (y, u) * γD G y (y, u, w) I + γD G u (y, u, w) γD G w (y, u, w) E y (y, u) is used as a generalized differential for (15). Here the subscript "C" emphasizes the close relation to Qi's C-subdifferential in finite dimensions.  (15) is locally Lipschitz continuous and ∂ C f -semismooth.
For a proof of Th. 4.3 the reader is referred to [22,Th. 5.21]. Therefore f as in (15) meets the requirements needed for Algorithms 2.2 and 3.3. The merit function in (3) for problem (P1) is thus given as 5. Numerical examples. In this section we start with a numerical example for the application of Algorithm 3.3 to problems of the type of (P1).

Semilinear elliptic PDE with control constraints.
For simplicity we restrict ourselves at first to the following control-constrained optimal control problem subject to a semilinear elliptic PDE that was taken from [10,Ch. 6] and is motivated by an application in superconductivity: Minimize J(y, u) : The desired state is y d := 1 6 sin(2πx 1 ) sin(2πx 2 ) exp(2x 1 ), the Tikhonov parameter is chosen as λ := 10 −3 . We convince ourselves that Assumption 4.2 is fulfilled for this example, where Y = W = H 1 0 (Ω), Y * = W * = H −1 (Ω) and L u (y, u, w) = λu + w yielding that G(y, u, w) = w has the required regularity. Hence It is not clear, how to check a priori whether Assumption 2.3 and Assumption 3.1 (a) hold in general in an infinite dimensional function space.
However, for both our examples the well-posedness of the linear operator M and the optimal control problem in a function space setting can be shown by means of the Lax-Milgram theorem (see [10,Sect. 4 & 5] and [9,Sect. 3], respectively), exploiting a sufficient second-order condition [3] for the semilinear state equation. Otherwise, we can see within our numerics whether Assumption 2.3 holds.
We verify Assumption 3.1 (a) for our undiscretized example, assuming that the Tikhonov parameter λ is sufficiently large. In the matrix M we have only the non-differentiable component γDG w (y, u, w) (see (16)), in which in our example G(y, u, w) = w. Thus, exploiting that E y depends continuously on z in our example and α k ≤ 1, we estimate where we use (2) in the last estimate. In case of ∂ C P U ad (γ(w k + ρs k,3 )) − ∂ C P U ad (γw k ) L(Z,Z * ) = 0, Assumption 3.1 (a) holds immediately withε = 0. However, once the subdifferential M (z k ) has been fixed at a point of non-differentiability, ∂ C P U ad (γw k ) ∈ (0, 1] (or ∈ [0, 1) alternatively), it may happen that M ρ can only be chosen such that ∂ C P U ad (γw k ) = 0 (or = 1 otherwise) at a point of differentiability. In the latter situation ∂ C P U ad (γ(w k + ρs k,3 )) − ∂ C P U ad (γw k ) L(Z,Z * ) ∈ (0, 1] and this does not vanish as ρ tends to zero from above. However the constant C M −1 is uniform and for sufficiently large λ = 1/γ, we may guarantee Assumption 3.1 (a). This estimate is also supported by numerical evidence, where we observe always descent directions, unless we reduce λ significantly. For the approximation of the state as well as the control, piecewise linear finite elements are used. In this context, let T h be a triangulation of Ω for which the usual regularity assumptions hold. Furthermore, let h := max T ∈T h diam(T ) denote the size of the mesh. For details we refer to [2]. Analogously for a finite element discretization the well-posedness of the Newton step and the discretized problem can be derived for both examples [10,9], together with a mesh independence result for the local algorithm under the further assumption that the set where the strict complementarity is violated has measure zero. As a starting point we use z 0 = 0. Our stopping condition is f (z k ) Z * ≤ 10 −10 . Table 1 shows the iteration history for h = 1/256. In iterations 1 to 17 the step sizes are chosen with α k < 1 due to the Armijo line search. As predicted by Theorem 3.5, the acceptance of α k = 1 by the Armijo line search leads to the transition of Algorithm 3.3 into Algorithm 2.2. The expected superlinear convergence is confirmed by the evolution of the values f (z k ) Z * and s k Z . We observe that α is uniformly bounded below and corresponding to Th. 3.4 (a) and Th. 3.5 the numerical accumulation point is a solution. Figure 1 shows the discrete solution of the optimal state y h and the optimal control u h for h = 1/64.

5.2.
Semilinear elliptic PDE with state constraints. Problem (P2) has illustrated the performance of our algorithm. However, problems of the type of (P2) have been solved so far, see e.g. [10,12,13]. Hence we close with an example for a semilinear elliptic PDE with state constraints [9,Ch. 6] where, to our knowledge, previous globalization strategies cannot be applied: We would like to track the state y d := 1 2 cos(π x 1 ) cos(π x 2 ) exp(x 1 ). As Tikhonov parameter we consider λ := 10 −4 . We use a Lavrentiev regularization of the stateconstraint in (P3) −10 −2 ≤ u + y ≤ 0, where > 0 is a sufficiently small parameter, yielding a mixed control-state constraint.
Note that we consider the same objective functional as in (P2), but in addition to the regularized state constraints, we allow for a term y 3 exp(10y) instead of y 3  Table 2. Iteration history for the solution of problem (P3) for h = 1/128.
Step size α k , norm f (z k ) Z * and norm of the search direction s k Z for the k-th iterate. We observe transition to local superlinear convergence. in the PDE. However, the mixed control-state constraint yields a slightly different setting as in Sect. 4. Concerning the validation of Assumption 2.3 and Assumption 3.1 (a) see the discussion for the first example. However, with respect to Assumption 3.1 (a) we have to keep in mind that the Lavrentiev regularization parameter and the Tikhonov parameter λ should not be chosen independently.
Again, we work with the stopping condition f (z k ) Z * ≤ 10 −10 . For numerical results for = 10 −3 as Lavrentiev regularization parameter, see Table 2 and Fig. 2.