Tikhonov regularization of optimal control problems governed by semi-linear partial differential equations

In this article, we consider the Tikhonov regularization of an optimal control problem of semilinear partial differential equations with box constraints on the control. We derive a-priori regularization error estimates for the control under suitable conditions. These conditions comprise second-order sufficient optimality conditions as well as regularity conditions on the control, which consists of a source condition and a condition on the active sets. In addition, we show that these conditions are necessary for convergence rates under certain conditions. We also consider sparse optimal control problems and derive regularization error estimates for them. Numerical experiments underline the theoretical findings.


Introduction
We consider the following optimal control problem Minimize J(u) = 1 2 y u − y d The standing assumptions on the data of the problem will be made precise below.
Since the cost function J only implicitly depends on u through the solution y of the state equation, the control problem is not coercive with respect to u in suitable spaces. Optimal controls of (P) may exhibit a bang-bang structure, where the control constraints are active on the whole domain, i.e., u(x) ∈ {u a , u b } almost everywhere. In addition, due to the nonlinear constraint (1.1) the resulting optimal control problem is non-convex. This makes the analysis and numerical solution of this problem challenging. To address this issue, we investigate the Tikhonov regularization of the problem given by: Minimize subject to the semilinear equation and the control box constraints. Here, α > 0 is the Tikhonov regularization parameter. Here, we are interested in convergence of solutions or stationary points u α of the regularized problems for α ց 0. Under suitable conditions, we prove in Section 4 convergence rates of the type u α −ū L 2 (Ω) = O(α d/2 ) for α ց 0, see Theorem 4.4. This is the main result of the paper, and it is the first convergence rate result for regularization of optimal control problems subject to nonlinear partial differential equations. In addition, we also derive necessary conditions for convergence rates. As it turns out, a certain source condition is necessary to obtain convergence rates, see Section 5.
In the subsequent analysis, we will make use of the second-order conditions developed by Casas [1]. They require positive definiteness of the secondderivative J ′′ of the reduced cost functional with respect to solutions of linearized equations, see (2.8) below. A second ingredient is a condition on the optimal control and adjoint state of the original problem. This condition was used earlier for convex problems to prove convergence rates for Tikhonov regularization in [18]. The present paper continues these investigations and generalizes the convergence rate results to optimal control problems with non-linear state equations.
We also investigate sparse control problems given by Minimize J(u) = 1 2 y u − y d 2 L 2 (Ω) + β u L 1 (Ω) such that u a ≤ u ≤ u b a.e. in Ω, where β > 0 is a parameter. This is a non-smooth variant of the control problem above. Again we study the Tikhonov regularization and derive error estimates, see Section 6.
Optimal control of semi-linear partial differential equations has been intensively studied in the literature, we refer to the monograph [14]. In recent years, there is also a growing interest in sparse optimal control problems starting with [12], see also [1,3]. Tikhonov regularization and its convergence was studied in [15,16,19] in connection with linear-quadratic optimal control problems. As we show in this paper, the results obtained for linear equations can be carried over using similar techniques while heavily relying on the second-order condition of Casas [1].
The work on regularization of optimal control problems is certainly connected to regularization of nonlinear inverse problems: If no control constraints are present, i.e., U ad = L 2 (Ω), the problem (P) is an heat source identification problem, which amounts to a nonlinear, ill-posed operator equation. Tikhonov regularization of nonlinear equations is studied, e.g., in the monograph [6]. Necessary conditions for convergence rates for non-linear problems can be found in [10]. Regularization of variational inequalities was studied in [9]. In some sense, our results generalize results from inverse problems theory: If no control constraints are present, our regularity conditions reduce to well-known source conditions. The paper is structured as follows. In Section 2 we introduce the necessary tools needed later for the convergence analysis, e.g., the second order sufficient condition and our regularity assumption. A stability analysis of the Tikhonov regularization for α → 0 is done in Section 3. The associated convergence rates are established in Section 4. The regularity assumption is also necessary for the convergence rates, which is shown in Section 5. In Section 6 we extend our analysis to a sparsity promoting objective functional and establish convergence rates under a suitable modified regularity assumption. Numerical results are provided in Section 7.

Assumptions and preliminary results
In the sequel, we will make use of the following assumptions, see [1]. To shorten our notation, will denote the partial derivatives ∂ ∂y f and ∂ 2 ∂y 2 f by f ′ and f ′′ , respectively.
For all M > 0 there exists a constant C f,M > 0 such that x ∈ Ω and |y| ≤ M.
(A2) The coefficients of the operator A satisfy a ij ∈ C(Ω). There exists some λ A > 0 such that Under these assumptions we can establish the following results. Existence and uniqueness of solutions of the state equations are well-known, see, e.g. [2, Thm. 2.1].
For convenience, let us introduce the space Y := H 1 0 (Ω) ∩ C(Ω) endowed with the norm y Y := y H 1 0 (Ω) + y C(Ω) . Then Theorem 2.1 implies the existence of M > 0 such that In addition, S maps weakly converging sequences to strongly converging sequences: Lemma 2.2. Let (u k ) be a sequence in U ad converging weakly in L 2 (Ω) to u. Then, the associated sequence of states (y k ) converges strongly in Y to y u .

Existence of solutions
The existence of solutions of the optimal control problem can be proved by classical arguments. Theorem 2.3. Problem (P) has at least one solutionū with an associated statē y ∈ H 1 0 (Ω) ∩ C(Ω). The derivatives of the control-to-state map S can be characterized by the following systems. Let u ∈ L p (Ω) be given with y u := S(u). Then z := S ′ (u)v is the unique weak solution of In addition, let us introduce the adjoint state p u associated to u as the unique weak solution of the adjoint equation Using these expressions, the derivatives of the cost functional J are given by the following lemma.
Lemma 2.4. The functional J : L 2 (Ω) → R is of class C 2 , and the first and second derivative is given by where we used the notation y := y u , p := p u , z vi := S ′ (u)v i .
Let us recall the first-order necessary optimality conditions. Theorem 2.5. Letū be a local solution of problem (P). Then there isȳ := S(ū) andp := pū such that the following system is satisfied:

5)
Let us close this section with the following stability result regarding the solutions of the adjoint equations. Lemma 2.6. Letū ∈ U ad be given with associated stateȳ and adjoint statep. Then there is a constant c > 0 such that for all u ∈ U ad it holds Proof. Let us denote y := y u and p := p u . Then the difference p −p of the adjoint states satisfies Due to the Lax-Milgram theorem and Stampacchia's estimates [13,Théorème 4.2], there is c > 0 such that Sincep is the solution of a linear elliptic equation with right-hand side in L 2 (Ω), we knowp ∈ L ∞ (Ω). Hence, we can estimate using the assumptions on f with M given by (2.2). And the claim is proven

Sufficient second-order optimality conditions
To formulate the sufficient second order conditions we will need the following notation. Following Casas [1], we define for τ > 0 the extended critical cone at u ∈ U ad by The second-order condition now reads as follows.
Assumption SOSC (Second order sufficient condition). Letū ∈ U ad be given. Assume that there exists δ > 0 and τ > 0 such that where we used the notation This condition together with the first-order necessary conditions imply local optimality, see [1].
Theorem 2.7. Let us assume thatū is a feasible control for problem (P) satisfying the first order optimality conditions (2.5)-(2.7) and the second order condition SOSC. Then, there exists ε > 0 such that

Regularity conditions
In order to derive regularization error estimates for the control we assume some regularity onū. We say, thatū satisfies the assumption ASC, if the following holds.
Assumption ASC (Active Set Condition). Letū be a local solution of (P).
Assume that there exists a set I ⊆ Ω, a function w ∈ Y , and positive constants κ, c such that the following holds: 1. (source condition) I ⊃ {x ∈ Ω :p(x) = 0} and 2. (structure of active set) A := Ω \ I and for all ε > 0 This assumption is a combination of a source condition and a regularity assumption on the active sets. Similar regularity assumptions were used in, e.g., [11,17,18,21] for problems with affine-linear control-to-state mapping S. Note that for the special case A = Ω the solutionū is of bang-bang structure. Under this regularity assumption we can establish an improved first order necessary condition, see [11].
Theorem 2.8. Letū satisfy assumption ASC, then there is c > 0 such that it holds

Convergence of the Tikhonov regularization
Let us introduce the Tikhonov regularized optimal control problem associated to (P). Let α > 0 be given. Then the regularized problem reads where y u denotes again the solution of the semi-linear partial differential equation (1.1). Clearly, the regularized problem admits solutions. At first, we want to show that weak limit points of global solutions (u α ) α for α ց 0 are again global solutions of (P). In addition, we show that every strict local solution of (P) can be obtained as a limit of local solutions of (P α ). The results and the proofs are very similar to [4, Section 4], but since the proofs are short we present them here. Lemma 3.1. Let (u α ) α>0 be a family of global solutions of (P α ) such that u α ⇀ u 0 in L 2 (Ω). Then u 0 is a global solution of (P). In addition, u α → u 0 strongly in L 2 (Ω). Moreover, the following identity holds Since u ∈ U ad was arbitrary, it follows that u 0 is a global solution of (P). Let us now prove the strong convergence u α → u 0 in L 2 (Ω). On one hand, we have due to the weakly lower semicontinuity of the norm that On the other hand, using that u 0 is a global solution of (P), we obtain Let now u be a global solution of (P). Then we get which implies u α L 2 (Ω) ≤ u L 2 (Ω) for all α > 0. This shows which finishes the proof. This result shows that weak limit points of global solutions of (P α ) are global solutions of minimal norm of (P). Since this problem is non-convex in general, such minimal norm solutions may not be uniquely determined.
Using the second-order optimality condition and the growth estimate of Theorem 2.7, we can establish the following a-priori error estimate for the states and adjoints. Analogous results were obtained in [18] for the case of a linear state equation.
Proof. Using Theorem 2.7 and the fact that J α (u α ) ≤ J α (ū) we get This implies Using the strong convergence u α →ū, we get which proves the first part of the claim. The second part follows directly from Lemma 2.6.

Convergence rates
The results of Theorems 3.2 and 3.3 provide convergence results and a-priori rates. However, numerical computations reveal that the a-priori rates are suboptimal, see, e.g., the numerical examples in Section 7. In addition, it is hard to guarantee that optimization algorithms deliver globally or locally optimal controls. Hence, we will assume in the subsequent analysis that only stationary points u α of (P α ) are available. Recall that u α is a stationary point if it satisfies Furthermore one observes that in many applications the optimal controlū exhibits a bang-bang structure, as y d is not reachable, i.e., there exists no feasible control u ∈ U ad such that y d = Su. In this section we want to prove convergence rates under our regularity assumption ASC, which is suitable for bang-bang solutions. The regularity assumption ASC was used in [11,17,18,21] to establish convergence rates for an affine-linear control-to-state mapping. First we need some technical results, which will be helpful later on.
Proof. This can be proven following the lines of [1, Corollary 2.8].
The following Lemma is an extension of [1, Lemma 2.7].
Proof. Let us denote the states and adjoints corresponding to u α andū by y α , p α , andȳ,p, respectively. Due to Lemma 2.2 we obtain y α →ȳ and p α →p in L ∞ (Ω). Let us define z α,v := S ′ (u α )v and z v := S ′ (ū)v. According to Lemma 2.4 we can write Here, the absolute value of the first integral can be made smaller than ε/2 z v 2 L 2 (Ω) for α small enough due to y α →ȳ and p α →p in L ∞ (Ω). Let us observe that (z α,v ) is uniformly bounded in Y . It remains to study the difference z α,v − z v . This difference satisfies the differential equation Arguing as in Lemma 2.6 we find Note that the constant c is independent of y α , which is a consequence of the nonnegativity of f ′ . This estimate also implies the existence of c > 0 independent of α such that This shows that the integral can be made smaller than ε/2 z v 2 L 2 (Ω) for α small enough. The following result uses the regularity assumption on the optimal control.
which yields the result.
We now have everything at hand to establish convergence rates for the control. We want to point out, that we only need weak convergence of the sequence (u α ) α .
Theorem 4.4. Letū satisfy Assumption ASC, and let the assumptions of Theorem 2.7 hold forū. Let (u α ) α be a family of stationary points converging weakly in L 2 (Ω) toū for α ց 0. Then it holds with d := min(κ, 1) for α ց 0 In the case w = 0 or A = Ω, these convergences rates are obtained with d := κ.
Proof. By first-order optimality conditions of u α we know (4.9) Due to the Assumption ASC, Theorem 2.8 gives Usingū and u α as test functions in these inequalities and adding them, yields Using Lemma 4.3, we obtain by Young's inequality with C > 0 independent of α. By Taylor expansion, we obtain withũ α between u α andū. Let us argue that u α −ū is in the extended critical cone C τ u . Since u α ⇀ū in L 2 (Ω), it follows from Theorem 2.1, Lemma 2.2, and Lemma 2.6 that p α →p in L ∞ (Ω). Hence, we obtain |αu α + p α | > τ /2 and sign(αu α + p α ) = sign(p) for all α sufficiently small on the set, where |p| > τ is satisfied. If we choose α small enough, then also τ /2 > α max( u a L ∞ , u b L ∞ ) holds. The variational inequality (4.9) implies which yields u α =ū on |p| > τ . Consequently, u α −ū ∈ C τ u holds for all α sufficiently small. Hence, we can apply the second-order condition SOSC onū to obtain J ′′ (ū)(u α −ū) 2 ≥ δ z uα−ū 2 L 2 (Ω) . In addition (see Lemma 4.2), we find that for all α sufficiently small. Collecting the estimates above, we get This yields which proves the claim.
Convergence rates for the state and adjoint state can be now easily obtained. Remark 4.6. The convergence rates obtained in Theorem 4.4 and Corollary 4.5 resemble the rates obtained for the control of a linear partial differential equation, see [15,16], which improved on the results of [18].

Necessity of the regularity condition
In this section we will show that the regularity assumption ASC is necessary to obtain the convergence rates provided by Theorem 4.4. In the case of a linear state equation, such results were obtained in [15,16,20]. As it turns out, these results can be transferred to the nonlinear case with suitable modifications.
Theorem 5.1. Let us assume that {x ∈ Ω :p(x) = 0} ⊂ A c holds for some given set A ⊂ Ω. Furthermore assume that there exists a constant σ > 0 such that Let (u α ) α be a family of stationary points of (P α ). Suppose that for some κ > 1 and all α > 0 sufficiently small. Then there is c > 0 such that the relation is fulfilled for all ε > 0 sufficiently small.
Proof. The proof is analogous to that of the corresponding result [16,Thm. 13].
As this proof only uses the variational inequality (2.7), it can be transferred to our situation without modifications.
Second, we will show that the source condition is satisfied on the inactive set {x ∈ Ω :p(x) = 0} if the convergence rate is sufficiently large. For a related result concerning the regularization of an ill-posed nonlinear operator equation we refer to [10].
Sinceû ∈ U ad was arbitrary with the restrictionû =ū on Ω \ K, this inequality implies If in addition we have y α −ȳ L 2 (Ω) = o(α), then we obtain p α −p Y = o(α). This implies that α −1 (p α −p) converges to zero in L ∞ (Ω). Passing to the limit in (5.10) gives χ Kū = χ K P U ad (0), henceū = 0 holds almost everywhere on K. Then the result of the previous theorem can be written as: There existsẏ ∈ L 2 (Ω) such that Here, L yy denotes the partial derivative of second order of L with respect to y interpreted as a linear and continuous mapping from L 2 (Ω) to L 2 (Ω).
In case of a linear state equation, we obtain L yy = id. In this case, the theorem above reduces to the results obtained in [20].
In addition, the above results resemble results for nonlinear inverse problems from [10]. Under the assumptions U ad = L 2 (Ω) andȳ = y d (exact and attainable data), the source condition reduces tō Here, we used thatȳ = y d impliesp = 0 and L yy (ȳ,ū,p) = id.

Extension to sparse control problems
In this section we consider the problem , and β > 0. The motivation for the additional L 1 -term in the cost functional F is the following. A solution u of (S) is sparse, i.e. large parts ofū are identically zero. The larger β, the smaller the support ofū. One possible application of such a model is the optimal placement of controllers, since in many cases it is not desirable to control the system from the whole domain Ω. Starting with the pioneering work [12], such sparsity related control problems have been studied in, e.g., [20,19,21] for optimal control of linear partial differential equations and [1,3] for the optimal control of semi-linear equations.
In order to simplify the exposition, we assume u a (x) ≤ 0 ≤ u b (x) almost everywhere in Ω. Our aim is to investigate so called bang-bang-off solutions, i.e.,ū ∈ {u a (x), 0, u b (x)} almost everywhere in Ω. The necessary optimality conditions for problem (S) are given by: Aȳ + f (ȳ) =ū in Ω, y = 0 on ∂Ω, (6.12) withλ ∈ ∂ ū L 1 (Ω) . We refer to [1] for proofs. Again we consider the Tikhonov regularization of problem (S) given by The following convergence result can be proven similarly to the related result of Theorem 3.2.
Theorem 6.1. Letū be a strict local solution of (S). Then there exist ρ > 0 and a family (u α ) α∈(0,ᾱ) of local solutions of (S α ) such that u α →ū in L 2 (Ω) and every u α is a global minimum of F α in U ad,ρ := U ad ∩ {v ∈ L 2 (Ω) : v − u L 2 (Ω) ≤ ρ} Since j is not twice differentiable, we follow [1] and consider the modified extended critical cone defined bỹ The second order condition for the sparse control problems reads as follows: Assumption SSC 2 (Sufficient second order condition). Letū ∈ U ad be given. Assume that there exists δ > 0 and τ > 0 such that ∀v ∈C τ u . This second order condition induces local quadratic growth of the cost functional. The next theorem is due to [1,Theorem 3.6].
Theorem 6.2. Let us assume thatū is a feasible control for problem (S) with stateȳ and adjoint statep satisfying the first order optimality conditions (6.12)-(6.14) and the second order condition SSC 2. Then, there exists ε > 0 such that The variational inequality (6.14) implies the following relations betweenū andpū ifp(x) < −β see [1,12]. Hence, we have to modify the regularity assumption ASC to take the influence of the non-smooth term j into account, see also [20,19].
Assumption ASC 2 (Active Set Condition). Letū be a local solution of (P) and assume that there exists a set I ⊆ Ω, a function w ∈ Y , and positive constants κ, c such that the following holds Note that ifū satisfies this condition with A = Ω it exhibits a bang-bang-off structure, and the set {x ∈ Ω : |p(x)| = β} is a set of measure zero. Again we can establish an improved first order necessary condition: Lemma 6.3. Letū satisfy assumption ASC 2, then Proof. We start by using the directional derivative of the L 1 -norm and compute Let ε > 0 be given. We now split the set {|p| > β} and derive Similarly, we can estimate Then the above computations yield Let us note that assumption ASC 2 implies |A \ A ε | ≤ c ε κ . Now, putting everything together, we obtain using the regularity assumption on the active Here c > 1 is a constant independent from u.
proves the claim.
We are now in the position to prove convergence rates. The proof mainly follows the proof of Theorem 4.4.
Theorem 6.4. Letū satisfy Assumption ASC 2 and let the assumptions of Theorem 6.2 hold forū. Let (u α ) α be a family of stationary points converging weakly in L 2 (Ω) toū. Then it holds with d = min(κ, 1) for α ց 0 sufficiently small In the case w = 0 or A = Ω, these convergences rates are obtained with d := κ.
Proof. We split the proof in two parts and consider the two cases u α −ū ∈C τ u and u α −ū ∈C τ u . (1) The case u α −ū ∈C τ u . The optimality conditions for u α andū are given as Note that j is a convex function, hence we have the identity Testing (6.15) and (6.16) withū and u α , respectively, we obtain As the regularity assumptions ASC and ASC 2 only differ in item (ii), Lemma 4.3 is applicable here as well, which gives with Young's inequality with C > 0 independent of α. By Taylor expansion, we obtain withũ α between u α andū. Since u α −ū ∈C τ u we can apply the second-order condition onū to obtain By Lemma 4.2, we find that for all α sufficiently small. Altogether, we obtain This yields which implies the existence of C > 0 such that holds for all α sufficiently small.
Since z uα−ū → 0 in L 2 (Ω), the following inequality is satisfied for all α small enough which implies the claim for the second case.

Numerical examples
In this section we present numerical examples to support our theoretical results. We construct a bang-bang solution for the following optimal control problem: 7.17) subject to −∆y + f (y) = u + e Ω in Ω y = 0 on ∂Ω and −1 ≤ u ≤ 1 a.e. in Ω.
with Ω = (0, 1). To solve the regularized optimal control problem numerically, we use dolfin-adjoint [7,8] with linear finite elements on an equidistant mesh with 10 6 cells. We make use of the adjoint equation It is easy to check that (ū,ȳ,p) is a solution of (7.17). Moreover, Assumption ASC is satisfied with A = Ω and κ = 1, see [5]. We expect to obtain the following convergence rate with respect to the L 2 norm: We test with 3 different nonlinearities The results can be seen in Figure 1, 2 and 3, where we plotted the error u α −ū L 2 (Ω) for solutions u α of the discretized and regularized problem. As expected, the theoretical convergence order is very well resolved.