Optimal control of a non-smooth semilinear elliptic equation

This paper is concerned with an optimal control problem governed by a non-smooth semilinear elliptic equation. We show that the control-to-state mapping is directionally differentiable and precisely characterize its Bouligand subdifferential. By means of a suitable regularization, first-order optimality conditions including an adjoint equation are derived and afterwards interpreted in light of the previously obtained characterization. In addition, the directional derivative of the control-to-state mapping is used to establish strong stationarity conditions. While the latter conditions are shown to be stronger, we demonstrate by numerical examples that the former conditions are amenable to numerical solution using a semi-smooth Newton method.

In this paper, we consider the following non-smooth semilinear elliptic optimal control problem min u ∈L (Ω), y ∈H (Ω) (y, u) s.t. − ∆y + max( , y) = u in Ω, where Ω ⊂ R d , d ∈ N, is a bounded domain and is a smooth objective; for precise assumptions on the data, we refer to Assumption . below. The semilinear PDE in (P) models the de ection of a stretched thin membrane partially covered by water (see [ ]); a similar equation arises in free boundary problems for a con ned plasma; see, e.g., [ , , ].
The salient feature of (P) is of course the occurrence of the non-smooth max-function in the equality constraint in (P). This causes the associated control-to-state mapping u → y to be non-smooth as well, and hence standard techniques for obtaining rst-order necessary optimality conditions that are based on the adjoint of the Gâteaux-derivative of the control-tostate mapping cannot be applied. One remedy to cope with this challenge is to apply generalized . Let us shortly address the notation used throughout the paper. In what follows, Ω always denotes a bounded domain. By 1 M : R d → { , } we denote the characteristic function of a set M ⊂ R d . By λ d we denote the d-dimensional Lebesgue measure. Given a (Lebesgue-)measurable function : Ω → R, we abbreviate the set {x ∈ Ω : (x) = } by { = }; the sets { > } and { < } are de ned analogously. Note that in what follows, we always work with the notion of Lebesgue measurability (e.g., when talking about L p -spaces or representatives), although we could equivalently work with Borel measurability here. As usual, the Sobolev space H (Ω) is de ned as the closure of C ∞ c (Ω) with respect to the H -norm. Moreover, we de ne the space Y := {y ∈ H (Ω) : ∆y ∈ L (Ω)}.
Equipped with the scalar product , it is separable as well. Note that the solution operator S : u → y associated with the PDE −∆y + max( , y) = u in (P) is bijective as a function from L (Ω) to Y (cf. Proposition . below). The space Y is thus the natural choice for the image space of the control-to-state mapping appearing in problem (P). If the boundary ∂Ω possesses enough regularity (a C , -boundary would be su cient here), then Y is isomorphic to H (Ω) ∩ H (Ω) by the classical regularity theory for the Laplace operator, cf. [ , Lem. . ]. With a little abuse of notation, in what follows we will denote the Nemytskii operator induced by the max-function (with di erent domains and ranges) by the same symbol. In the same way, we will denote by max (y; h) the directional derivative of y → max( , y) in the point y in direction h, both considered as a scalar function and as the corresponding Nemytskii operator.
Throughout the paper, we will make the following standing assumptions.
Assumption . . The set Ω ⊂ R d , d ∈ N, is a bounded domain. The objective functional : Y × L (Ω) → R in (P) is weakly lower semi-continuous and continuously Fréchet-di erentiable.
Note that we do not impose any regularity assumptions on the boundary of Ω.
We start the discussion of the optimal control problem (P) by investigating its PDE constraint, showing that it is uniquely solvable and that the associated solution operator is directionally di erentiable.
Moreover, the solution operator S : u → y associated with (PDE) is well-de ned and globally Lipschitz continuous as a function from L (Ω) to Y .
Proof. The arguments are standard. First of all, Browder and Minty's theorem on monotone operators yields the existence of a unique solution in H (Ω). If u ∈ L (Ω), then a simple bootstrapping argument implies y ∈ Y . To prove the Lipschitz continuity of the solution mapping S, we consider two arbitrary but xed u , u ∈ L (Ω) with associated solutions y := S(u ) and y := S(u ). Using again the monotonicity, we obtain straightforwardly that y − y H ≤ C u − u L holds with some absolute constant C > . From (PDE) and the global Lipschitz continuity of the max-function, we now infer The above shows that S is even globally Lipschitz as a function from L (Ω) to Y and completes the proof.
Theorem . (directional derivative of S). Let u, h ∈ L (Ω) be arbitrary but xed, set y := S(u) ∈ Y , and let δ h ∈ Y be the unique solution to Then it holds h n h in L (Ω), t n → + =⇒ S(u + t n h n ) − S(u) t n δ h in Y and h n → h in L (Ω), t n → + =⇒ S(u + t n h n ) − S(u) t n → δ h in Y .
Now let u, h ∈ L (Ω) be arbitrary but xed and let (t n ) ⊂ ( , ∞) and (h n ) ⊂ L (Ω) be sequences with t n → and h n h in L (Ω). We abbreviate y n := S(u + t n h n ) ∈ Y . Subtracting the equations for y and δ h from the one for y n yields Testing this equation with (y n − y)/t n − δ h and using the monotonicity of the max-operator, we obtain that there exists a constant C > independent of n with . Now the compactness of L (Ω) → H − (Ω) and the directional di erentiability of max : L (Ω) → L (Ω) (which directly follows from the directional di erentiability of max : R → R and Lebesgue's dominated convergence theorem) give As max : L (Ω) → L (Ω) is also Lipschitz continuous and thus Hadamard-di erentiable, Hence, ( . ) yields that the sequence (y n − y)/t n − δ h is bounded in Y and thus (possibly after transition to a subsequence) converges weakly in Y . Because of ( . ), the weak limit is zero and therefore unique so that the whole sequence converges weakly to zero. This implies the rst assertion. If now h n converges strongly to h in L (Ω), then ( . ), ( . ) and ( . ) yield ∆((y n − y)/t n − δ h ) → in L (Ω). From the de nition of the norm · Y , it now readily follows that (y n − y)/t n − δ h → in Y . This establishes the second claim.
Theorem . allows a precise characterization of points where S is Gâteaux-di erentiable. This will be of major importance for the study of the Bouligand subdi erentials in the next section.
Corollary . (characterization of Gâteaux-di erentiable points). Let u ∈ L (Ω) be arbitrary but xed. Then the following are equivalent: If one of the above holds true, then the directional derivative δ h = S (u; h) ∈ Y in a direction h ∈ L (Ω) is uniquely characterized as the solution to Proof. In view of ( . ), it is clear that if λ d ({y = }) = , then S is Gâteaux-di erentiable in u and the Gâteaux derivative is the solution operator for ( . ). Further, (ii) trivially implies (iii). It remains to prove that (iii) implies (i). To this end, we note that if S (u; h) = −S (u; −h) for all h ∈ L (Ω), then ( . ) implies for all h ∈ L (Ω). Consider now a function ψ ∈ C ∞ (R d ) with ψ > in Ω and ψ ≡ in R d \ Ω, whose existence is ensured by Lemma . . Since h ∈ L (Ω) was arbitrary, we are allowed to choose such that S (u; h) = ψ by virtue of ( . ). Consequently, we obtain from ( . ) that 1 {y = } ψ = .
This section is devoted to the main result of our work, namely the precise characterization of the Bouligand subdi erentials of the PDE solution operator S from Proposition . .

.
We start with the rigorous de nition of the Bouligand subdi erential. In the spirit of [ , Def. . ], it is de ned as the set of limits of Jacobians of di erentiable points. However, in in nite dimensions, we have of course to distinguish between di erent topologies underlying this limit process, as already mentioned in the introduction. This gives rise to the following Definition . (Bouligand subdi erentials of S). Let u ∈ L (Ω) be given. Denote the set of smooth points of S by In what follows, we will frequently call points in D Gâteaux points.
(i) The weak-weak Bouligand subdi erential of S in u is de ned by (ii) The weak-strong Bouligand subdi erential of S in u is de ned by (iii) The strong-weak Bouligand subdi erential of S in u is de ned by (iv) The strong-strong Bouligand subdi erential of S in u is de ned by Remark . . Based on the generalization of Rademacher's theorem to Hilbert spaces (see [ , Thm. . ]) and the generalization of Alaoglu's theorem to the weak operator topology, one can show that ∂ ww B S(u) and ∂ sw B S(u) are non-empty for every u ∈ L (Ω); see also [ ]. In contrast to this, it is not clear a priori if ∂ ws B S(u) and ∂ ss B S(u) are non-empty, too. However, Theorem . at the end of this section will imply this as a byproduct.
From the de nitions, we obtain the following useful properties. Lemma . .
Proof. Parts (i) and (ii) immediately follow from the de nition of the Bouligand subdi erentials (to see (ii), just choose u n := u for all n). In order to prove part (iii), observe that the de nition of ∂ ww B S(u) implies the existence of a sequence of Gâteaux points u n ∈ L (Ω) such that u n u in L (Ω) and S (u n )h Gh in Y for all h ∈ L (Ω). For each n ∈ N, the global Lipschitz continuity of S according to Proposition . immediately gives S (u n ) L(L (Ω),Y ) ≤ L. Consequently, the weak lower semi-continuity of the norm implies This yields the claim.
Remark . . The Bouligand subdi erentials ∂ ww B S(u) and ∂ sw B S(u) do not change if the condition "S (u n )h G h in Y for all h ∈ L (Ω)" in De nition . (i) and (iii) is replaced with either "S (u n )h → G h in Z for all h ∈ L (Ω)" or "S (u n )h G h in Z for all h ∈ L (Ω)", where Z is a normed linear space such that Y is compactly embedded into Z , e.g., Z = H (Ω) or Z = L (Ω). This can be seen as follows: Suppose that a sequence (u n ) ⊂ D is given such that S (u n )h G h holds in Z for all h ∈ L (Ω). Then, by Lemma . (iii), we can nd for every h ∈ L (Ω) a subsequence (u n k ) such that (S (u n k )h) converges weakly in Y . From the weak convergence in Z , we obtain that the weak limit has to be equal to G h independently of the chosen subsequence. Consequently, S (u n )h G h in Y for all h ∈ L (Ω), and we arrive at our original condition. If, conversely, we know that S (u n )h G h holds in Y for all h ∈ L (Ω), then, by the compactness of the embedding Y → Z , it trivially holds S (u n )h → G h in Z for all h ∈ L (Ω). This yields the claim. The case of S (u n )h → G h in Z proceeds analogously.
Next, we show closedness properties of the two strong subdi erentials.
Proposition . (strong-strong-closedness of ∂ ss B S). Let u ∈ L (Ω) be arbitrary but xed. Suppose that (i) u n ∈ L (Ω) and G n ∈ ∂ ss B S(u n ) for all n ∈ N, Then G is an element of ∂ ss B S(u).
Proof. The de nition of ∂ ss B S(u n ) implies that for all n ∈ N, one can nd a sequence (u m,n ) ⊂ L (Ω) of Gâteaux points with associated derivatives G m,n := S (u m,n ) such that u m,n → u n in L (Ω) as m → ∞ and . Because of the convergences derived above, it moreover follows that for all n ∈ N, there exists an m n ∈ N with Consider now a xed but arbitrary h ∈ L (Ω), and de ne Then the density property of {w k } ∞ k = implies h * n → h in L (Ω) as n → ∞, and we may estimate where the boundedness of G m n ,n − G n L(L ,Y ) follows from Lemma . (iii). The above proves that for all h ∈ L (Ω), we have G m n ,n h → Gh in Y . Since h ∈ L (Ω) was arbitrary and the Gâteaux points u m n ,n satisfy u m n ,n → u in L (Ω) as n → ∞ by ( . ) and our assumptions, the claim follows from the de nition of ∂ ss B S(u).
Proposition . (strong-weak-closedness of ∂ sw B S). Let u ∈ L (Ω) be arbitrary but xed. Assume that: (i) u n ∈ L (Ω) and G n ∈ ∂ sw B S(u n ) for all n ∈ N, Then G is an element of ∂ sw B S(u).
Proof. As in the proof before, for all n ∈ N the de nition of ∂ sw B S(u n ) implies the existence of a sequence of Gâteaux points u m,n ∈ L (Ω) with associated derivatives G m,n := S (u m,n ) such that u m,n → u n in L (Ω) as m → ∞ and Now the compact embedding of Y in H (Ω) gives that G m,n h → G n h in H (Ω) as m → ∞, and we can argue exactly as in the proof of Proposition . to show that there is a diagonal sequence of Gâteaux points u m n ,n such that u m n ,n → u in L (Ω) and G m n ,n h → Gh in H (Ω) for every h ∈ L (Ω).
( . ) On the other hand, by Lemma . (iii), the operators G m n ,n are uniformly bounded in L(L (Ω); Y ). Therefore, for an arbitrary but xed h ∈ L (Ω), the sequence G m n ,n h Y is bounded in Y , so that a subsequence converges weakly to some η ∈ Y . Because of ( . ), η = Gh and the uniqueness of the weak limit implies the weak convergence of the whole sequence in Y . As h was arbitrary, this implies the assertion.
. This section is devoted to an explicit characterization of the di erent subdi erentials in De nition . without the representation as (weak) limits of Jacobians of sequences of Gâteaux points. We start with the following lemma, which will be useful in the sequel.
Lemma . . Assume that (i) j : R → R is monotonically increasing and globally Lipschitz continuous, in Ω for all n ∈ N and χ n * χ in L ∞ (Ω) for some χ ∈ L ∞ (Ω), Then it holds that w n w in Y , and if we additionally assume that χ n → χ pointwise a.e. and u n → u strongly in L (Ω), then we even have w n → w strongly in Y .
Proof. First note that due to the monotonicity and the global Lipschitz continuity of j, the equations ( . ) and ( . ), respectively, admit unique solutions in Y by the same arguments as in the proof of Proposition . . Moreover, due to the weak and weak- * convergence, the sequences (u n ) and (χ n ) are bounded in L (Ω) and L ∞ (Ω), respectively, so that (w n ) is bounded in Y . Hence there exists a weakly converging subsequence -w.l.o.g. denoted by the same symbol -such that w n η in Y and, by the compact embedding Y → H (Ω), w n → η strongly in H (Ω). Together with the weak convergence of u n , this allows passing to the limit in ( . ) to deduce that η satis es −∆η + χη + j(η) = u.
As the solution to this equation is unique, we obtain η = w. The uniqueness of the weak limit now gives convergence of the whole sequence, i.e., w n w in Y . To prove the strong convergence under the additional assumptions, note that the di erence w n − w satis es For the rst term on the right-hand side of ( . ), we have u n → u in L (Ω) by assumption. The second term in ( . ) is estimated by The rst term in ( . ) converges to zero due to w n w in Y and the compact embedding, while the convergence of the second term follows from the pointwise convergence of χ n in combination with Lebesgue's dominated convergence theorem. The global Lipschitz continuity of j and the strong convergence of w n → w in L (Ω) nally also give j(w n ) → j(w) in L (Ω). Therefore, the right-hand side in ( . ) converges to zero in L (Ω). As −∆ induces the norm on Y , we thus obtain the desired strong convergence.
By setting j(x) = max( , x) and χ n ≡ χ ≡ , we obtain as a direct consequence of the preceding lemma the following weak continuity of S.
Corollary . . The solution operator S : L (Ω) → Y is weakly continuous, i.e., We will see in the following that all elements of the subdi erentials in De nition . have a similar structure. To be precise, they are solution operators of linear elliptic PDEs of a particular form.
Definition . (linear solution operator G χ ). Given a function χ ∈ L ∞ (Ω) with χ ≥ , we de ne the operator G χ ∈ L(L (Ω), Y ) to be the solution operator of the linear equation We rst address necessary conditions for an operator in L(L (Ω), Y ) to be an element of the Bouligand subdi erentials. Afterwards we will show that these conditions are also su cient, which is more involved compared to their necessity.
Proposition . (necessary condition for ∂ ww B S(u)). Let u ∈ L (Ω) be arbitrary but xed and set y := S(u). Then for every G ∈ ∂ ww B S(u) there exists a unique χ ∈ L ∞ (Ω) satisfying ≤ χ ≤ a.e. in Ω, χ = a.e. in {y > }, and χ = a.e. in {y < } ( . ) Proof. If G ∈ ∂ ww B S(u) is arbitrary but xed, then there exists a sequence of Gâteaux points u n ∈ L (Ω) such that u n u in L (Ω) and S (u n )h Gh in Y for all h ∈ L (Ω). Now, let h ∈ L (Ω) be arbitrary but xed and abbreviate y n := S(u n ), δ h,n := S (u n )h, and χ n := 1 {y n > } . Then we know from Corollary . that y n y in Y and from Corollary . that δ h,n = G χ n h. Moreover, from the Banach-Alaoglu Theorem it follows that -after transition to a subsequence (which may be done independently of h) -it holds that χ n * χ in L ∞ (Ω). From the weak- * closedness of the set {ξ ∈ L ∞ (Ω) : ≤ ξ ≤ a.e. in Ω} we obtain that ≤ χ ≤ holds a.e. in Ω. Further, the de nition of χ n and the convergences χ n * χ in L ∞ (Ω) and y n → y in L (Ω) Due to the sign of the integrand, the above yields χ min( , y) − ( − χ ) max( , y) = a.e. in Ω, and this entails χ = a.e. in {y > } and χ = a.e. in {y < }. This shows that χ satis es ( . ). From Lemma . , we may deduce that δ h,n G χ h in Y . We already know, however, that Gh in Y . Consequently, since h was arbitrary, G = G χ , and the existence claim is proven. It remains to show that χ is unique. To this end, assume that there are two di erent functions χ,χ ∈ L ∞ (Ω) with G = G χ = Gχ . If we then consider a function ψ ∈ C ∞ (R d ) with ψ > in Ω and ψ ≡ in R d \ Ω (whose existence is ensured by Lemma . ) and de ne h ψ := −∆ψ + χψ ∈ L (Ω), then we obtain ψ = G χ h ψ = Gχ h ψ , which gives rise to Subtraction now yields (χ −χ )ψ = a.e. in Ω and, since ψ > in Ω, this yields χ ≡χ .
Proposition . (necessary condition for ∂ ws B S(u)). Let u ∈ L (Ω) be arbitrary but xed with y = S(u). Then for every G ∈ ∂ ws B S(u) there exists a unique function χ ∈ L ∞ (Ω) satisfying , the preceding proposition yields that there is a unique function χ satisfying ( . ) such that G = G χ . It remains to prove that χ only takes values in { , }. To this end, rst observe that the de nition of ∂ ws B S(u) implies the existence of a sequence of Gâteaux points (u n ) ⊂ L (Ω) such that u n u in L (Ω) and S (u n )h → Gh in Y for every h ∈ L (Ω), where, according to Corollary . , S (u n ) = G χ n with χ n := 1 {y n > } . As in the proof of Proposition . , we choose the special direction h ψ := −∆ψ + χψ ∈ L (Ω), where ψ ∈ C ∞ (R d ) is again a function with ψ > in Ω and ψ ≡ in R d \ Ω. Then Gh ψ = ψ , and the strong convergence of G χ n h ψ to Gh ψ in Y allows passing to a subsequence to obtain ∆G χ n h ψ → ∆ψ and G χ n h ψ → ψ pointwise a.e. in Ω. From the latter, it follows that for almost all x ∈ Ω there exists an N ∈ N (depending on x) with G χ n h ψ (x) > for all n ≥ N , and consequently But, as χ n takes only the values and for all n ∈ N, pointwise convergence almost everywhere is only possible if χ ∈ { , } a.e. in Ω. This proves the claim.
As an immediate consequence of the last two results, we obtain: Proof. The inclusion ⊇ was already proved in Lemma . . The reverse inclusion follows immediately from Propositions . and . , and the fact that in a Gâteaux point there necessarily holds Remark . . Note that even in nite dimensions, the Bouligand and the Clarke subdi erential can contain operators other than the Gâteaux derivative in a Gâteaux point; see, e.g., [ , Ex. . . ]. Thus, Corollary . shows that, in spite of its non-di erentiability, the solution operator S is comparatively well-behaved.
Remark . . Similarly to Theorem . , where the directional derivative of the max-function appears, Propositions . and . show that elements of ∂ ww B S(u) and ∂ ws B S(u) are characterized by PDEs which involve a pointwise measurable selection χ of the set-valued a.e.-de ned functions and ∂ c max( , ·) denote the Bouligand and the convex subdi erential of the function R x → max( , x) ∈ R, respectively. Now that we have found necessary conditions that elements of the subdi erentials ∂ ws B S(u) and ∂ ww B S(u) have to ful ll, we turn to su cient conditions which guarantee that a certain linear operator is an element of these subdi erentials. Here we focus on the subdi erentials ∂ ss B S(u) and ∂ sw B S(u). It will turn out that a linear operator is an element of these subdi erentials if it is of the form G χ with χ as in ( . ) and ( . ), respectively. Thanks to Lemma . (i) and the necessary conditions in Propositions . and . , this will nally give a sharp characterization of all Bouligand subdi erentials in De nition . ; see Theorem . below. We start with the following preliminary result.
Lemma . . Let u ∈ L (Ω) be arbitrary but xed and write y : ( . ) Then G χ as in De nition . is an element of the strong-strong Bouligand subdi erential ∂ ss B S(u).
Proof. We have to construct sequences of Gâteaux points converging strongly to u such that also the corresponding Gâteaux derivatives in an arbitrary direction h ∈ L (Ω) converge strongly in Y to G χ h. For this purpose, set y ε := y + εφ, ε ∈ ( , ), and u ε := −∆y ε + max( , y ε ) ∈ L (Ω).
Then we obtain S(u ε ) = y ε , y ε → y in Y and u ε → u in L (Ω) as ε → . Choose now arbitrary but xed representatives of y and φ and de ne Z := {y = } ∩ {φ = }. Then for all ε ε and all x ∈ Ω, we have i.e., the sets in the collection ({y + εφ = } \ Z ) ε ∈( , ) are disjoint (and obviously Lebesgue measurable). Furthermore, the underlying measure space (Ω, L(Ω), λ d ) is nite. Thus, we may apply Lemma . to obtain a λ -zero set N ⊂ ( , ) such that Consider now an arbitrary but xed sequence (ε n ) ⊂ E with ε n → as n → ∞ and x for the time being a direction h ∈ L (Ω).
where χ is as de ned in ( . ). Note that according to assumption ( . ), the case y(x) = φ(x) = is negligible here. Using Lemma . , we now obtain that δ ε n = S (u ε n )h converges strongly in Y to G χ h. Since h ∈ L (Ω) was arbitrary, this proves the claim.
In the following, we successively sharpen the assertion of Lemma . by means of Lemma . and the approximation results for characteristic functions proven in Appendix .
( . ) Moreover, due to Lemma . , we know that G χ n ∈ ∂ ss B S(u) for all n ∈ N and, from Lemma . and ( . ), we obtain G χ n h → G χ h strongly in Y for all h ∈ L (Ω). Proposition . now gives G χ ∈ ∂ ss B S(u) as claimed.
Proof. We again proceed in two steps: (i) If χ is a simple function of the form χ := K k= c k 1 B k with c k ∈ ( , ] for all k, K ∈ N, and B k ⊆ Ω Lebesgue measurable and mutually disjoint, then we know from Lemma . (iv) that there exists a sequence of Lebesgue measurable sets A n ⊆ Ω such that 1 A n * χ in L ∞ (Ω). In view of ( . ), this yields so that, by Lemma . , we obtain G χ n h G χ h in Y for all h ∈ L (Ω). Moreover, from Proposition . , we already know that G χ n ∈ ∂ ss B S(u) ⊆ ∂ sw B S(u) for all n ∈ N. Therefore Proposition . gives G χ ∈ ∂ sw B S(u) as claimed.
(ii) For an arbitrary but xed χ ∈ L ∞ (Ω) satisfying ( . ), measurability implies the existence of a sequence of simple functions satisfying ( . ) and converging pointwise a.e. to χ . (Note that the pointwise projection of a simple function onto the set of functions satisfying ( . ) remains simple as y is xed and measurable.) Since pointwise a.e. convergence and a uniform bound in L ∞ (Ω) imply weak- * convergence in L ∞ (Ω), we can now apply (i) and again Lemma . and Proposition . to obtain the claim.
Thanks to Lemma . (i), the necessary conditions in Propositions . and . , respectively, in combination with the su cient conditions in Propositions . and . , respectively, immediately imply the following sharp characterization of the Bouligand subdi erentials of S.
(ii) It holds ∂ ww B S(u) = ∂ sw B S(u). Moreover, G ∈ ∂ sw B S(u) if and only if there exists a function χ ∈ L ∞ (Ω) satisfying ≤ χ ≤ a.e. in Ω, χ = a.e. in {y > }, and χ = a.e. in {y < } such that G = G χ . Furthermore, for each G ∈ ∂ sw B S(u) the associated χ is unique. Remark . . Theorem . shows that it does not matter whether we use the weak or the strong topology for the approximating sequence u n ∈ L (Ω) in the de nition of the subdi erential; only the choice of the operator topology makes a di erence. We further see that the elements of the strong resp. weak Bouligand subdi erential are precisely those operators G χ generated by a function χ ∈ L ∞ (Ω) that is obtained from a pointwise measurable selection of the Bouligand resp. convex subdi erential of the max-function, cf. Remark . . Note that for all χ , χ ∈ L ∞ (Ω) with χ ≥ , χ ≥ a.e. in Ω, all α ∈ [ , ], and all η ∈ Y , it holds This implies that the set {G − : G ∈ ∂ sw B S(u)} is convex and contains the convex hull of the set {G − : G ∈ ∂ ss B S(u)}. We point out that the convex combination of two elements of, e.g., ∂ sw B S(u) is typically not an element of ∂ sw B S(u) (due to the bilinear term χη in the de nition of G χ ). The above "convexi cation e ect" appears only when we consider the inverse operators.
In this section we turn our attention back to the optimal control problem (P), where we are mainly interested in the derivation of rst-order necessary optimality conditions involving dual variables. Due to the non-smoothness of the control-to-state mapping S caused by the max-function in (PDE), the standard procedure based on the adjoint of the Gâteaux derivative of S cannot be applied. Instead, regularization and relaxation methods are frequently used to derive optimality conditions as in, e.g., [ ]. We will follow the same approach and derive an optimality system in this way in the next subsection. Since the arguments are rather standard, we keep the discussion concise; the main issue here is to carry out the passage to the limit in the topology of Y . We again emphasize that the optimality conditions themselves are not remarkable at all. However, in Section . , we will give a new interpretation of the optimality system arising through regularization by means of the Bouligand subdi erentials from Section (cf. Theorem . ), which is the main result of this section.
. For the rest of this section, letū ∈ L (Ω) be an arbitrary local minimizer for (P). We follow a widely used approach (see, e.g., [ ]) and de ne our regularized optimal control problem as with a regularized version of the max-function satisfying the following assumptions.
There are numerous possibilities to construct families of functions satisfying Assumption . ; we only refer to the regularized max-functions used in [ , ]. As for the max-function, we will denote the Nemytskii operator associated with max ε by the same symbol.
Lemma . . For every u ∈ L (Ω), there exists a unique solution y ε ∈ Y of the PDE in (P ε ). The associated solution operator S ε : L (Ω) → Y is weakly continuous and Fréchet-di erentiable. Its derivative at u ∈ L (Ω) in direction h ∈ L (Ω) is given by the unique solution δ ∈ Y to where y ε = S ε (u).
Proof. The arguments are standard. The monotonicity of max ε by Assumption . (iii) yields the existence of a unique solution, and bootstrapping implies that this is an element of Y . The weak continuity of S ε follows from Lemma . in exactly the same way as Corollary . . Due to Assumption . (i) and (iii), the Nemytskii operator associated with max ε is continuously Fréchetdi erentiable from H (Ω) to L (Ω) and, owing to the non-negativity of max ε , the linearized equation in ( . ) admits a unique solution δ ∈ Y for every h ∈ L (Ω). The implicit function theorem then yields the claimed di erentiability.
Lemma . . There exists a constant c > such that for all u ∈ L (Ω), there holds Moreover, for every sequence u n ∈ L (Ω) with u n → u in L (Ω) and every sequence ε n → + , there exists a subsequence (n k ) k ∈N and an operator G ∈ ∂ sw B S(u) such that Proof. Given u ∈ L (Ω), let us set y := S(u) and y ε := S ε (u). Then it holds that Testing this equation with y − y ε and employing the monotonicity of the max-operator and Assumption . (ii) gives y − y ε H (Ω) ≤ c ε. Then, thanks to the Lipschitz continuity of the max-function and again Assumption . (ii), a bootstrapping argument completely analogous to that in the proof of Proposition . yields ( . ), cf. ( . ).
To obtain the second part of the lemma, let (u n ) ⊂ L (Ω) and (ε n ) ⊂ ( , ∞) be sequences with u n → u in L (Ω) and ε n → + . Then ( . ) and Proposition . imply as n → ∞, i.e., y n := S ε n (u n ) → y := S(u) in Y . Now, given an arbitrary but xed direction h ∈ L (Ω), we know that the derivative δ n := S ε n (u n )h is characterized by −∆δ n + max ε n (y n )δ n = h.
Then, due to y n → y pointwise a.e. in Ω (at least for a subsequence) and Assumption . (iii) and (i ), there is a subsequence (not relabeled for simplicity) such that in Ω, χ = a.e. in {y > }, and χ = a.e. in {y < }.
Note that the transition to a subsequence above is independent of h. Using Lemma . and Theorem . then yields the second claim.
Proof. Based on the previous results, the proof follows standard arguments, which we brie y sketch for the convenience of the reader. We introduce the reduced objective functional associated with (P ε ) as and consider the following auxiliary optimal control problem min u ∈L (Ω) F ε (u) where r > is the radius of local optimality ofū. Thanks to the weak continuity of S ε by Lemma . and the weak lower semi-continuity of by Assumption . , the direct method of the calculus of variations immediately implies the existence of a global minimizerū ε ∈ L (Ω) of (P ε,r ). Note that due to the continuous Fréchet-di erentiability of , the global Lipschitz continuity of S, and ( . ), there exists (after possibly reducing the radius r ) a constant C > such that for all su ciently small ε > , it holds As a consequence, we obtain (with the same constant) Thus, for every ε > su ciently small, any global solutionū ε of (P ε,r ) must necessarily satisfy In particular, for ε small enough,ū ε is in the interior of the r -ball aroundū and, as a global solution of (P ε,r ), is also a local one of (P ε ). It therefore satis es the rst-order necessary optimality conditions of the latter, which, thanks to the chain rule and Lemma . , read From Lemma . we obtain that there exists a sequence ε n → + and an operator G ∈ ∂ sw B S(ū) such that Further, we deduce from ( . ), the global Lipschitz continuity of S, and ( . ), that S ε n (ū ε n ) → S(ū) in Y . Combining all of the above and using our assumptions on , we can pass to the limit ε n → in ( . ) to obtain By setting p := G * ∂ y (S(ū),ū), this together with Theorem . and Remark . nally proves the claim.
Proof. According to De nition . , G χ is the solution operator of ( . ), which is formally selfadjoint. Thus we can argue as in [ , Sec. . ] to deduce ( . a) and the H -regularity of p. The Y -regularity is again obtained by bootstrapping. .
In classical non-smooth optimization, optimality conditions of the form ∈ ∂ * f (x), where ∂ * f denotes one of the various subdi erentials of f , frequently appear when a function f : X → R is minimized over a normed linear space X ; we only refer to [ , Secs. and ] and the references therein. With the help of the results of Section (in particular Theorem . ), we are now in the position to interpret the optimality system in ( . ) in this spirit. To this end, we rst consider the reduced objective and establish the following result for its Bouligand subdi erential.
Proposition . (chain rule). Let u ∈ L (Ω) be arbitrary but xed and let F : L (Ω) → R be the reduced objective for (P) de ned by F (u) := (S(u), u). Moreover, set y := S(u). Then it holds such that F is Gâteaux in u n for all n ∈ N and F (u n ) w in L (Ω) as n → ∞}.
Proof. Let u ∈ L (Ω) and G ∈ ∂ sw B S(u) be arbitrary but xed. Then the de nition of ∂ sw B S(u) guarantees the existence of a sequence u n ∈ L (Ω) of Gâteaux points with u n → u in L (Ω) and S (u n )h Gh in Y for all h ∈ L (Ω). Since is Fréchet-and thus Hadamard-di erentiable and so is S by Theorem . , we may employ the chain rule to deduce that F is Gâteaux-di erentiable in the points u n ∈ L (Ω) with derivative F (u n ) = S (u n ) * ∂ y (y n , u n ) + ∂ u (y n , u n ) ∈ L (Ω) for y n := S(u n ). As y n → y in Y by Proposition . and : Y × L (Ω) → R is continuously Fréchet-di erentiable by Assumption . , we obtain for every h ∈ L (Ω) that Since h ∈ L (Ω) was arbitrary, this proves G * ∂ y (y, u) + ∂ u (y, u) ∈ ∂ B F (u).
With the above result, we can now relate the optimality conditions obtained via regularization to the Bouligand subdi erential of the reduced objective, and in this way rate the strength of the optimality system in ( . ).

Theorem . (limit optimality system implies Bouligand-stationarity). It holds:
u is locally optimal for (P) ⇓ there exist χ ∈ L ∞ (Ω) and p ∈ L (Ω) such that ( . ) holds ⇓ u is Bouligand-stationary for (P) in the sense that ∈ ∂ B F (u) ⇓ u is Clarke-stationary for (P) in the sense that ∈ ∂ C F (u) Here, ∂ C F (u) denotes the Clarke subdi erential as de ned in [ , Sec. . ].
Proof. The rst two implications immediately follow from Theorem . and Proposition . . For the third implication, observe that the weak closedness of ∂ C F (u) (see [ , Prop. . . b]) and Remark . . The above theorem is remarkable for several reasons: (i) Theorem . shows that ∈ ∂ B F (u) is a necessary optimality condition for the optimal control problem (P). This is in general not true even in nite dimensions, as the minimization of the absolute value function shows.
(ii) The above shows that the necessary optimality condition in Theorem . , which is obtained by regularization, is comparatively strong. It is stronger than Clarke-stationarity and even stronger than Bouligand-stationarity (which is so strong that it does not even make sense in the majority of problems).
Remark . . It is easily seen that the limit analysis in Section . readily carries over to control constrained problems involving an additional constraint of the form u ∈ U ad for a closed and convex U ad ⊂ L (Ω). The optimality system arising in this way is identical to ( . ) except for ( . c), which is replaced by the variational inequality The interpretation of the optimality system arising in this way in the spirit of Theorem . is, however, all but straightforward, as it is not even clear how to de ne the Bouligand subdi erential of the reduced objective in the presence of control constraints. Intuitively, one would choose the approximating sequences in the de nition of ∂ B F from the feasible set U ad , but then the arising subdi erential could well be empty. This gives rise to future research.
. Although comparatively strong, the optimality conditions in Theorem . are not the most rigorous ones, as we will see in the sequel. To this end, we apply a method of proof which was developed in [ ] for optimal control problems governed by non-smooth semilinear parabolic PDEs and inspired by the analysis in [ , ]. We begin with an optimality condition without dual variables.
Proposition . . The strong stationarity conditions are equivalent to the purely primal optimality conditions, i.e.,ū ∈ L (Ω) together with its stateȳ and a multiplier χ and an adjoint state p satis es ( . ) if and only if they also ful ll the variational inequality ( . ).
Remark . . As in case of the optimality system ( . ), the regularity of the adjoint state in Theorem . is again only limited by the mapping and di erentiability properties of the objective functional. Thus, arguing as in Corollary . , one shows that if is di erentiable from H (Ω) × L (Ω) or L (Ω) × L (Ω) to R, the adjoint state p satisfying ( . a) is an element of H (Ω) or Y , respectively.
Remark . . Although the optimality system ( . ) is comparatively strong by Theorem . , it provides less information compared to the strong stationarity conditions in ( . ) since it lacks the sign condition ( . c) for the adjoint state. The conditions ( . ) can be seen as the most rigorous dual-multiplier based optimality conditions, as by Proposition . they are equivalent to the purely primal condition. We point out, however, that the method of proof of Theorem . can in general not be transferred to the case with additional control constraints (e.g., u ∈ U ad for a closed and convex set U ad ), since it requires the set {S (ū; h) : h ∈ cone(U ad −ū)} to be dense in L (Ω). In contrast to this, the adaptation of the limit analysis in Section . to the case with additional control constraints is straightforward as mentioned in Remark . .
One particular advantage of the optimality system in ( . ) is that it seems amenable to numerical solution as we will demonstrate in the following. We point out, however, that we do not present a comprehensive convergence analysis for our algorithm to compute stationary points satisfying ( . ) but only a feasibility study. For the sake of presentation, we consider here an L tracking objective of the form with a given desired state y d ∈ L (Ω) and a Tikhonov parameter α > .
. Let us start with a short description of our discretization scheme, where we restrict ourselves from now on to the case Ω ⊂ R . For the discretization of the state and the control variable, we use standard continuous piecewise linear nite elements (FE), cf., e.g., [ ]. Let us denote by V h ⊂ H (Ω) the associated FE space spanned by the standard nodal basis functions φ , . . . , φ n . The nodes of the underlying triangulation T h belonging to the interior of the domain Ω are denoted by x , . . . , x n . We then discretize the state equation in (P) by employing a mass lumping scheme for the non-smooth nonlinearity. Speci cally, we consider the discrete state equation where y h , u h ∈ V h denote the FE-approximations of y and u. With a slight abuse of notation, we from now on denote the coe cient vectors (y h (x i )) n i= and (u h (x i )) n i= by y, u ∈ R n . The discrete state equation can then be written as the nonlinear algebraic equation where A := ( ∫ Ω ∇φ i · ∇φ j dx) n i j= ∈ R n×n and M := ( ∫ Ω φ i φ j dx) n i j= ∈ R n×n denote sti ness and mass matrix, max( , .) : R n → R n is the componentwise max-function, and with ω i = | supp(φ i )| denoting the lumped mass matrix. Due to the monotonicity of the maxoperator, one easily shows that ( . ) and ( . ) admit a unique solution for every control vector u. The objective functional is discretized by means of a suitable interpolation operator I h (e.g., the Clément interpolator or, if y d ∈ C(Ω), the Lagrange interpolator). If -again by the abuse of notation -we denote the coe cient vector of I h y d with respect to the nodal basis by y d , we end up with the discretized objective . We now present two di erent examples with a constructed exact solution to ( . ) in order to demonstrate convergence of the proposed algorithm and to illustrate the dependence on the parameters α and γ as well as on the mesh size h.
In both examples, the state vanishes in parts of the domain so that the non-smoothness of the max-function becomes apparent. For this purpose, we introduce an additional inhomogeneity in the state equation, i.e., we replace the PDE in (P) by y ∈ H (Ω), −∆y + max( , y) = u + f in Ω with a given function f ∈ L (Ω). It is easy to see that this modi cation does not in uence the analysis in the preceding sections. The domain is chosen as the unit square Ω = [ , ] ⊂ R , which is discretized by means of Friedrich-Keller triangulations. In all cases, we take as a starting guess for the Newton iteration y = p = χ = , and terminate the iteration if either the combined norm of the residuals in ( . a), ( . b), and ( . c ) becomes less than − or if the maximum number of Newton iterations is reached. The Newton system in each iteration is solved by 's sparse direct solver.
In the rst example, the optimal state and adjoint state are set to y(x , x ) = sin(π x ) sin( π x ) and p ≡ , and the data f and y d are constructed such that ( . ) is ful lled, i.e., the optimal control is u ≡ . We note that there is a subset where y and p vanish at the same time, but it is only of measure zero. Table presents the numerical results for di erent values of the mesh size h, the Tikhonov parameter α in the objective in ( . ), and the parameter γ in the proximal point mapping. For the state y, we report the relative error of the computed approximation y h with respect to the (L projection of the) constructed optimal state y in the continuous L norm. For this choice of the adjoint state p, the relative error is of course not appropriate, and we thus report here the absolute error in the continuous L norm. The error for χ is given in the discrete where the x i are the interior nodes of the triangulation. (Note that χ is everywhere constant except in {y = }, and the triangulation is chosen such that y(x i ) .) First, we remark that for almost all combinations of parameter values, only a few Newton iterations are needed to reach a residual norm below − . Regarding the dependence on the mesh size, we can observe quadratic convergence of the state y and the adjoint state p. Since in this case, the active set satis es χ (x i ) ∈ { , } for all x i , the approximation only depends on the sign of y and is hence independent of the mesh size (for the considered values of h). Turning to the behavior with respect to γ , we rst note that the Newton iteration failed to converge for larger values. This can be explained by the fact that for larger values of γ , the critical set where y i ∈ [−γ , ] and hence y i + γ χ i ∈ [ , γ ] becomes larger so that we expect the local convergence of Newton's method to become an issue. For smaller values of γ , the Newton iteration converges quickly, and the approximation of y and p is independent of γ . This is not the case for χ , where the approximation becomes worse. Here we point out that while ( . c ) is equivalent to ( . c) for any value of γ , this only holds for exact solutions. A simple pointwise inspection shows that if ( . c ) does not hold exactly but only up to a residual of ε, then χ ε = χ + O( ε γ ). Finally, we see that the Newton method is robust with respect to α, and the approximation of y and p even improves for smaller α. However, this seems to be a particularity of this example, since the Tikhonov parameter enters the data through the construction of this exact solution.
For the second example, we choose Note that y and p are twice continuously di erentiable and vanish on the right half of the unit square. Therefore, the non-smoothness of the max-function occurs on a set of positive measure in this example. Moreover, as y and p vanish at the same time, χ is not unique in this set. This example can thus be seen as a worst-case scenario. Nevertheless, our algorithm is able to produce reasonable results as Table demonstrates. (Note that it does not make sense to list the discrete L ∞ error for χ h , since χ is not unique as mentioned above. In contrast, we can now report relative errors for p h .) The algorithm shows a similar behavior as in the rst example. Again, we observe quadratic convergence with respect to mesh re nement and that the Newton method does not converge if γ is chosen too large. (Note that in this example, the optimal state y is scaled di erently, which in uences the e ect of γ in ( . c ).) In contrast to the rst example, smaller values of α lead to worse numerical approximation and slower convergence of the Newton method (and even non-convergence for α = − ). This is a typical observation which is also made in case of smooth optimal control problems.
In summary, one can conclude that our semi-smooth Newton-type method seems to be able to solve the discrete optimality system ( . ) for a certain range of parameters, even in genuinely non-smooth cases. However, a comprehensive convergence analysis is still lacking, and the choice of the parameter γ appears to be a delicate issue. Moreover, as already mentioned in Remark . , it is completely unclear how to incorporate the sign condition in ( . c) into the algorithmic framework. This is the subject of future research. For every ball B n , there is a smooth rotational symmetric bump function ψ n ∈ C ∞ (R d ) with ψ n > in B n and ψ n ≡ in R d \ B n . De ning ψ := ∞ n= ψ n n ψ n H n (R d ) .
it holds that ψ > in D, ψ ≡ in R d \ D, and ψ ∈ H n (R d ) for all n ∈ N. Sobolev's embedding theorem then yields the claim. [ ] T , A non-linear eigenvalue problem: the shape at equilibrium of a con ned plasma, Arch. Rational Mech. Anal. [ ] W , Strong stationarity for optimal control of the obstacle problem with control constraints, SIAM Journal on Optimization ( ), -, : . / .
[ ] W , Towards M-stationarity for optimal control of the obstacle problem with control constraints, SIAM Journal on Control and Optimization ( ), -, : .