FINITE ELEMENT ERROR ESTIMATES FOR ONE-DIMENSIONAL ELLIPTIC OPTIMAL CONTROL BY BV-FUNCTIONS

. We consider an optimal control problem governed by a one-dimen-sional elliptic equation that involves univariate functions of bounded variation as controls. For the discretization of the state equation we use linear ﬁnite elements and for the control discretization we analyze two strategies. First, we use variational discretization of the control and show that the L 2 - and L ∞ -error for the state and the adjoint state are of order O ( h 2 ) and that the L 1 -error of the control behaves like O ( h 2 ), too. These results rely on a structural assumption that implies that the optimal control of the original problem is piecewise constant and that the adjoint state has nonvanishing ﬁrst derivative at the jump points of the control. If, second, piecewise constant control discretization is used, we obtain L 2 -error estimates of order O ( h ) for the state and W 1 , ∞ -error estimates of order O ( h ) for the adjoint state. Under the same structural assumption as before we derive an L 1 -error estimate of order O ( h ) for the control. We discuss optimization algorithms and provide numerical results for both discretization schemes indicating that the error estimates are optimal.


(Communicated by Eduardo Casas)
Abstract. We consider an optimal control problem governed by a one-dimensional elliptic equation that involves univariate functions of bounded variation as controls. For the discretization of the state equation we use linear finite elements and for the control discretization we analyze two strategies. First, we use variational discretization of the control and show that the L 2 -and L ∞ -error for the state and the adjoint state are of order O(h 2 ) and that the L 1 -error of the control behaves like O(h 2 ), too. These results rely on a structural assumption that implies that the optimal control of the original problem is piecewise constant and that the adjoint state has nonvanishing first derivative at the jump points of the control. If, second, piecewise constant control discretization is used, we obtain L 2 -error estimates of order O(h) for the state and W 1,∞ -error estimates of order O(h) for the adjoint state. Under the same structural assumption as before we derive an L 1 -error estimate of order O(h) for the control. We discuss optimization algorithms and provide numerical results for both discretization schemes indicating that the error estimates are optimal.
1. Introduction. In this paper we derive a priori error estimates for two finite element discretizations of the optimal control problem governed by a one-dimensional elliptic equation min (u,q) The continuous problem. We will consider the following model problem in the one-dimensional spatial domain Ω := (0, 1). Given the parameter α > 0, a desired state u d ∈ L ∞ (Ω), and functions a ∈ C 0,1 (Ω) and d 0 ∈ L ∞ (Ω) satisfying a(x) ≥ ν > 0 with a constant ν > 0 for all x ∈Ω and d 0 (x) ≥ 0 for a.e. x ∈ Ω, we are looking for a control q ∈ Q := BV (Ω) and an associated state u ∈ V := H 1 0 (Ω) solving the optimal control problem min (u,q)∈V ×Q 2.1. The state equation. Recall from, e.g., [3,28,41] that the space BV (Ω) is given by those functions v ∈ L 1 (Ω) for which the distributional derivative v is a Radon measure, i.e., As BV (Ω) embeds into L 2 (Ω) we note that for every q ∈ BV (Ω) the Lax-Milgram theorem readily guarantees existence of a unique associated state u = u(q) ∈ V . Thus, the use of the solution or control-to-state operator is justified. We note in passing that S : V * → V is a self-adjoint isomorphism. In fact, because we are working in dimension one, the following strong regularity result can be proven by standard arguments.
Introducing the reduced objective j : Q → R, j(q) := J(S(q), q), we can now analyze the reduced version of the original problem, given by min q∈Q j(q). (P) We will demonstrate that (P) admits a unique solution, characterize this solution by means of optimality conditions, and draw some conclusions from the optimality conditions regarding the structure of the optimal solution. Due to convexity we need not distinguish between local and global solutions, and first order necessary conditions are also sufficient.
Proof. The injectivity of S implies that j is strictly convex, so (P) has at most one solution. To establish existence ofq, let us consider a minimizing sequence (q n ) n∈N of j with j(q n ) ≤ j(0) for all n ∈ N. Our goal is to bound the BV-norm of that sequence. Since there holds α q n M(Ω) ≤ j(q n ) ≤ j(0), it only remains to establish that ( q n L 1 (Ω) ) n∈N is bounded. From [3,Thm. 3.44] it follows that q n −q n L 1 (Ω) ≤ C iso q n M(Ω) ≤ C iso j(0) α , whereq n := 1 |Ω| Ω q n dx and C iso depends only on Ω. Estimate (2) implies via the inverse triangle inequality that for all n ∈ N there holds q n L 1 (Ω) ≤ C iso j(0) α + |q n |, where we have used that |Ω| = 1. Moreover, we have |q n | S1 L 2 (Ω) ≤ S(q n − q n ) L 2 (Ω) + Sq n L 2 (Ω) ≤ S L(V * ,L 2 (Ω)) q n − q n V * + Sq n L 2 (Ω) .
Making use of the embedding L 1 (Ω) → V * with constant C emb we infer that the first term on the right-hand side can be bounded using (2), and the second term can be bounded by (1). Together, this yields

This and (3) imply
where we have used that S1 = 0 and that u d L 2 (Ω) ≤ 2j(0). In view of (1) we have thus found for all n ∈ N that Since BV (Ω) is compactly embedded in L 1 (Ω), there is a subsequence (q n k ) k∈N of (q n ) and aq ∈ L 1 (Ω) such that q n k →q in L 1 (Ω) for k → ∞. By continuity of the mapping L 1 (Ω) q → 1 2 Sq − u d 2 L 2 (Ω) and lower semicontinuity of q → q M(Ω) with respect to the L 1 (Ω) topology, cf. [41, Thm. 5.2.1], we deduce that j(q) = inf q∈Q j(q).

2.3.
Optimality conditions. Next, we provide necessary and sufficient optimality conditions for the optimal solution. Theorem 2.3. The controlq ∈ Q with associated stateū ∈ V is optimal for Problem (P) if and only if there exists a unique adjoint statez ∈ W 2,∞ (Ω) ∩ V such that (ū,q,z) and the W 3,∞ (Ω) functionΦ : Proof. Using convex analysis, e.g. [32], the optimality ofq is equivalent to where ∂j(q) denotes the subdifferential of j at the pointq. By the chain rule and the sum rule, e.g. [32,Proposition 3.28] and [32,Thm. 3.30], this is equivalent to Note that the sum rule is applicable since both summands of j are continuous on Q. Definingz := S * (Sq − u d ) and recallingū = Sq we obtain In particular, the asserted regularity ofz follows from Lemma 2.1, which in turn impliesΦ ∈ W 3,∞ (Ω). Furthermore, the definition of the subdifferential implies that (5) can be equivalently expressed as Testing with q = 2q, q = 0 and q =q +q for anyq ∈ Q yields the equivalent system Inserting q = 1 into (6) suppliesΦ(1) = Ωz ds = 0. By the definition of the distributional derivative of BV-functions, (6) is equivalent to For x ∈ Ω let q := 1 (x,1) ∈ Q be the characteristic function of the interval (x, 1). We have q = δ x and hence (7) yields |Φ(x)| ≤ α.
Structural conclusions. With the optimality conditions of Theorem 2.3 at hand, we can now derive helpful structural properties that hold without additional assumptions.
Corollary 1. Ifq is optimal for (P), then there hold whereq + andq − denote the positive and the negative part of the Jordan decomposition of the measureq . Moreover, we have Proof. Letx ∈ Ω withΦ(x) < α. By the continuity ofΦ there is an open neighborhood U ⊂ Ω ofx and δ > 0 such thatΦ ≤ α − δ on U . Then we have Thusq + (U ) = 0 andx ∈ supp(q + ). The claim forq − follows analogously. The first inclusion in (8) follows from Theorem 2.3 implies that every x with |Φ(x)| = α is either a global maximum or minimum of the C 1 functionΦ and hence satisfies 0 =Φ (x) =z(x), establishing the second inclusion in (8).
3. Finite element discretization. For the discretization of (P) we divideΩ = [0, 1] into 1 < l subintervals T i = (x i−1 , x i ) of size h i defined by the spatial nodes We obtain Ω = 1≤i≤l T i and set where C 0 (Ω) denotes the continuous functions onΩ that vanish on ∂Ω.
For further reference we recall that the Ritz projection associated to the bilinear form a, denoted R h : It is well known that for each v ∈ V this variational equality has a unique solution. Moreover, the discrete solution operator is denoted by S h : V * → V h and satisfies, with Since these identities, in fact, uniquely determine R h and S h , it follows that Concerning the approximation quality of S h we cite the following well-known results.
Lemma 3.2. There exist C > 0 and h 0 > 0 such that for every h ∈ (0, h 0 ] and all v ∈ L ∞ (Ω) there holds Proof. This is the main theorem of [40], keeping the regularity from Lemma 2.1 in mind.
The next lemma shows that S h is stable from L 2 (Ω) to W 1,∞ (Ω).
Lemma 3.4. Let w ∈ H 2 (Ω) ∩ V and R h w its Ritz projection. Then there are C, h 0 > 0 such that for each h ∈ (0, h 0 ] we have In both cases, the constant C > 0 is independent of w and h. Proof. Lemma 3.3 implies that the Ritz projection is stable in W 1,∞ (Ω) and thus Here, I h w is the usual nodal interpolant of w. The two estimates now follow from [9,Thm. 4.4.20].

3.2.
Variational control discretization. In this section we discuss the variational discretization of problem (P), in which the controls are not discretized explicitly. We show that the resulting semi-discrete problem admits a solution, characterize solutions by means of optimality conditions, and draw conclusions from the optimality conditions regarding the structure of solutions. The variationally discretized version of (P) is given by Defining j h : Q → R by j h (q) := J(S h (q), q), its reduced formulation reads min q∈Q j h (q). (P vd ) Theorem 2.2 has the following discrete counterpart.
Theorem 3.5. Problem (P vd ) admits an optimal controlq h ∈ Q with associated optimal stateū h ∈ V h . There exist C > 0 and h 0 > 0 such that for all h ∈ (0, h 0 ] we have q h BV (Ω) ≤ C for any optimal controlq h .
We point out that the control space Q is not discretized, hence the optimal controlsq h belong to BV (Ω). We prefer the notationq h nonetheless, because the variationally discretized problem depends on h.
We collect without proof optimality conditions and structural properties analogous to the continuous setting. and Corollary 2. Ifq h is optimal for (P vd ), then there hold where (q h ) + and (q h ) − denote the positive and the negative part of the Jordan decomposition of the measureq h . Moreover, we have 3.3. Piecewise constant control discretization. In this section we present a discretization for (P) in which the controls q h are piecewise constant. We denote the space of piecewise constant functions on T h by Now the discretization of (P) is given by Note that in contrast to (P vd ) the control q h is now discretized and has the form for some a h , c j h ∈ R, 1 ≤ j ≤ l − 1. We now address existence of optimal solutions and optimality conditions for Problem (P cd ).
Proof. The proof is the same as for Theorem 3.5.
Proof. As in the proof of Theorem 2.3 the optimality ofq h ∈ Q h is equivalent to . Also as in the proof of Theorem 2.3, in particular (6), this is equivalent to It remains to establish the statements forΦ h . Testing with q h := 1 ∈ Q h in (13) shows Ωẑ h (s) ds = 0 and thusΦ h (1) = 0. Moreover, (13) can be expressed as Because 1 (xj ,1) ∈ Q h and (1 (xj ,1) ) = δ xj for j = 0, 1 . . . , l, we infer from the inequality in (14) that Remark 1. The information onΦ h in Theorem 3.8 concerns only the gridpoints. It is therefore not ensured (and in general not true) that Φ h ∞ ≤ α.
Corollary 3. Ifq h ∈ Q h is optimal for (P cd ), then there holds where (q h ) + and (q h ) − denote the positive and the negative part of the Jordan decomposition of the measureq h .
Proof. Recall that By Theorem 3.8 we thus find a contradiction that impliesΦ h (x j * ) = α and hence the statement for supp((q h ) + ). Analogously, we obtain the assertion for supp((q h ) − ).

Remark 2.
Note that at non-gridpoints, |Φ h | from Theorem 3.8 may assume larger values than α. This implies that This stands in stark contrast to both the continuous and the variationally discretized problems, where every point at which |Φ|, respectively, |Φ h | attains the value α is necessarily an extreme point and thus a root ofz, respectively,z h . However, if That is, there is a root ofẑ h whose distance to x j * is no more than h. This will suffice to prove error estimates of order O(h).
For later use let us define an L 2 -projection operator onto the space of piecewise constant functions and collect useful properties of this operator.
It is easy to check that for any v h ∈ Q h and q ∈ BV (Ω) we have We have the following estimates.

4.1.
Error estimates for variational control discretization.
4.1.1. Basic error estimates for state and adjoint state. Throughout Section 4.1,q h refers to an arbitrary but fixed solution to (P vd ), and (ū h ,z h ,Φ h ) denote the associated state, adjoint state and multiplier, respectively, that are uniquely determined by Theorem 3.6. The constants that appear in the following error estimates are independent of (q h ,ū h ,z h ,Φ h ). We begin this section by proving a priori estimates for the errors in the optimal state and the adjoint state.
Proof. The optimality conditions forq andq h from Theorems 2.3 and 3.6 provide Adding these two inequalities and inserting R hz yields We can rearrange the first term by first using the state equations, cf. Theorems 2.3 and 3.6, and then using the definition of the Ritz projection. This demonstrates Invoking the definition of the adjoint equations, cf. Theorems 2.3 and 3.6, and

Hölder's inequality and Young's inequality supply
Proof. By Lemma 4.1 we have that By Lemma 3.1 and Theorem 3.7 the second term is of order Ch 2 . Taking the root yields the desired estimate.
We readily deduce an error estimate for the adjoint state.
where we used that u d ∈ L ∞ (Ω) and that ū h L ∞ (Ω) ≤ C, the latter being a consequence of Lemma 3.2. Moreover, by means of H 2 (Ω) → W 1,∞ (Ω) and Lemma 2.1 we obtain where the last inequality is due to Lemma 4.2.
4.1.2. Improved error estimates under structural assumptions. We now improve the L 2 (Ω) convergence order for the state to O(h 2 ) and deduce from this that the controls have L 1 (Ω) convergence order O(h 2 ), and that the adjoint state has L ∞ (Ω) convergence order O(h 2 ). To achieve this, we work with a structural assumption: We consider situations where the continuous optimal control admits finitely many jumps. More precisely, we assume that the number of minima and maxima of the functionΦ is finite. This number bounds the number of jumps of the optimal control. Since these maxima and minima are in fact roots of the continuous adjoint state, regularity and convergence results for the discrete adjoint state allow to prove that the discrete problem admits a similar structure. In the following we will frequently use the regularityz ∈ W 2,∞ (Ω) from Theorem 2.3.
The essential structural assumption reads as follows.
with m = 0 indicating that these sets are empty.
To interpret this assumption recall from Corollary 1 that where some of the coefficients may be zero. In addition, (8) yieldsz(x i ) = 0, 1 ≤ i ≤ m, i.e., thex i are roots of the continuous adjoint state. Under a mild additional assumption it is possible to prove that the discrete adjoint statez h admits rootsx i h close to thex i . Specifically, the distance |x i −x i h | is of order O(h). The additional assumption reads as follows. We point out that Assumption 4.5 is equivalent to the existence of numbers κ > 0 and R > 0 such that That is, Assumption 4.5 imposes a quadratic growth condition onΦ near its extreme points x i . Also note that the discrete counterpartsΦ h andΦ h ofΦ are piecewise quadratic functions.
Let us now prove the existence of unique roots of the discrete adjoint state in small neighborhoods of the pointsx i . Proof. We first note thatx i ∈ Ω is satisfied for i = 1, 2, . . . , m sinceΦ(x) = 0 for x ∈ ∂Ω, whereas |Φ(x i )| = α > 0 for i = 1, 2, . . . , m. Hence, we can assume without loss of generality that R > 0 is chosen so small that B R (x i ) ⊂ Ω for i = 1, 2, . . . , m. Moreover, we can choose R > 0 so small that all B R (x i ) are pairwise disjoint. Thus, it is sufficient to argue for one i ∈ {1, 2, . . . , m}. We writex :=x i for this i.
Sincez ∈ H 2 (Ω), we havez ∈ C(Ω). Thus, Assumption 4.5 implies the existence of R > 0 and δ > 0 such thatx is the only solution ofz(x) = 0 in B R (x) and such that |z (x)| ≥ δ > 0 for all x ∈ B R (x). Sincez is continuous, this inequality implies thatz does not change sign in B R (x), hencez is strictly monotone in B R (x). In view of Lemma 4.3 we can also achieve thatz h has for all sufficiently small h the same sign asz a.e. in B R (x). Hence,z h is either positive or negative almost everywhere in B R (x).
Evidently, the strict monotonicity ofz implies thatz assumes both negative and positive values in Suppose that there were an additional rootx h ofz h in B R (x). Then, by the fundamental theorem of calculus for Sobolev functions, we obtain 0 However, sincez h is either positive or negative almost everywhere in B R (x), this cannot be true. Hence,x h is indeed the only root ofz h in B R (x).
In the next lemma we conclude that in the neighborhoods B R (x i ) only thex i h can satisfy |Φ h (x)| = α and that there cannot be any points outside these neighborhoods where |Φ h (x)| = α holds.
In this case the claim follows from Lemma 4.6.
It is sufficient to show that in this case, |Φ h (x)| = α cannot be satisfied. To this end, we will demonstrate that there is > 0 such that |Φ(x)| ≤ α − for all Granted this claim, we infer from the definitions ofΦ andΦ h together with Lemma 4.3 and |Ω| = 1 that Thus we obtain, for h sufficiently small, that To establish the existence of said , note that |Φ| is continuous on the compact set Ω \ m i=1 B R (x i ). Hence, it attains a maximum on this set, and from Assumption 4.4 and Φ L ∞ (Ω) ≤ α, cf. Theorem 2.3, it is evident that this maximum is smaller than α, which shows that the desired exists, thereby concluding the proof. Lemmas 4.6 and 4.7 guarantee the existence of m well-defined pairs (x i ,x i h ) that are roots of the continuous and discrete adjoint state, respectively. By Corollary 1 and Corollary 2 we have Therefore, Lemmas 4.6 and 4.7 together with Assumption 4.4 imply that the number of points of the support ofq andq h are both bounded by m. Using Lemma 4.6 we observe for the cardinality of the involved sets that

Yet, by virtue of Lemma 4.7 this implies
but it can happen, at least for large h, that # supp(q ) < # supp(q h ).
Since we know from Corollary 2 and Lemma 4.7 that we find the following discrete analogue to the continuous representation (17): There exist real numbersā h andc i h , 1 ≤ i ≤ m, such that Note that some of the coefficients may be zero. In addition, we recall thatz h (x i h ) = 0 for i = 1, . . . , m by definition, cf. Lemma 4.6.
Next we estimate the difference between the jump heights of the optimal control q andq h . Lemma 4.8. Suppose that Assumption 4.5 is valid. Then there exist C, h 0 > 0 such that for all h ∈ (0, h 0 ] the optimal controlsq =ā + m i=1c i 1 (x i ,1) of (P) and Proof. Let R, h 0 > 0 be the quantities from Using the structure of the optimal controls, the definition of the distributional derivative, and the definition of the state equation, we infer for all h ∈ (0, h 0 ] that For the second term on the right-hand side we observe (Ω) ≤ Ch 2 due to Lemma 3.1 and the boundedness ofq h independent of h (after decreasing h 0 if necessary), cf. Theorem 3.5. Using the state equation for the first term we obtain where the second inequality is obtained by virtue of Lemma 3.1 and integration by parts. Inserting the two obtained estimates into (19) yields the assertion after summation.
From the previous lemma we derive an estimate for the difference between the offsets and the jump positions ofq andq h . Lemma 4.9. Suppose that Assumption 4.5 is valid. Then there exist C, h 0 > 0 such that for all h ∈ (0, h 0 ] the optimal controlsq =ā + m i=1c Proof. Lemma 4.6 and Corollary 1 imply By Lemma 4.6 we also have |z | ≥ δ > 0 in a neighborhood ofx i containingx i h for i = 1, 2, . . . , m for h sufficiently small. Thus, by Lemmas 3.2 and 3.3 we find It remains to estimate the difference in the offsets. To this end, we denote S := S * S and S h := S * h S h and observe that By Theorems 2.3 and 3.6 the means ofz andz h vanish. Integration hence shows As S is an isomorphism, we have Ω S1 dx = Ω S * S1 dx = S1 2 L 2 (Ω) = 0 and therefore We have that S h L(L 1 (Ω),L 1 (Ω)) = S * h S h L(L 1 (Ω),L 1 (Ω)) ≤ C, since S * h = S h and, by standard energy norm estimates, (Ω) in one space dimension. We can therefore continue the estimate by From the definition ofq h we obtain i h δxi h M(Ω) ≤ j(0) α and because of S1 = 0 it also yields |ā h | ≤ C with C independent of h. By Lemma 4.8 we have (20) and (22) show The previous two results have the following consequence.
the result follows from Lemma 4.8 together with Lemma 4.9.
In view of Corollary 4 it remains to estimate ū −ū h L 2 (Ω) . We are now able to establish convergence order h 2 for the optimal state. Proof. Combining Lemma 4.1 with Hölder's inequality and Corollary 4 leads to By Young's inequality this yields Sincez ∈ W 2,∞ (Ω), the error estimate of the Ritz projection from Lemma 3.2 thus implies the assertion.
Finally, we obtain convergence of order h 2 also for the optimal control and the optimal adjoint state, but with respect to the L 1 (Ω)-norm and the L ∞ (Ω)-norm, respectively.
Corollary 5. Suppose that Assumption 4.5 is valid. Then there exist C, h 0 > 0 such that for all h ∈ (0, h 0 ] we have the following estimates of the structural differences ofq andq h We also have the error estimates where we have also used Theorem 4.10 to deduce the last inequality. The claim follows by taking into account the Ritz projection error from Lemma 3.2.

4.2.
Error estimates for piecewise constant control discretization. Through the whole of Section 4.2q h refers to an arbitrary but fixed solution to (P cd ), and (û h ,ẑ h ,Φ h ) denote the associated state, adjoint state and multiplier, respectively, that are uniquely determined by Theorem 3.8. The constants that appear in the following error estimates are independent of (q h ,û h ,ẑ h ,Φ h ).
In this section we prove convergence rates for (P cd ). Let us stress that we can for any node x j . We will establish precisely this order of convergence and emphasize that the numerical experiments in Sections 5.3 and 5.4 indicate that this order is indeed optimal.
As in the variationally discrete case we begin by establishing an error estimate for the state and the adjoint state that holds without any structural assumption on the optimal controls. In fact, we are not able to improve this further. Still, in a second step we can derive an error estimate for the control relying on the same structural assumptions as in the variationally discretized setting.

Basic error estimates for state and adjoint equation.
Lemma 4.11. Let h 0 > 0 be as in Theorem 3.7. For any h ∈ (0, h 0 ] the optimal stateû h associated with the optimal controlq h to (P cd ) satisfies Proof. We test the variational inequality from Theorem 3.8 with q h = Π hq ∈ Q h and the variational inequality from Theorem 2.3 with q =q h and obtain Adding those two lines and using Lemma 3.10 we find Rearranging terms and using (15) leads to By Lemma 3.10 we obtain Using Theorem 3.5, Lemma 3.3 and the boundedness of û h − u d L 2 (Ω) , which is due to Theorem 3.7, we find q h M(Ω) , ẑ h L ∞ (Ω) ≤ C for all h sufficiently small and thus We introduce the auxiliary stateũ h := S hq and observe with the boundedness results from Theorem 2.2 and Theorem 3.7 together with Lemma 3.1 that pointing out that due to S * = S and S * h = S h the same finite element discretization error estimates as for the state equation apply to the adjoint states. Combining this with (23) leads to ũ h −û h L 2 (Ω) ≤ Ch. Therefore, the assertion follows from where the first summand is of order h 2 by Lemma 3.1.
The preceding lemma has the following consequence. Proof. The proof is essentially the same as for Lemma 4.3, with Lemma 4.11 replacing Lemma 4.2.

Improved error estimates under structural assumptions.
Similarly as in the variationally discrete setting we will now use the structural Assumptions 4.4 and 4.5 to derive an L 1 (Ω)-error estimate for the control. We recall that Assumption 4.4 ensures thatΦ has only finitely many minima and maxima, which in turn implies that the optimal control exhibits only finitely many jumps. The main idea underlying the proof of the error estimate is to examine the distance between jump points and jump heights of the continuous and the discrete optimal control. Note that the discrete optimal controlq h is piecewise constant and can only admit jumps at the gridpoints x j with |Φ h (x j )| = α. These jumps can only occur close to points where |Φ| = α, i.e., in the vicinity of thex i , i = 1, 2, . . . , m, as the following result shows.
Proof. The proof follows along the lines of Case 2 in Lemma 4.7.
Next we investigate the behavior ofΦ h inside the balls B R (x i ). Note that if |Φ h | < α in B R (x i ), thenq h will not admit a jump in B R (x i ), henceĉ j h = 0 in (12) for all j with x j ∈ B R (x i ). We therefore consider points where |Φ h | ≥ α and remark that points with |Φ h | > α can actually exist becauseΦ h is piecewise quadratic.
Lemma 4.13. Let Assumption 4.5 hold and let R > 0 be as in Lemma 4.6. There exists an h 0 > 0 such that the following holds for all h ∈ (0, h 0 ]. If |Φ h (x)| ≥ α for somex ∈ B R (x i ) and some i ∈ {1, 2, . . . , m}, thenẑ h has a unique rootx i h in Proof. Without loss of generality let us assume that h 0 ≤ R/2. We argue for the caseΦ h (x) ≥ α for somex ∈ B R (x i ) and an i ∈ {1, 2, . . . , m}. The caseΦ h (x) ≤ −α can be handled analogously. Due toΦ h (x) ≥ α we infer from Lemma 4.12 thatx ∈ B R 2 (x i ). Since h 0 ≤ R/2, we find gridpointsx i l,h andx i r,h that satisfŷ In the gridpoints we have |Φ h | ≤ α. Next we show that |Φ h (x j )| = α for a gridpoint x j can only hold if x j =x i h for some i ∈ {1, . . . , m} or if |Φ(x i h )| > α and x j is close tox i h . Corollary 7. Let Assumption 4.5 hold and let R be as in Lemma 4.13. There exists h 0 > 0 such that the following holds for all h ∈ (0, h 0 ]. If |Φ h (x)| ≥ α for somex ∈ B R (x i ) and some i ∈ {1, 2, . . . , m}, then the pointx i

Hence, the continuous functionΦ h attains a local maximum at somex
Proof. The first part of (24) is just a restatement of Corollary 3 and the second part of (24) follows from the main statement in combination with Lemma 4.12.
Summarizing we now know thatq h cannot jump outside of any B R (x i ), 1 ≤ i ≤ m, and that inside every B R (x i ) jumps can only occur atx i h (Case 1) or at any of the two points y i l and y i r (Case 2), 1 ≤ i ≤ m. In addition, such a jump can only occur if the respective point is a gridpoint. In contrast, in the variational discrete setting the jumps ofq h are not restricted to gridpoints. For clarification we point out that there might well be situations, for large h, where the continuous optimal controlq jumps atx i , but the discrete optimal controlq h does not admit a jump in B R (x i ). Vice versa, for large h it may happen thatq h exhibits one or two jumps in B R (x i ), butq does not jump in B R (x i ).
To obtain a convergence result, we need to estimate the difference in the jump points and the corresponding coefficients. In the remainder of this section we use the following notation. We writex i l,h ,  By virtue of the inclusion in (24) the preceding discussion furthermore shows that q h can be represented as follows. There exist real numbersâ h ,ĉ i l,h ,ĉ i r,h , 1 ≤ i ≤ m, such that (25) where some of the coefficients may be zero. We estimate the difference between the jump heights of the optimal controlq andq h .
Proof. The proof of Lemma 4.8 remains valid forq h ,û h ,ẑ h and yields Applying Lemma 4.11 establishes the desired estimate.
The difference between the offsets and the jump positions ofq andq h can be estimated as follows.
Proof. In view of Lemma 4.14 it only remains to estimate the difference |ā −â h |. This can be accomplished almost verbatim as in Lemma 4.9.
We obtain the following error estimate for the control in L 1 (Ω).
Corollary 8. Suppose that Assumption 4.5 is valid. Then there exist C, h 0 > 0 such that for all h ∈ (0, h 0 ] we have Proof. The desired estimate follows by combining Lemma 4.15 and Lemma 4.16.

5.
Numerical experiments. In this section we introduce an algorithm to solve the optimization problems (P vd ) and (P cd ) based on the PDAP method described for example in [33,39]. Moreover, we discuss the error estimates for both discretization schemes on two numerical examples.
beforehand, the algorithmic idea is to work with approximations of this set. We start with an approximation by a semi-smooth Newton method, cf. [36]. Note that (26) is a finite-dimensional problem of dimension m (0) + 1, independently of h. This h ). This process is iterated. We call the step of the algorithm where the new estimate is obtained, the outer iteration. The inner iteration consists of solving (26). The outer iteration and thereby the overall algorithm are terminated if an approximation t (k) : and where out > 0 is some small tolerance, e.g., out = 10 −10 . All in all, these considerations give rise to the following algorithm. Obtain (q h ) by solving (26) to tolerance in // inner iter. 4 Compute the roots t (k+1) ∈ R m (k+1) of z While it is theoretically possible that the inner iteration does not converge, we did not observe divergence in the numerical experiments that we carried out. However, we did sometimes observe cycling of the outer iteration, e.g., t (2k+2) = t (2k) and t (2k+3) = t (2k+1) for all k sufficiently large. Since this did only occur for iterates with an equal number of roots of the adjoint state, the following modification of line 4 was possible and turned out to be sufficient: Compute the roots t (k+1) in line 4, and if t (k+1) − t (k) 2 ≥ t (k) − t (k−1) 2 , then use 0.5t (k) + 0.5t i=1 have to be gridpoints and, in view of our theoretical findings from Corollary 7, we may add two gridpoints for every root of z (k) h . To meet these demands we first compute the roots of z (k) h in the same way as in Algorithm 1. Subsequently, every root is replaced by the two gridpoints adjacent to that root, except if a root happens to be on a gridpoint, in which case only that gridpoint is used. This is in agreement with Corollary 7. Indeed, if a gridpoint is added at which no jump occurs, then the inner iteration accounts for this by yielding zero for the corresponding coefficient (recall the representation (25)). Since these are the only changes in Algorithm 1, we do not state the resulting algorithm. In the numerical experiments we use the same set of parameters as for Algorithm 1, cf. (27). 5.3. Example 1: Known solution. Throughout Section 5.3 and Section 5.4,q h , respectively,q h refer to the solutions obtained by the optimization algorithms.
It is straightforward to check that these quantities satisfy the conditions from Theorem 2.3. In particular, given this α and this u d the exact solution to (P) isq. The approximated solutions to this problem are depicted in Figure 1. Figure 2 displays the errors between solutions to the original problem (P) and solutions to the variationally discretized problem (P vd ). We observe that the error estimates of Theorem 4.10 and Corollary 5 are indeed sharp. In addition, the L 2 (Ω)error of the controls is not of order h 2 , showing that the derived error estimates for the control are not satisfied for the L 2 (Ω)-norm. We remark that an error estimate of order O(h) for the controls with respect to the L 2 (Ω)-norm follows easily from Corollary 5.
In Figure 3 we compare the solutions of the fully discretized problem (P cd ) to the solutions of the original problem. Again we find the error estimates from Lemma 4.11, Corollary 6 and Corollary 8 to be sharp and the L 2 (Ω)-error of the controls to be of lower order than the L 1 (Ω)-error. Correspondingly, it is straightforward to deduce an error estimate in L 2 (Ω) of order O(h 1 2 ) for the controls. The slightly erratic behavior of the errors can be explained by the fact that on some grids the locations of the jumps of the continuous optimal controlq are better resolved by the gridpoints than on others; we stress that the grids are not nested.  cos(2πx)). An approximate solution to (P) is shown in Figure 4.
First we turn to the variationally discrete problem. As we do not have a known solution, we compute a reference solution  displays the approximated errors. As in Example 1 we observe that the rates from Theorem 4.10 and Corollary 5 are sharp and that the L 2 (Ω)-error of the control is of lower order than the L 1 (Ω)-error. The same procedure is applied to the fully discrete problem, and the results are depicted in Figure 6. Once again the proven rates turn out to be sharp and the L 2 (Ω)-rate is of lower order than the L 1 (Ω)-rate.