Optimal Control of the Linear Wave Equation by Time-Depending BV-Controls: A Semi-Smooth Newton Approach

An optimal control problem for the linear wave equation with control cost chosen as the BV semi-norm in time is analyzed. This formulation enhances piecewise constant optimal controls and penalizes the number of jumps. Existence of optimal solutions and necessary optimality conditions are derived. With numerical realisation in mind, the regularization by H1 functionals is investigated, and the asymptotic behavior as this regularization tends to zero is analyzed. For the H1 regularized problems the semi-smooth Newton algorithm can be used to solve the first order optimality conditions with super-linear convergence rate. Examples are constructed which show that the distributional derivative of an optimal control can be a mix of absolutely continuous measures with respect to the Lebesgue measure, a countable linear combination of Dirac measures, and Cantor measures. Numerical results illustrate and support the analytical results.

In problem (P ), we focus our attention on sparse optimal controls in the sense that they are piecewise constant. In particular, using the total variation of a BVfunction in the cost functional J, enhances sparsity in the derivative of the optimal control. If the total variation norm of the derivative of the control is replaced with Hilbert-space control costs, a classical tracking type optimal control problem is obtained, c.f. [25, p. 295 et seq.], and also e.g. [14], [22].
For a piecewise constant optimal control of (P ) the jumps are located in the position of these Dirac measures, see for example [9]. This type of sparsity property is reflected in the necessary and sufficient first-order optimality condition. As far as the authors know, the L 1 -norm is one of the first discussed sparsity enhancing cost terms in the context of partial differential equations. A detailed discussion on the history of sparsity in optimal control of partial differential equations can be found in e.g. [2]. Furthermore, sparsity results for optimal control problems with linear partial differential equations are considered in several works. References were specified for example in [3] where the authors emphasize the papers [4], [5], [6], [7], [8], [11], [12], [19], and [20]. In image reconstruction, BV -functions are well investigated but modeling aspects are different compared to optimal control with partial differential equations. In mathematical image analysis the use of BVfunctionals is motivated by their ability to preserve natural edges and corners in the image. An introduction to image reconstruction aspects can be found in [10].
For the purpose of numerical realization we rely on regularized problems by using the H 1 semi-norm. This enables us to approximate the BV optimal control of (P ) by the H 1 controls in the strict-BV sense. The main purpose of this regularization is to use the semi-smooth Newton algorithm for which we present super-linear convergence results. In particular, one is able to show that the regularized problem permits a point-wise formula for the derivative of the H 1 controls. This property is used for the well-posedness result of the Newton algorithm.
The choice of the control costs related to BV-norms or BV-seminorms has not yet received much attention in the literature. In [11] the effect of L 2 -, H 1 -, measurevalued and BV-valued control costs on the qualitative behavior of the optimal control is compared and a significant difference of the resulting optimal controls was pointed out. A systematic study of the use of controls which are BV-functions in time for optimal control related to semi-linear parabolic equation is given in [9]. In the numerical experiments of that paper it was noted that indeed the use of BV-control costs enhance the property that the optimal controls only exhibit a few jumps (switches). Here we aim for obtaining related results for the linear wave equation. Differently from [11], for the numerical realisation by means of a semi-smooth Newton method, we use an H 1 regularisation of the infinite dimensional problem and verify super-linear convergence of the method. While in [11] the numerical realisation of the non-smooth part originating from the BV-function uses a duality formulation, it is based on a prox-operator approach, c.f. [26], in the present work.
Let us briefly outline the following sections. In section 2 we gather the necessary prerequisites on the wave equation and on one-dimensional BV -functions which will be needed later on in this paper. Section 3 is dedicated to the analysis of the optimal control problem and sparsity properties of the optimal controls. Section 4 is devoted to the regularized problem (P 1 γ ), the corresponding convergence results for the optimal controls of (P 1 γ ) as γ → 0, and the first-order optimality conditions of (P 1 γ ). Furthermore, the semi-smooth Newton algorithm and its super-linear convergence are presented. The algorithm is embedded into a path following algorithm to approximate the original unregularized problem. In section 5, we construct test cases for problem (P ) in such a manner that exact analytic solutions for (P ) can be found. The construction steps can be used to build all types of distributional derivatives for the optimal controls D t u j . This means that D t u j can be a mix of absolutely continuous measures with respect to the Lebesgue measure, a countable linear combination of Dirac measures, and Cantor measures, see for example [1, p. 184]. The first numerical example refers to an optimal control that has finitely many jumps. In the second example, we construct an optimal control which can be characterized as a Cantor function. In the last section 6 we remark that our results can be extended to several other linear second-order hyperbolic equations.
2. The wave equation and BV functions in time.

Preliminaries on the wave equation.
Since in this work non-smooth data are used for the wave equation, we directly introduce the weak solution of the wave equation (see e.g. [28]). In particular y u is understood as the weak solution of the wave equation (W) in problem (P ). Furthermore, we present in this section standard regularity results, and an energy estimation for the weak solution of the wave equation. H) a weak solution of (W) with forcing f ∈ L 1 (0, T ; H), displacement y 0 ∈ V , and velocity for every η ∈ L 1 (I; V ) such that ∂ t η ∈ L 1 (I; H), η| t=T = 0.

Preliminaries on BV functions in time.
Concerning BV-functions in one scalar variable we refer to [1]. In this section we only recall a few facts which we frequently refer to: A sequence (u k ) ⊂ BV (I) is said to converge weakly* in BV (I) to u if (u k ) converges to u in L 1 (I), and the measures (Du k ) converge weakly* in For all bounded sequences (u k ) k ⊂ BV (I) there exists a weakly* convergent sub-sequence with limit u ∈ BV (I).
A weakly*-converging sequence (u k ) k in BV (I) with limit u is also strongly converging in L p (I) for 1 ≤ p < ∞ to u.
A sequence (u k ) ⊂ BV (I) is said to converge strictly in BV (I) to u if (u k ) converges strongly in L 1 (I) and D t u k M (I) Strictly converging sequences in BV (I) are also weakly* converging in BV (I), see [1, p. 126].
The following BV-Poincaré inequality holds: Lemma 2.6. For each m ∈ N >0 the map u → (D t u, u(0)) is an isomorphism from 3. Analysis of the optimal control problem (P). In the following we show the existence of a unique solution of (P ). Furthermore, we will introduce a problem (P ) which is equivalent to (P ), for which the first-order optimality conditions are derived. These optimality conditions will be used to present sparsity results for the optimal control of (P ). Proof. Utilizing the fact that the forward mapping is continuous from L 2 (I) m to L 2 (Ω T ), the proof can be carried out along the lines of [9, Theorem 3.1].
3.1. Equivalent problem (P ). Consider the following linear and continuous operator: Using the identification of BV (I) with M (I) × R, see Lemma 2.6, and the fact that BV (I) embeds into L 2 (I) we can rewrite (P ) as the equivalent problem: where we have to modify the control to state operatorS to First-order optimality condition for (P ). In this section the necessary and sufficient first-order optimality conditions for (P ) are presented. Furthermore, we show sparsity results for the optimal control of (P ) respectively (P ). Let us begin with the following theorem: where p 1 ∈ C 2 ([0, T ]) m . This first-order optimality condition is equivalent to: For all i = 1, ..., m and v ∈ M (I) it holds that Proof. The proof can be found in the appendix.
be an optimal control for (P ). Then we have for all i = 1, · · · , m and p 1 = (p 1,i ) m i=1 given in (6): The proof is analogous to the one of [9, Proposition 2.4]. The following corollary which is similar to a result in [9] exhibits an important structural property of the solution u α,j as a function of α j . Corollary 1. There exists M j > 0 such that the j-th component u α,j of the optimal control u α of (P ) is constant in BV (I) m for all α j > M j .
Proof. Let y 0 , y α be the solutions of the state equation associated to the controls u = 0 respectively u α . Furthermore, let us define p α := L(y α − y d ). From the optimality of u α we get This implies that y α −y d L 2 (Ω T ) ≤ y 0 −y d L 2 (Ω T ) . From the adjoint state equation we obtain The constant c 1 is defined with respect to the embedding L ∞ (I; V ) → L ∞ (I; H), c 2 is depending on the embedding constant in (2), and c 3 is the embedding constant of L 2 (Ω T ) = L 2 (I; H) → L 1 (I; H). From the adjoint p 1 , and the above estimation we get for all t ∈ [0, T ] where the first inequality follows from The support relation in Lemma 3.3 now implies that D t u α,j ≡ 0 if α j > M j .
Corollary 2. Let u ∈ BV (I) m be the optimal control of (P ). Assume for some i ∈ {1, · · · , m} that the measures D t u + i and D t u − i are not trivial. Then we have Proof. W.l.o.g. let us consider m = 1. Assume that dist(supp(D t u + ), supp(D t u − )) = 0. Then there exists a sequence (t n ) n ∈ supp(D t u + ) ⊂ I such that p 1 (t n ) = −α and dist({t n }, supp(D t u − )) → 0. Hence, there exists a sub-sequence (t t k ) k which converges to somet with dist({t}, supp(D t u − )) = 0. Furthermore, there exists a sequence (τ n ) n ∈ supp(D t u − ) ⊂ I such that p 1 (τ n ) = α and τ n →t. By the continuity of p 1 we have −α = lim lim k→∞ p(t n k ) = lim n→∞ p(τ n ) = α which is a contradiction to α > 0.

Remark 1.
If the set of points in which p 1,i (t) ∈ {±α i }, is finite, we have by Lemma 3.3 c) that D t u i is a combination of Dirac measures centered at those points (not necessarily in all of these points). In particular, we obtain that the optimal control u j of (P ) is piecewise constant in [0, T ] with jumps in supp(D t u i ). This remark can also be found in [9, Remark 3.5]. Later we will construct an analytically exactly solvable example for our problem (P ), which allows us tho show that the derivatives of the optimal controls can either be of Cantor or Dirac kind or alternatively absolutely continuous with respect to the Lebesgue measure.
In particular, the derivatives of the optimal controls need not to be sparse. For further information about these characterizations of measures, see for example [1] on page 184.

4.
Regularization. For numerical realization we aim at applying a semi-smooth Newton method. For this purpose we regularize problem (P ). We then analyze the asymptotic behavior of the optimal controls of the regularized problem as well as the first-order optimality condition of the regularized problem. Finally, we will present convergence results for the semi-smooth Newton algorithm.
In the following, let us consider the regularized optimal control problem: Note that for each u ∈ H 1 (I) the value u(0) is well defined, because H 1 (I) embeds continuously into C(I). The total variation cost term in (P ) can be identified with the cost term m j=1 α j ∂ t · L 1 (I) for H 1 (I) m functions in (P 1 γ ) since now u ∈ H 1 (Ω). The symbol ∂ t represents the weak derivative.

4.1.
Asymptotic behavior as γ → 0 + . In this section we show that the unique solution of (P ) can be approximated by the unique solutions of the problems (P 1 γ ) as γ → 0.
In terms of the reduced costs J, problem (P ) can be expressed as Analogously, we have The following result follows with standard techniques.
Theorem 4.1. For every γ > 0 problem (P 1 γ ) has a unique solution u γ ∈ H 1 (I) m . Let us denote the unique optimal controls of (P ) and (P 1 γ ) by u and u γ . To argue the BV-weak* and strict convergence of u γ to u we use concepts from [26] and [9].  It is (locally) Lipschitz-continuous, monotonically increasing, concave, and a.e. differentiable in (0, ∞) with Proof. Utilizing the fact, that κ ∈ C 1 ([0, ∞)) and κ(0) = 0, the proof can be carried out along the line of [ as the solution of (P ). Due to the continuity ofS, we have that J is continuous with respect to the metric d BV . The continuity of J implies then, that there exists N ∈ N such that |J(u) − J(u n )| ≤ for all n ≥ N . Thus we have for all γ > 0: Because is arbitrary and holds, we have for all c loc > 0, and γ ≤ c loc : where we used that V is (locally) Lipschitz-continuous, monotonously increasing (which implies that V ≥ 0 a.e.), concave (which implies an a.e. decreasing derivative), and thus V ∈ L ∞ (0, c loc ).
Theorem 4.5. The unique optimal controls u γ of (P 1 γ ) converge weakly* in BV (I) m to the optimal control u of (P ).
Proof. Let (γ n ) be an arbitrary null sequence in R + . In the following we show that the solutions (u γn ) ∞ n=1 of the problems (P 1 γn ) ∞ n=1 are bounded in BV (I) m , with a proof which is similar to the one in [9]: Consider the decomposition u γn = a γn +û γn where a γn = (a γn,1 , ..., a γn,m ) ,û γn = (û γn,1 , ...,û γn,m ) At first we argue that (u γn ) n is bounded in BV (I) m . Note that S (u γn )−y d is bounded, because (v(γ n )) n is bounded. Thus, we get thatS(u γn ) is bounded in where we used the BV-Poincaré inequality in the last estimate. Now define z n = y n −ŷ n = L(a γn − → g ) with y n =S(u γn ), andŷ n =S(û γn ). The sequence z n is bounded in L 2 (Ω T ).
To argue that (a γn ) n is bounded we argue by contradiction, and assume that (for a subsequence, denoted by the same index)p n := max we have that ξ n which does not converge to 0 for n → ∞ sincep n → ∞. This is a contradiction to (10) by the injectivity of the L operator. Thus we get that (a γn ) n is a bounded se- Considering that bounded sequences in BV (I) m are weak* compact, we obtain by [1,Theorem 3.23] that there exists a sub-sequence (u γn k ) k , which converges weakly* to a functionũ ∈ BV (I) m . The weak* convergence implies that u γn k converges in L 2 (I) m toũ, and D t u γn k converges in the weak* topology of M (I) m to D tũ . Hence, by the weak* lower semi-continuity of · M (I) m we get Furthermore, the continuity of S implies that Because ∂ t u γn k ,i L 2 (I) , and u γn k ,i (0) R m are bounded sequences, we have Estimates (12) - (14) and Theorem 4.4 imply that By uniqueness of the optimal control of (P ) we get thatũ is equal to the optimal control u of (P ). Thus, the unique solutions u γn k of (P 1 γn k ) converge BV (I) m -weak* to the optimal control u of (P ).
Corollary 3. The unique optimal controls u γ of (P 1 γ ) converge strictly in BV (I) m to the optimal control u of (P ).
Proof. Due to the weak* convergence by Theorem 4.5 we get that u γ converges in L 1 (I) m to the optimal control u. Using thatS(u γ ) →S(u) in L 2 (Ω T ), Theorem 4.4 implies that the total variations of u γ converge to the total variation of u.

4.2.
Equivalent regularized optimal control problem to (P 1 γ ). In this section we introduce an equivalent problem (P γ ) to (P 1 γ ). The latter will be solved by a semi-smooth Newton method. In the remaining part of the paper we restrict the operator B defined in (5) to L 2 (I) m × R m . Its adjoint has the form Analogously we henceforth restrict S to L 2 (I) m × R m . The isomorphism in Lemma 2.6 translates (P 1 γ ) into the following equivalent form: Regularization -first-order optimality condition. In this section we present the first-order optimality conditions for (P γ ). We will use a prox-operator approach to represent implicitly the distributional derivative of the BV-optimal control of (P ) with respect to the adjoint. This allows to replace the sub-differential in the first-order optimality conditions of (P γ ). Finally we compare the sparsity results of (P γ ) and (P ), and show the convergence of the adjoints of (P γ ) to the adjoint of (P ) for γ → ∞.
be the optimal control of (P γ ). We have the following necessary and sufficient optimality conditions for (P γ ): Proof. Since this proof is standard, we have deferred it to the appendix.
In the appendix it is also shown that is equal to the right hand side of equation (1) in Lemma 4.6.
Due to the regularity of the adjoint wave equation, we have that the optimal control − → v is at least Lipschitz continuous.
be the optimal control of (P γ ). Then we have for a.a. s ∈ I and i = 1, ..., m: One can compare the sparsity structure of the optimal controls associated to (P γ ) to the sparsity for the optimal control of (P ). The optimal measures − → v i in (P ), see We next address the convergence of the adjoints ψ γ of (P γ ) to the adjoint p 1 of (P ), which is defined in Theorem 3.2.
and for κ > 0, Due to Theorem 4.5 we know that − → v γ , the derivative of the optimal control u γ of (P 1 γ ), converges weakly* to − → v in M (I) m , the distributional derivative of the optimal control u of (P ). Furthermore, recall that − − −− → 0, for i = 1, · · · , m, holds. By regularity results of the wave equation we have that p 1,i and ψ γ,i are elements of H 2 (I). Furthermore, using Theorem 4.5 in the last inequality of the following computation we find − −− → 0 holds as well. For this purpose, utilizing the dominated convergence theorem and Theorem 4.5 we obtain Consider now the case κ = 0: To verify (17) let us note that ψ γ,i ∈ H 1 0 (I) since κ = 0. Because H 2 (I) continuously embeds into C 0 (I), and ψ γ,i , p 1,i ∈ C 0 (I) we −−−→ p 1,i , for i = 1, · · · , m, we achieve the desired result.

4.4.
Regularization -semi-smooth Newton method. In this section, we discuss the semi-smooth Newton method which is used to construct a sequence in L 2 (Ω) m × R m that solves the first-order condition (1), (2) in Lemma 4.6 in the limit. Later in section 5 a BV-path following algorithm is presented which uses these method, see Algorithm 1. At first, let us introduce and observe that F γ ( − → v , − → c ) = 0 is equivalent to (1), (2) in Lemma 4.6.
Consider the following definition of [18, p. 120 et seq.]: Definition 4.7. Let G : X → Y be a continuous operator, between Banach spaces X and Y . Further, let us consider a set-valued mapping ∂G : X ⇒ L(X, Y ) with non-empty images. We call ∂G a generalized differential. We define the operator G to be ∂G-semi-smooth or simply semi-smooth in We recall the following theorem from [17, Theorem 1.1]: Theorem 4.8. Suppose that x * is a solution of the equation G(x * ) = 0 and that G is ∂G-semi-smooth in a neighborhood U in X containing x * . If the set ∂G(x) contains only non-singular mapping and if { M −1 | M ∈ ∂G(x)} is bounded for all x ∈ U , then the Newton iteration converges super linearly to x * , provided that x 0 − x * is sufficiently small.

Remark 2.
Let us note that [17, Theorem 1.1] is actually more general. The authors are using the slant differentiability which is a weaker concept than the semi-smoothness, see [17, p. 868].
In the following, we prove that all conditions needed for Theorem 4.8 hold for If the initial value x 0 is sufficiently close to x * this guarantees that the sequence (x k ) k∈N in (23) converges super linearly in L 2 (I) m × R m to x * with respect to F γ .
Definition 4.9. Define the following operators for ( Furthermore, we write for i = 1, · · · , m: For any function Υ : X → Y , with X and Y Banach spaces, we denote by DΥ(x)(z) the directional derivative of Υ in x in direction z.
Let us recall that the point-wise maximum and minimum operation from L p to L 2 are semi-smooth if p > 2 (Norm gap), and a Newton derivative in f ∈ L p (I) in direction h ∈ L p (I) is given by for max, Hence, we get formax/min : In particular, we have for ( Proof. Lemma 4.10 is a consequence of the semi-smoothness of max / min from L p to L q with p > q ≥ 1.  Furthermore, G := ( Lg i , Lg j L 2 (Ω T ) ) m i,j=1 is invertible and we have for all h ∈ L 2 (I) m that the continuous affine linear operator (B * L * LB) 2 (h, ·) is bijective with Proof. The non-negativity and injectivity can be seen by the strict inequality, The strictness is a consequence of the uniqueness of solutions of the wave equation defined by L.
The claim on the spectrum follows from selfadjointness and the fact that the spectral radius is B * L * LB , see [27,Theorem VI.6].
Let us now show that G is invertible. Given the linear independence of (g i ) m i=1 in L 2 (Ω T ) we get that (L(g i )) m i=1 is linearly independent in L 2 (Ω T ) by the uniqueness of solutions of the wave equation. Further, introduce λ, . This is an inner product in R m .
Hence, the Gram-Schmidt Matrix G = ( e i , e j L ) m i,j=1 ∈ R m×m is invertible.
In the following we present the injectivity results for the Newton derivative DF γ (v, c). The final surjectivity results and uniform boundedness of DF γ (v, c) −1 with respect to γ → 0 and κ > 0 can be found in section 4.5. Combined, these results will allow us to conclude, that the super linear convergence of Theorem 4.8 holds for our control problem at least in the case κ = 0.
We have h i = φ 1,i for all i =ñ + 1, · · · , m and By (35) we have This equation implies the following: where (**) follows by the non-negativity of B * L * LB, i.e.
where we used thath j =h j 1 I1,j =h N ,j for j = 1, · · · ,ñ andh j = 0 for j >ñ. Hence, we have Finally, we have by (40) and the definition of ψ i wherec > 0 is some constant independent of (v, c), and (h, k). This finally concludes the boundedness of DF γ (v, c) −1 ≤c.
As a consequence of Theorem 4.8 -4.13 we have the following result.

Numerics and examples.
In the following sections we present numerical results which illustrate the effect of BV cost on the optimal controls. For the discretization of (W) we used the 3-level finite element method presented in [28]. In particular, we used the Crank-Nicholson method with linear continuous finite elements in time (S τ ) and space (S h ). The resulting discrete solution of (W) is an element in the tensor space S τ ⊗ S h . We discretized the control (v, c) ∈ L 2 (I) m × R m in (P γ ) by S τ elements. Furthermore, we used the trapezoidal rule to evaluate all time-depending integrals in problem (P γ ). The trapezoidal rule guarantees that the function inside the prox operator (see (15)) attains its maximum and minimum in the time nodes we considered for S τ . We used the mass matrix for the space depending integral in (P γ ) with respect to the finite elements in S h . Further details can be found in [13].
In the following sections, we construct two test cases in such a manner that exact analytic solutions for (P ), respectively (P ), become available. We use Algorithm 1, which is a BV-path following algorithm, to approximate numerically the solutions of those examples. The solution of the linear system in Algorithm 1 is approximated by a Krylov iterative method.
A similar path following algorithm is used in the semi-linear parabolic case in [9]. A special aspect about the semi-smooth Newton method inside the BV-Path-Algorithm 1: BV-path following algorithm. Input: Following algorithm compared to the one in [9] is that we consider the derivative and an additional constant as control instead of a BV function. Besides, we have an additional term κ(γ) γ c, which allows us to obtain super-linear convergence for the semi-smooth Newton algorithm for κ = 0, see section 4.4 and 4.5.

Construction of test examples.
This example is constructed in such a way that for the optimal adjoint state a wide range of different sets in Ω T . Furthermore, we have with p 1,i ∞ ≤ α i , and p 1,i ∈ C 0 (I), for i = 1, · · · , m. Control: Let us consider now arbitrary positive measures Due to the continuity of p 1,i the support of µ + i is disjoint from the support of µ − i . The measure D t u 1,i is a positive measure on {p 1,i = −α i } and a negative measure on {p 1,i = +α i }. On the support of µ ± i we have | αi p1,i(s) | = 1. This gives us Let us define u 1,i := u 1,i + c i with u 1,i (t) = t 0 dD t u 1,i (s) and c i ∈ R. State and Desired State: Furthermore, let us fix the desired state according to y d =S(u) − (f · ∂ tt h − h · f ), and some displacement and velocity functions (y 0 , y 1 ) ∈ V × H. For the resulting problem (P ) the function u is the optimal control. 5.2. Finitely many jumps example. This example is constructed in such a way that the set {t ∈ I||p 1 (t)| = α} consists of finitely many active points. A similar construction steps can also be found in [9,Example 1]. Let β > 0, l ∈ N >0 , d ∈ {1, 2, 3}, Ω = (−1, 1) d , I = (0, 2), and define Then ϕ has the property ϕ| ∂Ω ≡ 0, ϕ(2) = 0, and ∂ t ϕ| t=2 (t, x) = β lπ cos(lπt) sin(l π 2 t) + l π 2 sin(lπt) cos(l π 2 t) d i=1 cos( π 2 x i )| t=2 = 0. Using the wave operator ∂ tt − on ϕ(t, x) gives us: By an elementary computation we find It holds that p 1 (0) = p 1 (2) = 0, and p 1 ∈ C 0 (I) with p 1 C0(I) = 4β  where the last equality follows from an elementary computation. We now turn to discuss numerical results. We considered dimension d = 2, and the number of Diracs l = 3. For the desired state y d :=S(u) − (∂ tt − )ϕ(t, x) we used (y 0 , y 1 ) = (0, 0). The optimal constant is fixed by c = 0. The BV-path following algorithm starts with γ 0 = 1, (v 0 , c 0 ) = (0, 0), and we iterate according to γ k+1 = 0.1γ k . We stopped the BV-path following algorithm when γ k = 10 −8 was reached. The function κ is defined as κ(γ) = γ 4 .
In the Figures 1 and 2 we depict the optimal control for two different choices of d.o.f. On the right hand side of each Figure 1 and 2, we see the function p 1,approx := ψ 1 which appear in the prox operator (15). As suggested by (16) we obtain ∂ t u approx = 0 whenever |p 1,approx | < α for the derivative of the approximated optimal control.
In the upper left sub-figure in Figure 1 -2 the red curve depicts the approximated derivative of the approximated optimal control u approx . The blue pin line represents the exact Dirac measures approximated according to the mesh, i.e. for a ∈ R the Dirac measure a · δ t is approximated by a pin in the position t with pin height of a τ with τ the uniform distance between two time nodes. In the lower sub-figure in Figure 1 -2 we see the exact optimal control u in blue, L 2 -projected on V h , and the approximated optimal control u approx in red.
We stopped the semi-smooth Newton algorithm as soon as F γ k (u k ) L 2 (I) m ×R m ≤ 10 −6 =: T OL N . In Figure 3 we show the F γ k (u k ) L 2 (I) m ×R m -error for different γ values which where used in the in Algorithm 1. In Figure 3, we see the errors which correspond 2049 d.o.f. in time. In the last figure on the right we see the error corresponding to Figure 2. As expected, all figures show the super-linearity of the F γ k (u k ) L 2 (I) m ×R m -error.

5.3.
Cantor function or Devil's staircase example. Here we construct functions p 1,i ∈ C 0 (I) which enable us to use all three classes of measures for the distributional derivative of a BV function in time. This means absolutely continuous measures with respect to the Lebesgue measure, countable linear combinations of Dirac measures, and Cantor measures. For further information about these measure characterisation see for example [1, p. 184]. Finally we will use p 1,i to create a Cantor-like optimal control. Let 0 < a 1 < b 1 < a 2 < b 2 < T . Then for all closed non-trivial intervals In the following we denote by P C the set {t ∈ I|p(t) = ±1} = I 1 ∪I 2 . Let us now fix T, a i , b i such that the assumptions above (43) Under these circumstances, we define α i := |z i |. Now, we consider positive measures . Following the instructions in section 5.1 "Construction of Test Examples" an optimal control − → u . The measures µ ± i can be of the types described above.
Our next aim is to construct an optimal control which has a Cantor-like shape. Hence, denote by C(t) the Cantor function on [0, 1] (see [1,Example 3.34]). Define x, on the domain (a 2 , b 2 ). Accordingly we define the continuous function else.
Let us define u i =c i ·u i (t)+c i with c i ∈ R andc i > 0. The distributional derivative of u i has a positive part in I 1 and a negative part in I 2 . The measure D t u i is a Cantor measure with support in P C where D t u + i is supported in I 1 and D t u − i in I 2 . Following the instructions in section 5.1 "Construction of Test Examples" gives us an optimal control − → u . Similarly as above, one can construct functionsp i , for i = 1, · · · , m, such that each of them has finitely many plateaus with different signs.
In our numerical experiment we considered the following parameters: and the optimal control we want to approximate u(t) := 10 · C( t−0.8 2(2.14−0.8) )1 [0.8,2.14] (t) + 5 · 1 [(2.14,2.85)] (t) + 10 · C( 4.2−t 2(4.2−2.85) ) · 1 [2.85,4.2] (t). In Figures 5 and 6 we depict the numerical optimal control for two different choices of d.o.f. In the upper left sub-figure the red curve which is the approximated derivative of the approximated optimal control u approx . The blue curve represents an approximation to the derivative of u by finite differences.
The BV-path following algorithm starts with γ 0 = 1, (v 0 , c 0 ) = (0, 0), and we iterate according to γ k+1 = 0.5γ k . We stopped the BV-path following algorithm when γ k = 3.8 · 10 −6 was reached. The function κ is defined by κ(γ) = 0. We used F γ k (u k ) L 2 (I) m ×R m ≤ 0.5 · 10 −4 as the stopping criterion for the semi-smooth Newton algorithm.  with a D ∈ R, and a N = 0. More detailed discussion can be found in [13]. The work in [13] also includes more general cost functionals which in particular also involve velocity y t : where y u is again defined by (W). The scalar values − → t := ( t 1 , · · · , t r ) represent measurement time points with 0 < t 1 < · · · < t r = T , and r finite. Furthermore it is assumed that at least one β i > 0, that O i , i = 1, 2, 3, are separable Hilbert spaces, and that Π 1 : L 2 (Ω T ) → O 1 , Π 2 : L 2 (Ω) r → O 2 , Π 3 : H −1 (Ω) r → O 3 are linear, continuous operators with adjoints Π * 1 , Π * 2 , and Π * 3 . Let us denote by y gj the weak solution of (W) with forcing g j , j = 1, · · · , m, and (y 0 , y 1 ) = (0, 0). In [13] it is assumed that (Π 1 (y gj )) m j=1 , (Π 2 (y gj ( − → t ))) m j=1 , and (Π 3 (∂ t y gj ( − → t ))) m j=1 are linear independent in O 1 , O 2 , respectively O 3 , for at least one of these sets of vectors where β i > 0. Then the existence of solutions for (P Π ) is shown in [13]. For additional technical assumptions we refer to [13,Section 2.4.1]. As in the case of distributed observations of y u treated in the earlier sections, we again introduce a transformed problem in terms measures in M (0, T ) × R instead of BV -functions in time, and thus arrive at an analogue of (P ) which we call (P Π ). Due to the more complex costs and observation operators in (P Π ), we present next the optimality conditions of (P Π ), c.f. [13,Theorem 2.10]. In this regard, let us first introduce the following time-space depending functions p i for i = 1, · · · ,τ withτ ∈ N >0 , c.f. [13,Lemma 2.8].
Proof. Let us define the linear continuous operators with DF− → u the Gateaux derivative of F in − → u . It has the following form: Hence, (47) implies that Using standard techniques for the sub differential of a convex functional and (49), this implies that: In the following we prove Lemma 4.6: Proof. Firstly, let us present the optimality conditions of (P γ ): The control ( − → v , − → c ) ∈ L 2 (I) m × R m is optimal for (P γ ) if Consider now − → u := ( − → v , − → c ) ∈ L 2 (I) m × R m and the following function for − → p ∈ L 2 (I) m , which we also call the Prox problem. Our aim is to calculate the first-order optimality conditions for this problem, and to find an explicit represen- . For the sub-differential of the non-smooth α i P * i ∂ P i (·) L 1 (I) ⊆ L 2 (I) m with the domain L 2 (I) for the function α i · L 1 (I) . Thus, we have the following first-order optimality conditions for Prox γ i αi · L 1 (I) For (52) this means that − → v ∈ L 2 (I) m is optimal if and only if γ αi (p i − v i ) ∈ ∂ v i L 1 (I) for all i = 1, ..., m, and − → p ∈ L 2 (I) m . Next, let us show that our optimal control can be explicitly written as We can proceed coordinate wise. By the definition of the sub-differential we have the equivalent inequality condition γ α (p − v), v − v L 2 ≤ v L 1 − v L 1 for all v ∈ L 2 (I). By a standard Lebesgue point argument v is an optimal control if and only if