Finite element error analysis for measure-valued optimal control problems governed by a 1D wave equation with variable coefficients

This work is concerned with the optimal control problems governed by a 1D wave equation with variable coefficients and the control spaces $\mathcal M_T$ of either measure-valued functions $L_{w^*}^2(I,\mathcal M(\Omega))$ or vector measures $\mathcal M(\Omega,L^2(I))$. The cost functional involves the standard quadratic tracking terms and the regularization term $\alpha\|u\|_{\mathcal M_T}$ with $\alpha>0$. We construct and study three-level in time bilinear finite element discretizations for this class of problems. The main focus lies on the derivation of error estimates for the optimal state variable and the error measured in the cost functional. The analysis is mainly based on some previous results of the authors. The numerical results are included.


1.
Introduction. This work is concerned with the discretization and numerical analysis of optimal control problems involving a 1D linear wave equation with variable coefficients and controls taking values in certain measure spaces. The combination of variable coefficients and irregular data leads to significant technical problems.
Motivated by industrial applications as well as applications in the natural sciences, in which one is interested to place actuators in form of point sources in an optimal way, see, e.g., [4,9] or in the reconstruction of point sources from given measurements, see, e.g., [34,44], measure valued optimal control problems involving PDEs gained attention in the last years. These problems can be translated into optimization problems in terms of the coordinates and coefficients of the point sources. However, these optimization problem are non-convex since the solution of the state equation (PDE) depends in a non-linear way on the coordinates of the point sources. Thus one has to deal with multiple local minima. Several authors suggested to cast the control problem resp. inverse problem in form of an optimization problem over a suitable measure space M T involving a convex regularization functional R which favors point sources as solutions. In our case we introduce the following problem formulation involving the 1D wave equation with additional initial and boundary conditions. The functional F is given by a quadratic tracking functional involving y| I×Ω , y(T, ·)| Ω and ∂ t y(T, ·)| Ω . The regularization functional R and the control space M T are chosen in a way such that M T contains point sources of the desired form and R promotes controls of such a form, i.e. linear combinations of point sources with time-dependent intensities or more general controls with a small spatial support. Since problem (1.1) is convex, one does not need to deal with several local minima. However, it is not longer guaranteed that the solution consists of a sum of point sources. We enforce such controls via the regularization functional R. Problems of the form (1.1) (also involving other PDEs) have been analysed from theoretical, numerical and algorithmic points of view, see [6, 11-16, 18, 19, 33, 34, 44, 45]. Optimal control problems governed by the linear wave equation were discussed in several different aspects, see [24, 25, 29-32, 35, 36, 40, 41, 52]. In our particular case we consider the control spaces M T of measure-valued functions L 2 w * (I, M(Ω)) and vector measures M(Ω, L 2 (I)) with R(u) = α u M T . These two different choices imply different structural properties of the optimal controls. A typical non-regular element from the space M(Ω, L 2 (I)) is given by (1.2) where δ xi are the Dirac delta functions. Point sources of such type with fixed positions and time-dependent intensities are of interest in acoustics or geology, see [34,44]. If one is interested in controls involving moving point sources of the form u i (t)δ xi(t) , u i ∈ L 2 (I), x i : I → Ω is measurable, (1.3) then the control space L 2 w * (I, M(Ω)) rather than M(Ω, L 2 (I)) is more appropriate. The space M(Ω, L 2 (I)) and the functional · M(Ω,L 2 (I)) are also related to the term directional sparsity resp. joint sparsity, see [22,26].
For the discretization of optimal control problem (1.1), we discretize the state equation by space-time finite element method as introduced in [51]. Related methods are also discussed and analyzed in [1,23], see also [2]. The measure-valued control is not directly discretized, cf. the variational control discretization from [27]. However, there exists optimal controls consisting of Dirac measures in the spatial grid points which can be computed, see also [12,33]. The numerical analysis of the control problem is based on FEM error estimates for the second order hyperbolic equations from [51] and techniques developed in [12,33]. It requires to overcome significant technical difficulties caused by non-smoothness of controls and states. To the best of our knowledge, this is the first paper providing such numerical analysis for the studied control problems.
The problem like (1.1) for a parabolic/heat state equation is analyzed for the case M T = M(Ω, L 2 (I)) in [33] and for the case M T = L 2 w * (I, M(Ω)) in [12]. In particular, in both papers the authors prove existence of optimal controls and derive optimality conditions and FEM error estimates. Our analysis is partly based on these results of [33]. In [34] a problem similar to (1.1) involving the linear wave equation with constant coefficients as state equation is analyzed. In particular, existing regularity results for a Dirac right-hand side are extended to sources from M(Ω, L 2 (I)). Based on these regularity results existence of optimal controls is proved as well as optimal conditions are derived in the 3D case. Now we briefly sum up the contents of this work. First of all we collect and partially prove required existence and regularity results for the linear wave equation in the 1D setting. In particular, we check that the notions of a weaker solution defined in [51] and more commonly used very weak solution, e.g. [38], are equivalent. Most importantly we prove that the solution of the linear wave equation with variable coefficients from H 1 (Ω) for any source term u ∈ M(Ω, L 2 (I)) is an element of C(Ī, H 1 0 (Ω)) ∩ C 1 (Ī, L 2 (Ω)) provided that the initial data have relevant regularity. The proof is based on a non-standard energy type bound in space, not only in time, cf. [21,37]. In [34] the same result is proved for the wave equation with constant coefficients using duality techniques. This proof in [34] provides also corresponding results for multidimensional case but can not be directly extended for treating variable coefficients. This is due to the fact that it uses estimates of the solution of the wave equation in the whole space with a Dirac measure on the right hand-side which are proven using the Fourier-and Laplace-transformation or explicit solution formulas.
The existence of optimal controls and the derivation of optimality conditions are discussed on the basis of results from [33,34]. In the case M T = M(Ω, L 2 (I)) we prove that the optimal controlū belongs to C 1 (Ī, M(Ω)).
Further, the FEM discretization of the state equation is introduced. The state variable y h,τ belongs to the space of bilinear finite elements and is defined by the regularized Galerkin method. The resulting numerical scheme is a three-level method in time (i.e., its main equation relates the approximate solution values at three consecutive grid time levels). Moreover, we pose and prove the FEM error estimates in C(Ī, L 2 (Ω)) for the discrete state equation which we need for the numerical analysis of the control problem. We base this study mainly on the results from [51] concerning error analysis of FEMs for the second order hyperbolic equations in the classes of the data having integer Sobolev or fractional Nikolskii order of smoothness. Note that their sharpness in a strong sense was stated in [50].
Then we consider a semi-discrete optimal control problem in which the continuous state equation is replaced by its discretized version whereas the controls are not discretized. We prove convergence of the discrete optimal controls to the continuous one and derive optimality conditions based on the Lagrange techniques. Most importantly we derive the discrete adjoint state equation. We can conclude that the first-discretize-then-optimize and first-optimize-then-discretize approaches commute. Therefore an analysis of the discrete adjoint state equation including the error estimates in C(Ī ×Ω) and L 2 (I, C 0 (Ω)) can also be based on techniques from [51]. Then we use results from [33] to represent the numerical error of state variable and of the cost functional in terms of FEM errors of the state equation and the adjoint state equation. Letū andȳ be the optimal control and the corresponding optimal state, and the variablesū τ,h andȳ τ,h be their discrete counterparts. As the main result of this paper we prove the error estimates where τ is the step in time, h is the maximal step in space and α = 1/3 for M T = L 2 w * (I, M(Ω)) or α = 2/3 for M T = M(Ω, L 2 (I)). The latter higher order is due to the above mentioned improved regularity results for the state and optimal control. Such estimates are proved for the measure-valued controls in the hyperbolic case for the first time. Similar estimates are impossible in multidimensional settings due to much less fractional Sobolev regularity of optimal states and controls.
Finally we discuss the numerical computation of the discrete controlū h,τ . Based on a control discretization u h,τ that given by the sum like (1.2) with x i at the spatial grid points and u i in the space of linear finite elements, a solution of the semidiscrete control problem can be calculated similarly to [33]. For the actual numerical computation of the optimal control we add the term (γ/2) u 2 L 2 (I×Ω) , γ > 0, to (1.1). This regularized problem is solved by a semi-smooth Newton method, see [43]. In a continuation strategy the regularization parameter γ is made sufficiently small. We complete this work with a numerical example for M T = M(Ω, L 2 (I)).
The paper is organized in the following way. In Section 2 we introduce the problem setting and the control spaces resp. the regularization functionals. Section 3 is concerned with regularity properties of the linear wave equation with variable coefficients in the 1D setting. In Section 4 the control problem is analyzed from a theoretical point of view. Section 5 deals with discretization of the state equation. Then we obtain stability bounds and error estimates for the discrete state equation in Section 6. Section 7 is concerned with the analysis of the semi-discrete optimal control problem. The next section discusses stability bounds and error estimates for the discrete adjoint sate equation. In Sections 9 resp. 10 error estimates for the optimal state and cost functional are derived being the main theoretical results of the study. Section 11 deals with the time stepping formulation of the discrete state equation. In Section 12 we discuss the control discretization with Dirac measures at the grid points. Then we introduce the L 2 (I × Ω) regularized problem and describe its solutions by a semi-smooth Newton method. Finally Section 13 provides a numerical example.
3.1. Weak formulations and preliminary existence, uniqueness and regularity results. In this section we introduce our solution concepts for the state equation (2.1). We begin with defining a weak formulation of (2.1). (3.2) and the initial condition y(0) = y 0 .
The right-hand side in (3.1) is well defined for X = M(Ω, L 2 (I)) too due to embeddings (2.2).

Remark 1.
It is possible (and more common) to suppose that v(T ) = 0 in (3.1) when the last term on the left disappears (for example, see [51]). This leads to an equivalent formulation. To check this, it is enough to replace there v by vβ δ , where β δ (t) = min 1, is the characteristic function of (T − δ, T ). Passing to the limit as δ → 0 with the help of the dominated convergence theorem and the properties of y and v leads to the result.
Another definition of the weak solution is possible.
Hereafter c > 0, c 1 > 0, etc., are independent of y and the data. In the case X = H 1 (I, V * ) there even holds y ∈ C 2 (Ī, V * ) as well as 2. Let (u, y 0 , y 1 ) ∈ X × V 2 × V with X = L 2 (I, V ) or H 1 (I, H). Then the weak solution y satisfies y ∈ C(Ī, V 2 ) ∩ C 1 (Ī, V ) ∩ H 2 (I, H) and In the case X = H 1 (I, H) there even holds y ∈ C 2 (Ī, H) as well as Moreover, y satisfies the equation Item 2 ensures the regularity of weak solution for more regular data. For less regular data (u, y 0 , y 1 ) ∈ L 2 (I, V * ) × H × V * one can use other weak formulations. To state the first of them, we define the integration operator (I t v)(t) := t 0 v(s) ds and its adjoint ( As in the case of Definition 3.1, it is sufficient to take v(T ) = 0 in (3.6), cf. Remark 1. We infer that there are other weak formulations of (2.1) for solutions y ∈ C(Ī, H) ∩ C 1 (Ī, V * ). One can use the concept of very weak solutions.
is called a very weak solution of (2.1).
Actually, these two last solution concepts are equivalent for the considered data spaces. Proof. First of all, we consider the auxiliary integrated in t problem (2.1): for (u, y 0 , y 1 ) ∈ L 2 (I, V * ) × H × V * . Thus, we have I t u ∈ H 1 (I, V * ). According to Proposition 2 problem (3.8) has a unique weak solutionỹ ∈ C(Ī, V ) ∩ C 1 (Ī, H). Moreover, we set y = ∂ tỹ . Thus the weak formulation of (3.8) involvingỹ coincides with the weaker formulation of (2.1) involving y. Furthermore there holds y = ∂ tỹ ∈ C(Ī, H) and Now we take any v ∈ C ∞ (Ī, V 2 ) and test (3.6) with −∂ t v in the role of v: Next we rearrange a term on the left integrating by parts in x and t: Since formula (3.9) implies that by the density of C ∞ (Ī, V 2 ) in L 2 (I, V 2 ) ∩ H 2 (I, H) we find that y is a very weak solution of (2.1). Now let y ∈ C(Ī, H) ∩ C 1 (Ī, V * ) be a very weak solution of (2.1). Then we take any v ∈ C ∞ (Ī, V 2 ) and test (3.7) with I * t v. Thus, we get and then The last equation yields that LI t y = −ρ∂ t y + I t u + ρy 1 ∈ C(Ī, V * ). Thus I t y ∈ C(Ī, V ) and we can transform a term on the left in (3.11) by replacing v by I * t v in (3.10): Then the density of C ∞ (Ī, V 2 ) in L 2 (I, V ) ∩ H 1 (I, H) shows that y is a weaker solution of (2.1).
Moreover, there is the concept of solutions by transposition. Proof.  3.2. Existence and regularity of the state. In this section we study the existence, uniqueness and regularity of solution of the state equation for measure valued source terms. We will carry out the analysis for both control spaces. We use the distinct properties of each space in order to show improved regularity of the state.
3.2.1. The control space M(Ω, L 2 (I)). The space M(Ω, L 2 (I)) is not so broad as L 2 w * (I, M(Ω)) and contains no moving point sources but contains the standing δsources (1.2). Therefore, we expect that the state has better regularity properties in this case and prove that y ∈ C(Ī, V ) ∩ C 1 (Ī, H). The proof will be based on a priori bound and a density argument. First we state the following density result.
Proof. We denote by X the locally convex space M(Ω, L 2 (I)) endowed with its weak-star topology and define the absolutely convex set Assume that (3.14) is wrong. Then there exists u 0 ∈ M(Ω, L 2 (I)) with u 0 M(Ω,L 2 (I)) = 1 such that u 0 ∈Ē whereĒ is the closure of E in X. Owing to the corollary of a theorem on the separation of convex sets [28, Ch. III, that contradicts (3.15). Thus u 0 ∈Ē. Since C 0 (Ω, L 2 (I)) is separable, the weak-star topology on E is metrizable. Therefore the closure of E is equal to its sequential closure, see [8, Theorem 3.28, Corollary 3.30].
Note that clearly C ∞ c (Ω, L 2 (I)) ⊂ L 2 (I, V ). Preliminarily we prove the following crucial a priori bound. Lemma 3.8. Let (u, y 0 , y 1 ) ∈ L 1 (Ω, L 2 (I)) × V × H and y be the corresponding strong solution of problem (2.1). Then y satisfies the following a priori bound Proof. We first remind the energy equality for problem (2.1) We also multiply the equation in (2.1) by −2κ∂ x y and integrate over I. Integration by parts in t yields the equality We define a function P := ρκ ∂ t y 2 taking the modulus and integrating over any (a, b) ⊂ Ω we derive Let x 0 ∈Ω be such that P C(Ω) = P (x 0 ) hold and let now [a, b] x 0 . Then the mean value theorem for integrals implies By the above definitions we clearly have Inserting (3.19) into (3.20) and using (3.21), we obtain Owing to (3.17) we can write Using this in (3.22) and choosing a small enough (a, b) such that Inserting the last bound in (3.23), we also get . Finally, this yields bound (3.16). Remark 3. Lemma 3.8 remains valid for ρ, κ ∈ W 1,1 (Ω). Owing to the absolute continuity of the Lebesgue integral we have ∂ (3.22) and below in the proof.
Proof. 1. Let first u = 0. According to Proposition 2 were exists a unique weak solution y of (2.1) for any y ∈ V × H and it satisfies 2. Now it suffices to consider the case . Then according to Proposition 2 there exists a unique weak solution y ∈ C(Ī, V ) ∩ C 1 (Ī, H) of (2.1) and it satisfies bound (3.4). Moreover, it is also a weaker solution.
So it remains to prove the bound To this end, according to Lemma 3.7 we approximate u by functions {u n } ⊂ L 2 (I, V ) satisfying (3.14). The strong solution y n of (2.1) corresponding to u = u n satisfies the bound like (3.16) and in particular Therefore there exists a subsequence of {y n } (not relabeled) andỹ ∈ L ∞ (I, V ) ∩ W 1,∞ (I, H) such that y n converges toỹ in the weak-star sense of L ∞ (I, V ) ∩ W 1,∞ (I, H). This is sufficient to pass to the limit in the last bound and in (3.6) for y = y n , u = u n and v(T ) = 0, see Remark 1. Thusỹ both satisfies the bound and is a weaker solution of (2.1). Due to its uniqueness there holdsỹ = y, and bound (3.26) is proved. 3. Let now u ∈ M(Ω, L 2 (I)) and y be the corresponding weaker solution of (2.1), see Proposition 3. The space M(Ω, H 1 (I)) is dense in M(Ω, L 2 (I)), cf.
[34, Proposition 2.1]; this also holds since M(Ω, L 2 (I)) is the projective closure of the tensor product between M(Ω) and L 2 (I), see [46], and H 1 (I) is dense in Then we pass to the limit in (3.1) for y = y n , u = u n and v(T ) = 0 and see thatŷ is a weak solution of (2.1). Due to uniqueness of the weaker solution we getŷ = y, and the proof is complete.

Some function spaces and embeddings. We set
and introduce the interpolation spaces for non-integer λ ∈ (−1, 3) using the real K λ,q -interpolation method of Banach spaces for q = ∞, see [3]. Recall that the value q = ∞ leads to the broadest intermediate spaces. Their explicit description in terms of the Nikolskii spaces or their subspaces is known, see [42,48,49]. In particular, and ow is the odd extension of w with respect to x = 0 and L from Ω toΩ := (−L, 2L). Hereafter equalities of Banach spaces are understood up to the equivalence of their norms. It is well known that the spacẽ H 1/2,2 (Ω) contains discontinuous but piecewise C 1 -functions.
In addition, let D x be the distributional derivative and, for a Banach space B(Ω) ⊂ L 1 (Ω), B ⊥ (Ω) denote the subspace of W ∈ B(Ω) with the mean value (Ω) equipped with the norm w H −1/2,2 (Ω) = W H 1/2,2 (Ω) . Then H (−1/2) = H −1/2,2 (Ω), see Item 3 of the proof of Lemma 3.10 below. (Actually a quite similar result is valid for H (λ) for any −1 < λ < 0.) Note that, in particular, the Dirac Here H 1/2,0;2 (Q) is a particular anisotropic Nikolskii space (of the order 1/2 in x only) and SHW 1/2,1;2 (Q) is a particular space of functions having the dominating mixed smoothness (of the order 1/2 in x in the Nikolskii sense and 1 in t in the Sobolev sense). Note that Note that all the spaces defined above and below in this subsection are Banach ones.
The next technical lemma plays an essential role below.
The following equalities hold The elements in L 2 (I, V * ) and L 2 (I, H) = L 2 (Q) can be uniquely identified as Taking into account that one and the same operator establishes the one-to-one correspondence between respectively three spaces involved in equalities (3.32) and (3.27), the latter one is valid too.
3. The following equalities hold which are simpler 1D versions of (3.31)-(3.32), for example, see [3,48] and [49, Ch. Notice that the following inequalities hold for any W ∈ N BV (Ω); the definition of the Riemann integral implies the latter one. Then for any W ∈ N BV ⊥ (Ω) we get the inequalities where the equality is valid due to the classical Pettis theorem [20,Theorem 8.15.2] (since V * is separable), and w = D x W with W ∈ L 2 ⊥ (Q) and W L 2 (Q) ≤ c w L 2 w * (I,M(Ω)) . Moreover, we have W (t) ∈ N BV (Ω) for a.e. t ∈ I. By applying (3.37) to W (·, t), omitting sup 0<h<L on the left, integrating the squared result over I and taking back sup 0<h<L on the left, we obtain that completes the proof of embedding (3.29).
according to embedding (3.29). Moreover, define the forward difference quotients in time ∆ Then for the same t and τ owing to the first inequality (3.36) we get Therefore Consequently there exists the derivative ∂ t W = Z ∈ L 2 (Q), and inequality (3.38) implies embedding (3.30).
We also set V 0 (Q) = L 2 (Q) and define the anisotropic Sobolev subspaces Proof. The control-to-state operator S is weak-star-to-strong sequential continuous, i.e., if {u n } ⊂ M T and u n * u in M T , then Su n → Su in Y. The proof of this continuity property is similar to [34,Lemma 6.1] in the case of solutions by transposition resp. very weak solutions. The strong continuity follows from the compact embeddings and well known Aubin-Lions-Lemma. Then the direct method of calculus of variations combined with the sequential Banach-Alaoglu theorem (C T is separable) can be applied to show existence of an optimal control. Additionally the control is unique since the control-to-state operator S is injective and the data tracking functional is strictly convex.
Owing to Proposition 3 the optimal controlū ∈ M T satisfies the inequalities 2) Hereafter C > 0 depends on the norms of data.
Next we discuss first order optimality conditions. We introduce the adjoint control-to-solution operator S : Y → C(Ī, V ) → C T , (φ, p 1 , p 0 ) → p where p is a weak solution of (3.13). This operator is well defined and bounded according to Proposition 2.
We also need the operator A −1 : 3) The next result provides the necessary and sufficient optimality condition for the optimal pair (p,ū).
The subgradient condition in Proposition 6 implies the following conditions.

Proposition 7.
Letū ∈ M T be the optimal control of (P) andp ∈ C T be the corresponding adjoint state. Then there holds p C T ≤ α. Proof. A detailed discussion of the proof of these results can be found in [12,33].
The regularity of the adjoint statep is now applied to show improved regularity of the optimal controlū.
We approximate the state variable y by y ϑ ∈ V ϑ := V τ ⊗ V h ⊂ H 1 (I, V ) and additionally ∂ t y(T ) by y 1 T h ∈ V h . For (u, y 0 , y 1 ) ∈ M T × H × V * the discrete state equation has the following form involving the indefinite symmetric bilinear form with the grid independent parameter σ, cf. (3.1). This definition follows [51] but notice carefully that normally y ϑ is uniquely defined by (5.1) with v(T ) = 0 and (5.2). To treat general v, we need y 1 T h . Remark 4. The second term on the right hand-side of (5.3) regularizes the Galerkin (i.e. projection) method with respect to bilinear form (3.2). It is included to ensure unconditional stability for suitable values of σ. Moreover, the term is the error term of the compound trapezoidal rule applied for the calculation of the temporal integral in (κ∂ x y, ∂ x v) L 2 (I×Ω) . So that, in particular, for σ = 0 in (5.3) this temporal integral is calculated using this rule whereas for σ = 1/6 it is not approximated.
Next we recall the inverse inequality where the least constant satisfies c 1 h −1 ≤ α h ≤ c 2 h −1 for the quasi-uniform grid. For σ ≤ 1/4 we need to state conditions linking the temporal and spatial grids to ensure stability of the numerical method.

Assumption 1.
In what follows, let Remark 5. The parameters ε 0 and ε 1 can be chosen arbitrarily small but then constants in the stability and error estimates for our FEM can tend to infinity.

Remark 6.
As we see below in Section 11, the method is related to well known time-stepping methods, in particular, to the explicit Leap-Frog-method for σ = 0. Then conditions (5.5) and (5.6) reduce to a CFL-type one τ α h ≤ 2 1 − ε 2 0 . For σ = 1/4 the method is related to the Crank-Nicolson scheme and is unconditionally stable but in a weaker norm than we need to derive our error estimates so that we impose a very weak CFL-type condition τ α h ≤ 2/ε 1 .
Below in proofs we utilize the auxiliary squared norms We need to bound them by standard norms.  Now we discuss some properties of y 1 T h and ∂ t y(T ) that are essential below. Proposition 8. Let (y ϑ , y 1 T h ) ∈ V ϑ × V h be the solution of (5.1)-(5.2). Then there holds Proof. This is proved by testing (5.1) with time constant The non-local in time identity (5.8) is convenient for our error analysis but not for the implementation; for the latter issue see Section 11. Identities similar to (5.8) also hold on the continuous level.
The density of V 2 in V implies (5.10).
Next we define the operator 3) with w = A −1 f , and the norm in V * κ and its discrete counterpart can be written as Then by the standard FEM error analysis [7] and operator interpolation theory we have 6. Stability and error estimates for the discrete state equation. In this section we present error estimates for the state equation. We begin with an auxiliary result.

Once again we apply [51, Theorem 4.1] and first get the estimate
Combining it together with (6.11), we derive In this proof, we apply this estimate in the case u = 0 only (but in general case below). In the remaining case y = 0, from [51, Theorem 4.1] we also get the higher order error estimate (6.14) Moreover owing to Proposition 3 and bound (6.5) (both for y = 0) we have The last bound and estimate (6.14) imply by the K 1/2,∞ -method and equality (3.28): for any u ∈ SHW −1/2,1;2 (Q). Applying inequality (6.12) we complete the proof.
7. Discrete control problem. First we introduce the discrete mappinĝ S ϑ : (u, y 0 , y 1 ) → (y ϑ , y ϑ (T ), ρy 1 T h ) and the discrete affine linear control-to-state mapping The former mapping is bounded due to e ϑ m,n ∈ C T and the latter one is finite dimensional. Thus S ϑ is a bounded operator. Then we consider the following semi-discrete optimal control problem with the squared semi-norm corresponding to the inner product Using the similar argument as in the continuous case it can be shown that (P ϑ ) has a solutionū ϑ which is not unique in general, and due to the optimality, the stability bound (6.5) and property (5.16) (for λ = −1) one gets cf. (4.1), and consequently Theorem 7.1. Let z ∈ Y, y ∈ V ×H (−1/2) andū,ū ϑ ∈ M T be the optimal controls of respectively problems (P) and (P ϑ ). Then there holds Proof. Owing to (7.1) there exists a sequence {ϑ n }, ϑ n → 0, and u ∈ M T such that u ϑn * u in M T as n → ∞. This implies the limit relation To prove it, we write the chain of inequalities The first term on the right in the last inequality converges to zero according to the error estimate (6.6). The convergence of the second term follows from the weakstar-to-strong continuity of S : M T → Y and the stability of π 1 h in V . Finally, property (5.13) forw = w implies the convergence of the last term.

Then (7.2) and the weak-star lower semicontinuity of
Thus, the uniqueness ofū means that u =ū and in addition implies the convergence of the whole sequenceū ϑ * ū in M T as ϑ → 0. Moreover, we have j ϑ (ū ϑ ) → j(ū). This and (7.2) In the following the directional derivative of a functional g : M T → R at u ∈ M T in direction δu ∈ M T is denoted by Dg(u)δu. In the case Dg(u) ∈ M * T , g is the Gateaux differentiable in u. Moreover, we make use of the convex subdifferential of · M T . Letû ∈ M T and p ∈ C T . Then there holds p ∈ ∂ û M T if and only if An elementū ϑ ∈ M T is an optimal solution of (P ϑ ) if and only if −D( ) for u ∈ M T , we apply the Lagrange technique and define the Lagrange functional by Therefore the discrete optimality system consists of the discrete state equation the discrete adjoint state equation and the discrete variational inequality

Stability and error estimates for the discrete adjoint state equation.
We define the general discrete adjoint state equation Here y is the solution to the state equation (2.1). Clearly identity (8.2) means simply that 1. If y ∈ C(Ī, H) ∩ C 1 (Ī, V * ) and z ∈ Y, then the following stability bound holds If u ∈ L 2 (I, V * ), z ∈ Y and y ∈ H × V * , then the following error estimate holds By applying also the counterpart of inequalities (6.9) we derive bound (8.3).
2. The counterpart of the error estimate (6.13) for the adjoint state equation case and bound (8.7) give Owing to inequality (6.12) and Proposition 3 we obtain estimate (8.4).
4. First notice that the multiplicative inequality (8.9), Proposition 2 (2) (applied for the adjoint state problem) and property (5.16) imply another error estimate for the time interpolation Then Proposition 2 (1) leads to Next we derive the error estimate According to [51, Theorem 5.3 and estimate (5.18)] and equality (3.39) for = 1 together with Propositions 3 and 2 the following three estimates hold and forp ϑ (T ) = π 0 h A −1 q T (for the same reason as above). Then applying the K 1/2,∞ -method to the two last estimates and using equality (3.28) we get By combining this estimate and (8.15) we obtain (8.14).

Proposition 12.
Let z ∈ Y and y ∈ V × V * . Then the following estimate holds

PHILIP TRAUTMANN, BORIS VEXLER AND ALEXANDER ZLOTNIK
Proof. We recall thatp = S W (Sū − z) andp ϑ = S ϑ W h (S ϑūϑ − z) and test the continuous subgradient condition (4.5) with the discrete optimal controlū ϑ and the discrete subgradient condition (7.5) with the continuous optimal controlū. Then we subtract the first inequality from the second one and get We definep ϑ := S ϑ W h (Sū − z), insert it betweenp andp ϑ and obtain For convenience we introduce the variables (ŷ ϑ ,ŷ ϑ (T ), ρŷ 1 T h ) = S ϑū and remark that the state equations for (ȳ ϑ ,ȳ 1 T h ) and (ŷ ϑ ,ŷ 1 T h ) have the same initial data. With the help of them we rewrite the second term on the right in (9.2) taking first the difference of the discrete state equations (7.3) and (5.1) (taken for (ŷ ϑ ,ŷ 1 T h )) for v =p ϑ −p ϑ , next the difference of the discrete adjoint state equations (7.4) and (8.1)-(8.2) (taken forp ϑ ) for v =ȳ ϑ −ŷ ϑ and ϕ =ȳ 1 T h −ŷ 1 T h and finally using (5.15) Further we easily get Finally by applying bounds (4.2) and (7.1) we derive (9.1).
This proposition is important since it allows one to derive estimates forȳ −ȳ ϑ with the help of the above error estimates for the discrete state and adjoint state equations. Theorem 9.1.

Remark 10.
Note that our error bounds could be better provided that one would improve the last term on the right in (9.1) by increasing the power 1/2. But this seems a complicated problem.
10. Error estimate for the cost functional. In this section we derive error estimate for the cost functional. We first observe the inequalities which can be equivalently rewritten in the form Therefore, to bound |j(ū) − j ϑ (ū ϑ )| below we apply the following result.
11. Time-stepping formulation. In this section we discuss the time-stepping formulation of the discrete state equation (5.1)-(5.2) and the discrete adjoint state equation (7.4). We introduce the piecewise-linear "hat" functions such that for (t, x) ∈Ī ×Ω with y m,j ∈ R, y h m ∈ V h and y τ j ∈ V τ . We also define the forward and backward difference quotients and the average in time operator We define the self-adjoint positive-definite operators B h and L h acting in V h (in other words, the mass and stiffness matrices) such that

PHILIP TRAUTMANN, BORIS VEXLER AND ALEXANDER ZLOTNIK
We recall the form of the discrete state (11.1).
The forward time-stepping is implemented as follows. The integral identities (5.1)-(5.2) are equivalent to the operator equations

3)
B h y ϑ,0 = (ρy 0 ) h (11.4) followed by the counterpart of (11.3) at time T for y 1 T h : Next the adjoint (backward) time-stepping is implemented in a similar manner. Namely, the integral identities (7.4) are equivalent to the operator equations followed by the counterpart of (11.5) for p 1 0h : Remark 12. For σ = 1/4 the three-level time stepping scheme (11.2)-(11.5) is closely related to the well-known two-level Crank-Nicolson method applied to the first order in time system see [51,Section 8] for details, as well as to the Petrov-Galerkin method described in [32]. After the mass lumping, for σ = 0 our method becomes explicit and is related to the Leap-Frog method; moreover, for any σ it becomes close to threelevel finite-difference schemes with such weight in time, eg. see [47].
12. Control discretization. Solution process and L 2 (I × Ω)-regularization. Now we discuss in more detail solving of the semi-discrete optimization problem (P ϑ ) in the case M T = M(Ω, L 2 (I)). An important point is that we can seek its solution in the form To show that, let π 0 τ be the projector in L 2 (I) on V τ . Note that, for η ∈ L 2 (I), it satisfies w, e h j Ω δ xj and Π ϑ = π 0 τ Π h . The following identity holds and consequently (like in [33, Lemma 3.11]) we have S ϑ = S ϑ • Π ϑ as well as Π ϑ u M T ≤ u M T . Thus for each solutionũ ϑ of problem (P ϑ ), the discrete control Π ϑũϑ satisfies j ϑ (ũ ϑ ) = j ϑ (Π ϑũϑ ). Therefore Π ϑũϑ is also a solution of (P ϑ ). This is a justification for solving the fully discrete problem in order to get a solution of (P ϑ ). The direct solution of (12.1) by means of a generalized Newton type method is a challenging problem since a proper globalization strategy is needed, see [39]. Thus we propose a solution strategy based on an additional L 2 (I × Ω)-regularization of (12.1) with a parameter γ > 0 and a continuation method. For high values of γ the corresponding Newton type method converges independently of the initial guess in numerical practice. Thus the continuation strategy can be seen as simple globalization strategy.
On the continuous level we consider the following regularized problem 2) It is possible to formulate a semi-smooth Newton method for this problem on the continuous level which is based on the following necessary and sufficient optimality condition 3) withp = S W h (Sū γ − z). Moreover, this semi-smooth Newton method is superlinear convergent. Letū γ andū be the unique solutions of (12.2) and (P). Then we haveū γ * ū in M(Ω, L 2 (I)), see [26,33,43]. This justifies the use of a continuation strategy in γ. The control discretization described above can not be used for (12.2). Instead we propose to use discrete controls from V ϑ , i.e.,

PHILIP TRAUTMANN, BORIS VEXLER AND ALEXANDER ZLOTNIK
The use of D allows us to derive the following optimality conditions for (12.4) u γ m,j = − 1 γ max 0, 1 − α p ϑ ·,j L 2 (I) p ϑ m,j , (12.5) for all m and j, withp ϑ = S ϑ W h (S ϑūϑ −z), cf. (12.3). Based on (12.5) we can set up a semi-smooth Newton method. Since problem (12.4) is a discretization of (12.2), we can expect that this method behaves mesh independently. Letū γ ϑ = N −1 j=1 u j (t)e h j be the solution of (12.4) and we definẽ As γ → 0 the controlũ γ ϑ tends to a solution of (12.1) justifying the use of this control discretization and the continuation strategy. For more details see [43]. 13. Numerical results. In this section, we present results of numerical experiments and consider two examples both involving zero initial data y 0 = y 1 = 0, the control space M T = M(Ω, L 2 (I)) and the tracking functional with the time independent desired state z which is a Gaussian centered at x = λ. We choose ρ = 0.1 and λ as an irrational parameter. For sufficiently large α (α = 0.1), we expect that the optimal controlū consists of one point source with a position close to λ. If the Gaussian would move through the domain, a point source shapedū is not able to follow the center of the Gaussian since M(Ω, L 2 (I)) contains no moving point sources. The optimal control would rather consist of some additional fixed point sources. This would not lower the regularity of the state whereas a moving point source can cause it.
The domain Ω and the time segmentĪ are discretized by the uniform grids for N = 2 r h and M = 2 rτ where r τ , r h = 2, 3, . . . , r max with r max = 10. The stability parameter is fixed to its lowest value σ = 1/4 ensuring unconditional stability of the time-stepping method. The discrete control problem is solved for r h = 2, 3, . . . , r max and the fixed r τ = r max and then vice versa. The solution process has been described above in Section 12. Numerically the desired state z is replaced by i h z for simplicity, moreover the corresponding error O(h 2 ) is negligible. Since the optimal pairs (ū,ȳ) are not known in our examples, we replace them by reference solutions (û,ŷ) which are taken as the approximate solutions on the finest grid level. Example 1. We first take the constant coefficients ρ ≡ 1 and κ ≡ 1 and set λ = π/20. We depict the reference solution (û,ŷ) in Figure 1.
As expected, the optimal controlû consists only of one point source positioned in the vicinity of λ. Thus, the stateŷ has a kink at this position. Due to reflections at the boundary,ŷ has also kinks at other positions.
Next, we discuss the convergence results. In Figure 2, we see the convergence rate of ȳ σ −ŷ L 2 (I×Ω) (left) and the objective functional (right) as h refines. The state error behaves mostly in a linear way and the rate for the functional is close to two; as usual the latter is approximately the doubled rate of the former, and fortunately both are better than the above proved theoretical rates. In Figure 3, we see the similar results as τ refines. The error of the functional stagnates at the last τ refinement that is caused by a too coarse space grid. Nevertheless, we observe reduced rates forŷ much less than two caused by its reduced regularity (kinks). Example 2. Now we take the variable coefficient and set λ = π/6. Our analysis does not cover discontinuous coefficients, but they are of great importance in applications, for example, in seismic tomography. A jump discontinuity in κ translates to a jump in the wave speed which can be related to two different material characteristics changing at the point of discontinuity. Note that the point of discontinuity is a grid point for all grid levels. The reference solution (û,ŷ) is displayed in Figure 4. Once againû consists of one Dirac measure with a time-dependent intensity located in the vicinity of λ = π/6 and thusŷ has a kink at this position. Moreover, we can clearly see that at x = 0.25 the wave speed changes and the wave propagation becomes slower. In Figure 5 we observe that the error of the state variable converges in a linear way whereas the error measured in the objective functional behaves quadratically. Finally in Figure 6 we study the error behaviour for τ -refinement and find the similar rates of convergence. So somewhat surprisingly we find that the convergence behavior of the error is comparable to the previous Example 1 with κ ≡ 1 that stimulates further possible studies.