Multiwave tomography with reflectors: Landweber's iteration

We use the Landweber method for numerical simulations for the multiwave tomography problem with a reflecting boundary and compare it with the averaged time reversal method. We also analyze the rate of convergence and the dependence on the step size for the Landweber iterations on a Hilbert space.


Introduction
In this paper, we apply the well known Landweber iteration method for the multiwave tomography problem with reflectors. We also compare it to the Averaged Time Reversal (ATR) method developed by the authors in [33] leading to an inversion of a Neumann series. We also present a convergence analysis for the Landweber method for linear problems in Hilbert spaces. Even though the Landweber iteration method is widely used and well studied, we prove some results that we have not found in the literature. In numerical inversions, we always work in finite dimensional spaces, but there are good reasons to understand the method better in the continuous model first, and then to understand how well the discretization approximates the continuous model. As we demonstrate in this work, for the inverse problem we study involving wave propagation, the discrete model based on a second order finite difference scheme is always unstable regardless of whether the continuous one is stable or not. On the other hand, stable problems in our case behave in a stable way numerically, even though they are not (when discretized). We analyze the convergence rates of the iterations, their relation to the a priori stability of the problem, of the spectrum of the operator and on the spectral measure of the object we want to recover, and present numerical evidence of those results.
While there are more advanced computational inversion methods for general linear or non-linear inverse problems, including such with regularization, our goal is not to develop "the best one". Without real life data and understanding the real measurements challenges, that would be not so useful anyway. We would like to have methods that would allow us to test the mathematics of the problem: how the geometry affects the stability, for example; and partial data on the boundary. We are also interested in how the discretization and the numerical solver to model the PDE affects the problem, regardless of how it is solved after that.
The multiwave tomography problem with reflectors (the term "cavity" is often used, as well) we study models the following. We send one type of waves to an object we want to image, like a part of the human body like microwave radiation (in thermoacoustic tomography), laser rays (in photoacoustic tomography) or elastic waves. Those waves have a low resolution (the laser ray would diffuse inside) but they are partly absorbed by the cells, which cause them to emit ultrasound waves of high enough frequency allowing for a high resolution. The emitted sound waves are measured on the boundary, or on a part of it, and we want to recover the source; after that, we want to recover the absorption rate. We do not study the latter problem here.
This problems is well studied under the assumption that the waves travel freely away from the body, see, e.g., [13,21,22,23,38,30,31,26,32,7,8,5] and the references there. In some experimental setup, see [10], reflectors are placed around the body and the measurements are made on a part of it. The mathematical model of this is presented below in (2.1), first studied in [24,17,7,8]. In [33] and [25], see also [1], two different stable ways to solve the problem were proposed, both based on exponentially convergent Neumann series, expanding some ideas introduced first in [30]. The inversion in [1,25] works for the case of partial boundary data as well; while in this case, in [33], we know that it gives a parametrix for partial boundary data but there is no proof yet that in converges to a solution. We tested it in that case anyway, and we also do it in this paper. Recently, Acosta and Montalto [2] studied a model taking account attenuation, see also [18].
Landweber iterations for the whole space problem were done recently in [7,8] and compared to the sharp time reversal method in [30] implemented numerically in [26]. Another recent work is [5], where several different optimizations are compared, based on different regularization terms.

The model and known theoretical results
Let Ω ⊂ R n be a bounded open subset. Consider where ∂ ν is the normal derivative. One could consider a Riemannian metric g and then ∆ would be the Laplace-Beltrami operator, see [33] and section 6. The measurement operator is (2.2) L : f → u| (0,T )×∂Ω .
The goal is to inverse L and recover f . In the partial data problem, we are given where Γ ⊂ ∂Ω is a relatively open subset. We recall some uniqueness and stability results about this problem, see also [33]. Uniqueness follows from unique continuation [34]. The sharpest time T for uniqueness if given by (2.4) T 0 := max where dist(x, Γ) is the distance between a point and a set in the metric c −2 dx 2 . If T < T 0 , Lf determines f uniquely only at the points staying at distance less than T from Γ. If T > T 0 , there is a global uniqueness. A sharp stability condition follows from the Bardos, Lebeau and Rauch [6]. In view of the intended applications, we would assume that ∂Ω is strictly convex and that supp f is compactly supported in Ω but [6] covers the general case. (a) We say that the stability condition is satisfied if every broken unit speed geodesic γ(t) with γ(0) ∈Ω 0 has at least one common point with Γ for |t| < T , i.e., if γ(t) ∈ Γ for some t ∈ (−T, T ).
(b) We call the point (x, ξ) ∈ T * Ω 0 \ 0 a visible singularity if the unit speed geodesic γ through (x, ξ/|ξ|) has a common point with Γ for |t| < T . We call the ones for which γ never reachesΓ for |t| ≤ T invisible ones.
We denote by T 1 = T 1 (Γ, Ω) the least upper bound of all T for which the stability condition is satisfied.
As mentioned in [33], the visible and invisible singularities do not cover the while T * Ω 0 \ 0 since some rays may hit ∂Γ or t = T . The complement of those two sets is of measure zero however.
Visible singularities can be recovered stably, and non-visible cannot. We will not give formal definitions (see, e.g., [29]). In numerical computations, this means that visible singularities, like edges, etc., can in principle be recovered well (the would look "sharp") but the actual reconstruction may require some non-trivial efforts. Non-visible singularities cannot be recovered well and the typical way to deal with this is to recover some regularized version of them, like blurred edges.
If the stability condition is satisfied, them one has an H 1 → H 1 stability estimate [33]. The proof of this follows from the fact that L is an FIO of order 0 associated with a local diffeomorphism. That diffeomorphism can be described in the following way. For every (x, ξ) ∈ T * Ω 0 \ 0, we take the geodesic γ x,ξ/|ξ| (t), where |ξ| is the norm of the covector ξ at x in the metric c −2 dx 2 . We also identify vectors and covectors by the metric. When γ x,ξ/|ξ| hits ∂Ω, we take that point and the projection of the tangent vector on ∂Ω. That correspondence is a local diffeomorphism, and here it is essential that we have guaranteed that γ x,ξ/|ξ| hits ∂Ω transversely. By the stability condition, such a contact with ∂Ω exists. After that, we reflect the geodesic by the usual law of reflection, and look for another contact, etc. We consider both positive and negative t. That would give us a multi-valued map but locally (near the starting singularity and locally near the image), each branch is a diffeomorphism. Then L is a restriction of a first order elliptic FIO (say, the same operator but defined on a larger open set containing the closure of (0, T ) × Γ), to (0, T ) × Γ. In particular, that means that it is bounded from L 2 (Ω 0 ) to L 2 ((0, T ) × Γ). We consider L 2 (Ω 0 ) as subspace of L 2 (Ω). The stability condition guarantees that at least one of the contacts with ∂Ω is interior for that set, and this is enough for building a microlocal parametrix. This, together with the uniqueness result, implies the following theorem. We refer to [29] for a more general case, and to [6] for this particular one.
Theorem 2.1. Let ∂Ω be strictly convex and fix Ω 0 Ω, an open Γ ⊂ Ω and T > 0. Then if the stability condition is satisfied, i.e., if T > T 1 , then there exist 0 < µ ≤ C so that In fact, in the reconstructions, we work in the topologically equivalent space L 2 (Ω 0 , c −2 dx). This is the measure appearing in the energy Ω (|∇f 1 | 2 + |f 2 |c −2 ) dx; and our numerical simulations show that the error is smaller if we use that measure in the definition of L * later.
If there are invisible singularities, then the estimate above cannot hold with µ > 0 and it fails even if we replace the L 2 norm by weaker Sobolev ones.
We review the Averaged Time Reversal method in section 6. One of the essential differences is that there, the energy space is (topologically equivalent to) H 1 (Ω) while in the Landweber iterations we choose to work in L 2 (Ω). We could do Landweber iterations considering L as an operator from H 1 0 (Ω) to H 1 ((0, T ) × Γ), as well or even as an operator from H 1 0 (Ω) to L 2 ((0, T ) × Γ). In fact, we tried the latter numerically. Then L is a smoothing operator of degree 1 (an elliptic FIO of order −1 associated with a local diffeomorphism), and this makes the problem always unstable in those spaces. The reconstructions we get are very blurred as one would expect.
Finally, if one wants to achieve some regularizing effect with a regularizing parameter, similar to Tikhonov regularizations, one could introduce the following H 1 norm where ε > 0 is a parameter. Then we think of L as the operator L : H 1 ε (Ω) → L 2 ((0, T ) × ∂Ω), restricted to functions supported in Ω 0 . This replaces L * in the inversions by (1 − εc 2 ∆ D ) −1 L * , where ∆ D is the Dirichlet Laplacian in Ω. Another modification would be to keep L 2 (Ω, c −2 dx) but to replace L 2 ((0, T ) × ∂Ω) by a certain H −1 space with a parameter. This would put a low pass frequency filter to the right of L * rather than to the left. One such reconstruction is shown in Figure 12.
The plan of this work is the following. We review the Landweber iterations in section 3 and study the convergence or the divergence, the dependence on the step γ, etc. In section 4, we compute the adjoint of the measurement operator L with full or partial data. We present numerical examples in section 5. A review of the averaged time reversal method is included in section 6.
Acknowledgments. The authors thank Francois Monard and Jianliang Qian for their advice during the preparation of this work. Reference [41] was pointed out to the authors by Francois Monard.

Landweber iterations
As explained in the Introduction, one of the goals of this paper is to analyze further the Landweber iterations method for linear inverse problems, its relation to the stability or the instability of the problem and to the power spectrum of the function we want to recover (defined as the differential dµ f (λ) of P λ f 2 , where P λ is the spectral projection of L * L). We also study the effect of the choice of the step γ in the iterations and how they are influenced by noise or data not in the range. We do not study regularized versions of the iterations for severely ill posed linear problems, or non-linear problems. We do not study stopping criteria and their effect on the error, either. For analysis of the linear Landweber method, see, e.g., [36,37,15,9,20] and the references there. For the non-linear one, see, e.g., [16]. Some of the analysis below can be found in the literature, for examaple in [15], but we did not find all of it in the literature.
3.1. Landweber Iterations. Let L : H 1 → H 2 be a bounded operator where H 1,2 are Hilbert spaces. We want to solve the linear inverse problem of inverting L. Assume no noise first. Then we write The operator K := I − γL * L is self-adjoint with spectrum in the interval where µ ≥ 0 is any of the stability constants in the stability estimate which might be zero if there is no stability or even injectivity. We choose µ to be the largest number with that property, i.e., µ 2 is the bottom of the (closed) spectrum of L * L. If L * L has a discrete spectrum, then µ would be the smallest singular value. In other words, L and µ are the sharp constants for which Then (3.3) is the smallest closed interval containing the spectrum spec(K) of K but the spectrum itself could have gaps. By (3.3), we see that K is a strict contraction ( K < 1) if and only if i.e., In other words, γ has to be small enough and the problem has to be stable. Then f can be reconstructed by the following Neumann expansion which converges uniformly and exponentially because This implies the following scheme.
Landweber Iterations. Set f 0 =0, The scheme is used even when the data m is "noisy", i.e., when m ∈ Ran L, or when there is no uniqueness. Sometimes, the following criterion is used. Let C > 1 be a prescribed constant. The iteration is terminated when the condition Lf k − m H 2 < Cδ is violated for the first time, where δ is an a priori bound of the noise level, i.e., a constant for which m − Lf ≤ δ, where f is the unknown function we want to reconstruct and we think of m as a noisy measurement. Then we take f k for such a k as an approximation of f . If there is stability, i.e., when µ > 0, this implies f k − f ≤ Cδ/µ.
It is well known that Landweber Iterations is just a gradient decent method for the functional This makes some of the properties we describe below more geometric; for example if m ∈ Ran L, a minimizer, if exists, would be the same as a minimizer for m replaced by m projected on Ran L.
Let us denote K by K γ now. To estimate the rate of convergence, we need to estimate how close K γ is to 1. By (3.3) and (3.6), In Figure 1, we plot log 10 K for µ 2 = 1, L 2 = 20 for a range of γ's. This gives us the asymptotic rate of convergence in the uniform norm, since the error if we use the N -th partial sum is (I − K ) −1 K N , which log 10 is N log 10 K − log 10 (I − K ). In Figure 3, we present a numerical evidence of that behavior. Note that in the analysis below, with exact data, the factor (I − K ) −1 is actually removed, see (3.18), for example. We have Since the expressions in the parentheses are respectively an increasing and a decreasing function of γ, the maximum is achieved when they are equal, i.e., for γ equal to which is well known, see e.g., [9, pp.66-67]. Then the series is dominated by the geometric series C (1 − ν * ) j which converges exponentially, and therefore, (3.8) converges also uniformly (in the operator norm). When 0 < ν * 1 however, the speed of convergence decreases and the sensitivity to computational errors increases.
For general γ ∈ (0, 2/ L 2 ), the Landweber iterations are dominated by C (1 − ν) j (and this is sharp), see (3.11). Note that ν as a function of γ is piecewise linear with a maximum at γ * , indeed but since the slope of the second factor in (3.11) is larger (and typically, much larger) by absolute value (which is L 2 ) than the slope of the first one, µ 2 , increasing γ form some very small value to γ * would improve the convergence gradually but after passing γ * , the convergence will deteriorate very fast. We observed this behavior in our tests. Note that the ratio of the two slopes is the condition number L 2 /µ 2 of L * L.
If µ = 0, then K = 1. We can use the fact however that K j is applied to L * m only. Then f can be reconstructed by the following formula, see (3.8), (with Lf = m given). A zero eigenvalue of L * L would actually create a series of zeros, that still converges. Below, we actually allow L to have a non-trivial kernel Ker L = Ker L * L and study f 's modulo that kernel. The proof is a standard application of functional analysis, see [29], for example.
Definition 3.1. We will call the problem stable, if either of the conditions above holds.
We always assume that µ is the maximal constant (which exists) for which Proposition 3.1(b) holds, when it does. Such a condition is satisfied, if L * L is an elliptic ΨDO, see for example [29].
When the problem is stable, is invertible with a norm of the inverse equal to 1/µ. A more detailed analysis is done by the spectral theorem below. Let be the N -the partial sum in (3.14).
Modifications. If f belongs to a subspace of H 1 , we can replace L by LΠ * , where Π is the orthogonal projection to that space (ΠΠ * = I) and then L * L is replaced by ΠL * LΠ * and still apply the scheme. Also, for every bounded non-negative operator P in H 2 , we can replace L * L by L * P L. If H 2 is an L 2 space with respect to some measure, a different choice of the measure would insert such a P there. In that case, P can be a multiplication by a non-negative function, which can be used to satisfy compatibility conditions in the case we study, for example.

Exact data.
Theorem 3.1 (Inversion with exact data). Assume (a) Then for every f ∈ H 1 , the series (3.14) converges to the orthogonal projection f 1 of f to (Ker L) ⊥ .
(b) The series (3.14) converges uniformly (i.e., in the operator norm) if and only if the problem is stable. In that case, it converges to f 1 exponentially fast in the operator norm; more precisely, with ν as in (3.11), Remark 3.1. As we showed above, ν is maximized for γ = γ * . Then ν = ν * , and λ dP λ be the spectral decomposition of |L| := (L * L) 1/2 , where dP λ is the spectral measure supported on spec(|L|) = spec(L * L), which is contained in [0, L ]. In that representation, the series (3.14) consists of the terms (1 − γλ 2 ) j γλ 2 .
Recall that if |L| is an N × N matrix, then P λ = λ j ≤λ P λ j , where λ j and P λ j are the eigenvalues of |L| and the projections to the corresponding eigenspaces, respectively. Then dP λ = λ j ≤λ P λ j δ(λ − λ j ) dλ, and for any f ∈ R N , the spectral measure dµ f of f we use below is given by where c j are the scalar projections (the Fourier coefficients) of f w.r.t. to an orthonormal basis of eigenfunctions corresponding to λ j .
Using the identity Clearly, S N f 0 = 0. For S N f 1 we have the same formula as above with dP λ replaced by dP 1 λ , where the latter is the restriction of dP λ to the orthogonal complement of Ker |L|. On that space, λ = 0 is not in the pure point spectrum spec pp (|L|) = spec pp (L * L) but may still be in the spectrum. We refer to [27] for the definition and the basic properties of the pure point and continuous spectra of bounded self-adjoint operators. Then is the spectral measure associated with f 1 . The integrand converges to g(x) = 0 for x = 0 and g(0) = 1, as N → ∞. It is bounded by the integrable function 1, see (3.6). By the Lebesgue dominated convergence theorem, the limit in (3.18), as N → ∞, is spec(|L|) g dµ f 1 and the latter is zero because µ f 1 ({0}) = 0. We will provide a more direct proof of the latter revealing better the nature of the problem. We claim that (3.19) lim Indeed, as δ 0, the integral above (which is just the measure of spec(|L|) ∩ [0, δ) with respect to µ f 1 ) is a monotonically decreasing non-negative function of δ and therefore, it has a limit which is its infimum. By the properties of Borel measures, if the limit is positive, that means that {0} has a positive measure. Therefore, 0 would be in the pure point spectrum which we eliminated by restricting L on the orthogonal complement of Ker(|L|). Therefore, the limit in (3.19) is zero, as claimed.
This argument shows that the rate of convergence depends heavily on the spectral density (measure) of f near λ = 0 for λ > 0 but not for λ = 0.
To prove (b), notice first that the norm of S N − I on (Ker L) ⊥ is given by the maximum of For uniform convergence, we need that maximum to be strictly less than 1, which proves the first part of (b). When this happens, 3.3. Noisy data. Let the measurement Lf be "noisy", i.e., we are given m ∈ H 2 not in the range of L and want to "solve" Lf = m. A practical solution of such problem is to find some kind of approximation of the actual f assuming that m is close in some sense to the range. We will study what happens if we solve (3.2) with a Neumann series without assuming (3.1).
The convergence is uniform with a rate (see (3.17) If the problem is not stable, then there is no uniform convergence. Moreover, there is no even strong convergence. The set of m for which (3.8) is unbounded (and hence, diverges) is residual and in particular, dense, i.e., its complement is of first Baire category.
Proof. Recall that every bounded operator L has a polar decomposition of the type L = U |L|, see [28,VI.4], true also for bounded operators between different Hilbert spaces, where U is a partial isometry. We have Ker U = Ker L, Ran U = Ran L, and U is an isometry on (Ker L) ⊥ . We can write (3.2) as For U * , we have Ker U * = (Ran L) ⊥ and Ran U * = (Ker L) ⊥ . Set m * = U * m. We have m * ⊥ Ker L. Then (3.21) reduces to The function g N is smooth even near λ = 0. Therefore, in the spectral representation,S N = g N (λ).
To prove (a), assume that the problem is stable. Then L * L is invertible on (Ker L) ⊥ by the spectral theorem. The function 1/λ is a bounded function on the spectrum, and it is the spectral representation of the inverse of |L| restricted to (Ker L) ⊥ that we will denote temporarily by |L 1 | −1 . Notice that |L 1 | −1 U * = L −1 1 , where L 1 : (Ker L) ⊥ → Ran L (the latter is closed) is the restriction of L as in (3.15). Therefore, denoting by m 1 the orthogonal projection of m to (Ker L) ⊥ , we have The convergence then follows as in Theorem 3.1 because then the integrand is uniformly bounded on the spectrum. Assume now that the problem is unstable. Assume thatS N m is bounded (which would be true if it converges) for every m * ∈ (Ker L) ⊥ with a bound that may depend on m * . By the uniform boundedness theorem, the norms S N must be uniformly bounded, as well. Therefore, by [27, On the other hand, (1)), as N → ∞. This contradicts (3.24) since in the non stable case, spec(|L|) contains a sequence of positive numbers converging to 0; and in particular proves lack of uniform convergence. We also proved that there cannot be strong convergence in this case.
The proof is then completed by noticing that the set of m for which theS N m is unbounded is known to be residual since S N is unbounded. Residual sets are called sometimes generic.
We give a more direct proof of the last statement, revealing the structure of those m for which the iterations diverge. We will show in particular, that the span of such m is infinitely dimensional. Notice first that spec(L * L) contains a sequence λ n 0. Let I k = (λ k , λ k+1 ]. By the standard proof that N × N is countable, where N = {1, 2, . . . } (or by its conclusion), we can label I k and λ k as I l,m and λ i,m so that the map k → (l, m) is a bijection. For each l, λ l,m → 0, as m → ∞ because the latter is a subsequence of λ k with rearranged terms. Then for every l, on the range H l of the spectral projection corresponding to ∪ k I l,k , the series (3.8) diverges for some m so that U * m ∈ H l . On the other hand, U is unitary from (Ker L) ⊥ to Ran L; therefore, U * : Ran L → (Ker L) ⊥ is unitary, as well. Since H l ⊂ (Ker L) ⊥ , U * m ∈ H l identifies m ∈ Ran L uniquely by applying the inverse of U * . This proves the existence of m = m l which makes (3.8) divergent, and those m l 's are mutually orthogonal. They are mapped to a linearly independent set under U * .
Note that we actually proved something more: for every l, "almost all" elements m in H l force (3.8) to diverge: only those with finitely many non-zero projections do not. On the other hand, there is an infinitely dimensional space of m for which we have strong convergence, for example all m with a spectral measure dµ m * such that 1/λ is integrable.
It is worth noting that in the stable case (a), the Landweber solution f = L −1 1 m 1 can also be described in the following way. The noisy data is first projected to Ran L (which is closed). Denote that projection by m 1 . Then the exact solution is found as in the case of exact data: the solution of Lf = m 1 orthogonal to Ker L. This is called sometimes the Moore-Penrose inverse. If there is no stability, m is projected to the closure of Ran L, because in m * = U * m the result would be the same if we replace m by m 1 . Then we take m * = U * m 1 . Then (3.21) is equivalent to |L| 2 f = |L|m * . This is an equation in (Ker L) ⊥ , equivalent there to |L|f = m * . We have m * ∈ (Ker |L|) ⊥ and the later is the closure of Ran |L| but it is guaranteed to be in range if and only if that range is closed, i.e., if there is stability.

Remark 3.4.
If H 1 is finitely dimensional, then the problem is always stable. If we discretize a problem in an infinitely dimensional space, we get a finitely dimensional one. One may think that there is no convergence problem then. There are two potential problems with this argument however. First, the gap could be very small causing stability problems in numerical inversions. Second, the discretization could be a very poor approximation of the continuous problem near the bottom of the spectrum causing a very small gap (optimal µ) even though the continuous model may have a not such a small one. In our simulations, for example, we get eigenvalues of order 10 −15 while the "significant ones" are of order 1.
Remark 3.5. The proof of Theorem 3.2 reveals that the rate of convergence or possible convergence with noisy data depends on the spectral measure of m * , which we defined as the orthogonal projection of U * m onto (Ker L) ⊥ .

Choosing a different initial guess.
We may think f 0 in (3.9) as an initial guess. We may also think of f 1 = γL * Lm as such. When K < 1, we have a sequence generated by a contraction map, and this sequence would converge regardless of the initial f 0 . We then choose f 0 not necessarily zero and set, as before, Then the limit f , if exits, must satisfy which is equivalent to L * Lf = L * m, which is the starting point in Section 3.1, see (3.2). If f 0 is already close enough to f , we would expect the iterations to converge faster. In fact, we could gain speed if f − f 0 has a better power spectrum w.r.t. L * L, i.e., has a smaller spectral measure there. It is easy to show that Therefore, the Landweber iterations in this case correspond to the ones we studied so far for the equation i.e., f is replaced with f − f 0 and m is replaced by m − Lf 0 . Even though the new equation is equivalent to the old one, if f − f 0 is small in a certain sense or it has a lower spectral measure near the origin, the series may converge faster.

Application to multiwave tomography with reflectors
We apply the abstract theory in the previous section to multiwave tomography with reflectors. To have L as a bounded L 2 → L 2 map, we will restrict supp f to a compact subset of Ω, and assume convexity. The reason for that is to exclude possible singularities issued from supp f hitting ∂Ω tangentially. We also refer to [35] for a discussion of the mapping properties of a similar operator (the solution of the wave equation without reflections, i.e., propagating in R n ) to R × ∂Ω in case of tangential rays.
First we derive the expression for We know that L * must exist as a bounded operator in the spaces indicated above, and C ∞ 0 ((0, T ) × ∂Ω) is dense in L 2 ((0, T ) × ∂Ω); therefore L * restricted to the latter space defines L * uniquely.  L : where v is the solution of the following initial boundary value problem . Note that (4.3) is uniquely solvable, and one way to construct a solution is to take a smooth v 0 satisfying the boundary condition in (4.3), supported in (0, T ) ×Ω and set v = v 0 + v 1 , where v 1 = − v 0 , and v 1 has zero Cauchy data on t = T and satisfies the homogeneous Neumann boundary condition. Here, is the wave operator and the source problem for v 1 can be solved by Duhamel's principle using the well posedness of the boundary value problem for the Neumann problem with Cauchy data in H 1 (Ω) × L 2 (Ω), see e.g., [14] or [33] in this context. In [33], in the energy space, H 1 is replaced by the same space modulo constants (the kernel of the Neumann realization of −c 2 ∆) but for Cauchy data (C, 0) with C constant, the solution is trivially u = C.
Let v be a solution of (4.3), and let u solve (2.1). Then For the first term in (4.4), integration by parts in t twice gives where we have used that u| t=0 = f , ∂ t u| t=0 = 0 and v| t=T = ∂ t v| t=T = 0. For the second term in (4.4), applying Green's formula yields Notice that the last term on the right-hand side actually vanishes as ∂ ν u = 0. Inserting these expressions into (4.4) and observing that ∂ 2 t u = c 2 (x)∆u we have which justifies (4.2). Since C ∞ 0 (Ω 0 ) and C ∞ 0 ((0, , T )×∂Ω) are dense in L 2 (Ω 0 , c −2 dx) and L 2 ((0, T )× ∂Ω), respectively, the closure of L is indeed the adjoint operator of L.
Remark 4.1. The proof of the theorem reveals the following. Note first that if g is smooth but not C ∞ 0 (more specifically, not vanishing at t = T ), the compatibility condition for the first derivatives of v is violated at t = T and x ∈ ∂Ω since v| t=T = 0 implies ∂ ν v| t=T = 0 as well, which requires g| t=T = 0. In that case, there is no C 1 solution in the closed cylinder; and for a C 2 solution, we need to satisfy even one more compatibility condition: ∂ ν g| t=T = 0. We are interested in L * L, and we cannot hope to have g| t=T = g t | t=T = 0 for g = Lf . Therefore, when computing L * Lf , we (almost) always must compute a weak solution of (4.3) and therefore deal with g violating the compatibility conditions.
The way to overcome that difficulty is to notice that we are constructing an L 2 → L 2 operator, see (4.1). Let χ ε be the characteristic function of [0, T − ε], with a fixed ε > 0. Then χ ε g → g in L 2 for any g ∈ L 2 but the rate of convergence is g-dependent (for the multiplication operator, we have χ ε → Id strongly only). Therefore, if we replace g by χ ε g and solve the resulting problem (the jump of χ ε g at t = T − ε is not a problem), we would get a good approximation for 0 < ε 1. On the other hand, in the iterations above, we apply L * to g = Lf for a sequence of g's. One could use the results in [6] to investigate whether χ ε g → g uniformly in g for g in the range of L for T > 0 satisfying the stability condition but we will not do this. Another way is just to replace L by χ ε L, and therefore, L * will also be replaced by L * χ ε . In fact, we could put any real valued L ∞ function there, as long as it has a positive lower bound in a set on (0, T ) × ∂Ω which set set satisfy the stability condition. We compare the Averaged Time Reversal method [33] with the Landweber method. We use a standard second order finite difference scheme on an N ×N grid, and we restrict the image to a smaller square by multiplying by zero in the frame staying at 3% from the border. This does not affect the SL phantom but it affects L * (and L) in the iterations; we have χL * Lχ instead of L * L in the iterations, where χ is the cutoff function. This does not eliminate completely geodesics through supp f tangent to the boundary if c is not constant but by [6], the stability is still preserved.
The errors are presented in Figure 3. The thinner curves in the top part are the errors on a log 10 scale for different choices of the parameter γ. Since we use the step dt = dx/( √ 2 max(c)), dx = dy = 2/(N − 1) in the numerical implementation, this effectively rescales the t-axis by the coefficient 1.5 √ 2, and therefore, changes the L 2 norm on [0, T ] × ∂Ω. The choice of γ depends on that factor, as well.
We see in Figure 3 that the optimal γ seems to be γ ≈ 0.055, based on 50 iterations. The convergence is highly sensitive to increasing γ beyond this level. On the other hand, decreasing γ reduces the accuracy but this happens much slower. Choosing the optimal γ is based on trials and errors in this case. In Figure 3, we show the errors vs. γ for 10, 30 and 50 iterations. The errors are consistent with the theoretical analysis in Section 3.2. The error curves for each fixed number of steps (10, 30 or 50) have shapes similar to that in Figure 1. Note that to compare Figure 3, and Figure 1, one has to divide the values in the former graphs by the number of the steps.
The theoretical analysis of the uniform convergence rate illustrated in Figure 1, allows us to estimate µ 2 roughly. The slopes of the curves in Figure 3 up to the optimal γ, normalized to one step, give a slope in the range [1,2]. Note that it is closer to 2 for N = 10, and close to 1 for N = 50. Converting to natural log, we get a rough estimate of µ 2 in [2.3, 4.6]. Estimating L is not practical from that graph.
We computed the matrix L of L in its discrete representation for N = 101, and Ω 0 being an interior 94 × 94 square. We halved N because computing L and its spectrum is very expensive for N = 201. The errors in that case are similar to what we presented for N = 101. We computed L on that grid. The eigenvalues of L * L are plotted in Figure 4, the smooth curve with the vertical axis on the left.
We computed the numerical representation L * w of L * using the wave solver to solve (4.2) as well. The actual numerical inversion uses L * w L, not L * L. The spectral representation looks a bit different then but the locations of the lower part of the spectrum and the power spectrum of the SL phantom remain almost the same. Some of the eigenvalues have imaginary parts small relative to their real ones.
The largest computed eigenvalue of L * L is λ 8836 ≈ 50.5, and the smallest 30 are of order 10 −15 , which can be thought of as 0 computed with a double numerical precision. We also get λ 1000 ≈ 0.036. So for all practical purposes, L * w L has a kernel of dimension at least 1, 000! This seems to contradict the fact that L is stable. The computed eigenfunctions of L * L (not L * w L) are shown in Figure 5. The eigenfunction problem is unstable but for our purposes, it is enough to know that there is a certain approximate orthonormal (enough) base so that on the first, say 1, 000 eigenfunctions, L has a norm of several orders of magnitude smaller than L(SL) / SL , where SL is the SL phantom. We verified numerically that L acting on those computed eigenfunctions have small norms, as the small eigenvalues suggest, and that the basis is orthonormal to a good precision. We also generated linear combinations of the first 1, 000 eigenfunctions with random coefficients and computed the norm of L on them. The relative norm of the SL phantom on that space is around 0.0016, which means that the total spectral measure there is the square of that, i.e., approximately 2.42 × 10 −6 . Also, using so generated phantoms, the iterations break. It turns out that the eigenfunctions corresponding to the extremely small eigenvalues have sharp jumps from minimal to maximal values from cell to cell like a chess board. The plotted 1, 000-th eigenfunction has mostly high frequency content as well, see Figure 5. The 7, 500-th one has a much lower high frequency content. This suggests that the extremely small eigenvalues are due to the poor accuracy of the wave equation solver for frequencies close to Nyquist, see also section 5.4. This can also be confirmed by using semiclassical methods. The operator L * L − z for 0 < z 1 is elliptic because z is separated from the range of the (elliptic) principal symbol. Therefore, possible eigenfunctions with eigenvalues z should be smooth and should also have low oscillations. Therefore, eigenfunction #1 eigenfunction #1,000 eigenfunction #7,500 Figure 5. The eigenfunctions of L * L corresponding to λ 1 , λ 1,000 and λ 7,500 the spectrum of the discrete L * w L contains small eigenvalues not approximating the spectrum of L * L.
Why do the iterations still work, and we are getting a lower bound of the spectrum µ 2 in [2.3, 4.6] (based on one f , indeed but we tried a few other ones)? It turns out that our f (the SL phantom) does not have much spectral content in the bottom of the spectrum. In Figure 4, we plot the computed squares of the Fourier coefficients (the power spectrum) of the SL phantom with respect to the eigenfunction basis of L * w L, see the rough curve there and the vertical axis on the right. It turns out that the SL phantom has very small Fourier coefficients until, say, λ 3,000 ≈ 4.26. We can take this as an effective value for µ 2 since the analysis above says that the bottom on the spectrum should be taken on the support of the spectral measure of f , see (3.18). This correlates well with our estimate µ 2 ∈ [2.3, 4.6] based on Figure 3. In the top of the spectrum, there is a similar but a smaller gap with the highest significant eigenvalue λ 8,718 ≈ 31.84. Therefore, L 2 ≈ 31.84 on the support of the spectral measure of the SL phantom is a good estimate. That gives us, roughly speaking, γ * = 0.055, which is close to our numerical results, see Figure 3. We recall that the computed errors are based on inversion with L * w L (even though we do not compute the matrices L * w and L in the inversion) while the spectral representation is done based on L * L.
It is well known that finite difference schemes for differential operators require a priori boundedness of the higher order derivatives, therefore the conclusions above should not be surprising. High discrete frequencies propagate with slower speeds and the numerical solution is a good approximation only at low frequencies, see e.g., [41,3]. Also, the discretization of the phantom matters. We render the SL phantom on a higher dimensional grid first, then resample it to the grid we work with. This way, we have a better sampling of the continuous function the SL phantom represents. This does not affect the rates of convergence much, and without that, we still have a very low spectral content for very small eigenvalues, even though it is not that low. The original and the reconstructions however do not show the typical aliasing artifacts like staircase edges, etc., and a still visibly "sharp" enough.

5.1.2.
Data not in the range. In the example above, we add the data on one of the sides to another one. The difference, and hence the new data is not in the range by unique continuation. The reconstruction shows obvious artifacts as one would expect. The convergence behavior as a function of γ however, does not change in a radical way, see Figure 6. In particular, the iterations start  Figure 7. In the continuous model, see Theorem 3.2, the iterations would converge at a rate depending on the spectral measure of the noise; which could be very slow. They converge to f +f as above. In the numerical case however, the problem is actually unstable, as explained above. The data now has a small but not negligible spectral part in the lower part of the spectrum due to the nature of the noise. This is responsible for a very slow divergence. Note that when the "noise" has a different spectral character, as in Figure 6, we are closer to convergence.  The ATR method with this example is more sensitive to noise and gives significantly worse errors and images.

5.1.4.
Conclusions in the stable case. We summarize the conclusions in this section in the following. We want to emphasize again that some of them, like the first one are well known.
• The finite difference solver for the wave equation is a poor approximation of the continuous model for (discrete) frequencies close to Nyquist. We refer to section 5.4 for a further discussion. Even the sampling of the phantom could be a poor approximation (aliased). • The discrete realization L of L is unstable even if L is stable. This is not a problem with "generic" phantoms because they have a very small spectral content at the bottom of the spectrum of L * L which corresponds to asmall hight frequency content (w.r.t. the discrete Fourier transform). • The discrete realization L * w of L * used in the inversion is based on back-projection at each step and it is not the same as L * . The differences do not affect the inversions visibly but create small but measurable imaginary parts of the eigenvalues of L * w L. Most of the differences are related to modes with frequencies close to Nyquist.
• Despite this, the inversion with "generic" phantoms works as predicted by the theory of the continuous model because they have small projections to the eigenfunctions of L * L with small eigenvalues (the unstable subspace). If we base the theory on the discrete model, we would have to conclude that the problem is very unstable. • In case of noise, the iterations converge (to something different than f ) if the noise does not have low spectral content (corresponding to higher frequencies in the discrete Fourier transform). When the noise has a significant low spectral content (high frequencies), the inherent instability of the discretization causes a slow divergence.

5.2.
A unstable example; partial boundary data. We take a constant speed c = 1 so that we can compute easily the invisible singularities. We use data on a part of the boundary as indicated in Figure 8: the bottom and the left-hand side plus 20% of the adjacent sides. We take T = 1.8 which is smaller than the side of the box Ω, equal to 2. We have uniqueness since T 0 = 1.8 in this case. For stability however, we need T > 2. Singularities from some neighborhood of the top of the SL phantom traveling vertically and close to it, do not reach Γ for time T = 1.8. Therefore, such edges cannot be reconstructed stably. Our goal is to test the convergence of both methods and the dependence on γ in the Landweber one. The reconstructions are shown in Figure 8. The second and the third one are done with the Landweber method but in the second one, we use the fact that we know that the SL phantom is supported in Ω 0 , being a slightly smaller square. Numerically, this means that we work with χL * Lχ instead of L * L, (i.e., L is replaced by Lχ, where χ is the characteristic function of Ω 0 . In the third reconstruction, there is no such restriction. The cutoff improves the error a bit. The "ripples" artifacts are due to the sharp cutoff of the data at T = 1.8. If we introduce a gradual cutoff χ 1 (t) with χ 1 (T ) = 0, i.e., if we use χL * χ 1 Lχ instead of L * L, this removes the visible "ripples" but blurs the edge of the SL phantom in a larger neighborhood of the invisible singularities on the top. Despite the slightly larger error, the ATR reconstruction has less artifacts. The errors for various values of γ are shown in Figure 9. The convergence is slower than in the stable case after 10 or so steps, and the improvement with γ increasing below reaching its optimal value is much slower. The errors are based on a cutoff to Ω 0 , which is the better case. With the cutoff, the Landweber and the ATR errors are closer. On the other hand, the Landweber method is more flexible w.r.t. introducing such weights. The Landweber method performs better in this case in terms of the L 2 error.
We analyzed the eigenvalues of L * L next, and the Fourier coefficients of the SL phantom w.r.t. its eigenvalues. The results are shown in Figure 10. This situation is reversed now, compared to the stable case. There are still a lot of "zero" (extremely small) eigenvalues with highly oscillatory eigenfunctions. The non-"zero" Fourier coefficients however start much earlier than the non-"zero" eigenvalues. Since our spectral analysis shows that we need to restrict our considerations on the support of the spectral measure only, we can take λ ≈ 800 as a rough lower bound for this support. Then we have practically zero eigenvalues up to, roughly speaking, λ 1,800 , where the SL phantom has a non-negligible spectral measure. The norm of the projection of the SL phantom to the space spanned by the first 1, 000 eigenvalues is around 40% of the total one. This means instability and it is in contrast to Figure 4. 5.2.1. Unstable example with data not in the range and with noise. We add data on one side to another one, as in the stable case. The errors did not look much different than the noise case below. Next, we add Gaussian noise with 0.1 standard deviation. The data Lf ranges in the interval [−1, 1.5]. The errors reach a minimum and start diverging slowly for γ ≤ 0.16, and diverge fast for γ larger than that, see Figure 11 on the left. This suggests that γ * ≈ 0.16, for that particular image, at least, and correlates well with Theorem 3.2 and its proof which proves divergence with generic perturbations of the data not in the range in the unstable case but indicates a slow divergence. Note that the optimal γ looks similar to that in the noise free case, see Figure 9. On the other hand, instead of a convergent series, we get a slowly divergent one. Finally, we add a filter χ between L and L * w which cuts high frequencies and also frequencies outside the characteristic cone on each side; in other words, we replace L * w by L * w χ. Since χ is a Fourier multiplier by a non-negative function, χ is a non-negative operator. The errors get smaller and the iterations do not start to diverge even after 200 iterations, see Figure 11. This is in line  Figure 11. A unstable case with Gaussian noise. Left: error curves with γ ranging from 0.03 to 0.17 (the diverging curve). The critical value of γ looks close to that in the zero noise case in Figure 9. The iterations for γ ≤ 0.16 diverge slowly in contrast with the noise free case in Figure 9. Right: Filtered data, γ ≤ 0.15. The errors look like in Figure 9.
with Theorem 3.2 and its proof since the filtering of the noise in the data reduced significantly the spectral content of the noisy data (w.r.t. L * L) in the low part of the spectrum, where the instability is manifested. After that, the problem behaves as that with data not in the range but with small high frequency content (w.r.t. the Fourier transform), see Figure 9, where the number of iterations is 50 vs. 200 in Figure 11 on the right. The reconstructions are shown in Figure 12. Even though the filtered reconstruction has a smaller error, it is only marginally better in recovering detail. Left: unfiltered data. Right: filtered data. 5.3. Discontinuous sound speed. We choose a sound speed with a jump across a smaller square. Such speeds model thermoacoustic tomography in brain imaging, see, e.g., [40,31,39]. Singularities in this case reflect from the internal boundary and may refract, as well. Since the speed outside that boundary is faster, this creates rays that do not refract. This makes the problem inside that rectangle potentially unstable. The reconstructions are shown in Figure 13. The ATR method works a bit better not only by providing a slightly better error but there are less visible artifacts. The Landweber reconstruction handles noise better however. 5.4. Forward data generated on a different grid. We present now examples where the data was created on a finer grid. As we mentioned above, the finite difference wave solver is inaccurate for large frequencies. This is known as numerical (grid) dispersion, see, e.g., [4,11,19,12,3]. High frequency waves propagate slower, which explains the terms dispersion. This effect gets worse with time. This creates "ripples" in the wave fronts of high frequency waves, with oscillations in the back of the front. It was experimentally found in [4,11,19] that in order to get a satisfactory performance for a second order finite difference scheme, say at distance 30-40 wavelengths ( [11]), one needs to sample in the spatial variables each wavelength at a rate 4-5 higher than the Nyquist one, which is two point per wavelength. We refer to [12,3] for further discussion and for ways to improve the performance with other solvers.
Such errors would be canceled in the backprojection, which is a well known phenomenon in numerical inverse problems. For this reason, it is often suggested that the forward data should be generated by a different solver. This does not solve the dispersion problem however. The backward solver (if based on finite differences) would still be inaccurate for high frequencies, if they are present in the data. The different forwards solver may have the same problem. A natural attempt seems to be to generate data by a known analytic solution or using a much finer grid. This would guarantee the good accuracy of the data but the problem with the backward solver at high frequencies remains. The practical solution (again, if we backproject by finite differences) is to make sure that the data has mostly low frequency content relative to the grid used in the back-projection or to use a higher order scheme which still suffers form dispersion but allows higher frequencies [11]. Another solution would be to use other solvers not so much limited at higher frequencies but this is behind the scope of this work.
A Landweber reconstruction of the SL phantom with data computed on a finer grid with c = 1 is shown in Figure 14. The ripple artifacts are due to high frequency waves in the backprojection propagating slower than the wave speed 1. Since they become worse when T increases, it is important to keep T low. The critical stability time here is √ 2 and we chose T = 2. In an idealized real life situation, where we have properly sampled boundary data, textitif we still want to use a finite difference scheme, we should do the numerical inversion on a grid much finer than the one determined by Nyquist rate of the signal. That would increase the cost, of course. To simulate such a situation, we should take a continuous phantom with a finite frequency band, oversample it by a factor of five, for example, keep T low, solve the forward problem and sample it on a coarser grid depending on the frequency content of the data. To do the backprojection, we should increase the size of the grid and do finite differences. But making the grid coarser after we have computed Λf and finer right after it basically means that we could stay either with the original finer one in our simulations or change it but keep the data severely oversampled. As mentioned above, a more efficient solution would be to use different solvers for back propagation.
Based on the discussion above, we chose a low frequency phantom in Figure 15 below with data generated by the finer grid used in Figure 14. The reconstruction is much more accurate. A reconstruction with the variable speed used above, see section 5.1.1 yields visibly similar results with a relative L 2 error 4%.  Figure 15. Same situation as in Figure 14 but the phantom contains Gaussians with low frequency content only. Left: original. Right: reconstruction with 50 iterations. The relative L 2 error is 1.8%, the L ∞ error is 3.5%.

Review of the Averaged Time Reversal Method
In this section we review briefly the averaged time reversal method proposed in [33]. The treatment there works for any fixed Riemannian metric g on Ω and the wave equation where ∆ g is the Laplace-Beltrami operator in the metric g and c(x) > 0 is a smooth function. In applications g is the Euclidean metric and ∆ g is the Euclidean Laplacian. Here we sketch the method as well as the resulting algorithm only in the Euclidean case for simplicity of notions. More general and detailed exposition can be found in [33].
6.1. Complete data. We start with the complete data case, i.e. when Γ = ∂Ω. Here and below L 2 (Ω) = L 2 (Ω, c −2 dx). We sometimes adopt the notation u(t) = u(t, ·). Define the Dirichlet space H D (Ω) as the completion of C ∞ 0 (Ω) under the Dirichlet norm The function φ is often referred to as the harmonic extension of h(T ).
When h = Lf we have v(0) = ALf , which can be viewed as an approximation of the initial data f in the multiwave tomography model in R n [30]. Indeed, introduce the error operator (6.2) Kf := f − ALf.
In the multiwave tomography model in R n , it is shown in [30] that if T is such that there is stability, K is a contraction on H D (Ω), that is, K H D (Ω)→H D (Ω) < 1. Moreover, K is compact. This makes the operator I − K invertible on H D (Ω) and one thus deduces from (6. 2) that f = (I − K) −1 ALf . Inserting the expansion (I − K) −1 = I + K + K 2 + . . . gives the Neumann series reconstruction algorithm in [30].
In the multiwave tomography model in Ω, the error operator K : H D (Ω) → H N (Ω) is no longer a contraction regardless of how large T is. In fact one could construct high frequency solutions propagating along a single broken geodesic to show that K H D (Ω)→H N (Ω) = 1. This breaks down the inversion of I − K. The idea suggested in [33] is to average the time reversal operator A with respect to T . The averaging process causes some partial cancellation of the microlocal singularities with opposite signs at t = 0 when T > T 1 , see Definition 2.1, making the averaged error operator a microlocal contraction (but not compact anymore).
We rewrite the time reversal operator A for the purpose of averaging later. Setṽ = v − Ph(T ) with v the solution of (6.1), thenṽ solves and Ah = v(0) =ṽ(0) + Ph(T ). We consider the terminal time as a parameter now, and call it τ with τ ∈ [0, T ]. We replace T by τ in (6.3). Denote the corresponding solution byṽ τ and the corresponding time reversal operator by A(τ ). Note that (6.4) A(τ )h =ṽ τ (0) + Ph(τ ) and note thatṽ τ solves a similar problem as (6.3) but the boundary data is replaced by H(τ − t)(h(t) − h(τ )) with H the Heaviside function. Choose a weight function χ ∈ C ∞ 0 (R) which is positive and has integral equal to one over [0, T ]. We define the averaged time reversal operator A by multiplying (6.4) by χ(τ ) and integrating it over [0, T ]: Ah := Finally, define A 0 := Π 0 A where Π 0 projects the result onto H D (Ω 0 ). The projection of the second term actually vanishes since it is harmonic. The following theorem gives an explicit reconstruction of f when h = Lf . Theorem 6.1. Let (Ω, c −2 dx 2 ) be non-trapping with a strictly convex boundary as a Riemannian manifold and let Ω 0 Ω. Suppose T > T 1 (∂Ω, Ω) and denote K 0 := Id − A 0 L on H D (Ω 0 ). Then K 0 H D (Ω 0 )→H D (Ω 0 ) < 1. In particular, Id − K 0 is invertible on H D (Ω 0 ), and the inverse problem has an explicit solution of the form In other words, the modification to the time reversal (6.1) is that we impose the homogeneous Neumann boundary condition (which is satisfied by the forward solution u) on ∂Ω\Γ where the measurement is unavailable. The averaged time reversal operator is then constructed in a similar manner. We were only able to prove however that the averaging method with partial data however provides a parametrix recovering the singularities of f which do not hit the edge of [0, T ] × Γ [33]. This does not cause a significant loss of singularities as those that hit the edge of Γ form a set of measure zero in the cotangent bundle. Then we have K ≤ 1 but we do not know if K < 1. This does not guarantee convergence of the Neumann series but we use such series numerically nevertheless.