Regularization of inverse problems via box constrained minimization

In the present paper we consider minimization based formulations of inverse problems $(x,\Phi)\in\mbox{argmin}\{\mathcal{J}(x,\Phi;y)\colon(x,\Phi)\in M_{ad}(y) \}$ for the specific but highly relevant case that the admissible set $M_{ad}^\delta(y^\delta)$ is defined by pointwise bounds, which is the case, e.g., if $L^\infty$ constraints on the parameter are imposed in the sense of Ivanov regularization, and the $L^\infty$ noise level in the observations is prescribed in the sense of Morozov regularization. As application examples for this setting we consider three coefficient identification problems in elliptic boundary value problems. Discretization of $(x,\Phi)$ with piecewise constant and piecewise linear finite elements, respectively, leads to finite dimensional nonlinear box constrained minimization problems that can numerically be solved via Gauss-Newton type SQP methods. In our computational experiments we revisit the suggested application examples. In order to speed up the computations and obtain exact numerical solutions we use recently developed active set methods for solving strictly convex quadratic programs with box constraints as subroutines within our Gauss-Newton type SQP approach.


Introduction
Recently, as alternatives to the classical reduced formulation of inverse problems as operator equations with a forward operator F F (x) = y , all-at once methods based on the more original formulation as a system of model and observation equation and beyond that, minimization based formulations (x, Φ) ∈ argmin {J (x, Φ; y) : (x, Φ) ∈ M ad (y)} , have been put forward, see, e.g., [10,11]. In (1) - (4) x is the searched for quantity (e.g., a coefficient in a PDE), Φ the corresponding state (e.g., the solution of this PDE) and y the observed data. The forward based on the formulation (4) and investigate its applicability to several concrete choices of the cost function and the admissible set, based on (2), (3) as well as on the mentioned variational formulation of EIT. Here y δ are the actually given noisy data satisfying S(y, y δ ) ≤ δ , and R, α are regularization functional and parameter, respectively. The present paper is supposed to provide computational results in the specific but highly relevant case that M δ ad (y δ ) is defined by pointwise bounds, which is the case, e.g., if L ∞ constraints on the parameter are imposed in the sense of Ivanov regularization, and the L ∞ noise level in the observations is prescribed in the sense of Morozov regularization, i.e., starting from (4), (7), a regularizer is defined via the minimization problem    min (x,Φ)∈X×U T α (x, Φ; y δ ) = J A (x, Φ) + α · R(x, Φ) s.t.
x ≤ x ≤ x , and y δ − τ δ ≤ C(Φ) ≤ y δ + τ δ , (11) where τ > 1 is a fixed safety factor for the error level in the observation residual, x, x are given (pointwise) bounds on x, and S(y 1 , y 2 ) = y 1 − y 2 L ∞ . Here we think of x and y δ as functions on some domains Ω, Ω, and of L ∞ as the corresponding Lebesgue spaces. Accordingly, the inequality constraints in (11) are to be understood pointwise almost everywhere in Ω andΩ, respectively. Moreover, C is supposed to be a linear operator, more precisely, a restriction of the state, e.g. to some subdomain or some part of the boundary of Ω.
Regularization here mainly relies on the upper and lower bounds on x as well as on relaxation of the data misfit constraint from (7) to an interval [−τ δ, τ δ]. The term α · R may as well be skipped, as we will see in some of the examples in Section 3, where we consider just x ≤ x ≤ x , and y δ − τ δ ≤ C(Φ) ≤ y δ + τ δ .
Therewith, simple discretizations of (11) with, e.g., piecewise linear and/or piecewise constant finite elements, lead to finite dimensional nonlinear box constrained minimization problems. Thus, in their numerical solution via Gauss-Newton type SQP methods, we can take advantage of recently developed methods for the efficient solution of large strictly convex quadratic programs with box constraints [7,8].
The remainder of this paper is organized as follows. In the following section we provide a result on convergence of the regularizer to an exact solution of (2), (3) for the particular setting (11). In Section 3 we discuss three coefficient identification problems in elliptic boundary value problems as application examples of our setting (11). In Section 4 we show how discretization of x and Φ in our application examples leads to finite dimensional non-linear box constrained minimization problems that can be solved by Gauss-Newton type SQP approaches. Section 5 is concerned with the description of methods for the efficient solution of large strictly convex quadratic programs with box constraints that are used as subroutines of Gauss-Newton type SQP approaches. In Section 6 we conduct several numerical experiments for the suggested application examples. Section 7 concludes the paper.

Convergence
For the sake of self-containedness we provide a result on convergence of the regularizer to an exact solution (x † , Φ † ) of (2), (3) for the particular setting (11) of (9), along with its (very short) proof. The main assumption is existence of an appropriate topology τ U on the state space, which will be constructed appropriately in the application examples below, and which also determines the topology in which the regularized solutions converge. As parameter and data spaces, in view of the pointwise bounds on x and C(Φ), we will always use respectively, and T X will be defined by the weak * topology on L ∞ (Ω).
• The topology T U on U is chosen such that for any c > 0, the level sets Here T is the topology defined by weak-* L ∞ convergence on X = L ∞ (Ω) and by T U on U .
Proof. For fixed α > 0, minimality and admissibility of (x † , Φ † ) for (11), together with the fact that A(x † , Φ † ) = 0, yields the estimate so that it suffices to restrict the search for a minimizer to the level set x and y δ − τ δ ≤ C(Φ) ≤ y δ + τ δ} is contained in L c and thus has a T convergent subsequence with limit (x, Φ) ∈ L c . Thus, (x, Φ) satisfies the box constraints and T lower semicontinuity of T α yields minimality of (x, Φ).
Remark 1. In view of estimate (13), which in case R = 0 implies that (x δ α , Φ δ α ) satisfies the model equation (2) exactly, the presence of a strictly positive regularization term can be viewed as a relaxation of the model equation. Also note that the regularization term is here not necessarily needed for stability of x, since this is already achieved by the pointwise bounds. Still, the presence of the term α · R may help to enable existence of minimizers of (11) and in this sense, well-posedness of the regularized problem.

Application examples
In this section we provide some examples of coefficient identification problems in elliptic boundary value problems.

An inverse source problem
Consider identification of a spatially varying source term f (e.g., a heat source) in the elliptic boundary value problems from observations y i = C i (φ i ) of φ i , i = 1, . . . , I (e.g., temperatures) in the interior and/or on the boundary of the domain Ω. This is a linear inverse problem which can be formulated as a linear operator equation or as a quadratic minimization problem. We use the function spaces (where in the latter case we assume that Ω f dx = 0 = ∂Ω j i ds) and the negative Laplace operator which is an isomorphism between V and its dual V * , thus we can use as a norm on V . With this function space setting and the functionals b f , g i ∈ V * defined by the weak form of (14) reads as Here b f plays the role of the parameter x in the previous section, and the state consists of Φ := (φ 1 , . . . , φ I ).
We will particularly concentrate on the practically relevant observations where in the latter case ω o ⊆ Ω is a measurable subset with positive measure, and thus use the data space From the point of view of elliptic PDEs, a natural norm for measuring the deviation from the model, i.e., the residual of the PDE, is the H −1 norm, which can be implemented by using the inverse of the negative Dirichlet Laplacian. This results in the cost function and leads to a formulation of the inverse problem as a constrained minimization problem This directly corresponds to the reformulation (7) of the all-at-once version (2), while a reduced one (1) can be defined via the linear forward operator A priori information on pointwise lower and upper bounds b, b of the source term, as often available in practice, and a relaxation of the observation equation according to the discrepancy principle leads to the regularized problem cf. (11). Note that we set R to zero here, which is feasible in the setting of Proposition 1, as long as Assumption 1 can be verified.
To do so, we define the topology Therewith, T compactness of level sets L c obviously holds. Indeed, boundedness of J A (b n , Φ n ) and L ∞ (Ω) (hence V * ) boundedness of b n implies boundedness of Φ n in H 1 (Ω) I , which implies existence of subsequences (b n k , Φ n k ) converging weakly * in L ∞ (Ω), as well as according to the first two limits in (21), to some (b f , Φ). By boundedness of CΦ n k in L ∞ (Ω) I , we can extract another subsequence (without relabelling) such that CΦ n k * y in L ∞ (Ω). It remains to show that y = CΦ. In both cases of (16), the operator C is continuous as a mapping from V to L p (Ω) for some p ∈ (1, ∞) (in the first case, due to the Trace Theorem, in the second case, due to continuity of the embedding V → L p (Ω) and of the restriction operator L p (Ω) → L p (ω o )), hence, as a linear operator it is also weakly continuous. Thus, CΦ n k C(Φ) in L p (Ω), which by uniqueness of weak limits implies y = CΦ. By weak lower semicontinuity of the norms, Also T lower semicontinutiy of J A is a direct consequence of weak lower semicontinuity of the V * norm and the fact that Moroever the last line in (21) immediately implies (12). We mention in passing that Assumption 1 remains valid if we add a regularization term α · R with R defined, e.g, by some positive power of a norm.

Identification of a potential
We now consider a nonlinear inverse problem for an elliptic PDE, namely recovery of the distributed coefficient x = c in the boundary value problem from interior or boundary observations is a Lipschitz domain and the excitation is done via the sources f i and the Neumann data j i , which are assumed to be known. In case c is nonnegative almost everywhere in Ω, the above PDE is elliptic and thus the forward problem of computing φ i in (22) is well-posed. The situation is to some extent similar (but technically more challenging, cf., e.g., [1]) when replacing the PDE in (22) by the Helmholtz equation as long as one can guarantee that ω 2 stays away from the eigenfrequencies of −c 2 0 ∆. This model, together with boundary observations of φ i is the frequency domain version of the seismic inverse problem of recovering the spatially varying wave speed c 0 in the subsurface from surface measurements of the acoustic pressure φ i , often referred to as full waveform inversion (FWI) cf., e.g., [1] and the references therein.
The approach we are following here does not require well-definedness of the parameter-to-state map S and therefore allows to consider (22) with arbitrary c ∈ L ∞ (Ω), in particularly also (23) without restriction on the frequency ω. We will demonstrate this by means of some numerical experiments with negative and mixed sign coefficients c in (22) in Section 6.
The weak form of (22) can be written as and g i ∈ V * is defined by (15), where in the pure Neumann case meas(∂Ω \ Γ) = 0 we assume that Analogously to the inverse source problem above, we can formulate a cost functional by using the V * norm of the residual in the state equation.
Starting from the mimimization based formulation (18) (with b f replaced by c and J A according to (24)) of the inverse problem, again we can define the regularized problem without using R where this time we also guarantee nonnegativity of c by means of the lower bound. The topology to verify Assumption 1 is the same as in the previous example (21) and the verfication of T compactness of level sets and of (12) goes exactly like in Section 3.1. To see T lower semicontinuity of J A , consider a sequence (c n , Φ n ) By weak lower semicontinuity of the V * norm, this implies T lower semicontinuity of J A .
Remark 2. In the elliptic case c ≥ 0 in (22), alternatively to D, the operator D c can be used as an isomorphism between V and V * , which induces the norms v 2 Thus for c ≥ 0, we could define the cost function bỹ Due to the use of a parameter dependent norm, this is really different from the standard reformulation (7) of the all-at-once version (2), and (19), as actually given by (24). However, the last term g i , D −1 c g i V * ,V in (26) would spoil T semicontinuity of J A even if we add some higher norm of Φ as a regularization term and strengthen T accordingly (see (30) below for a different application example). Thus to establish existence of a minimizer of the regularized problem withJ A , we would have to additionally regularize c e.g., by a total variation term, which would require more sophisticated discretization than piecewise constant finite elements, though. Moroever, due to this last term, evaluation of the cost function (26) for some iterate c n requires solving boundary value problems with the elliptic operator D cn = −∆ + c n · changing in each iteration, while (24) only involves simple Laplace problems. Thus we furtheron stay with the formulation based on (24).
We mention in passing that a reduced formulation (1) can be defined by means of the nonlinear forward operator F :

Identification of a diffusion coefficient
Another nonlinear problem is the identification of the distributed diffusion coefficient x = a in the elliptic boundary value problem 3} is a Lipschitz domain, the excitations f i , j i are assumed to be known, and we focus on the observation setting (16), (17). With and the function space V = H 1 ♦ (Ω) according to (15) (where again in case meas(∂Ω \ Γ) = 0 we assume that Ω f i dx + ∂Ω j i ds = 0), as well as g i ∈ V * defined by the weak form of (27) can be written as Note that with f i = 0, Γ = ∂Ω, C i = tr ∂Ω , this setting comprises the electrical impedance tomography (EIT) problem of recovering the conductivity a from several current-voltage measurements (j i , tr ∂Ω φ i ), i = 1, . . . , I, cf., e.g., [4] and the references therein.
To derive a minimization based formulation of this inverse problem we define which corresponds to the reformulation (7) of the all-at-once version (2), As a regularized version of (28) we consider s.t. 0 < a ≤ a ≤ a a.e. in Ω , and C i φ i − y δ i Yi ≤ τ δ , i = 1, . . . , I .
The regularization term φ i 2 H 3/2− (Ω) with ∈ (0, 1 2 ) is required to guarantee existence of a minimizer of (29), while still admitting jumps in the gradient of φ i , hence jumps in the diffusivity/conductivity a, as often occuring in practice.
The T U topolgy is here defined by Therewith, Assumption 1 can be established as follows. Verfication of T compactness of level sets and of (12) is the same as in Section 3.1, additionally taking into account weak lower semicontinuity of the H 3/2− (Ω) norm. Moreover, T convergence of (a n , Φ n ) to (a, Φ) implies weak V * convergence of D an φ n,i − g i to D a φ i − g i as follows. For any ψ ∈ V This implies T lower semicontintuity of J A (a, u) = 1 2 A(a, u) 2 (V * ) I . Remark 3. Analogously to Remark 2 we could use the parameter dependent norm on V for a ∈ L ∞ (Ω) positive and bounded away from zero, to define the cost functioñ However, again the last term would spoil T semicontinuity of J A and necessitate additional regularizion of a, which we wish to avoid.
A reduced formulation (1) could here be defined via F :

Gauss-Newton SQP method
Discretization of x and u with piecewise constant and piecewise linear finite elements, respectively, leads to finite dimensional nonlinear box constrained minimization problems. We solve these iteratively by Gauss-Newton type SQP methods, which means that we approximate the Hessian of the Lagrangian by linearizing under the norm that defines the cost function, i.e., skipping higher than first order derivatives of the operator A, and leave the constraints unchanged due to their linearity.
For the potential identification problem (25) from Section 3.2, this means that we successively have to solve a discretized version of the quadratic minimization problem where with d k,i = D −1 (c k φ k,i + g i ), the cost function can be rewritten as For the diffusion identification problem (28) from Section 3.3, the quadratic minimization problem in each Newton step reads as ≤ a a.e. in Ω , and y δ i − τ δ ≤ C i φ i ≤ y δ i + τ δ , i = 1, . . . , I , Note that we always just invert the negative Laplacian D and not the parameter dependent operator D c or D a as it would be required in a reduced formulation (1). For the inverse source problem (18), the cost functional is already quadratic, so just one Newton step is required and coinides with the original regularized minimization problem (18). Thus, after discretization, each Gauss Newton step consists of solving a strictly convex box constrained quadratic program with J(x) = x T Qx + q T x, Q 0, < u, and the inequalities to be understood component-wise.

Solving strictly convex box constrained quadratic programs
In this section we discuss available methods for efficiently solving (31) and provide some details on the approach performing best in our numerical tests in Section 6.

Available methods
The minimization of a strictly convex quadratic function under box constraints is a fundamental problem on its own and additionally an important building block for solving more complicated optimization problems. Interior point, active set and gradient projection methods are the most prominent approaches for efficiently solving (31). A special type of active-set method was introduced by Bergounioux et al. [2,3] in connection with constrained optimal control problems and tailored to deal with discretisations of specially structured elliptic partial differential equations. Their approach has turned out to be a powerful, fast and competitive approach for (31). Hintermüller et al. [6] provide a theoretical explanation of its efficiency by interpreting it as a semismooth Newton method. One major drawback of this method lies in the fact that it is not globally convergent for all classes of strictly convex quadratic objectives. Practical computational evidence shows that if the method converges at all, it typically takes very few iterations to reach an optimal solution, see e.g. [9,13].
Several modifications of this method were introduced recently. In [7] we propose a primal feasible active set method that extends the approach from [2,3] such that strict convexity of the quadratic objective function is sufficient for the algorithm to stop after a finite number of steps with an optimal solution. In [8] we introduce yet another modified, globally convergent version of this active set method, which allows primal infeasiblity of the iterates and aims at maintaining the combinatorial flavour of the original approach.
Beside their simplicity (no tuning parameters), our globally convergent methods offer the favorable features of standard active set approaches like the ability to find the exact numerical solution and the possibility to warm start. Computational experience on a variety of difficult classes of test problems shows that our approaches mostly outperform other existing methods for (31). In the following subsection we discuss the workings of the approach from [7] in some detail, as it is the best performing one for the benchmark instances considered in this paper.

Details on a primal feasible active set method [7]
It is well known that x ∈ R n together with vectors α, γ ∈ R n of Lagrange multipliers for the box constraints furnishes the unique global minimium of (31) if and only if the triple (x, α, γ) satisfies the KKT system where • denotes component-wise multiplication. The crucial step in solving (31) is to identify those inequalities which are active, i.e. the active sets A ⊆ N := {1, . . . , n} and C ⊆ N , where the solution to (31) satisfies x A = u A and x C = C respectively. Here, for B ⊆ N and some vector x ∈ R n , x B denotes restriction of the vector to components with indices in B; analogous notation will be used for matrices. Next let us also introduce the inactive set I := N \ (A ∪ C). Then we additionally require α I = α C = γ I = γ A = 0 for (32b) to hold. To simplify notation we denote KKT(A, C) as the following set of equations: The solution of KKT(A, C) satisfies stationarity (32a) and complementary (32b) conditions and is given by We write [x, α, γ] = KKT(A,C) to indicate that x, α and γ satisfy KKT(A,C). In some cases we also write x = KKT(A,C) to emphasize that we only use x and therefore do not need the backsolve to get (α,γ).
If we only carry out the backsolve to get (α,γ), we write   pair (B, D).
In general we have no means to ensure convergence of the algorithmic setup described so far. The key idea to ensure convergence of active set methods consists in establishing that some merit function strictly decreases during successive iterates of the algorithm. This gurantees that no active set is considered more than once and hence cycling cannot occur. In our case we use the objective function as merit function.
To avoid cycling, we therefore suggest additional measures in case J(y) ≥ J(x). Let us assume (A, C) not optimal for the following case distinction. Note that |A| + |C| ≥ 1 holds due to [7,Lemma 10]. We consider the following cases separately.
In this case we can set A ← B, C ← D, and continue with the next iteration.
The pair (A, C) with A ∪ C = {j} is primal feasible. Since x = KKT(A,C) is primal feasible, but not optimal, we must have α j < 0 if j ∈ A and γ j < 0 if j ∈ C. Thus the objective function improves by allowing x j to move away from the respective boundary. Hence in an optimal solution it is essential that x j < u j if j ∈ A and x j > j if j ∈ C. Thus we have a problem with only 2n − 1 constraints which we solve by induction on the number of constraints.
In this case we again solve a problem with less than 2n constraints to optimality. Its optimal solution then yields a feasible pair (B, D) with y =KKT(B, D) and J(y) < J(x). Our strategy is to identify two sets A 0 ⊆ A and C 0 ⊆ C with A 0 ∪ C 0 = ∅ such that x is feasible but not optimal for Note that (33) is a subproblem of (31), where some variables are fixed to their upper respectively lower bounds. For details on the choice of (A 0 , C 0 ) we refer to [7]. Now (recursively) solving (33) yields an optimal pair (B 0 , D 0 ). We set B := A 0 ∪ B 0 , D := C 0 ∪ D 0 and compute [y, β, δ] =KKT(B, D). By construction, (B, D) is primal feasible, and J(y) < J(x) because y is optimal for (33). In summary we can ensure that the objective value associated to the primal feasible pair of active sets reduces in each iteration and therefore the feasible active set method from [7] is globally convergent. A corresponding algorithmic description of the method is given in Table 2.

Numerical tests
We performed test computations in a Matlab implementation for the three examples from Section 3 iñ Ω = Ω = (−1, 1) 2 ⊆ R 2 using just one observation I = 1 (thus skipping subscrips i in the following) and a the piecewise constant function The finite element grid used in computations was determined by subdividing the unit interval into N = 32 subintervals in both directions, leading to (N + 1) 2 = 1089 gridpoints for the piecewise linear nodal discretization of u and b f , as well as 2 * N 2 = 2048 triangles for the piecewise constant element discretization c and a. Thus in the example from Section 3.1 we end up with n = 2178, in the two other examples with n = 3137 unknowns. Part of the experiments were also carried out on a finer grid with N = 64 and correspondingly n = 8450 or n = 12417 unknowns. In order to avoid an inverse crime, we generated the synthetic data on a finer grid for the example from Section 3.1, while in the examples from Sections 3.2, 3.3, we additionally prescribed the exact state as φ ex (x, y) = cos( π 2 x) sin( π 2 y), and computed Choose A 0 ⊆ A, C 0 ⊆ C with A 0 ∪ C 0 = ∅ such that x is feasible but not optimal for (33), for details see [7]. Let (B 0 , D 0 ) be the optimal pair for (33). Table 2: Algorithmic description of the feasible active set method from [7]. the corresponding right hand side f on the finer grid. After projection of φ ex onto the computational grid, we added uniformly distributed random noise of levels δ ∈ {0.001, 0.01, 0.1} corresponding to 0.1, 1 and 10 per cent noise, to obtain synthetic data y δ . In all tests we started with the constant function with value 5.5 for b f 0 , c 0 , a 0 (i.e., the mean value between upper and lower bound) and φ 0 ≡ y δ . Moreover, we always set τ = 1.1.
First of all, we provide a comparison of the methods described in Section 5, in their Matlab implementations Feas_AS [7] and Infeas_AS [8] with quadprog which is the standard Matlab function for solving (31). The trust-region-reflective algorithm quadprog is a subspace trust-region method based on the interior-reflective Newton method from [5]. To facilitate transparency of our numerical tests we provide the Matlab code for generating and solving all instances with the various methods discussed in this paper under http://philipphungerlaender.com/qp-code/. Table 3 shows the CPU times measured with the Matlab function cputime, the L 1 errors as well as the errors in certain spots within the two homogeneous regions and on their interface, cf. Figure 1, more precisely, on 1 N × 1 N squares located at these spots, corresponding to the piecewise constant L 1 functions with these supports in order to exemplarily test weak * convergence. Moreover, we provide the number k of Gauss Newton SQP iterations (which is always one in the linear inverse source problem, of course) and the relative residual, i.e., the cost function ratio at the final iterate J(x0,u0) . We do so for the three examples from Sections 3.1, 3.2, and 3.3, shortly referred to as "source", "potential", and "diffusion", respectively, using the lowest noise level δ = 0.001 and, besides the discretization with N = 32, also one with N = 64 subintervals in both directions. We observe that Infeas_AS and Feas_AS obviously reach the same solution, which is almost bang-bang, as the vanishing errors in almost all spots show, whereas quadprog, being an interior point method, exhibits small deviations from the bounds. In all cases Feas_AS outperforms the other two methods as far as CPU times are concerned. Therefore the following computations were done using Feas_AS.
To provide an illustration of convergence as the noise level tends to zero, we performed five runs on each noise level for each example and list the average errors in Table 4. In the diffusion example, where we need to regularize, we set α = 1.e − 4 · δ. Correspondingly we provide an illustration of the convergence history for the inverse potential problem from Section 3.2 for two different noise levels δ = 0.1 and δ = 0.01 in Figures 2, 3.
Next, we consider a fixed noise level of δ = 0.001 and illustrate the gain in computational effort obtained by using the active set from the previous Newton step as an initial guess (warm start) as compared to starting each Newton step with an empty active set A = ∅ for the upper bound and a full active set C = N for the lower bound (cold start), see Figure 4. We do so for the inverse potential problem from Section 3.2. The CPU times were 8.60 (351.20) seconds with warm start and 9.15 (525.99) seconds with cold start to achieve an L 1 error of 0.0959 (0.0827) for N=32 (N=64).
Finally, in order to demonstrate the ability of the method to also deal with coefficients that would not allow for a well-defined parameter-to-state map, we consider the inverse potential problem from Section 3.2 with the test cases

Conclusions and Remarks
Imposing bounds both on the searched for parameter and on the data misfit is shown to provide an efficient tool for stabilizing inverse problems. The box constrained minimization problems resulting after discretization can be efficiently solved by a Gauss-Newton type SQP approach using recently developed active set methods for box constrained strictly convex quadratic programs. This is demonstrated by three examples of coefficient idenitification in elliptic PDEs. Future research in this direction might be concerned with strategies dealing with the potential nonconvexity arising due to stronger nonlinearity. (Note that the examples considered here so far can be shown to satisfy the so-called tangential cone condition and are therefore only mildly nonlinear.) Also investigations on convergence of iterative regularization methods based on formulation (11) in an infinite dimensional function space setting would be of interest. However, this requires to work in nonreflexive spaces both in parameter and in data space, which makes an analysis challenging. J(x δ k ,u δ k ) J(x 0 ,u 0 ) 6.6901e-06 6.6154e-06 6.6154e-06 3.2887e-05 3.2532e-05 3.2744e-05 errspot 1 2.4023e-06 0 0 6.7780e-07 0 0 errspot 2 7.3665e-06 0 0 0.0462 0 0 errspot 3 1.0890e-05 0 0 J(x δ k ,u δ k ) J(x 0 ,u 0 ) 6.2155e-06 6.0569e-06 8.0316e-07 1.6318e-04 1.6286e-04 1.6286e-04 errspot 1 6.9611e-10 0 0 1.2184e-10 0 0 errspot 2 1.7502e-06 0 0 0.7183 0.7145 0.7145 errspot 3 1.5392