Kullback-Leibler residual and regularization for inverse problems with noisy data and noisy operator

We study the properties of a regularization method for inverse problems with joint Kullback-Leibler data term and regularization when the data and the operator are corrupted by some noise. We show the convergence of the method and we obtain convergence rates for the approximate solution of the inverse problem and for the operator when it is characterized by some kernel, under the assumption that some source conditions are satisfied. Numerical results showing the effect of the noise levels on the reconstructed solution are provided for Spectral Computerized Tomography.

1. Introduction. In this paper, we are interested in the inversion of a nonlinear operator F . Our aim is thus to find an approximation of a solution x of the inverse problem formulated as: (1) F (x) = y.
We assume that the data are corrupted by Poisson noise. For this type of noise, the relevant distance is the Kullback-Leibler (KL) distance defined in the following. Therefore, we assume that the distance between the non noisy data y and the noisy data y δ can be estimated with the Kullback-Leibler (KL) distance: (2) KL(y δ , y) ≤ δ 2 where δ is a positive constant. This assumption replaces the classical one, y − y δ 2 ≤ δ 2 used for Tikhonov regularization. In order to use the Kullback-Leibler (KL) distance, we assume that F : D → D is defined on the domain D = {v ∈ L 1 (Ω), where Ω is a bounded open set of R n and where d 1 and d 2 are two strictly positive constants, 0 < d 1 < d 2 < ∞. For the sake of simplicity, we consider the case where the domain and the codomain of the operator are the same, but different strictly positive upper and lower bounds could be considered for these spaces. In the classical inverse problem theory, it is assumed that the operator F is known exactly. We assume here that only an approximation F δ : D → D of the exact operator is known. An example of such problems is the spectral computerized tomography (SPCT) inverse problem when the detector response is unknown. We are interested to estimate an approximate solution of the inverse problem from the noisy data and also to study some methods to recover at the same time this solution and the unknown operator. In the following, we investigate variational approaches with regularization functionals based on the Kullback-Leibler distance as discrepancy term and regularization term. There is no detailed study of this type of inverse problems with Poisson noise on the data and an inexact operator. The joint additive Kullback-Leibler (KL) residual minimization and regularization for linear inverse problems has been investigated in [21] but the operator is assumed to be well determined. Our study generalizes this work and several proofs will be similar to the ones detailed in this work. In order to investigate linear problems with inexact operators, Regularized Total Least Squares methods have been proposed with cost functionals based on the Frobenius norm and the L 2 norm [12]. In the framework of the Regularized Total Least Squares, some estimates of (x, F, y) are determined by solving the constrained minimization problem: where B is some unbounded densely defined self-adjoint strictly positive definite operator used to restrict the set of admissible solutions and R a positive constant. Dual Regularized Total Least Squares are studied in [17], [26]. In these cases, the inverse problem is formulated as the following constrained minimization problem: where h is some bound on the noise on the operator. A double regularization approach is considered in [6] for inverse problems with bilinear operators of the unknown function x and of a kernel k. The kernel function determines the behaviour of the operator and a double penality term is used to stabilize the reconstruction of the unknown solution and of the charateristic function governing the operator. A convergence rate analysis of Tikhonov regularization for nonlinear inverse problems with noisy operators is detailed in [18] for uniform and non uniform noise bounds. In both cases, the discrepancy term is the square of the L 2 norm. In this work, we extend these studies and we investigate the properties of the regularization methods minimizing functionals based on the KL distance. We estimate error bounds for the solution of the inverse problem and convergence rates depending on noise bounds on the data and the operator. These convergence rates are obtained if source conditions are satisfied, as usual for linear and nonlinear inverse problems [11,22,8,24]. In order to determine also the direct operator, we consider the case where it depends linearly on a kernel. In this case, the noise of the operator is due to this unknown characteristic function. A stabilizing term is included in the regularization functional. Assuming that source conditions holds, we estimate the convergence rate for both the solution of the inverse problem and the operator. This article is organized as follows. In the second section, we present some preliminaries about the Kullback-Leibler functional and the operator noise and in the third section we present the mathematical framework with two different regularization functionals. In the fourth section, we analyze the well-posedness and convergence properties of the model, and we detail the convergence rates for the different cases investigated. Finally, in the last section, we present some simulations results for Spectral Computerized Tomography inverse problem illustrating the effect of the noise on the data or on the operator.
2. Notation and preliminary results on Kullback-Leibler divergence. The Kullback-Leibler divergence is the Bregman divergence associated with the Poisson noise distribution [4]. The Kullback-Leibler divergence between two functions u and v in its domain is given by: To avoid divergencies, it is possible to define a regularized distance: where is a small parameter [29,30]. In the following, we will consider restricted Kullback-Leibler distances. We consider functions in D for which the KL divergence leads to well-defined regularization methods [20]. The assumption that the regularized solutions belongs to the set D is quite natural. It has been shown in [21] that under the assumptions that the solution x is bounded and bounded away from zero almost everywhere and that the forward operator is a linear integral operator with a non negative kernel, then the regularized solutions obtained from the minimization of functionals based on the KL distance are also bounded and bounded away from zero. In this case, the set D is well-defined. From the definition of the Kullback-Leibler divergence, we have the following equality for any function u, v and w in the domain of Kullback-Leibler divergence: The following properties will be useful in the following [22,10]: (ii) For any fixed v, the function KL(., v) is lower semicontinuous with respect to the weak topology of L 1 (Ω). For any fixed v, the function KL(v, .) is lower semicontinuous with respect to the weak topology of L 1 (Ω). KL is also lower semicontinous with respect to the product topology of L 1 (Ω) × L 1 (Ω). (iii) Let A : L 1 (Ω) → L 1 (Ω) a compact, linear operator, with a positive range. For any C > 0 and any stricly positive u ∈ L 1 (Ω), the following sets are weakly compact in L 1 (Ω): {x ∈ L 1 (Ω) : KL(Ax, u) ≤ C}. Proposition 2.3. Let u, v ∈ D ⊂ L 1 (Ω), then there exists a positive constant C such that: Proof. The functions u and v are uniformly bounded away from zero and With ln(1 + t) ≤ t for all t > −1, we obtain: for a positive constant C.
3. Problem formulation. We assume that the distance between the noisy data and the non noisy data can be evaluated with the Kullback-Leibler distance with: Moreover, we assume we have a uniform operator noise and that there exists a positive constant δ F such that: We consider two different regularization functionals depending on whether we want to estimate the solution of the inverse problem from noisy data and from a noisy operator or at the same time this solution and the foward operator.
The first regularization functional considered J 1 is: where α is a regularization parameter and x * ∈ D an initial guess. In order to estimate at the same time the solution of the nonlinear inverse problem and the operator, we will investigate the case where the nonlinear operators F and F δ can be characterized by functions k and k δ ∈ L 2 (Ω). We assume also that they depend linearly on them [6,7] . We will consider the operatorF whereF is linear with respect to k and nonlinear with respect to x. In this case, the error on the operators in Eq.12 can be replaced by a bound on the error on the kernel k.
We will use the following regularization functional: where α and β are two regularization parameters. The kernel k * and the function x * are used as initial guesses. The two terms stabilize the inversion with respect to x and k. We will rewrite the regularization functional as: where η is a positive constant. In the following, we will denote Φ the functional including the regularization terms for x and k 3.1. Well-posedness. In this section, we analyze the properties of the regularization functionals J 1 and J 2 . We first show the well-posedness of the method and that the minimizers of the functionals exist for every parameters α and β.
Theorem 3.1. Assume that α > 0, and y δ ∈ D. Assume that the operator F δ is weak-to-norm continuous with respect to the weak topology of L 1 (Ω). Then there exists a global minimizer of the functional J 1 over D.
Proof. The proof is along the line of Theorem 3.1 in [14] , Theorem 4.2 [6]. We restrict the KL distance to functions in D [20]. The regularization functional is weak lower semicontinuous, positive (Proposition 2.2). Since x ∈ D, we can not consider that x 1 → ∞ and show that the functional is coercive. Yet, since Ω is bounded and with Eq.6, we can see that J 1 and KL(x, x * ) are bounded from above and below for x ∈ D. It is possible to find a minimizing sequence of J 1 and with the weak compactness of the level sets of KL intersected with D, it is also possible to extract a subsequence converging weakly in the L 1 (Ω) topology. The weak lower semicontinuity of J 1 concludes the proof.
Then there exists a global minimizer of the functional J 2 over L 2 (Ω) × D.
Proof. The functional J 2 is positive, proper and coercive since it follows with Proposition 2.1 that: is the domain of the functional J 2 . There exists a sequence (k j , x j ) such that J 2 (k j , x j ) → m. Thus the sequence (k j , x j ) is bounded. The kernels k j belongs to L 2 (Ω) which is reflexive Hilbert space and from Proposition 2.2 (iv), there exist subsequences also denoted as (k j , x j ) such that k j k and x j x in the L 2 (Ω) and L 1 (Ω) topologies respectively. With the weak lower semicontinuity of the functional J 2 with respect to the product topology, we obtain: Thus (k, x) is a global minimizer.
3.2. KL-minimal and Φ-minimal solutions. We will consider KL-minimal solutions of Eq.1, in the following sense: an element x ∈ D, is called a KL-minimizing solution of (1) if F (x) = y and KL(x, Proposition 3.3. Assume that there exists a solution of (1), and that the operator F is weak-to-norm continuous. Then there exists a KL-minimal solution.
Proof. There exists a sequence (x k ) of solutions of (1) in D, such that With Proposition 2.2, it is possible to extract a weakly convergent subsequence which is also denoted by (x k ), with a weak limit denoted by x. From the weak lower semicontinuity of KL (Proposition 2.2 (ii)), it follows that KL(x, x * ) ≤ c. Moreover, for all k, F (x k ) = y and F is weak-to-norm continuous, thus it follows that F (x) = y.
Similarly, we can define a Φ-minimal solution ofF (h, x) = y with Φ given in Eq.14. The L 2 norm and the KL distance are weakly lower semicontinuous with respect to the weak topologies of L 2 (Ω) and L 1 (Ω) respectively and therefore it follows: Proposition 3.4. Assume that there exists a solution (k, x) ofF (k, x) = y, and that the operatorF is weak-to-norm continuous. Then there exists a Φ-minimal solution.

4.
Convergence properties and convergence rates. In this section, we study the convergence properties and the convergence rates for linear and nonlinear inverse problems solved with the approximate solutions x δ α or (k δ α , x δ α ) obtained with the regularization functionals J 1 or J 2 . We show that the solutions converge to some solution of the inverse problems F (x) = y orF (k, x) = y as the noise levels tend to zero, provided the regularization parameter α is chosen appropriately. We will use the following proposition: Proposition 4.1. Let us assume that F is Fréchet differentiable in a ball B around x and that there is a positive constant L such that for all x in B: where . is the norm of the linear operators F (x) : Then we have for all x in D: This result is obtained with the mean value theorem.

4.1.
Convergence properties for the regularization functional J 1 and nonlinear inverse problems. We first show that under an appropriate regularization parameter choice rule, the minimizer x δ α of the functional J 1 converges to a KLminimal exact solution as the noise levels δ and δ F converge towards zero.
Proposition 4.2. Let (y δ j ) and (F δ j ) two sequences of noisy data and weak-to-norm Let x δj αj be the minimizer of J 1 obtained with the noisy data y δ j , the noisy operator F δ j , and the regularization parameter α j . There exists a convergent subsequence of (x δj αj ). The limit of every convergent subsequence of (x δj αj ) is a KL-minimal solution of F (x) = y.
Proof. The minimizing property of x δj αj guarantees that: With Eq.7, we get: With Propositions 2.2 and 2.3, there exist positive constants C, C 1 such that: where we have used Proposition 2.1 and KL(y δ j , y) ≤ δ 2 j . With Eq.22, 23 and 24 and with KL(y δ j , y) ≤ δ 2 j , we obtain that, as δ j → 0, there exists a positive constant C 2 such that: The sequence KL(x δj αj , x * ) is bounded and therefore there exists a weakly convergent subsequence also denoted by x δj αj converging weakly towards x. We will prove that x is a KL-minimal solution. The nokse on the operator is bounded and thus we have: F δ j is a weak-to-norm continuous operator and thus F δ . With the weak lower semicontinuity of KL, it follows: Therefore, we obtain: With Eq.21 and with the weak lower semicontinuity of KL, we obtain: and thus x is a KL-minimal solution and KL(x δj αj , x * ) → KL(x, x * ). With Proposition 2.1, it follows that x δj αj → x in the strong topology of L 1 (Ω).
The former result can be applied to a sequence (F δ j ) of bounded linear operators. We now detail convergence rates based on some source condition. Theorem 4.3. Let F and F δ two nonlinear operators satisfying the uniform noise bound of Eq.12 . We assume that F is Fréchet differentiable and that there exists a Lipschitz constant L such that: for all x 1 , x 2 ∈ D, where F (x) denotes the norm of the linear operator F (x). Moreover, we assume that the following source condition holds: . Let y and y δ such that KL(y δ , y) ≤ δ 2 , and x δ α the minimizer of the regularization functional J 1 . Then with the parameter choice α ∼ (δ + δ F ) 1/2 , we obtain the convergence rate Proof. The proof is similar to the one of Proposition 4.2. With Eq.7, we get: With Proposition 2.2 and 2.3, there exist positive constants C and C 1 such that: With KL(y δ , y) ≤ δ 2 , we obtain that, as δ → 0, there exists a positive constant C 2 such that: The Lipschitz continuity of the Fréchet derivative F and Proposition 4.1 imply that: The minimizers x δ α satisfies: where x is a KL-minimal solution of F (x) = y. With Eq.7, we have: Thus with Eq.37 it follows that: where ., . denotes the dual pairing of two functions. Moreover, we have and there is positive constant C 3 such that: We obtain thus for positive constants C 4 = 2 sup x∈D x 1 and C 5 : and we can reformulate Eq.46 as: for a positive constant C 6 . We obtain thus: With the choice α ∼ (δ + δ F ) 1/2 , we obtain that KL(x δ α , x) = O((δ + δ F ) 1/2 ). The former proposition gives a convergence rate for an a priori choice of the regularization parameter. Methods of choice of regularization parameters based on the Morozov principle have been studied in detail for the minimization of regularization functional based on least squares [1,11]. A Generalized Discrepancy Principle was proposed in [27] to take into account the discretization errors and the incompatibility of the data with the equation to be solved. Some rules for the choice of the regularization parameter for Poisson noise have been proposed in [23,3,5,15]. Generalzing these rules in the case where the operator is inexact will be the subject of future work.

4.2.
Convergence properties for the regularization functional J 2 and nonlinear operators depending linearly on a kernel. In this section, we investigate the regularization functional J 2 and we want to reconstruct also the forward operator. The convergence rates are thus given for k δ α → k, and x δ α → x. For the case investigated here, we will use the following source condition: where ∂Φ is the subdifferential of Φ defined in Eq.16 and R denotes the range of an operator. The former condition implies that there exists, (φ x , φ k ) in the subdifferential of Φ at (k, x) such that: (55) (φ k , φ x ) = (η(k − k * ), ln( x x * )) =F (k, x) * w, w ∈ L 1 (Ω) We assume thatF is Fréchet differentiable, with a Lipschitz continuous derivative. For (u, v) ∈ L 2 (Ω) × L 1 (Ω), the Fréchet derivative ofF at a point (k, x) is given by: For (u, v) ∈ L 2 (Ω) × L 1 (Ω), the remainder of the Taylor expansion can be estimated by: for a positive constant C. We first show the convergence of the regularization method.
Proposition 4.4. Let y δ j a sequence of data such that KL(y δ j , y) ≤ δ 2 j with δ j → 0. We assume that the operatorF is weak-to-norm continuous with respect to the topology of L 2 (Ω)×L 1 (Ω). Assume also that the regularization parameter α j satisfies α j → 0 and Let (x δj αj , k δj αj ) be the minimizer of J 2 obtained from the noisy data y δ j and regularization parameters α j . There exists a convergent subsequence of (x δj αj , k δj αj ). The limit of every convergent subsequence of (x δj αj , k δj αj ) is a Φ-minimal norm solution of F (k, x) = y.
Proof. The minimizing property of (k δj αj , x δj αj ) guarantees that: The sequence KL(x δj αj , x * ) is bounded and therefore there exists a weakly convergent subsequence x δj αj converging weakly towards x. Similarly, k δj αj − k * 2 is bounded and thus there is a subsequence k δj αj converging weakly towards k. We will prove that (k, x) is a Φ-minimal solution andF (k, x) = y. We havẽ and thusF (k With the lower semicontinuity of KL with respect to the product topology, it follows that: Therefore, we obtain: With Eq.59 and with the weak lower semicontinuity of KL, we obtain: The next result gives some convergence rates for the regularization method based on the functional J 2 . Theorem 4.5. Let y and y δ such that KL(y δ , y) ≤ δ 2 , (k, x) a Φ-minimal solution ofF (k, x) = y, (k δ α , x δ α ) the minimizer of the regularization functional J 2 . We assume the source condition Eq.54 holds for w ∞ small enough. Then KL(x δ α , x)+ η k δ α − k 2 2 ∼ δ 1/2 . Proof. With Eq. 7 , we obtain: KL(y,F (k δ α , x δ α )) + KL(y δ , y) − KL(y δ ,F (k δ α , x δ α )) = Ω (ln(y) − ln(F (k δ α , x δ α )))(y − y δ )dx and thus with Proposition 2.1, there exist some positive constants C, C 1 such that: Let U δ α = (k δ α , x δ α ), and U = (k, x), the Lipschitz continuity of the Fréchet derivativeF implies that: The minimizers (x δ α , k δ α ) satisfies: and with the source condition of Eq.54: With Eq.70 , we get: There exist positive constants C , C 1 and C 2 such that: We obtain thus: Eq.76 can be rewritten: where A 1 is a positive constant. Assuming that w ∞ < min(1/C 1 , η/C ), we have: With this inequality, we obtain: For the parameter choice α ∼ δ 1/2 , we obtain:

5.
A numerical experiment: Spectral computerized tomography. In this section, we present some numerical experiments showing the effect of the noise levels on the forward operator or on the data on the quality of the approximate solution for a nonlinear inverse problem obtained with joint KL data term and regularization. Multi-energy Computerized Tomography, or spectral CT, is an imaging modality where the information received depends on the energy of each photon hitting the detector. This information is provided by a photon counting detector which gives an image for each energy bin. Assuming that the object's attenuation can be obtained by linear combination of the attenuations of only a few materials, this energy-resolved information allows to reconstruct several volumes, each one representing a different material concentration map (e.g soft tissues and bone) [16,25,9]. The measured projections are corrupted by Poisson noise. The spectral CT problem sets an ill-posed inverse problem that has to be regularized to obtain stable results for material decomposition. In the first subsection, we present the forward model relating the projected mass of the material to the number of photons detected and the regularization functional. In the second section, we detail the numerical results.

Forward model for spectral CT.
For the sake of simplicity, we consider a 2D object to reconstruct. The forward model presented here can be easily extended to 3D. We denote Ω the space that contains the 2-dimensional (2-D) object studied and x the voxel position. This object is imaged on a 1-D detector for several projection angles. The number of photons transmited at energy E for each pixel u of the detector and for each angle θ is denoted n (E, u, θ). The energy range is denoted as B and the set of the pixels values on the detector as Σ. With the Beer-Lambert law, it follows : where n 0 (E) is the source spectrum, L(u, θ) is the line defined by the X-ray beam and µ(E, x) is the linear attenuation coefficient at energy E for the voxel x. The number of photons detected by the detector for energy E, pixel u and angle θ can be written as : is the detector response function and thus: It describes the probability that a photon with the energy E is detected at the energy E. We assume that we can write the attenuation coefficient as a sum of M basis functions for each material that are separable in energy and space: where ρ m (x) is the concentration of the material m at the voxel x and τ m (E) is a well-defined function describing the attenuation effects in the material m at energy E. The total number of materials is denoted as M . It is possible to reformulate the direct problem with Radon projection operator which maps a function f ∈ L 1 (Ω) to its line integrals. Let L(θ, u) be the line defined by L(θ, u) = {τθ * + uθ : τ ∈ R}, withθ = (cos(θ), sin(θ)) andθ * = (−sin(θ), cos(θ)), the Radon transform for f ∈ L 1 (Ω) is given by: and it follows: The nonlinear forward problem can be formulated as s = F(ρ) with a nonlinear operator F : . It can also be formulated as s = F(d, ρ) with a nonlinear operator F : . The framework presented above can thus be applied. The following proposition shows that the operator F satisfies the continuity assumption of the former sections. Proof. Let us assume for the sake of simplicity, that M = 1. Let us consider a sequence (ρ n ) in L 1 (Ω) converging weakly towards ρ in the weak topology of L 1 (Ω), then, for all φ ∈ L ∞ (Ω): For u ∈ B and θ ∈ [0, π], taking φ as the indicator function of the line L(θ, u), it follows that: The convergence properties presented in the former sections are obtained with an operator mapping a space D ⊂ L 1 (Ω) with itself. They can be easily generalized to a mapping between different spaces of strictly positive functions included in L 1 (Ω). Moreover, they may be extended to a product of spaces included in (L 1 (Ω)) M by considering the regularization term 1≤m≤M KL(ρ m , ρ * m ). After discretization, the inverse problem considered is a finite dimensional one. An integrated circuit counts the number of photons that are detected in the i th energy bin The number of photons detected in the i th , for the angle θ is : The function d i (E) ∈ L 1 (B, [0, 1]) is the response function of the i th energy bin of the detector. For the numerical simulation, we consider that our detector has P pixels, I energy bins, and the projections are measured for S angles. The dectector response function d is characterized by a set (d i ) 1≤i≤I of response functions.The projections data are thus the vector s ∈ R IP S = (s ijl ) 1≤i≤I,1≤j≤P,1≤l≤S . The domain to reconstruct is also discretized with Q pixels. We want to recover the set of the masses of the materials ρ ∈ R M Q = (ρ ij ) 1≤i≤M,1≤j≤Q . The discretization levels are thus n = M P and m = IP S. We consider a cost function similar to the ones in Eq.13 and Eq.14 in order to recover the maps of the materials: and

Minimization algorithms.
In this subsection, we present the minimization algorithms used for the functionals J 1 and J 2 . Regularization functional J 1 For the minimization of the regularization functional J 1 , we have to optimize the concentrations map ρ. The objective functional is non convex with respect to the materials concentrations. During the minimization, the simulation may be trapped in local minima. Recently, several methods have been proposed which reonstruct material-specific volumes directly from the photon counts. They are commonly referred to as one-step inversion methods. All of these methods are iterative: there is currently no analytical inversion formula for the material decomposition problem. They asssume a well-known detector response function [16,2,28,19]. In this work, we have used the Mechlem approach to minimize the regularization functional. The principle of the method is to minimize the functional based on the KL divergence which corresponds to the Poisson negative log-likelihood of the data by Separable Quadratic Surrogates (SQS). The surrogates are derived sequentially. The minimization also integrates Ordered Subsets (OS) and Nesterov acceleration. The minimization method is described in detail in [19]. Regularization functional J 2 In [7], an efficient and convergent alternating mimization scheme for the minimization of doubly regularized Total Least Square was presented. In each step, the regularization functional is minimized over one variable while keeping the second one fixed. In the case investigated in [7], for each subproblem, the functional is convex and a global minimum is obtained. The sequence obtained is converging towards a critical point of the regularization functional. In this work, we have not the same convergence results since the functionals considered are not convex but we use the same optimization methodology. In the case of the functional J 2 , numerical experiments have been performed to reconstruct both the function and the kernel. For the minimization of the regularization functional J 2 , we alternate the minimization with respect to ρ and with respect to the detector response. For the minimization with respect to ρ, we use the former Mechlem algorithm. For the minimzation with respect to the detector response, we use a gradient descent algorithm.
The operator F i is linear with respect to the detector response and thus the Fréchet derivative of F i (Eq.100) for d i ∈ L 2 (B, [0, 1]) with respect to the detector response is: The adjoint of F i is thus the operator F * i : The gradient of the KL data term with respect to the detector function d i is thus Simulations details. We present here simulations to illustrate the former convergence results. We can not show that the solution satisfies the source condition, but we use different noise levels for the projection data s and for the forward operator and show the effect on the quality of the reconstruction. We assume that the detector response function is approximately known. The approximation of the detector response function d will be denoted d δ . Our aim is first to reconstruct ρ from the noisy data s δ and from the noisy kernel d δ using the regularization functional J 1 . Then, we give approximate solutions for the concentrations and the kernel using the regularization functional J 2 .
We have designed a simple two-dimensional 3-materials phantom, consisting of a large square of water at 1 g/ml, a small square of iodine at 10 g/ml, and a small square of gadolinium at 10 g/ml. The iodine and gadolinium squares are inside the water square, but do not overlap. The phantom has 2562 voxels. This phantom is displayed in Figure 1. Through this phantom, 725 parallel projections were simulated, with 362 rays per projection, using the AIR toolbox [13] to generate the sparse forward projection matrix. The line integrals obtained were then converted to photon counts, following the classical polychromatic Beer-Lambert attenuation law. We consider in this work 5 energy bins. The detector response functions can thus be decomposed into 5 energy response functions d i . The detector response was simulated according to the model presented in appendix A.2 of [25]. The detector responses for the five energy bins are displayed in Figure 2. In the end, the photon counts were corrupted with Poisson noise. This model therefore neglects pile-up, scatter, charge sharing and probably many other complex effects.
In the simulations, the noise on the data s is characterized by the value s δ − is 0.05 or 0.1. The noise on the operator is evaluated by an approximate value of sup ρ∈D F(ρ, d)−F(ρ, d δ ) 1 / F(ρ, d) 1 = δ F . This quantity is estimated by sampling the concentration maps ρ around the initial concentration ρ * . The initial guess concentration ρ * used in the simulations consists of squares of gadolinium and iodine centered at the positions corresponding of the ground truth. The size of the borders is three times the ones of the ground truth squares. In order to sample the set D and evaluate δ F , we have tested considered several phantoms. The support of the materials are varied between the empty set and the support of the first guess concentration ρ * enlarged by a factor 1.5. The concentrations are chosen between 0.5 and 1.5 times the true gadolinium, water and iodine concentrations. The initial guess detector response d * is obtained from the true dectector response function d with a convolution with a Gaussian kernel. It is such that sup ρ∈D F(ρ, d) − F(ρ, d * ) 1 / F(ρ, d) 1 = 0.1.
The iodine and gadolinium concentrations are not bounded away from 0 everywhere. In order to avoid divergencies, we have just taken into account the pixels in the images with values different from 0 for these materials. The lower box constraint is thus ensured even if the lower bound value d 1 is not precisely determined for these materials. Similar results are obtained with Eq.6 and a small constant .
The regularization parameter α has been chosen in order to obtain to largest decrease of the regularization functional, with an extensive sweeping of the parameter values, multiplying or dividing it by powers of 2, and choosing the largest possible regularization that did not cause significant amount of cross-talk. The parameter η is chosen such that the two regularization terms in Φ have the same order of magnitude at the end of the optimization. Several values have been tested. The best simulation results have been obtained with η = 10. In the reconstruction of the simulated data, we have compared the concentrations of the materials with the ground truth concentrations. In order to evaluate the quality of the reconstruction for each material m, the l 1 relative error is calculated with ρ m −ρ m 1 / ρ m 1 = P p=1 |ρ mp −ρ mp |/ P p=1 |ρ mp |, where ρ m be the ground truth concentration map and ρ m the estimated concentration map 5.3. Results and discussion. We have tested different noise levels for the data and the detector response. We present successively the results for the functional J 1 and the functional J 2 . Functional J 1 The decrease with the iterations of the KL data term measuring the mismatch between the measured photons counts and those simulated through the reconstructed volume are displayed in Fig.3 for the different noise levels investigated.     for the KL data term and for the reconstruction errors for the three materials. As expected, the quality of the reconstruction degrades when the noise level on the forward operator increases. Some cross-talk artifacts are also visible when the noise level increases. Functional J 2 For the functional J 2 , we started the simulation with a noisy detector response d * . A gradient descent step for the detector response is performed every 5 iterations. The decrease of the relative errors for the iodine, water and gadolinium  concentrations obtained with the alternate minimization algorithm are displayed in Figure 9, 10 and 11. The final concentrations maps are displayed in Figure 12. Large decreases of the errors for the three materials are observed with the minimization of the functional J 2 . The increase of the noise level on the forward operator degrades the accuracy of the reconstruction. The comparison of the thin and dashed lines on these figures shows the improvement achieved with the alternate minimization and the optimization of the detector response function. With these results, it can be seen that better reconstruction results are obtained with a few gradient steps for the detector response function. Yet, the solution seems to be trapped in a local minima after a few iterations. 6. Conclusion. In this work, we have studied inverse problems with noisy data and unknown operator. We have considered regularization functionals based on the Kullback-Leibler divergence as data term and regularization term. We have first considered the case where we want only to recover the solution of linear or nonlinear inverse problems. Then we consider a more general functional when the aim is to estimate also the forward operator. We have studied the convergence properties of the regularization methods based on these functionals. We have derived convergence rates based on source conditions and restrictions on the linearity of the direct