MOROZOV PRINCIPLE FOR KULLBACK-LEIBLER RESIDUAL TERM AND POISSON NOISE

. We study the properties of a regularization method for inverse problems corrupted by Poisson noise with Kullback-Leibler divergence as data term. The regularization parameter is chosen according to a Morozov type principle. We show that this method of choice of the parameter is well-deﬁned. This a posteriori choice leads to a convergent regularization method. Convergences rates are obtained for this a posteriori choice of the regularization parameter when some source condition is satisﬁed.

1. Introduction. We consider in this work the linear inverse problem: where A : L 1 (Ω) → L 1 (Ω) is a linear compact operator continuous with respect to the weak topology of L 1 (Ω), and Ω a bounded open subset. The ideal data Y can be interpreted as a photon density with intensity y and exposure time t such that Y = yt. The observed data G t is a Poisson process with measure yt, and it can be written G t = Nt i=1 δ xi , where the x i denotes the positions of the N t detected photons.
Poisson inverse problems appears in many astronomical imaging and medical imaging problems like positron emission tomography or multi-spectral computerized tomography [17,43,33,30]. This problem is ill-posed and must be regularized. A regularization functional is usually based on a weighted combination of a data residual term and of a regularization term [21]. When the residual minimizer is a quadratic, a classical choice for the regularization term is some Sobolev space norm as in Tikhonov regularization [44]. For Poisson noise, quadratic data misfit functional have to be replaced by the Kullback-Leibler (KL) functional [29]. In a variational framework, the unknown is obtained with the minimization of a functional which is the sum of a KL data term and of a regularization term which includes a priori knowledge on the solution X. A regularization parameter is balancing the two terms. The Kullback-Leibler divergence is a well-known example of Bregman distance. The Bregman distance is now a widely used tool to study optimization and signal processing algorithms [13]. It is very successful to establish convergence rates [36,14]. It can also be used to build new data fidelity functionals.The solution obtained can be understood as a maximum a posteriori estimator in a Bayesian framework with exponential families. Inverse problems with the Kullback-Leibler divergence as data term have been been much studied in the context of Poisson noise [6,7,2,34,35,12,10,20,28]. In a deterministic context, this regularization has been investigated in Pöschl et al [34]. For a statistic noise hypothesis, the problem has been investigated by Werner et al. [46,47]. Some convergence results have been obtained for the mean value of the Bregman divergence associated to the regularization term measuring the distance between the reconstructed solution and the true solution.
As usual for inverse problems, the regularization parameter has to be chosen with care to ensure the convergence of the regularization method. In [46,47], an a-priori choice of the regularization parameter is investigated, which requires the knowledge of the smoothness of the solution. Some convergence results for a parameter chosen with the Lepskii balancing principle are also presented. Other methods have been proposed for Poisson noise. We can mention the Bardsley et Goldes' approach [6], the Bertero method [9,48] or the Santos and De Pierro method [38]. The Bertero method is based on the Morozov discrepancy principle and on a finite dimensional approximation of the inverse problem [9] but is not clear if this a posteriori method of choice of the regularization parameter leads to a convergent regularization scheme. Recently, some regularization methods where the regularization parameter and the discretization level are chosen at the same time have been investigated [1]. The aim of this work is to analyze the convergence properties of the Bertero method taking into account the discretization level.
This paper is organized as follows. In the second section, we summarize basic results about Poisson random variables, exponential families, the Bregman distances and the Kullback-Leibler divergence that will be used in the paper. We also present the Bertero approach. In section 3, we show that the Bertero-Morozov type principle is well-defined and that the regularization method for the choice of the regularization parameter is convergent. Some convergence rates are derived in Section 4 under suitable source conditions. Some numerical experiments are presented in Section 5. A conclusion ends this article.
2. Inverse problems with the Kullback-Leibler divergence data term. We summarize in this section well-known results about the Kullback-Leibler divergence that will be used in the following.
where p 0 is a well-defined probability density function. The parameter θ is called the natural parameter of the exponential family and the function b the cumulant function with a domain Dom(b).
Definition 2.2. Let g : R → R a strictly convex and differentiable function, G : The probability density function for exponential families can be expressed with the Bregman distance associated to the Fenchel dual of b. This is expressed by the following theorem [34,8,5]: Theorem 2.3. Let p θ,b the probability density function of a regular exponential family distribution and B b * the Bregman distance derived from b * the Fenchel dual of b. Then p θ,b is uniquely determined as: where the function F b * : R → R + is uniquely determined and where µ is the expectation of x with respect to For Poisson noise distribution, the dual of the cumulant is the function denoted g in the following and defined by g(x) = b * (x) = xln(x) − x. The Kullback-Leibler divergence between two functions u and v in its domain is given by: To avoid divergencies, it is possible to define a regularized distance: where is a small parameter [46,47]. From the definition of the Kullback-Leibler divergence, we have the following equality for any functions u, v and w in the domain of Kullback-Leibler divergence: The following properties will be useful in the following [36,20].
(ii) For any fixed v, the function KL(., v) is lower semicontinuous with respect to the weak topology of L 1 (Ω). (iii) For any fixed v, the function KL(v, .) is lower semi-continuous with respect to the weak topology of L 1 (Ω). (iv) Let A : L 1 (Ω) → L 1 (Ω) a compact, linear operator. For any C > 0 and any non-negative u ∈ L 1 (Ω), the following sets are weakly compact in L 1 (Ω): Regularization functional with the Kullback-Leibler divergence data term and Morozov principle. The use of the Kullback-Leibler divergence as data similarity term can then be understood in the Maximum a Posteriori (MAP) estimation framework Let us denote Y t be a noisy pertubation of some unknown data Y = AX, p(Y t |X) the conditional probability of Y t given X, p(X) and p(Y t ) the a priori distributions of X and Y t . The continuous MAP estimation problem consists of maximizing the functional: with respect to X. Taking the logarithm of T , the problem corresponds to the minimization of the logarithmic MAP estimator In a deterministic context, some method for an a priori choice of the regularization parameter have been proposed in [34]. For a statistic noise hypothesis, the problem of the choice of this parameter has been studied by Werner et al. [46,47]. On the other hand, a Morozov type discrepancy principle has been proposed for Poisson inverse problems. The Bertero discrepancy principle is in close relation with the Morozov discrepancy principle [21,11,3,4].
In its basic formulation for Gaussian noise,the Morozov principle consists to chose the value of the parameter α such that for a fixed τ > 1, the solution x δ α for noise level δ satisfies: (9) Ax δ α − y δ = τ δ where y δ is the noisy data. The noise free data y are assumed to be corrupted by noise in such a way that: (10) y − y δ L 2 (Ω) ≤ δ The Bertero principle is the generalization of the method of choice of the regularization parameter for Poisson noise. It is based on a finite dimensional approximation of the inverse problem and on the following lemma: [9,48] is a m-dimensional Poisson variable, where m may be the number of pixels of a detector. With the Bertero discrepancy principle the regularization parameter α is chosen such that the finite dimensional minimizer X α t,m of the regularization functional satisfies: where A m is a finite dimensional approximation of the direct operator A. This principle works well and some numerical examples are detailed [9,48]. Yet, to our knowledge, this approach has not been justified and the properties of the regularization method with this choice of the regularization parameter have not been investigated. They are studied in the next section.
3. The Morozov discrepancy principle for inverse problems with Poisson noise. In this section, we show that the use of the Morozov principle with the Kullback-Leibler functional is a well-defined and leads to a convergent regularization technique. The proof has many common points with the demonstration in the framework of Banach spaces [3,4].
3.1. Poisson processes. The observed data G t is a Poisson process, and it can be written G t = Nt i=1 δ xi , where the x i denotes the positions of the N t detected photons at time t. It can be seen as a random measure belonging to the space of integer valued Borel measures on Ω [26,37]. For any measurable subset A, G t (A) is an integer value random Poisson variable, with mean value denoted as ν t (A). We assume that the mean measure ν t is absolutely continuous with respect to the Lebesgue measure and proportional to the time of exposure t. We have thus ν t = yt and: where y is the scaled photons density. For a measurable function φ : Ω → R on Ω, the integral of φ with respect to the Poisson measure is given by: We assume that the Poisson measure G t can be written G t = tG t where the random measureG t converges weakly almost surely as t → ∞ to a measure with the density y.
In the following, in order to use the Kullback-Leibler distance, we will define some random intensity y t ∈ L 1 (Ω) which is a good approximation of the mean intensity y as t → ∞. The random measureG t can be considered as a generalized stochastic process. Generalized random processes are the probabilistic version of the distributions of L.Schwartz [41,27,23]. We recall that D(Ω) is the space of infinitely differentiable functions with compact support in Ω and D (Ω) the space of linear continuous functionals on D(Ω) with the corresponding weak convergence.
The random distributionG t ∈ D (Ω) can be regularized and approached as limit of random regular functions obtained by truncation and regularization. We set n = int(t), where int is the integer part of t, and we define for n ≥ 1, the compact set: For all n ≥ 1, there exists a function χ n ∈ C ∞ c (Ω) such that 0 ≤ χ n ≤ 1 χ n |K n = 1 supp(χ n ) ⊂ K n + B(0, 1/2n) (15) On the other hand, there is a function ξ ∈ C ∞ c (R 2 ) such that: We consider the function ξ n (x) = (4n) 2 ξ(4nx) and y t = T n(t) = (χ n(t)Gt ) * ξ n(t) . It can be shown that supp(T n ) ⊂ K 4n , y t ∈ C ∞ c (Ω) almost surely, where C ∞ c (Ω) be the set of C ∞ (Ω) functions with compact support in Ω. The function y t defines a positive distribution and a positive measure and we have: for any φ ∈ D(Ω), as t → ∞ almost surely [41,23,25] .  Proof. Let us assume µ n → µ in D (Ω) almost surely. Let F be a closed set and [41,23,25] and we have: (18) lim sup n µ n (F ) = lim sup for all k ≥ 0 almost surely. As k → ∞ , we obtain (1). By taking the complements we obtain (2). Let A be a stochastic continuity set of the random measure µ, Int(A) its interior and A its closure, we have almost surely: Since µ(∂A) = µ(A) − µ(Int(A)) = 0, these inequalities shows that lim µ n (A) = µ(A) almost surely for all stochastic continuity sets A.
Assuming that Ω is a continuity set for the mean measure ofG t , we obtain: almost surely. We have thus constructed an approximation y t of y which belongs to the space C ∞ c (Ω) ⊂ L 1 (Ω) almost surely. The average of y t is given by: As t → ∞, E[y t ] → y almost everywhere.

3.2.
Regularization functionals, discretization operators. We assume that the unique minimum norm solution x * of the inverse problem without noise belongs to a bounded, closed and convex subset B ∈ L 1 (Ω) ⊂ L 2 (Ω). We will consider the reduced regularization functional defined as: where x 0 ∈ L 1 (Ω) and α is a regularization parameter. We assume that the operator A : B → L 1 (Ω) is linear and continuous. We consider also a sequence of finite dimensional subspaces {Y n } n≥0 such that We assume we have continuous approximation operators Π n : B → B ∩ Y n with: for all x ∈ B with ρ n → 0 as n → ∞. For the sake of the simplicity, we will assume in the following that the operator Π n is linear. For some disjoint regions (A i ) 1≤i≤m ⊂ Ω, we consider the discrete approximation Π m G t of the Poisson process G t defined by: defines the indicator function of the set A and |A i | is the Lebesgue measure of the set A i . The domain Ω can be understood as a countable collection of boxes and G i t (ω) as the number of particles in the box A i . For each i, G i t (ω) is a Poisson random variable. Similarly, we define the discrete approximation Π mGt ofG t and we have: We will also use the notation g t,m = (g i t ) 1≤i≤m for the Poisson variables (G t (A i )) 1≤i≤m and (g i ) 0≤i≤m for their expectation values. We can thus assume there is a constant ρ m depending on the measure of |A − ∪ 1≤i≤m A i |, with ρ m → 0 as m → ∞ such that almost surely: This term corresponds to the discretization error of the Poisson process. The same discretization operator can be defined for f ∈ L 1 (Ω): This projection operator is a linear and bounded operator. Given the smooth approximation y t ofG t , one can consider Π m y t and the discrete values For given discretization levels m and n and x ∈ B∩Y n , the discrete regularization functional is given by: the function x ∈ B ∩ Y n can be written as x = Π n x , with x ∈ B. We will denote x α t the minimizer of the regularization functional J α,yt and Π n x α t,m,n the one of J α,yt,m,n . The proofs for the existence of these solutions are detailed in [36,34]. They are based on the weak compacity with respect to the L 1 (Ω) norm of the sublevels of the regularization functionals. We will assume that y and y t are bounded below and above by two strictly positive constants y min and y max . This assumption is also reasonable for y t since is it a smoothed version of the Poisson process. It implies that enough data have been collected with a long enough sampling time. Similar assumptions are done in [36,34] to ensure that the regularization method with the Kullback-Leibler divergence as discrepancy term is well-defined. To avoid divergencies, it is also possible to use the regularized Kullback-Leibler distance defined in Section 2.1 for the data term. In the following, we denote For the regularization functional considered, R min = 0 and X min is the set of constant functions equal to x 0 almost everywhere. We denote D x * R the Bregman distance associated to R evaluated at x * .
With this setting the Kullback-Leibler distance KL(y t , Ax α t ) between the function y t and the image of the reconstructed solution x α t is a random variable. In the next subsections, we show that this discrepancy term and its finite dimensional approximation satisfies some bounds almost surely as t → ∞ so that the Bertero principle is well-defined. In section 4, we derive convergence rates for the expecta-

3.3.
Morozov principle for Poisson noise. Considering these reduced variable y t , we will study the following discrepancy principle to chose the regularization parameter α which is directly derived from the Bertero principle. It is defined starting from the discrete version of the inverse problem. For given constants τ 3 > τ 2 > 1, and given time exposure t we will chose the discretization levels m(t) and n(t) and the regularization parameter α(t) such that x α t,m,n satisfies: with m(t) t → 0 as t → ∞. The basic facts necessary for the proof that this principle is well-defined are the following: -we show that the KL discrepancy term is a continuous function of the regularization parameter α (see Appendix).
-we bound KL(Π m y t , Π m AΠ n x * ) with a high probability taking account the Poisson statistics of Π mGt and the discretization errors. We consider the set L of R minimizing solutions for data without noise y: For α > 0, we define the set of functions minimizing the functional J α,yt : and also the set of functions for which the functional J α,yt is bounded by a constant c : These sublevel sets M α,yt (c) are weakly sequentially compact in L 1 (Ω) [36,34]. We will also use several times the following lemma: The sublevel sets M α,yt,m,n (c) = {z ∈ B ∩ Y n , J α,yt,m,n (z) < c} are weakly sequentially compact.
Proof. The set B ∩ Y n is closed and convex and thus weakly closed. The proposition 2.5 (iv) and weak continuity of KL and R with respect to the L 1 (Ω) norm conclude the proof.
In the following, we will assume (H 1 ) that X min ∩L = ∅. This assumption means that there is no constant function solution of the inverse problem for non noisy data. Let Y min = AX min and y 0 = Ax 0 : For τ 3 > 1, we also assume (H 2 ) that the smoothed noisy data Π m y t satisfy for all m and n large enough: This assumption is used to ensure the existence of a regularization parameter satisfying the discrepancy principle. We also make the third assumption, (H 3 ): (38) x ∈ M α,y and x ∈ L ⇒ x ∈ X min This assumption means that a function x ∈ L 1 (Ω) that minimizes the continuous regularization functional for the non noisy data and that is a minimum norm solution is a constant function. With assumption (H 1 ), it follows M α,y ∩ L = ∅.
For t, n and m fixed, we define the following functions of α : Well-posedness of the regularization method for the Morozov principle. We now show that the Morozov principle of Eq.31 is well-defined for m and n sufficiently large. In the first part of this subsection, we bound KL(Π mGt , Π m AΠ n x * ).
Let us consider the m-dimensional projections π mGt ofG t with components g t,m = (g i t ) 1≤i≤m . The moments of a Poisson variable are characterized by Lemma 7.1 (Appendix). The following proposition bounds the deviation of KL(g t,m , g m ) from m 2t : Proposition 3.4. Let τ 1 > 0, there exists a constant C such that: Proof. With the Tchebychev inequality we obtain: With the definition of the Kullback-Leibler term, for each index i with 1 ≤ i ≤ m, we obtain by a series developement, for each index i: with Let us consider the finite sum of the KL distances between the data with and without noise: The variables (g i t ) 1≤i≤m are assumed to be independent Poisson variables and thus: For n ≥ 1, and for 1 ≤ i ≤ m: using the lemma 7.1, we obtain for n ≥ 1: and for n = 1 Therefore, there exists a constant C 1 such that for each index i: Eq.44, we obtain there is a constant C such that: In the following, we will denote P m,t = C( Corollary 1. Let τ 1 > 0, there exists a constant C such that: We are now bounding the KL distance between Π mGt and Π m AΠ n x * .
Proposition 3.5. Let ρ n defined in Section 3.2, there is a positive constant D such that with a probability higher than P = 1 − P m,t .
Proof. From the definition of the Kullback-Leibler distance, with y = Ax * : where I 1 , I 2 , I 3 are the integrals: (66) One can assume that the derivative g (y) is bounded. The former integrals depend on the discretization level and they can be bounded from above by ρ n , (69) Similarly, we obtain: With proposition 3.4, it follows that there is a positive constant D such that: with a probability higher than P = 1 − P m,t .
It should be noted that this result is also true for non linear lipschitzian operators. This bound can be generalized to the smoothed approximation y t .
Proposition 3.6. Let ρ n defined in Section 3.2, there are positive constants D and E such that with a probability higher than P = 1 − P m,t .
Proof. From the definition of the Kullback-Leibler distance, it follows: Assuming that the derivative g (Π m AΠ n x * ) is bounded, the integral term can be bounded as follows: with proposition 3.5, we obtain the result.
Corollary 2. Assume we can find τ 1 > τ 1 and we can choose the discretization levels m(t), n(t) and t such that G t −y t 1 +Eρ m +Dρ n ≤ 2t with a probability higher than P = 1 − P m(t),t . This probability P converges to 1 as t → ∞.
Proof. It follows from proposition 7.7 (Appendix) that G t −y t 1 → 0 almost surely as t → ∞ and thus m(t), n(t) and t can be chosen such that Eq.77 holds.
The former conditions are satisfied for m(t) = t β for β < 1 and then for n and t large enough. The value of n has to be set as large as possible. The interpretation of this corollary is that m, n and t have to be chosen such that the discretization errors and the approximation error ofG t by y t are negligible with respect to the error due to the Poisson noise statistics.
Proposition 3.7. For fixed constants τ 3 > τ 2 > τ 1 > 1, and discretization levels m(t) and n(t) chosen according to corollary 2, there exists a regularization parameter α(t) such that: 2t with a probability higher than P.
Proof. With proposition 7.3 (Appendix) and assumption (H 2 ), we obtain that with a probability higher than P we have: For a given data realization, the KL term is a continuous function of α (see Appendix) and thus there exists a value α satisfying the Morozov discrepancy principle. As t → ∞, m(t) → ∞, P → 1 and the Bertero method of choice of the regularization parameter based on the Morozov principle is well-defined.

Convergence of the regularization method for the Morozov principle.
In this section, we show the convergence of the regularization method for the former a posteriori choice of the regularization parameter. The proof is based on the asymptotic properties of the smoothed approximation y t as t → ∞. We will use the following propositions proved in the Appendix: Proposition 3.8. Let us consider a sequence {t n } with t n → ∞. The L 1 norm y tn − y 1 → 0 almost surely as t n → ∞.
Proposition 3.9. Let us consider a sequence {t n } with t n → ∞. There is a constant ρ m → 0 such that y tn − Π mGtn 1 ≤ ρ m almost surely as t n → ∞.
The following proposition is independent of the discretization levels. Proof. Let t n → ∞, the sequence KL(y, Ax n ) is bounded and thus, there exists x ∈ L 1 (Ω), and a subsequence of {x n } also denoted {x n } weakly converging to x in L 1 (Ω).
With the continuity of the operator A and the weak lower-semicontinuity of KL we obtain: Since KL(y, Ax n ) = KL(y tn , Ax n ) + δ n with δ n = Ω g(y) − g(y tn ) − g (Ax n )(y − y tn )dx, we have: The derivative g is bounded on [y min , y max ] and thus we obtain that there exists a positive constant C 1 such that: (88) |δ n | ≤ C 1 y − y tn 1 From proposition 3.8, we see that δ n → 0 as t n → ∞ almost surely and Kl(y, Ax) = 0. With the weak lower-semicontinuity of R we obtain: The limit x * is a R minimizing solution and thus R(x) = R(x * ), and x ∈ L. Therefore, we obtain R(x n ) → R(x * ). With a subsequence argument, we see that the whole sequence converges weakly to a minimum R norm solution for the data without noise.
Proof. Let 0 < t < 1, and x + ∈ L ∩ M α,y : The regularization term R is convex and thus: and thus dividing by t and taking the limit as t → 0, we obtain: From the identity: Proof. From the definition of the Kullback-Leibler distance, it follows: (107) KL(Π m y t , u) = KL(Π m y, u) + I 1 + I 2 (108) Thus we have: The proposition then follows from proposition 3.8.
Proposition 3.14. The regularization parameter α chosen according to the Morozov principle satisfy α → 0 almost surely as t → ∞.
Proof. Let t k → ∞, m k = m(t k ), n k = n(t k ) and α k = α(t k ) chosen according to the Morozov principle for the regularization functional J α k ,yt k ,m k ,n k . Assume there exist some constant α such that for some subsequence also denoted {α k }, α k ≥ α, and denote x k the corresponding minimizer of the regularization functional J α k ,yt k ,m k ,n k Let us note {u k } the sequence defined by: With the Morozov principle and proposition 8.2: From proposition 3.12, we obtain : lim k→∞ KL(y t , AΠ n k u k ) = 0 (115) and lim sup k→∞ αR(Π n k u k ) ≤ αR(x * ) (116) Therefore, we can apply proposition 3.10 to {Π m k u k } and we can assume that {Π m k u k } is convergent to z ∈ L. From the weak lower semi-continuity of the functional R and of the KL data term, we obtain, for any f ∈ L 1 (Ω) The last inequality is satisfied almost surely as k → ∞ using proposition 3.13. Thus z ∈ M α,y . With assumption (H 3 ), we have z ∈ L ∩ X min . We have assumed that L ∩ X min = ∅, and thus we have reached a contradiction and thus α → 0 as t → ∞ almost surely.
Proof. Let t > 0, m and n and α chosen according to the Morozov principle, using the minimizing properties of Π n x α t,m,n , we obtain: where x * is solution of Ax * = y. The last inequality is satisfied almost surely as t → ∞. This implies that: thus from the former proposition, we obtain: This implies also that The regularization method with the regularization parameter chosen according to the Morozov discrepancy principle is a convergent regularization method: Proposition 3.16. Let t k → ∞, α k a sequence of parameters chosen according to the Morozov principle, and Π n k x α k t k ,m k ,n k minimizing the regularization functional J yt k ,α k ,m k ,n k . The sequence (Π n k x α k t k ,m k ,n k ) has a convergent subsequence with respect to the weak L 1 topology. A limit of each convergent subsequence is an Rminimizing solution.
Proof. With the definition of x α k t k ,m k ,n k , it follows that: where x * ∈ L . With proposition 3.13, From proposition 3.12, we obtain: With proposition 3.10, we conclude the proof.
4. Convergence rate of the a posteriori method of choice of the parameter with variational source condition. In this section, we do a statistical convergence analyis for the inverse problem and study the error estimates for approximate solutions in the limit t → ∞. We derive convergence rates for the expectation value, . Several types of source conditions are considered in the literature to obtain some convergence rate [39]. We assume the following condition (C) holds: there is ω ∈ L 1 (Ω) such that Aω ∈ ∂R(x * ) with ω 1 < ∞. Under the following assumption, a convergence rate can be obtained. We will use the following lemmas: is well-defined, then for all n ≥ 1: The proof can be found in [18]: Proof. With the Taylor development: and lemma 7.1, it follows that:

Proposition 4.3. Let assume ther forme source condition (C) is satisfied and
, then the following convergence rate is obtained: Proof. From the equality for the Bregman divergence, Eq.6, we have: and Taking the expectation we obtain: On the one hand, with Eq.51 From lemma 4.1 and 4.2 with n = 1, we obtain: For each Poisson variable, the second derivative g (Y i t ) = 1/Y i t is bounded from above by a constant C 1 . We can also assume y t − Π m AΠ n (x α t ) ∞ < C, where C is a constant independant of m and n.
Thus with the Cauchy-Schwarz inequality we obtain: where the last inequality is obtained with Eq.42 Using the L 1 bounding property of the Kullback-Leibler distance (proposition 2.5) and the Mororov principle we obtain: With Eq.143, 146 and 154 we obtain: From the definition of x α t , we have almost surely as t → ∞ : with x * ∈ L. and (157) τ 2 m(t) 2t + αR(Π n x α t,m,n ) ≤ KL(Π m y t , Π m AΠ n x α t,m,n ) + αR(Π n x α t,m,n ) and thus it follows: (158) R(Π n x α t,m,n ) ≤ R(Π n x * ) this inequality is satisfied almost surely as t → ∞. We now assume that the source condition holds, then we obtain: Using Eq.155, we get:

5.
A numerical experiment: Spectral computerized tomography. Multienergy Computerized Tomography, or spectral CT, is an imaging modality where the information received depends on the energy of each photon hitting the detector. This information is provided by a photon counting detector which gives an image for each energy bin. These projections allow us to separate the contribution of the different materials (e.g soft tissues and bone) to the projections [30,40,19,42]. The measured projections are corrupted by Poisson noise. For weak non linearities, the spectral CT problem sets a linear inverse problem that has to be regularized to obtain stable results for material decomposition. In the first subsection, we present the forward model relating the projected mass of the material and the number of photons detected and the regularization functional. In the second section, we detail some numerical experiments.

Forward model for spectral CT.
In this section, we present the forward model for spectral CT based on the attenuation model. We denote Ω the space that contains the 3-dimensional (3-D) object studied and x the voxel position. This object is imaged on a 2-D detector (S). The number of photon transmited at energy E for each pixel u of the detector is denoted n(E, u). The attenuation follows a Beer-Lambert law : where n 0 (E) is source spectrum, L(u) is line defining by the X-ray beam and µ(E, x) is the linear attenuation coefficient at energy E for the voxel x. The number of photons detected by the detector can be written as : where d(E, E) is the detector response function. It describes the probability that a photon with the energy E is detected at the energy E. An integrated circuit counts the number of photons that are detected in the i th energy bin The number of photons detected in the i th is : is the response function of the i th energy bin of the detector. We assume we can write the attenuation coefficient as a sum of M basis functions that are separable in energy and space: After discretization, the inverse problem considered is a finite dimensional one. We consider our detector has P pixels and I energy bins, the projections data are thus the vector s ∈ R IP = (s ij ) 1≤i≤I,1≤j≤P . We want to recover the set of the projected masses of the materials a ∈ R M P = (a ij ) 1≤i≤M,1≤j≤P . The discretization levels are thus n = M P and m = IP . We consider a cost function similar to the one in Eq.23 to minimize to recover the map of materials: (174) J(s, F(a)) = KL(s, F(a)) + αR(a) α the regularization parameter and R(a) the regularization term to stabilize the solution. For the numerical simulations, the L 1 norm is replaced by a Huber type penality functional.

5.2.
Numerical results. In the following, we will test the Morozov principle for different discretization levels. We can not be sure that the ground truth satisfies the source condition (C). But is is possible to illustrate qualitatively the former results. In order to minimize the regularization functional, we have used Expectation-Minimization type iterative methods [16,15,22,45]. In each case investigated, we will study the decrease of the Kullback-Leibler functional as a function of the iterations. The algorithm will be tested on a mouse phantom with two materials, bone and soft tissues. The number of incident photons is N = 10 3 . The reconstruction algorithm will be tested with M = 2 materials and with various numbers of energy bins I, and various number of pixels P . Let a * be the ground truth projected mass map andâ the estimated projected mass map. In order to evaluate the quality of the reconstruction, the relative mean square reconstruction error is calculated with RM SE(a * ,â) = M m=1 The first case investigated corresponds to I = 3 and P = 6820 pixels. The evolution of the Kullback-Leibler distance with the iterations is displayed on Figure  1 and the reconstructed image on Figure 2. In this case, the discretization level is low and the discretization error ρ n can not be neglected with respect to the error related to the Poisson noise. The corollary 2 shows that the Morozov principle is not well-defined. At the end of the iterations the RMSE error between a * andâ is 9.75. With this low discretization level, the reconstruction error is very large.
The second case investigated corresponds to a high discretization level I = 10 and P = 1737984 pixels (Figure 3 and Figure 4). In this case, the discretization errors can be neglected and it is possible to chose a regularization parameter α that fulfills the Morozov principle. A very small regularization parameter α = 10 −5 can be chosen. At the end of the iterations, the RMSE error is RM SE(a * ,â) = 2.39 . With this high discretization level, the reconstruction error is still large. On many pixels, a low number of photons is detected and the Morozov principle is not an efficient way to choose the regularization parameter. According to proposition 4.3,   The last case studied corresponds to the parameters I = 3 and P = 108624. The evolution of the Kullback-Leibler distance with the iteration and the reconstructed images are displayed in Figure 5 and 6 . In this case, the regularization parameter is 6.56. The final reconstruction error is RM SE(a * ,â) = 0.78. This choice of the discretization levels is the best one. The discretization error ρ n can be neglected and the smaller discretization level m leads to a smaller reconstruction errors.  6. Conclusion. In this work, we have studied a method of choice of the regularization parameter for Poisson inverse problems with the Kullback-Leibler divergence as data term in a finite dimensional setting based on the Morozov-Bertero principle. We have shown that this method of choice is well-defined when the number of Poisson variables is large and when the discretization error on the unknown can be neglected. The convergence of the regularization method with this a posteriori choice is also demonstrated when the number of random variables is large. Assuming some source conditions, convergence rates in expectation are obtained for the Bregman distance between the true solution and the reconstructed solution. Some numerical experiments show that the choice of the discretization levels is crucial to obtain good reconstruction images.

7.
Appendix. In this appendix, we detail the proof of some propositions used in the paper. The following lemma characterizes the moments of the Poisson distribution [18]: Lemma 7.1. Let X ∼ P oisson(λ), a one dimensional Poisson variable then for any n ≥ 1 where for any p, q ≥ 0, S(p, q) denote the Stirling second number defined as the number of partitions of a set of p elements into q nonempty disjoint subsets. For any n ≥ 1, the leading coefficient a n,n = 0. For any n ≥ 3, a n−1,n = 0.
The following proposition deals with some monotonicity properties and limit properties of E, Φ and Ψ.
Proposition 7.2. For y t fixed, and 0 < α < α , m, n fixed Proof. The demonstration of a) and b) can be found in [3,4]. For a given α parameter, R(Π n x min ) = 0 for constant images x min = x 0 . and thus we obtain c) We have Π n x α t,m,n ∈ M α,yt,m,n (c) for c large enough for all α. For given α, from the compacity property mentioned in section 3.3, we can thus construct some sequence (α k ) → ∞, α k ≥ α, such that Π n x α k t,m,n converges weakly towards Π n x ∈ Y n as k → ∞.
With the weak lower semicontinuity properties of R we obtain: With Eq.187 and Eq.189, we obtain proposition d).
Proposition 7.3. For a given realization of the data y t , the functions Φ, Ψ and E are continuous functions of α.
Proof. Let {α k } a positive sequence with α k → α > 0. For each values α k , we consider the minimizer, Π n x α k t,m,n in Y n . Then sup k≥0 J α k ,yt,m,n (Π n x α k t,m,n ) < ∞ and there is a convergent subsequence again denoted Π n x α k t,m,n converging towards Π n x. With the continuity of the operator A and with the weak lower-semicontinuity of KL and R we obtain: J α,yt,m,n (Π n x) ≤ lim inf k→∞ KL(Π mGt , Π m AΠ n x α k t,m,n ) + α k R(Π n x α k t,m,n ) (190) ≤ lim sup k→∞ KL(Π mGt , Π m AΠ n x α k t,m,n ) + α k R(Π n x α k t,m,n ) (191) for all x ∈ B. And thus Π n x = Π n x α yt,m,n and E(α n ) → E(α). By a subsequence argument, the same result is true for the whole sequence.
We now show the continuity of Ψ and of the data term Φ with respect to α. Let us assume that KL(Π mGt , Π m AΠ n x α k t,m,n ) does not converge to KL(Π mGt , Π m AΠ n x α t,m,n ). With the weak lower-semicontinuity of KL, we obtain: (194) KL(Π mGt , Π m AΠ n x α t,m,n ) < lim sup k→∞ KL(Π mGt , Π m AΠ n x α k t,m,n ) = c < ∞ With the compacity property mentioned in section 3.3, we can consider a subsequence denoted Π n x α k t,m,n such that KL(Π mGt , Π m AΠ m x α k t,m,n ) → c.
In the next propositons, we prove that y t −y 1 and y t −Π mGt 1 are converging towards zero as t → ∞. We will use the concentration inequality [37]: Proposition 7.4. For any measurable function f , one has: Proposition 7.5. Let us consider a sequence {t n } , with t n → ∞. Ω (dG tn − ydx) → 0 almost surely as t n → ∞.
Proof. From the former proposition, it follows that for u > 0, there is a positive constant C 1 such that: Let us consider a sequence {t n } , with t n → ∞. We can deduce that for all u ≥ 4/9C 1 : For all > 0, and u = t n it follows: We can thus apply the Borel-Cantelli theorem and Ω (dG tn − ydx) → 0 almost surely.
Proposition 7.6. Let us consider a sequence {t n } with t n → ∞. The L 1 norm y tn − y 1 → 0 almost surely as t n → ∞.
Proof. We have obtained in section 3.1 that: (202) lim tn→∞ Ω y tn dx = Ω ydx almost surely. Since y tn → y almost everywhere and almost surely as t n → ∞ , we obtain from the Lebesgue dominated convergence theorem that y tn − y 1 → 0 as t n → ∞ almost surely.
Proposition 7.7. Let us consider a sequence {t n } with t n → ∞. There is a constant ρ m → 0 such that y tn − Π mGtn 1 ≤ ρ m almost surely as t n → ∞.
Proof. Similarly we have: The first term in Eq.206 tends to zero as t n → ∞ almost with Proposition 7.5 . The second term in Eq.205 tends also towards zero as t n → ∞ almost surely with Proposition 7.6. With Eq.25, we can assume that t n is chosen such that almost surely: (205) y tn − Π mGtn 1 ≤ ρ m