NONLOCAL TV-GAUSSIAN PRIOR FOR BAYESIAN INVERSE PROBLEMS WITH APPLICATIONS TO LIMITED CT RECONSTRUCTION

. Bayesian inference methods have been widely applied in inverse problems due to the ability of uncertainty characterization of the estimation. The prior distribution of the unknown plays an essential role in the Bayesian inference, and a good prior distribution can signiﬁcantly improve the inference results. In this paper, we propose a hybrid prior distribution on combining the nonlocal total variation regularization (NLTV) and the Gaussian distribution, namely NLTG prior. The advantage of this hybrid prior is two-fold. The proposed prior models both texture and geometric structures present in images through the NLTV. The Gaussian reference measure also provides a ﬂexibility of incorporating structure information from a reference image. Some theoret- ical properties are established for the hybrid prior. We apply the proposed prior to limited tomography reconstruction problem that is diﬃcult due to se- vere data missing. Both maximum a posteriori and conditional mean estimates are computed through two eﬃcient methods and the numerical experiments validate the advantages and feasibility of the proposed NLTG prior.


1.
Introduction. Bayesian inference methods [12,17] have been a popular tool for inverse problems. Such popularity is mainly due to its ability to quantify the solution uncertainties. A typical Bayesian treatment consists of assigning a prior distribution to the unknown data and then update the distribution based on the observed data, yielding the posterior distribution. It is clear that, as most practical inverse problems are highly ill-posed, the performance of the Bayesian inference depends critically on the choice of the prior distribution. Recently considerable attentions have been paid to the studies of infinite dimensional Bayesian inverse problems, where the unknowns are functions of space or time, for example, medical images. For infinite dimensional problems, the Gaussian measures are arguably the most popular choice of prior distributions, as it has many theoretical and computational advantages in the infinite dimensional setting [37].
However, in many imaging problems, the images that one wants to recover are often subject to sharp jumps or discontinuities while the Gaussian prior distributions are not suitable for modeling such functions [43]. To this end several non-Gaussian priors have been proposed to model images, e.g., [39]. Since these prior distributions differ significantly from Gaussian, many sampling schemes based on the Gaussian prior can not be used directly. To address the issue, a hybrid prior motivated by the total variation (TV) regularization [33] in the deterministic setting was proposed in [43]; It has been proven in [21] that the TV based prior does not converge to a welldefined infinite-dimensional measure as the discretization dimension increases. The hybrid prior is a combination of the TV term and the Gaussian distribution: it uses a TV term to capture the sharp jumps in the functions and a Gaussian distribution as a reference measure to make sure that the resulting prior does converge to a well-defined probability measure in the function space in the infinite dimensional limit.
In this paper, we consider a typical imaging inverse problem: X-Ray computerized tomography (CT) problem with limited data. X-ray computed tomography plays an important role in clinical diseases diagnosis. Let u denote the image to be reconstructed. Throughout this paper, we assume u ∈ L 2 (R 2 ) is supported in a domain Ω, and we only consider the two dimensional parallel beam CT for simplicity. Then the sinogram (or the projection data) f is obtained by the following Radon transform [32]: where n = (cos α, sin α) and n ⊥ = (− sin α, cos α). Given a sinogram y = P u on [0, π)×R, the tomographic image u is reconstructed via the inverse Radon transform [1,27]: The problem (1) becomes ill-posed whenever the limited data is available in the subset of [0, π) × R due to the reduced size of detector [40,41] and/or the reduced number of projections [28,36,5]. In particular, if the projection data f is available on S r ∈ [0, π) × R: then there exists a nontrivial function g called the ambiguity of P , i.e. P g = 0 in S r [40]. As this ambiguity g is nonconstant in the region of interest (ROI) B(0, r) [40,42], the reconstructed image via (2) using available y only on S r will be deteriorated by g. In the literature, numerous studies have been proposed to remove the ambiguity g due to the restriction of y on S r . These studies can be classified into the following two categories: the known subregion based approaches related to the restoration of signal from the truncated Hilbert transform, and the sparsity model based approaches, such as total variation regularization. See [34,8,16] for the detailed surveys.
In the literature, nonlocal regularization has been popular for imaging inverse problems, as an improvement of total variation regularization. They are originally proposed for natural image processing to restore repetitive patterns and textures, for example a heuristic copy-paste technique was firstly proposed for texture synthesis in [9], a more systematic nonlocal means filter was proposed in [3] and a nonlocal variational framework was established in [14]. The main idea of the nonlocal methods is to utilize the similarities present in an image as a weight for restoring, smoothing or regularization. As an extension of TV, nonlocal total variation (NLTV) regularization method is among those popular variational regularization tools due to its flexibility of recovering both texture and geometry patterns for diverse imaging inverse problems, see [44,45,30,23] for the applications. It has been demonstrated that in many practical problems, the NLTV method has better performance than the standard TV, especially for recovering textures and structures of images. More definitions and details are present in Section 2.2. NLTV has been also applied in the context of CT reconstruction, for example in [44,24,18].
Inspired by the success of nonlocal regularization, we propose to improve the hybrid TV-Gaussian (TG) prior proposed in [43] for image reconstruction by incorporating the nonlocal methods. We shall replace the TV term in the hybrid prior with a NLTV term, and theoretically we prove that the resulting new hybrid prior can also lead to a well defined posterior distribution in the infinite dimensional setting. To demonstrate the effectiveness of the hybrid NLTV-Gaussian (NLTG) prior, we apply it to the limited tomography problems, where only limited projection data are available. In particular, we consider the two common types of point estimation in the Bayesian framework: maximum a posterior (MAP) and conditional mean (CM) with the NLTG prior. The MAP estimate consists of solving an optimization problem, while the computing of CM involves evaluating a high-dimensional integration problem [17,26]. Our numerical results show that the NLTG prior defined with a reference image can greatly improve the reconstruction quality compared to other conventional method. Furthermore, we also provide the quantification of the uncertainty of the CM estimator, which is consistent with the noise level. The advantages of our model are the following: The NLTV term can better recover textures and structures of images, especially for highly ill-posed or severely data-missing inverse problems such as the limited tomography reconstruction in this paper; Theoretically, the function space is extended to a larger function space compared to H 1 considered in TG prior, for dealing with larger images class; Finally, the Gaussian measure provides some freedom to incorporate other prior information through the covariance matrix, such as structures from a reference image, which is particularly usefully for limited CT problem.
The remainder of this paper is structured as follows: In Section 2, we describe the proposed NLTG priors on a Hilbert space and present the related theoretical properties. In Section 3 we apply the proposed NLTG prior to the limited tomography problem and present the MAP and CM estimators. The numerical results of both MAP and CM estimates are shown in section 4. Finally, we draw our conclusions in the last section.
2.1. The Bayesian framework and the TG prior. We first give a brief introduction to the basic setup of the Bayesian inference methods for inverse problems. We consider the forward model of the following generic form: where u is an unknown function (in this work we shall restrict ourselves to the situation where u is a real-valued function defined in R 2 , i.e., an image), y ∈ R m is the measured data and n is a m-dimensional zero mean Gaussian noise with covariance matrix Σ 0 . Our goal here is to estimate the unknown function u from the measured data y.
First we assume that the unknown u lies in a Hilbert space of functions, say X . We then choose a probabilistic measure defined on X , denoted by µ pr , to be the prior measure of u. The posterior measure of u, denoted as µ y , is represented as the Radon-Nikodym (R-N) derivative with respect to µ pr : is the data fidelity term in deterministic inverse problem. In the Bayesian framework, the posterior distribution depends on the information from data and the prior knowledge represented by the prior distribution, and so the choice of the prior distribution plays an essential role in the Bayesian method.
As is discussed in Section 1, probably the most popular prior in the infinite dimensional setting is the Gaussian measure, i.e. µ pr = µ 0 = N (0, C 0 ), a Gaussian measure defined on X with zero mean and covariance operator C 0 . Note that C 0 is symmetric positive operator of trace class [22]. To better model functions with sharp jumps, the hybrid prior in [43] takes the form where R(u) represents additional prior information (or regularization) on u. In this case, one can write the R-N derivative of µ y with respect to µ 0 : which in turn returns to the conventional formulation with Gaussian priors. Specifically, TG prior chooses the state space X to be Sobolev space H 1 and R(u) = λ u T V = λ ∇u 2 dx, as introduced in [43, Section 2.3].
2.2. Nonlocal total variation. Here we provide the formulation of the NLTG prior and we start with the notation for nonlocal regularization. One may refer to [14,13,46,19] for the details of the variational framework based nonlocal operators and [25,44,45,29] for the applications to inverse problems.
Let Ω ⊆ R 2 be a bounded set, and x ∈ Ω. Given a reference image f : Ω → R, we can define a nonnegative symmetric weight function ω : Ω × Ω → R as follows: where G a is a Gaussian kernel with the standard deviation a, h is a filtering parameter and ·, · is the standard inner product in L 2 (Ω). Note that in general, h corresponds to the noise level; conventionally we set it to be the standard deviation of the noise [25]. Let u : Ω → R. Using the weight function ω : Ω × Ω → R in (8), we define the nonlocal (NL) gradient ∇ ω u : Ω × Ω → R as For a given p : Ω × Ω → R, its NL divergence is defined by the standard adjoint relation with the NL gradient operator as follows: which leads to the following explicit formula: Now, we design the following functional based on the nonlocal operators: We can see that the functional in (11) is analogous to the total variation (TV) seminorm, and the NLTG prior can be constructed similarly. We first need to specify the state space X , which should be desirably a separable Hilbert space. Recall that Ω is a bounded set in R 2 . As convention, we choose Ω = [0, 1] 2 throughout this paper. (Note, however, that our analysis is valid for any bounded domain with C 1 boundary). For a given reference image f : Ω → R and a weight function ω : Ω × Ω → R defined as (8), we introduce where the NL gradient ∇ ω u is defined as (9). Obviously, we have X ⊆ L 2 (Ω). In addition, we present the following lemma which in turn shows that the NLTV functional defined as (11) is well defined on L 2 (Ω).
For (14), we first note that for each x ∈ Ω, the mapping is in L 2 (Ω, dx), by (13), where we specified the Lebesgue measure dx for clarity. Then applying the Hölder's inequality again, we have and this concludes Lemma 2.1.
Theorem 2.2 states that µ y in (7) is a well-defined probability measure on L 2 (Ω) and it is Lipschitz continuous in the data y. Since the theorem is a direct consequence of the fact that Φ + R satisfies [38, Assumptions 2.6.], we omit the proof.
Theorem 2.2. Assume that F : L 2 (Ω) → R m satisfies A1-A2. Let R : L 2 (Ω) → R be defined as (16). For a given y ∈ R m , we define µ y as in (7). Then we have the followings: 1. µ y is a well-defined measure on L 2 (Ω). 2. µ y is Lipschitz continuous in the data y with respect to the Hellinger distance.
More precisely, if µ y and µ y are two measures corresponding to data y and y respectively, then for every δ > 0, there exists Z = Z(δ) > 0 such that for all y, y ∈ R m with max { y 2 , y 2 } ≤ δ, we have As a result, the expectation of any polynomially bounded function g : L 2 (Ω) → K is continuous in y.
For practical concerns, it is important to consider the finite dimensional approximation of µ y . In particular we consider the following finite dimensional approximation: where Φ N1 (u) is the N 1 dimensional approximation of Φ(u) with F N1 being the N 1 dimensional approximation of F and R N2 (u) is the N 2 dimensional approximation of R(u). Theorem 2.3 provides the convergence property of µ y N1,N2 . Theorem 2.3. Assume that F and F N1 satisfies A1 with constants independent of N 1 and R N2 satisfies Proposition 1 (i)-(ii) with constants independent of N 2 . Assume further that for all > 0, there exist two sequences {a N1 ( )} and {b N2 ( )}, both of which converge to 0, such that µ 0 (X ) > 0 for all N 1 , N 2 ∈ N where then we have (19) d Hell (µ y , µ y N1,N2 ) → 0, as N 1 , N 2 → ∞.
In particular, noting that L 2 (Ω) is a separable Hilbert space, we can consider the finite dimensional approximation of u, as present in the following corollary. If F satisfies A1-A2, then we have The proof of Theorem 2.3 and Corollary 1 can be directly followed by [43, Theorem 2.3., Corollary 2.4], hence we omit here.
3. Bayesian estimators. In this section, we discuss how to compute the two popular point estimators in the Bayesian setting. The first often used point estimator is the MAP estimator, and following the same steps as in [8], we can show that the MAP estimator with the NLTG prior is equivalent to solve (23) min where Φ(u) = 1 2 F u − y 2 Σ0 with F being the linear operator in the limited tomography problem and the last term is the Cameron-Martin norm [37,43] with respect to a Gaussian measure µ 0 = N (0, C 0 ) defined on the Hilbert space X with zero mean and covariance matrix C 0 .
To minimize (23), we adopt the widely used split Bregman method [15] which is equivalent to the alternating direction method of multipliers (ADMM). For the sake of clarity, we present the split Bregman method for (23) in Algorithm 1. The subproblem (24) is solved by conjugate gradient method and the subproblem (25) is solved directly by soft-shrinkage. 2: for k = 0, 1, 2, · · · do 3: Update u k+1 :

6: end for
Another often used point estimator in the Bayesian setting is the conditional mean (CM), or the posterior mean, which is usually evaluated using the samples drawn from the posterior distribution with the Markov Chain Monte Carlo (MCMC) methods. In this work we use the preconditioned Crank-Nicolson (pCN) algorithm developed in [7] for its property of being independent of discritization dimensionality. Simply speaking, the pCN algorithm proposes according to, where u is the present position, v is the proposed position, and β ∈ [0, 1] is the parameter controlling the stepsize, and w ∼ N (0, C 0 ). The associated acceptance probability is We describe the complete pCN algorithm in Algorithm 2.

Experimental setup.
In this section, we present some experimental results to demonstrate the performance of the proposed NLTG prior. In particular we compare the results (both MAP and CM estimates) of the proposed method with those of the Filtered back projection (FBP) method, the NLTV regularization model and the TG method. We use the 128 × 128 XCAT image [35] taking integer values in [0, 255] as the original image u ori . Then the ground truth image u gt is generated by adding one round shaped object which stands for the tumor in lung and further adding a sinusoidal wave as an inhomogeneous background. Finally, the reference images u ref , that can be considered as the previous CT image of the same patient taken by the same CT scanner, is obtained by a direct reconstruction from 500 projections of u ori with Gaussian noise of standard derivation of 5 and 20. The two images are denoted as u 1 ref and u 2 ref respectively and the above mentioned four images are shown in Figure 1.
Given a reference image u ref , the covariance matrix C 0 for the Gaussian measure term is computed as following the idea of radial basis function kernel in machine learning [4] and the similarity weight in nonlocal means filter [3]. Here h is a tuning parameter of Gaussian prior. We note that this choice of covariance matrix is different from usual Gaussian measure, as the correlation between the pixels value of u ref is used instead of spatial distance. Such choice aims to bring structures and edge information of the reference image to the to-be-reconstructed image, which is especially important for reconstruction from highly missing data. We adopt this covariance matrix for TG priori as well for a fair comparison. For the NLTV weight (as well as used NLTV regularization), we only use the 10 largest weights and the 4 closest neighbors for each pixel, as adopted in [44].
To synthesize the limited projection data y, the forward operator F is taken as the discrete Radon transform followed by the restriction onto the discretization of [0, π) × [− 1 2 , 1 2 ]. In this experiments, we use 50 equally spaced projections with Gaussian noise of standard variation 1 and 10. Finally, all the reconstructed images are improved by imposing the intensity bound constraint [0, 255] using (30) u rec = min {max (u * , 0) , 255} where u * denotes the output of all the methods in comparison.

MAP results.
In solving (23), the filtering parameter h in both (8) and (29) is chosen as in [44]. The maximum outer iteration number is set as 80 and the regularization parameter λ is manually chosen so that we can obtain the optimal restoration results. Tables 1 summarizes the PSNR and SSIM values of all the methods. As we can see from the tables, our NLTG prior consistently outperforms the other reconstruction methods, namely the FBP, TV, TGV [2], TG and the NLTV priors. The reconstructed images with different methods are present in Figures 2 and 3. As can be expected, TV, TGV and FBP can not successfully reconstruct reasonable results due to limited projections. The TG prior based MAP estimate shows better performance since our proposed Gaussian covariance matrix (29) provides some structure information from the reference image as prior. We can also see that, compared to the TG prior, the NLTG prior shows the advantage of NLTV by using the similarity in the image. In addition, compared to the NLTV prior, the NLTG prior can obtain better recovery result, thanks to the presence of the Gaussian term which extracts more structure information by the covariance matrix computed from the reference image. As we can see from the figures, the visual improvements are consistent with the improvements in the indices. We also note that in terms of computation time, the hybrid priori, TG and NLTG are more time-consuming due to the extra covariance matrix term. In practice, this can be accelerated using a low dimensional approximation of the inverse of covariance matrix, as discussed in [47]. and NLTG prior, we perform the pCN approach with 9.5 × 10 5 samples and another 0.5 × 10 5 samples as the pre-run. The stepwise β has been chosen to make the acceptance probability is around 25%. The CM results are shown in Figure 4, and the PSNR and SSIM are shown in Table 2.
One can see from the figures and Table 2 that, the CM model with NLTG prior also outperformed the one with TG prior for different scenarios. Compared to MAP results, the PSNR of CM estimator is smaller under low noise level while higher under high noise level. Nonetheless, the SSIM of MAP estimator are consistently better than CM model. One can also see from the images that MAP estimator preserves better sharp edges on efficiently suppressing the noise in the low noise level while CM provides slightly more details on the high noise level. The main advantage of Bayesian techniques is that it can measure the uncertainty in the estimates. Figure 5 summarizes the 95% confidence interval (CI) gaps for each setting. It is easy to see that the CI gaps with TG prior are larger than that of NLTG prior, as one would expect as the similarity of structures extracted by NLTG prior improve the reconstruction than the local edge sharpness enhanced by TG prior. As expected, the uncertainty is higher with higher noise level. It is worth  noting that edges have high uncertainty than the smooth regions. This may be due to the fact that the edges are always more difficult to be preserved in CM estimator with both TG and NLTG prior, which leads to a high uncertainty interval.

5.
Conclusions. In this paper, we consider the Bayesian inference methods for infinite dimensional inverse problems, and in particular we propose a prior distribution that combines the nonlocal method with a standard Gaussian distribution. Theoretically, We show that the proposed prior distribution leads to a well-behaved posterior measure in the infinite dimensional setting. The proposed prior is applied to a severely data missing inverse problem: limited CT reconstruction. The numerical experiments demonstrate the performance of the proposed NLTG prior is superior to the existing and adapted methods. The comparison of MAP and CM estimators also provide some insights for solving high dimensional imaging inverse problems. We believe that the proposed NLTG prior distribution can be useful in a large class of image reconstruction problems, especially when reference images are available.   Figure 5. The 95% confidence interval for different sinogram data noise level and references images. The range of the values is from 0 (black) to 100 (whitest). Upper: NLTG; Lower: TG.