A SCALED GRADIENT METHOD FOR DIGITAL TOMOGRAPHIC IMAGE RECONSTRUCTION

. Digital tomographic image reconstruction uses multiple x-ray projections obtained along a range of diﬀerent incident angles to reconstruct a 3D representation of an object. For example, computed tomography (CT) generally refers to the situation when a full set of angles are used (e.g., 360 degrees) while tomosynthesis refers to the case when only a limited (e.g., 30 degrees) angular range is used. In either case, most existing reconstruction algorithms assume that the x-ray source is monoenergetic. This results in a simpliﬁed linear forward model, which is easy to solve but can result in artifacts in the reconstructed images. It has been shown that these artifacts can be reduced by using a more accurate polyenergetic assumption for the x-ray source, but the polyenergetic model requires solving a large-scale nonlinear inverse problem. In addition to reducing artifacts, a full polyenergetic model can be used to extract additional information about the materials of the object; that is, to provide a mechanism for quantitative imaging. In this paper, we develop an approach to solve the nonlinear image reconstruction problem by incorporating total variation (TV) regularization. The corresponding optimization problem is then solved by using a scaled gradient descent method. The proposed algorithm is based on KKT conditions and Nesterov’s acceleration strategy. Experimental results on reconstructed polyenergetic image data illustrate the eﬀectiveness of this proposed approach.


1.
Introduction. Since its invention by Hounsfield in 1972, x-ray computerized tomography (CT) has been a modern and widely used technique to visualize the inner structure of various bodies, and it has become an important tool in the medical diagnosis of many different diseases. Typically, in x-ray CT imaging, an object is illuminated with a series of x-rays from multiple angles, and the change in the intensity of x-rays along a series of linear paths is measured. Then a relationship between the intensity change and material attenuation coefficients is determined by Beer's law [17]. Finally, a specialized algorithm is used to reconstruct material attenuation coefficients and accordingly reconstruct the image in question. We will use the term full CT to refer to the situation when the x-ray source and detector make a complete rotation around the object, to distinguish from the term tomosynthesis, which refers to limited angle CT, where the rotation angle is 60 degrees or less. While the resolution in the direction orthogonal to the x-ray source of a tomosynthesis reconstructed image is not as good as it would be for full CT, an advantage of tomosynthesis is that it requires significantly less radiation dose to the patient compared to full CT [21,29,30].
In a real imaging system (either full CT or tomosynthesis), the x-ray field emitted from the source is polyenergetic, and an accurate polyenergetic model for image reconstruction requires solving a large-scale nonlinear inverse problem to compute the attenuation coefficients. This is mathematically and computationally difficult. Thus, traditional reconstruction techniques use a simplified model that assumes the x-ray source is monoenergetic, which only requires solving a linear inverse problem. Although it is much easier to solve the simplified linear model, ignoring the energy dependence in the mathematical model leads to so-called beam hardening artifacts [4,9,14].
In recent years, several approaches to address the polyenergetic tomographic image reconstruction problem have been proposed; for full CT see [3,15,16,26,27], and for tomosynthesis see [10,13]. In addition to reducing beam hardening artifacts, the additional information provided by a polyenergetic source model has the potential to allow for quantitative imaging; specifically, to identify material composition of objects being imaged. In 1976, Alvarez and Macovski [1] and Macovski et al. [24] proposed using dual energy CT, where two full sets of CT projection data are obtained and each uses distinct (assumed monoenergetic) x-ray energies. With the monoenergetic assumption, each set of projection data corresponds to a distinct linear inverse problem. This idea has been extended to whole-body CT, including generalizations of using three or more distinct energies; see, for example [2,20,31,32]. This pioneering work on multispectral CT is an important contribution to diagnostic medical imaging. However, it is important to emphasize that these schemes still assume that the x-ray source is monoenergetic, which is not physically accurate. Moreover, because they require collecting two or more full sets of data (i.e., two or more complete scans), there is a significant increase in radiation dose to the patient.
In this paper, we consider both full CT and tomosynthesis problems and we use the nonlinear polyenergetic model of the x-ray source. The potential of conducting quantitative imaging in this situation (for breast imaging) was demonstrated in [10,13]. The numerical approach used in these previous papers, which applied gradient descent to an un-regularized Poisson negative log-likelihood objective function with no explicit constraints, illustrated clearly that beam hardening artifacts could be significantly reduced with the polyenergetic model. The purpose of this paper is to investigate the use of bound constraints and more sophisticated regularization methods that provide better quantitative imaging capabilities. Specifically, we use total variation (TV) regularization, and solve the corresponding optimization problem with a scaled gradient descent method. The proposed algorithm is based on KKT conditions and Nesterov's acceleration strategy.
The remainder of the paper is organized as follows. Section 2 reviews the polyenergetic model developed in [10]. Our proposed method is described in Section 3. Numerical results are presented in Section 4, and some conclusions are given in Section 5. 2. Mathematical model. In this section, we review the polyenergetic x-ray computerized tomography image formation process with material parameterization, and the corresponding minimization problem to be considered.
2.1. Polyenergetic x-ray transmission with material parameterization. In this subsection, we describe the polyenergetic x-ray computerized tomography image formation process with multimaterial decomposition. To simplify some of the notation, we will show a 2-D image formation process, but the mathematical model extends naturally to 3-D imaging.
Let b (θ) d be the quantity measured at the dth bin of a digital x-ray detector, obtained when the x-ray source is at an incident angle θ. Using Beer's law [17], these measured values are related to the object's attenuation (that is, the fraction of the x-ray absorbed or scattered by the object) through the integral equation where • t∈Γ denotes integration along the ray Γ; • N d is the number of bins in the digital x-ray detector; • N θ is the number of projection images obtained as the x-ray source is moved to a new position, which is defined by an angle θ; • ε represents the energies that are emitted by the source x-ray beam, which range from, for example, 10 keV to 28 keV; • s(ε) is the energy fluence, which is a product of the x-ray energy with the number of incident photons at that energy; • µ( x, ε) is the linear attenuation coefficient, which depends on the energy of the x-ray beam and on the material in the object at the position x; lower energy will be attenuated more than higher energy, and denser materials such as bone or calcifications will attenuate more than soft tissue; • η (θ) d represents additional contributions (noise) measured at the detector, which can include x-ray scatter and electronic noise.
Discretizing the integrals in (1) leads to the discrete image formation model where we use the notation • N p = K × L is the number of pixels in the discretized 2-D object; • N ε is the number of discrete energy levels (in general N ε N p ); • s i , i = 1, 2, . . . , N ε are discrete values of the energy fluence function, s(ε); • a (θ) d,k is the length of the path traveled by the x-ray that passes through pixel (k, ), contributing to bin d of the detector.
We remark that this model easily extends to 3D discretized objects with 2D detectors.
The linear attenuation coefficients µ k ,i , of the composite material making up the object (e.g., the breast), can be approximated as a linear combination of individual materials. That is, suppose that there are N m distinct materials making up the object, we have • c m,i are known as linear attenuation coefficients for the mth material in pixel (k, ) at the ith discrete x-ray energy value; these coefficients, for a variety of specific materials, can be obtained from [7], using the chemical compositions of each material as determined by Hammerstein et al. [18]; • w k ,m are unknown weight fractions (or percentages) of the mth material in the (k, )th pixel of the object. We also assume that the weight fractions for each pixel should add to 1 (or percentages should add to 100), which means Nm m=1 w k ,m = 1, k = 1, 2, . . . , K, = 1, 2, . . . , L, or, equivalently, Substituting relations (3) and (4) into (2), we obtain To help simplify some of the mathematical formulations and discussion in the rest of the paper, it is useful to rewrite equation (5) in matrix-vector form. To do this, we use the following notation: d,k . • Let w k ,m+1 be denoted as x k ,m for k = 1, 2, . . . , K, = 1, 2, . . . , L and m = 1, 2, . . . , N m − 1, which represent the unknown weight fractions; and let x m be the vector of length N p containing the unknown weight fractions for the (m + 1)th material w k ,m+1 (that is, the vector with entries w k ,m+1 or equivalently or x k ,m ), and define the matrix X as • Let c m be a vector of length N ε containing the known attenuation coefficients of the mth material at the various discrete energy levels (that is, the vector with entries c i,m ), and define the matrix C as If we now define 1 be the vector of length N p with all elements equal one, then (5) can be written in matrix-vector form as Then a discrete model of (5) that comprises all projections can be written in matrixvector form as ). Then our goal is solving large-scale inverse problems of the form In the Poisson statistical model [33], which is commonly used for medical imaging applications, it is assumed that the measured data is a realization of a Poisson random variable, and hence b n ∼ Poisson([K(X)s] n + η n ), n = 1, 2, ......, N, where N = N θ N d , and b n , [K(X)s] n , and η n are, respectively, the nth elements of the vectors b, K(X)s, and η. The likelihood of observing b given X is described by the function In the case of a well-posed problem, one can find an approximation of X by computing the maximum of this likelihood function, or equivalently, by computing the minimum of the negative log-likelihood. However, in the case of severely ill-posed problems, such as tomosynthesis, it is typically recommended to incorporate regularization, or in the Bayesian framework, to include additional prior information on the unknown X. In particular, suppose that the probability density of X, which we denote as p(X), is known. The most frequently used priors are of the Gibbs type, which have the form where Z is a normalization constant, λ is a positive regularization parameter, and Ω(X) is a functional, possibly convex. From Bayes' formula we obtain the a posteriori probability density of X for a given value b: By maximizing the a posteriori estimate (9) of the unknown X, or equivalently, by minimizing the negative log of the posteriori function (9), and notice that the elements of X lie in [0, 1], (we ignore the constant terms associated with b n ! and p(b), and add b n ln(b n ) − b n to the negative log of the posteriori function (9)), we obtain the following minimization problem which has a close relationship with Kullback-Leibler divergence [22]. Since the publication of the seminal paper [28], there have been extensive studies on using TV regularization for inverse problems in imaging applications; see for example [11,12]. We have compared with Tikhonov and TV regularization schemes (as well as composite approaches that use a combination of Tikhonov and TV), and find that TV is generally most effective for the tomosynthesis image reconstruction problem. We therefore take Ω as the TV functional in (10); see Section 3 for the explicit definition of Ω.
The rest of this section is devoted to outlining the algorithm we use to solve the optimization problem (10), including computation of the gradient.
3.1. General form of the algorithm. Let ∇Φ(X) denote the gradient of Φ with respect to X, and ∇Φ n,m (X) the derivative of Φ(X) with respect to X n,m . The Karush-Kuhn-Tucker (KKT) conditions [8] defining the solution X * of (10) satisfies: Accordingly, these conditions lead us to solving the following system of nonlinear equations: (12) min(X n,m , 1 − X n.m )∇Φ n,m (X) = 0, which amounts to solving the following system of nonlinear equations: Based on the nonlinear equations (13) and (14), we propose a scaled gradient descent method for solving the minimization problem (10). Our proposed method is very similar to the multiplicative iterative method proposed in [11], however, the iteration in our algorithm is different from that of [11]. In principle, in each iteration, our strategy is to update all of the X n,m terms using (13) for 0 ≤ X n,m ≤ 1 2 , and then update all of the X n,m terms using (14) for 1 2 < X n,m ≤ 1. We also combine an acceleration strategy in the proposed algorithm.
To describe our method, we first introduce some notations needed for the method. For a constant c, we write This equation immediately provides an updating scheme for X n,m : By writing the updating formulae (16) and (17) into a single formula, we have which leads to the following scaled gradient expression: This update has the disadvantage that, if X k n,m = 0 or X k n,m = 1, then S k n,m = 0 so that X k n,m cannot be further improved [11,23]. We adopt the following strategy used in [11] to treat this situation. We keep S k n,m = 0, if X k n,m = 0 and ∇Φ n,m (X k ) ≥ 0, or if X k n,m = 1 and ∇Φ n,m (X k ) ≤ 0, since they satisfy the KKT conditions. For those n, m where X k n,m = 0 with ∇Φ n,m (X k ) < 0, we set S k n,m = k0 [∇Φn,m(X k )] + , with a nonzero constant k 0 . This k 0 must also ensure the corresponding X k+ 1 2 = k0 [∇Φn,m(X k )] + ∇Φ n,m (X k ) ≤ 1, and so we have (21) S k n,m = min Similarly, for those n, m where X k n,m = 1 with ∇Φ n,m (X k ) > 0, we set S k n,m = 1−k0 [∇Φn,m(X k )] − . This 1 − k 0 must also ensure the corresponding X k+ 1 [∇Φn,m(X k )] − ∇Φ n,m (X k ) ≥ 0, and so we have .
We choose k 0 = 2e − 2 in this paper. To keep the algorithm more stable, we also take the following strategies: for the iteration, if X k n,m [∇Φn,m(X k )] + > 1, we use the iteration (17) to update X k n,m , and similarly, [∇Φn,m(X k )] − < 0, we use the iteration (16) to update X k n,m . Thus the scale matrix is determined in each iteration. By incorporating a line search strategy, we get the following iteration: where the scale matrix S k is a diagonal matrix with diagonal entries S k n,m ≥ 0, and α k is a step size determined by the line search.
Generally, a line search forces objective function decrease at every iteration. Here, we take the following Armijo like rule: where ·, · denotes inner product, D k = X k+1/2 − X k , 0 < µ < 1 is a fixed parameter such as µ = 10 −5 . Concretely, we first set α = α 0 , and if (24) Φ(X k + αD k ) ≤ Φ(X k ) + µα ∇Φ(X k ), D k , then we update the iteration X k+1 = X k + αD k and stop; otherwise, we reset α = ρα (using, for example, ρ = 0.5) and reevaluate the line search condition. Notice that because S k is non-negative definite, iteration (23) with α k = 1 (which is actually X k+ 1 2 ) cannot move along the uphill direction. Thus if Φ(X k+1 ) > Φ(X k ), it indicates the necessity of reducing the step size. Therefore, we use the initial step length α 0 = 1 in the line search.

3.2.
Acceleration and stabilization. Gradient based methods have the disadvantage of slow convergence rate. To accelerate the speed of an algorithm, we may generate new iteration steps utilizing the two previous iterates, and this strategy often improves the performance of gradient based methods. For example, it is well known for linear systems that if X k and X k−1 are two iterates of the Gauss-Seidel method, and ω = (1 + ν) > 1 is a relaxation parameter (over relaxation), then the new iterate Z k = ωX k + (1 − ω)X k−1 = X k + ν(X k − X k−1 ) generates the SOR iterate. This extrapolation strategy can also be used to design fast gradient based methods to solve convex optimization problems. Specifically, suppose X k and X k−1 are two iterates of a gradient based method for solving convex optimization problems, then it is proposed in [5,6,25] to generate a new iterate Z k = X k + ν k (X k − X k−1 ), where ν k is adaptively updated by (25) ν , and t 0 = 1.
It has been shown that this strategy to automatically update ν can significantly improve convergence speed for convex optimization problems. However, since our problem is nonconvex, the iterates may oscillate seriously and even diverge. To overcome this drawback, we adopt the idea suggested in [6] that incorporates a monotone version of the fast proximal gradient method. Specifically, we generate a new iterate which helps to stabilize the behavior of our algorithm. We now state our algorithm explicitly in the following.
Step 4: Line search: Step 5: Compute X k+1 = argmin Φ(Z k+1 ), Φ(X k ) ; Step 6: Compute where Prj is a projection operator. That is, let Y = P rj{X}, then Proof. First consider 0 < Y k n,m ≤ 1 2 . According to (18), for those Y k n,m , 0 < Y k n,m ≤ 3.3. Gradient of the objection function. In our algorithm, we need to evaluate the gradient of the objective function Φ(X). So we need to compute ∇L(X), and ∇Ω(X). To compute the gradient of the TV term, we consider the slightly modified approximation of the TV function where is a small positive number that is used to avoid numerical difficulties when computing the gradient. This approximation is typical when working with TV; see, for example, [33]. The performance is not sensitive to the selection of this parameter; in our experiments we choose to be on the order of machine precision. If we use the notation |∇(X) k ,m | 2 = (∇ 1 X) 2 k ,m + (∇ 2 X) 2 k ,m + , then we have for k = 1, 2, . . . , K, = 1, 2, . . . , L, m = 1, 2, . . . , N m − 1, and ∇ T 2 : R K×L → R K×L is defined by for k = 1, 2, . . . , K, = 1, 2, . . . , L, and m = 1, 2, . . . , N m − 1.
For the gradient of L(X), let e T k A denote the kth row of matrix A, where e k is the kth standard unit vector (that is, the kth column of the identity matrix). Similarly, we denote the kth row of matrix C by e T k C. Then we have
Because of the structure of attenuation coefficients for known materials, with proper ordering of the columns of c i we can assume that C is a nonnegative matrix.
4. Experimental results. In this section, we illustrate the feasibility of the proposed algorithm for both the polyenergetic tomosynthesis and x-ray CT problems. Specifically, we consider 128 × 128 images that simulate breast tissue with two materials, glandular and adipose. In one test image (see top of Figure 1) we assume the material is made up of distinct materials in various regions. That is, each region is either 100% adipose and 0% glandular, or 0% adipose and 100% glandular. In the second test problem, we consider the situation where regions are as in the first test image, but also contain regions with a mix of 50% adipose and 50% glandular materials (see the top of Figure 2). The test images were generated using the phantoms provided in the MATLAB package AIRtools [19]. All computations in our experiments were performed in double precision using MATLAB R2014a on an Intel(R) Core(TM) i5-4200 CPU @1.6 GHz, 4.00 GB RAM. We stop our algorithm when the approximation obtained at iteration k satisfies the inequality or when the maximum number of iterations exceeds 20000.
To provide a measure of quantitative performance, we use the relative error (RErr): where X and X denote the approximate (reconstructed) solution and the original true solution, respectively. For each experiment we use AIRtools [19] to construct a fan beam projection matrix A, assuming 70 cm from source to detector, 2.5 cm air gap between object and detector and an object size of 50 cm (these quantities are typical for breast tomosynthesis). In addition, for an initial guess we always use a random set of weights X 0 , with 0 ≤ X 0 ≤ 1. Experiment 1. For the first experiment, we test the algorithm for the full CT case. We use AIRtools to construct a fan beam projection matrix A with the angular range of 0 to 359 degrees. Then we use (7) to obtain b, b = exp(−A(1c T 1 + X C T ))s + η. Poisson noise was incorporated so that the relative difference between the noise free data and noisy data was 1e-3.
We take k 0 = 10 −3 , µ = 10 −4 , ρ = 0.5, = eps, the machine epsilon (for double precision, this is approximately 2.2204 · 10 −16 ). The regularization parameter is chosen as λ = 10 −6 . The obtained results are shown in Figures 1 and 2, where in each case the top images show the two true materials (left is adipose, right is glandular), and the middle images show the corresponding reconstructions. The bottom of these figures display the sinogram data on the left, and algorithm convergence history (i.e., the RErr plots) on the right. From the plots of the relative error and the quality of the restored image, we see that our algorithm is very effective. Experiment 2. For the second experiment, we test the limited angle (tomosynthesis) case with an angular range of -30 to 30 degrees. We take k 0 = 0.02, µ = 10 −5 , λ = 10 −7 , ρ = 0.5, and = eps, the machine epsilon (for double precision, this is approximately 2.2204 · 10 −16 ). Moreover, we corrupt the data with Poisson noise so that the relative difference between the noise and noise free data is 10 −3 . The obtained results for this case are shown in Figures 3 and 4. As with the previous experiment, the top rows of these figures show the true (left) and reconstructed (right) weights for the first material, and the second rows show analogous results for the second material. In the bottom rows we display the data (i.e., sinogram) on the left, and the RErr plots on the right.
As previously mentioned, we collect the projections within a very limited angle range for the tomosynthesis problem, so we do not expect to obtain high-quality resolution in all directions. In particular, if the source at 0 degrees is at the top of the object, then we would expect the resolution to be much worse for slices in the vertical direction than those in the horizontal direction. To confirm this observation, we compare the sum of all pixels along each column then along each row, and the relative errors of coarse samplings (in both horizontal and vertical directions) of the true and reconstructed weights. These results, which are shown in Figures 5 and 6, show that we indeed obtain better results with coarse vertical (and fine horizontal) resolution than that with coarse horizontal (and fine vertical) resolution.
We also notice that, in Figure 3, the relative error decrease first and then increase as the iteration number increases, this phenomenon is called semi-convergence. Some more elegant strategy for the regularization parameter selection may reduce this phenomenon. Experiment 3. In the final experiment, we begin with an object that has finer details in the horizontal direction than in the vertical direction. Specifically, we assume the same geometric dimensions of the object as in the previous experiments, but here we use a coarse discretization of 16 × 128 (that is, we have thick, uniform horizontal slices). We form the measured data, b = exp(−A(1c T 1 + X C T ))s + η, and again incorporate Poisson noise so that the relative difference between the noise free data and the noisy data is 0.001. The obtained results are in Figures 7 and 8. As with the previous two experiments, the top rows of these figures show the true (left) and reconstructed (right) weights for the first material, and the second rows show analogous results for the second material. In the bottom rows we display the data (i.e., sinogram) on the left, and the RErr plots on the right. From the figures, we see that the obtained results are acceptable.

5.
Conclusions. Based on KKT conditions, and extending idea from fast gradient methods for solving convex optimization problems, we develop a scaled gradient method for polyenergetic digital tomographic image reconstruction. We show that by incorporating TV regularization and bound constraints that the method can produce results that allow for quantitative imaging.   . From left to right and upper to bottom: sum along each row for first material, sum along each column for first material, sum along each row for second material, sum along each column for second material, relative error of reduced resolution image along rows and columns for first material, relative error of reduced resolution image along rows and columns for second material.  Figure 6. From left to right and upper to bottom: sum along each row for first material, sum along each column for first material, sum along each row for second material, sum along each column for second material, relative error of reduced resolution image along rows and columns for first material, relative error of reduced resolution image along rows and columns for second material.