χ 2 TEST FOR TOTAL VARIATION REGULARIZATION PARAMETER SELECTION

. Total Variation (TV) is an eﬀective method of removing noise in digital image processing while preserving edges. The scaling or regularization parameter in the TV process deﬁnes the amount of denoising, with a value of zero giving a result equivalent to the input signal. The discrepancy principle is a classical method for regularization parameter selection whereby data is ﬁt to a speciﬁed tolerance. The tolerance is often identiﬁed based on the fact that the least squares data ﬁt is known to follow a χ 2 distribution. However, this approach fails when the number of parameters is greater than or equal to the number of data. Typically, heuristics are employed to identify the tolerance in the discrepancy principle and this leads to oversmoothing. In this work we identify a χ 2 test for TV regularization parameter selection assuming the blurring matrix is full rank. In particular, we prove that the degrees of freedom in the TV regularized residual is the number of data and this is used to identify the appropriate tolerance. The importance of this work lies in the fact that the χ 2 test introduced here for TV automates the choice of regularization parameter selection and can straightforwardly be incorporated into any TV algorithm. Results are given for three test images and compared to results using the discrepancy principle and MAP estimates. so that regularization is no longer necessary at the end of the iterative procedure. Alternatively, we suggest solving a single regularized problem with an optimally chosen regularization parameter. Solving a single regularized linear problem rather than iterating a sequence of regularized linear problems is advantageous for nonlinear or computationally large problems.


(Communicated by Weihong Guo)
Abstract. Total Variation (TV) is an effective method of removing noise in digital image processing while preserving edges. The scaling or regularization parameter in the TV process defines the amount of denoising, with a value of zero giving a result equivalent to the input signal. The discrepancy principle is a classical method for regularization parameter selection whereby data is fit to a specified tolerance. The tolerance is often identified based on the fact that the least squares data fit is known to follow a χ 2 distribution. However, this approach fails when the number of parameters is greater than or equal to the number of data. Typically, heuristics are employed to identify the tolerance in the discrepancy principle and this leads to oversmoothing. In this work we identify a χ 2 test for TV regularization parameter selection assuming the blurring matrix is full rank. In particular, we prove that the degrees of freedom in the TV regularized residual is the number of data and this is used to identify the appropriate tolerance. The importance of this work lies in the fact that the χ 2 test introduced here for TV automates the choice of regularization parameter selection and can straightforwardly be incorporated into any TV algorithm. Results are given for three test images and compared to results using the discrepancy principle and MAP estimates.
1. Introduction. Most image processing problems take a two-dimensional signal as input and produce an improved image that comes in the form of a set of parameters related to the image. Since the input signal typically contains noise or blurring, a regularization term is included in the inversion. A regularization parameter that controls the amount of regularization is necessary for the inversion. Unfortunately, the choice of regularization parameter is often overlooked and an inversion is run many times with different parameter values until a clear image is produced. This process can be improved by solving a sequence of regularized problems iteratively with the regularization parameter decreasing to zero. In this case the initial inverse estimate is updated at each iterate so that regularization is no longer necessary at the end of the iterative procedure. Alternatively, we suggest solving a single regularized problem with an optimally chosen regularization parameter. Solving a single regularized linear problem rather than iterating a sequence of regularized linear problems is advantageous for nonlinear or computationally large problems.
The difficulty with choosing the optimal regularization parameter is that essentially the regularization term contains prior information about the unknown parameters and there is potential to add bias to the solution. This is addressed in Bayesian methods by identifying appropriate prior probability models [38]. Bayesian methods find a range of parameter sets in the form posterior distributions. Rather than estimate a posterior distribution for the parameters, we instead consider the inverse problem as a statement about the prior and posterior means of the parameters and input signal, and use standard deviations to weight them.
If the error in data and initial parameter estimates are assumed to be normal, the maximum a posteriori probability (MAP) estimate is found solving a least squares problem with Tikhonov regularization. Choice of regularization parameter for Tikhonov regularization is well studied and common methods include the Discrepancy Principle, L-curve and generalized cross-validation (GCV) [20]. The validity of all of these approaches relies on knowing the optimal solution of the quadratic least squares functional, which is not possible for the TV functional. For example, in the L-curve approach, the norm of the data residual vs. the solution can be plotted, but with the TV functional there is no guarantee that the curve will be L-shaped with a corner containing the optimal regularization parameter value.
Various authors have been able to exploit quadratic aspects of TV regularization to find regularization parameters. In [27,37] the choice of scalar regularization parameter is made with the discrepancy principle applied to the data residual. The discrepancy principle relies on estimates of the data error or the standard deviation of data noise. In [41] they assume the noise is from a Poisson or multiplicative Gamma distribution and use this property to find the expected value of the data error. The discrepancy principle can lead to oversmoothing when data error is not known because it is difficult to estimate the degrees of freedom in the residual. In [47] the discrepancy principle is used with correct, equivalent degrees of freedom [44]. The equivalent degrees of freedom can be found for a quadratic functional and while TV is not quadratic, such a functional arises in their particular iterative primal-dual proximal point algorithm for TV optimization.
In [29] the Unbiased Predictive Risk Estimator (UPRE) is extended to TV regularization where they approximate the TV term with a quadratic functional. Again the quadratic approximation of the TV functional simplifies the degrees of freedom choice. As an alternative to approximating the TV functional with a quadratic functional, SURE methods use Stein's estimate for the correct degrees of freedom in the predictive risk estimator. Stein's degrees of freedom estimate involves gradients which in [9] they calculate with finite differences. Alternatively, in [4] the SURE-LET algorithm assumes the optimal parameter estimates are a linear combination of denoising functions. This linear approximation allows for simpler estimates of the gradient calculation in the degrees of freedom estimate.
The discrepancy principle and predictive risk methods rely on variance estimators of the data and empirically give a Bayesian framework, at least for the observations. In [11] they use TV regularization with the RichardsonLucy algorithm, which takes into consideration the Poisson nature of the noise, for 3D imaging of biological specimens. However, the TV regularization parameter is chosen in an ad hoc manner. Alternatively in [24] they exploit GCV to determine how to iteratively update the regularization parameter in a variable splitting algorithm. They use GCV to determine the regularization parameter for the quadratic part of the split and then step through regularization parameter values to identify the best choice in the TV minimization part of the split.
Another approach to TV regularization parameter selection is bi-level optimization. Bi-level optimization is similar to the discrepancy and predictive risk methods because there are two levels of optimization. In the discrepancy and predictive risk methods, the low level optimization is the regularized inverse problem that finds parameter estimates, and the high-level optimization problem finds the regularization parameter(s) according to a specified loss function. In the discrepancy principle the loss function is mean-squared error while in UPRE and SURE methods the loss function is predictive risk. Bi-level optimization in [10,26] also solves the regularized inverse problem in the low level optimization but they set up the highlevel loss function as training problem that finds regularization parameters which minimizes the difference between the estimated parameters and the training set value for the parameters. Another bi-level optimization method [21,22] solves the Fenchel-predual problem in a low-level optimization because it involves minimizing a smooth objective function. The high-level problem optimizes a loss function so that weighted data residual lies within upper and lower bounds of the data noise variance, which is similar to a discrepancy type principle.
Here we derive an approach that does not rely on a quadratic interpretation of TV regularization nor does it require training data. We identify correct degrees of freedom for predictive risk but do not need to estimate degrees of freedom using Stein's theorem because we use the regularized residual. With the regularized residual no estimation procedure is necessary for the degrees of freedom as long as the smoothing operator A is full rank. Therefore the regularization parameter selection algorithm proposed here using a χ 2 test on the regularized residual can be used with any TV optimization algorithm.
In Section 2 we first explain why the TV functional is a χ 2 random variable. Then in Section 3 we derive the χ 2 test for TV regularization using the TV regularized residual. Knowing the correct degrees of freedom is crucial to the method and we use recent results on the degrees of freedom of the Lasso method [13,43] to derive the degrees of freedom with a full rank blurring matrix. In Section 4 we show results from three different images, two types of noise, three different noise levels and three TV algorithms: deconvtv [7], FTVd [46] and tvdeconv [15]. Successful image reconstruction is measured through Improved Signal to Noise Ration (ISNR). Given the exact image, the regularization parameter that gives the highest ISNR is calculated and compared to the ISNR with regularization parameter selection algorithms for a MAP estimate, the discrepancy principle and the χ 2 test.
2. Least squares estimators. Regularization parameter selection is well studied for Tikhonov regularization [20] in linear problems,. We will briefly review it in a framework that introduces the χ 2 test strategy proposed here for TV and explains how the Discrepancy principle can be viewed as a χ 2 test.
Linear inverse problems are often stated as where for two dimensional m × n greyscale image reconstruction problems d ∈ R mn is a vector of noisy observations, A ∈ R mn×mn is a known model, and x ∈ R mn contains the unknown information about the true image that has m columns and n rows. The vector represents noise in the observations and it is assumed it has zero mean with variance σ 2 . In image processing, these vectors and matrix are specified by concatenating in a column-or row-wise fashion pixel values for the image. The matrix A can use used to represent a shift in the image, in which case the image is noisy, or A may represent a blur filter induced by motion in the image. In either case, a ij acts on the value of the image pixel at the point indexed by (i, j). For example an image represented by 256 × 256 pixels, has m = n = 256. A least squares fit of the unknowns to the observation solves the minimization problem Typically this optimization problem is ill-posed so a regularization term, or constraints, are added to it. The focus of this work is when the regularization term is a TV functional, but first we discuss least squares regularization which takes the form: The matrix L is a q × mn shaping matrix with no restriction on q, while x p is an initial guess that is often taken to be zero. If α 2 is chosen large, the regularization term dominates and more weight will be placed on the second misfit, hence the estimatex will be "close" to x p . If L is chosen to be the identity, there can be significant smoothing of the solution which may not be desirable in imaging applications. If L is diagonal or dense, non-smooth least squares estimates result [34], and applying L has the effect of applying a spatially dependent regularization parameter [12]. This regularization process can also be viewed as constrained linear inversion [8] or ridge regression [30]. In Bayesian inference the unknown parameters x are treated as random variables. If x p is the mean of x and α 2 L T L is the inverse error covariance, the optimal parameters found by Tikhonov regularization can be viewed as the maximum a posteriori estimate when error in the initial estimate and data are normally distributed. Using Bayes theorem, the optimal parameters are described by the posterior distribution of x given by p(x|d) = p(d|x)p(x).
In many applications it is often the case that the error distributions are not normal. Reasonable estimates can still be found with Tikhonov regularization if the errors are not normal as long as bias in the data and initial guess are removed, and the misfits are weighted by their inverse covariances. In practice this can be done with M-estimators which use initial estimates of the parameters, and data weighted with a diagonal matrix containing inverse variances of their errors. The difficulty then lies in estimating variances for errors in initial parameter estimates. Estimating this prior information is the focus of many Bayesian methods, for example in [3] they use a hierarchal Bayesian model for TV in blind deconvolution. Estimation of initial parameter errors with a scalar can be viewed as equivalent to regularization methods that focus on choice of parameter α 2 . The following χ 2 tests are often used to find α.
Proof. It is well known that the first term in the sum is distributed as χ 2 mn [5], since it is the sum of squares of mn normally distributed random variables. Similarly the second term is distributed as χ 2 q , and since they are independent, their sum is The Discrepancy Principle can be viewed as a χ 2 test using just the data misfit, i.e. a value of α 2 is found so that the optimal estimatex defined by (3) satisfies where τ is a predetermined real number. Ifx is independent of d then τ = 1 satisfies the χ 2 test, however this is not the case becausex is the least squares solution and depends on d. When there is the same number of data as unknowns we cannot appropriately reduce the degrees of freedom in the χ 2 test because there no χ 2 distribution with fewer than one degrees of freedom. In this case we can use the regularized residual as a χ 2 test [31,32] for choice of α 2 . It is based on the following theorem.
. . , q are independent, not necessarily normally distributed, but have mean zero and standard deviation σ and α −1 , respectively. Then for large mn Proof. Shown in [32].
Theorem 1 shows that the regularized residual is a χ 2 random variable and it gives the corresponding degrees of freedom, q. This suggests the following χ 2 test for choice of α 2 in Tikhonov regularization: A more general formula for the degrees of freedom is given in [14] df In particular for Tikhonov regularization the degrees of freedom is df (Ax(α)) = tr(N(α)) where N(α) = A(A T A + αI) −1 A T . This gives the χ 2 test for the residual in the discrepancy principle: The χ 2 test for regularization parameter selection is similar to finding a parameter that minimizes an estimate of the predictive risk E d − Ax 2 2 . The unbiased estimate of the mean squared error (MSE) is For Tikhonov regularization the degrees of freedom are known (i.e. it is the effective degrees of freedom), and the Unbiased Predictive Risk Estimator method (UPRE) finds α ∈ argmin 3. Least squares with total variation regularizaton. Ill-conditioning of the image reconstruction problem defined by (2) is often alleviated by TV regularization because TV results in reconstructions that maintain sharp edges in the image. With this type of regularization the prior information is a gradient vector containing the finite difference approximation to partial derivatives with appropriate boundary conditions. Here we use one sided differences at the boundaries and the anisotropic discrete total variation functional denoted as where D is a mn × mn matrix containing the stacking of the difference approximations. The TV image reconstruction problem is typically written as We can still view the Discrepancy Principle for TV as a χ 2 test since it only acts on the residual. However, we must determine the reduced degrees of freedom. Since there is no closed form expression forx for nonlinear smoothers like TV, it is more difficult to find the effective degrees of freedom than it is with Tikhonov regularization. In addition, based on the discussion in Section 2, it may appear that the degrees of freedom are effectively zero for square matrices A as is the case with the Tikhonov solution. However, we do not observe this in practice.
In cases where the reconstruction operator A is not linear, or regularization operators other than quadratic Tikhonov are used, Stein's Lemmas [Stein 1981] is often used to determine the degrees of freedom in the predictive risk estimator. In particular, Stein shows that Using this estimate of degrees of freedom gives Stein's Unbiased Risk Estimation (SURE) methods. The challenge with SURE methods is in estimating the gradient.
In [9] they use finite differences to estimate the gradient while in [4] they assume the optimal estimatex is a linear combination of denoising functions which simplifies the calculations. Unbiased degrees of freedom estimates for TV are obtained in [13] using Stein's lemma, while in [43] they derived them from Karush-Kuhn-Tucker (KKT) conditions. Here's we use the estimate from [43]: where A is the active set of the TV, or Lasso, estimate and consists of indices of Dx that are nonzero. In addition, D −A is the derivative matrix D after removing rows that are indexed by A and null(D −A ) is its nullspace. The expected value of the dimension of the linear subspace A(null(D −A )) gives the degrees of freedom.
The degrees of freedom estimate in (8) may be used in a Discrepancy principle or SURE method for TV to get the correct degrees of freedom, however, it can be difficult to compute. As an alternative to just using the predictive risk estimator, i.e. the data residual, here we use the degrees of freedom estimate (8) for the TV regularized residual λ 2 d − Ax(λ) 2 2 + T V (x(λ)). In particular when A is full rank, we have identified a χ 2 test on the regularized residual that does not require direct calculation of (8). This χ 2 test will be developed in Section 3.1.
3.1. TV functional and Laplacian distribution. TV can be viewed as a sparsity prior in the gradient domain that conforms to a Laplacian distribution with zero mean. We assume that pixel intensities are independent, which may not be accurate, but is a good first approximation to reality. An important consideration in modeling the TV functional as differentially Laplacian is the assumption that coefficients sum to zero, which is the case with pixel intensities.
Define y i = x i+1 − x i , and assume it follows a Laplacian distribution with probability density function where θ is the mean of y i and 2β 2 is the variance. In [16] they suggest that the standard deviation can be estimated by the observed pixel differences. If we also assume the data are estimates of the mean of Ax, and the errors follow a normal distribution, then the MAP estimate is Lemma 2. TV regularization parameter λ defined by (7) gives the MAP estimate when λ = β/σ 2 .
Lemma 2 was observed in [16] and is the basis for a regularization parameter selection method given in Algorithm 1.
Algorithm 1 TV as MAP estimate; Laplace hyperparameter from pixel differences. %The standard deviation of elements in array D 4: λ = β/σ 2 3.2. χ 2 test for TV regularization parameter selection. We now develop a χ 2 test for TV regularization which intuitively may not be apparent because the TV term is in L 1 , not L 2 . However, Laplace random variables can form a χ 2 distribution as described in the next two Propositions.
Proof. Laplace random variables are equivalent to the difference of two exponential random variables, i.e. for random variable Y with same density as y i where X 1 , X 2 are exponentially distributed. This can be seen by manipulating the appropriate characteristic functions [25]. Now the probability density function of a χ 2 random variable h with 2 degrees of freedom is f (h) = 1/2e −h/2 , for h ≥ 0. This is also the exponential distribution with parameter 1/2 and we can therefore write where H 1 , H 2 ∼ χ 2 2 and i.i.d. [25]. Proof. The sum of n independent χ 2 random variables is also χ 2 with degrees of freedom equal to the sum of degrees of freedom of each term.
Proof. As described in Proposition 1 the first term in the sum is distributed as χ 2 m while the second term is χ 2 2n by Theorem 3. Since they are assumed independent their sum is χ 2 m+2n .
Theorem 3 is illustrated in Figure (1) where histograms of random variables sampled 5000 times are plotted together with theoretical probability distributions. The i came from a normal distribution with mean 0 and σ = 0.07, while y i came from a Laplace distribution with θ = 0 and β = 100σ 2 . In addition there were 300 elements in and 1000 in y. In this idealized example, the histograms give excellent agreement with the theoretical probability distributions.
Equation (10) gives a χ 2 test for TV, however, if = d − Ax and y = T V (x), then the terms in the sum are not independent and the degrees of freedom must be reduced. In particular Theorem 4. Suppose that (d − Ax) i ∼ N (0, σ) and (Dx) i ∼ Laplace(θ, β) with A and D full rank. Then Proof. It was shown in [43] that i.e. the degrees of freedom in the TV estimate is the expected value of the dimension of the nullspace of D −A . They also show that when A and D are full rank df (Ax(λ)) = df (Dx(λ)) = df (x(λ)).
Note that Theorem 4 shows that when both A and D are full rank, we do not need to calculate df (Ax(λ)) when using the regularized residual. This gives in the following χ 2 test for choice of λ in TV regularization: decreases with increasing λ.
Proof. As λ increases, the data residual decreases since more weight is placed on the data fit in the optimization problem. In addition, since TV(x) remains bounded, the second term in the sum will approach zero for large λ.

Remark 1.
Whenx has a bounded gradient it may be the case that f (λ) > mnσ 2 for all λ. Given a large enough value of σ there will be a λ that passes the χ 2 test which implies the data are not as accurate as indicated by the standard deviation estimate σ. Similarily it could be that f (λ) < mnσ 2 for all λ whereby a smaller value of σ is needed for there to exist a root, meaning the data are more accurate than anticipated. If it is possible to find a λ so that f (λ) = mnσ 2 there is a unique λ that passes the χ 2 test.
Remark 2. Whenx does not have a bounded gradient there may not be a unique λ for which f (λ) = 0 or f (λ) may have multiple extremum. For λ ≈ 0 TV regularization givesx with TV(x) small. As λ increases d − Ax(λ) 2 2 will decrease but TV(x) may increase. A smooth or piecewise constant image results in a small value of TV(x) while large values of TV(x) indicate a highly textured image. This prior understanding of the image can be used to choose a non-unique λ. In particular if the image is assumed to be smooth the best choice of λ is the smallest value for which f (λ) = 0.
The proposed method for TV regularization parameter selection is given in Algorithm 2. To find the optimum we must solve the TV problem multiple times with different λ to determine which satisfies the χ 2 test. We also give an algorithm for the discrepancy principle in Algorithm 3 which similarly solves the TV problem multiple times with different λ, and automatically computes the regularization parameter as is done in [27]. In both cases there is an outer iteration that finds the regularization parameter according to a discrepancy or χ 2 principle, and each also contains an inner iteration that find the TV parameter estimates. In this work we have described why the discrepancy principle has the incorrect degrees of freedom in Algorithm 3, and developed a χ 2 test for the regularized residual that contains the correct degrees of freedom for TV in Algorithm 2. In both cases we tested the algorithms with inner iterations that find the TV parameter estimates using three publicly available TV codes: deconvtv [7], FTVd [46] and tvdeconv [15]. The first two codes use an augmented Lagrangian method while the third uses the Split Bregman method. Numerical results from Algorithms 2 and 3 are presented in Section 4 along with results from the MAP estimate and the estimate that gives maximum ISNR.
We compared BSNRs of 20, 30 and 40 dB, where the lower levels represent more noise.
The three images we chose to test the algorithms were the Camerman image, an artificial MRI, and a Mountain image as seen in Figures 2-4. The blurred images are in the top row of the Figures with BSNR 20, 30 and 40 ranging from left to right. The Camerman image with Gaussian noise is shown in Figure 2 while the MRI and Mountain images are shown with uniform noise in Figures 3-4. In all experiments the standard deviation used to generate noise in the images was also used to reconstruct the images. Although this may not be realistic, it allows us to determine the effectiveness of Algorithms 1-3 in the ideal case.

4.2.
ISNR. The optimal value of λ was identified to be that which gives the best Improved Signal-to-Noise Ratio (ISNR) defined by: The ISNR is an estimate of the perceptual quality of an image with larger values indicating better quality. The value of λ that gives the largest ISNR is also the value that gives the smallest mean square error (MSE).
Since we have the exact image, we are able to determine the optimal value of λ that gives the highest ISNR value. This is illustrated in Figures 5 and 6. In all Figures the graph associated with the right vertical axis is the ISNR value for a range of λ. There is a clear maximum ISNR value in all cases and the value of λ where this occurs is the λ used to reconstruct the images labeled "Maximum ISNR". The resulting ISNR from reconstructed images using this λ is also reported in Tables 1-2. Figures 5 and 6 also plot on the left vertical axis is the value the regularized residual in (12), or cost function, for all possible λ along with the targeted value of the cost function which is mnσ 2 . Where the cost function and the targeted value intersect will be the value of λ that satisfies the TV χ 2 test, and the value of λ chosen for the χ 2 method. Results from reconstructed images using this λ are reported in Tables 1-6. 4.3. Reconstructed images. The images were reconstructed using regularization parameters found by Algorithms 1-3 and the regularization parameter that gives the largest ISNR. These regularization parameters were used to reconstruct the images in the bottom three rows of Figures 2-4. The left column shows results from BSNR of 20, the middle BSNR of 30 and the right BSNR of 40. In all cases the MAP reconstruction looks the worst, while the χ 2 reconstruction has a higher contrast than the reconstruction that produces the best possible ISNR. The reconstructed image using the discrepancy principle looks similar to the χ 2 reconstruction, even though the discrepancy principle has the incorrect degrees of freedom, so the reconstructed image is not shown here.
The resulting ISNR values from the reconstructions are given in Tables 1 and  2. Algorithm 1, using the MAP estimate as suggested in [16], relies on estimating the Laplace distribution parameter β with the data, and in all cases this gives the lowest ISNR value. We also note that the resulting ISNR values when using the discrepancy principle are consistently less than when using the TV χ 2 test. This is to be expected because the discrepancy principle does not have the correct degrees of freedom. However, the resulting ISNR values for the χ 2 test and the discrepancy principle are very close in value. This indicates that 2T V (x)/λ is small and does not significantly change the value of the data residual.    The peak signal to noise ration (PSNR) was also calculated to measure how close the reconstructed images are to the true image. A higher PSNR value indicates a better reconstruction and these values are given in Tables 3 and 4. The regularization parameter that gives the highest PSNR is the one that also gives the maximum ISNR. The PSNR values resulting from the regularization parameter found with the χ 2 test are consistently larger than discrepancy principle. The PSNR values for both methods are very similar and have values close to the value when a regularization parameter with maximum ISNR is used.
The structural similarity indices (SSIM) that result when using regularization parameters from each of the methods are given in Tables 5 and 6. It measures the perceived difference between the true and reconstructed images with possible values between 0 (no correlation in the images) and 1. With this metric the regularization parameter that gives maximum ISNR doesn't always result in the highest SSIM. For the cameraman and MRI images with BSNR of 20 and 40 there are instances where the χ 2 test results in a slightly larger SSIM. This difference is reflected in Figures  2-3 where we see more contrast in the figures reconstructed with regularization parameters found by the χ 2 test, than with the regularization parameters that give the best ISNR. TV is often the preferred method to recover images with fine structures, details, and textures. The mountain image has these properties and we see in Tables 5-6 that the SSIM is consistently smaller for all methods as compared to the other images. The SSIM that results from the mountain image for all BSNR values in largest when using the maximum ISNR method, while the SSIM is smaller when using discrepancy principle than when using the χ 2 test. However, values of T V (x) are not large enough to significantly differentiate the discrepancy principle from the χ 2 test. 5. Summary and conclusions. The effectiveness of TV relies on appropriate choice of regularization parameter λ as defined by (7). If λ is too small the image will not appropriately fit the data while if it is too large, the image will be under smoothed. A suitable choice of λ is often found by trying different values until one is found that results in a clear image. This is not the most efficient way of choosing the parameter, and may not be an option if the TV optimization problem is part of a greater iteration scheme, such as that for nonlinear problems. In addition, when the exact image is unknown, it may be the case that even though the image is clear, it does not correctly identify key features. Here we propose an automated method for regularization parameter selection that utilizes statistical properties of TV and can be viewed as finding a value of λ that passes a χ 2 test.  Table 3. PSNR: Gaussian blur with variance 9.
The TV estimate is the MAP estimate if the regularization term is viewed as prior information about pixel differences from a Laplace distribution. This prior information assumption is common in the statistical modeling of natural images [40]. In this work we exploit this assumption to show that the objective function    for TV regularization follows a χ 2 distribution. The discrepancy principle similarly exploits χ 2 properties of the least squares data residual. However, when data are used to find a regularization parameter, the degrees of freedom in the χ 2 test must be reduced, otherwise the image will be over-smoothed. For Tikhonov regularization the analytical expression for the minimum of the objective function can be used to determine the equivalent degrees of freedom [45]. Since there is no such analytical expression for the TV minimizer it has been an open problem to determine the appropriate degrees of freedom for the TV χ 2 test. Here we use recent results concerning the degrees of freedom for the lasso fit [43] to derive a χ 2 test with appropriately reduced degrees of freedom for TV regularization parameter selection. Results were shown for three different images, two types of noise, three noise levels and three TV algorithms: deconvtv [7], FTVd [46] and tvdeconv [15]. In each case the χ 2 test for regularization parameter selection, Algorithm 2, performed best. In addition, since the exact images are known, we also computed regularization parameters that gave optimal ISNR and showed that the χ 2 test gives near optimal parameters.
In all test cases Algorithm 1, which uses a rough estimate of the hyperparameter in TV as a MAP estimate, gave the worse results. Algorithm 3 uses the discrepancy principle with incorrect degrees of freedom and it consistently performed worse than the χ 2 test in Algorithm 2. However, the improvement was not significant because T V (x) is typically small compared to the L 2 norm of the data residual. While the discrepancy principle with incorrect degrees of freedom gave similar results to the χ 2 test we do not recommend using the discrepancy principle in this manner because images with large T V (x) will be over-smoothed. In addition, there is no significant increase in computational cost to use the correct χ 2 test over the discrepancy principle since the only difference in the methods is evaluation of the TV regularization operator at the parameter estimate.