RWRM: RESIDUAL WASSERSTEIN REGULARIZATION MODEL FOR IMAGE RESTORATION

. Existing image restoration methods mostly make full use of various image prior information. However, they rarely exploit the potential of residual histograms, especially their role as ensemble regularization constraint. In this paper, we propose a residual Wasserstein regularization model (RWRM), in which a residual histogram constraint is subtly embedded into a type of variational minimization problems. Speciﬁcally, utilizing the Wasserstein distance from the optimal transport theory, this scheme is achieved by enforcing the observed image residual histogram as close as possible to the reference residual histogram. Furthermore, the RWRM uniﬁes the residual Wasserstein regularization and image prior regularization to improve image restoration performance. The robustness of parameter selection in the RWRM makes the proposed algorithms easier to implement. Finally, extensive experiments have conﬁrmed that our RWRM applied to Gaussian denoising and non-blind de-convolution is eﬀective.


1.
Introduction. Image restoration, which aims to estimate the original clean image x from the observed image y, is a classical but still active research topic in low level vision [22] [31]. A widely used data model [6] [49] [33] can be formulated as y = Kx + n, where n is additive white Gaussian noise (AWGN), and K is a degenerate operator, typically an identity operator in image denoising, a blurring operator in image deblurring, a down-sampling operator in image super-resolution, a random projection operator in compressed sensing [7] [1]. These inverse problems are of great concern because of their indispensable steps in many practical applications.
In recent years, another type of method based on image prior modeling is to utilize histogram approximation. It is well known that a histogram is a discrete approximation form of the probability density distribution of a continuous random variable. For an image, its histogram is defined as the statistical relationship between the image gray levels and the corresponding pixel numbers [10]. In many fields of image processing, the histograms of pixel or gradient domains can provide effective information about image statistics. For instance, a texture enhanced image denoising algorithm GHP [49] is presented by enforcing the denoised image gradient histogram to be close to the original image reference gradient histogram. GHP [49] achieves good image restoration effect at the expense of complex reference gradient histogram estimation.
Similarly, in UniHIST [24], the Wasserstein distance is used to minimize the difference between the histogram of the restored image and the reference histogram of the original image in the pixel and gradient fields. For UniHIST denoising [24], the images involved are pattern images, not natural images, and the reference histogram is assumed to be known. In [39], a variational model with the image histogram prior is proposed, and two equivalent convex relaxations of the given problem are discussed. In addition, in [29] [40] [9], various variants of image histogram prior are employed for image color transfer, image synthesis and texture reconstruction, respectively. It can be found that the image prior information is crucial in these applications.
Different from the above image restoration models that only utilize image prior regularizations, in this paper a residual Wasserstein regularization model (RWRM) is designed to regularize the residual image n = y − Kx, i. e., the difference between the degraded observation y and image formation model Kx. It turns out that our residual Wasserstein and image prior integration is effective in boosting image restoration performance.
In our RWRM, the benefits of using the residual Wasserstein regularization are two-fold. First, without the complicated reference histogram estimation like GHP [49], the reference Gaussian residual histogram is readily available and is stable. Second, complex manual parameter adjustments in many non-convex optimization problems are not required. This is because the parameter selection is robust, which facilitates the experiment implementation.
In this work, we employ the Wasserstein distance to force the observed image residual histogram close to the reference Gaussian residual histogram. Further, the residual regularization and the image priori regularization are coupled to restore the image. In terms of restoration performance, our RWRM is beyond the popular BM3D [5], NCSR [6] and GHP [49]. We do not claim that our RWRM can achieve the best restoration results. However, we want to illustrate from the residual estimation perspective that the proposed RWRM can be combined with other variational methods to improve the image restoration quality.
The main contributions of this paper can be summarized as follows: (1) We propose a residual Wasserstein regularization model (RWRM) in which the residual histogram constraint is embedded within a class of variational problems. Many existing image priors can be incorporated into the model to enhance the output image quality.
(2) The histogram matching by Wasserstein distance effectively minimize the discrepancy between the estimated residual histogram and the reference residual histogram. Further, the RWRM is applied to Gaussian denoising and non-blind deconvolution.
(3) Histogram matching and Chambolle duality projection algorithm are alternated to solve the RWRM. The parameter selection robustness makes our optimization algorithms easy to implement. The rest of this paper is organized as follows. Section 2 provides a survey of the related work. Section 3 proposes the residual Wasserstein regularization model (RWRM) for image restoration. Section 4 presents the application of the RWRM in Gaussian denoising. The RWRM applied to non-blind deconvolution is solved in Section 5. Finally, Section 6 concludes the work.
2.1. Literature analysis. Image restoration methods can be divided into two categories: model-based approaches and learning-based methods. Model-based approaches make full use of image and noise prior models to reconstruct the clean image, while learning-based methods manage to learn a mapping function from the degraded image to the clean image [38] [17] [14]. Here we briefly review some approaches that impose the Wasserstein constraint in image processing.
In the context of matching histogram, Woodford et al. [45] utilized a MAP estimation framework, namely marginal probability field (MPF) model to match a histogram of low-level image features such as gradients and texels for a range of image applications, including image segmentation, texture denoising and texture synthesis. Heeger and Bergen [16] implemented texture synthesis of an image that matches the texture histogram of a given image. Schmidt et al. [35] reconstructed images by matching the gradient distribution with an efficient Gibbs sampler, and achieved an excellent denoising effect. It can be found that these methods require multiple samples and complex calculations to restore a clean image.
In [3], Cho etal. developed iterative distribution reweighting (IDR) that enforces a global constraint on gradients in the process of image restoration. The approach minimizes the KL divergence between the restored image gradient distribution and the estimated reference distribution. IDR enhances the visual details of the reconstructed images with a high precision parameter density fitting.
Swoboda etal. in [39] presented a histogram prior based variational model. They mainly discussed two convex relaxations of the given original problem, and gave an equivalent proof of these two relaxations. By minimizing the Wasserstein distance between the discrete distributions of some features, a variational approach for image synthesis and image restoration [40] was presented. In [9], a high-resolution patch of the image texture was assumed to be known. Gheche etal. employed the Wasserstein distance to compare the histograms of the intensity values, the gradient and the Laplacian, so that the synthesized image has similar detail information as the highresolution patch. In [29], the sliced Wasserstein distance between two d-dimensional N-point coluds was utilized as a statistical fidelity term for color transfer application.
Zuo etal. [49] showed a texture-enhanced image denoising approach, which adds a gradient penalty term to a non-local patch-based image model. This method improves the image texture structure while removing noise by enforcing the gradient histogram of the latent denoised image to be as close as possible to the reference gradient histogram of the original clear image. Recently, Mei et al. [24] presented a unified method to enforce histogram constraints in pattern image denoising and nonblind deconvolution. The critical factor in their work is to minimize the difference between the histograms of the restored image and the reference histograms of the original image in pixel and gradient domains by employing the Wasserstein distance.
In W-LDMM [15], a Wasserstein driven low-dimensional manifold model was proposed to implement image denoising and image inpainting. W-LDMM makes full use of the complementarity of Wasserstein distance noise estimation and lowdimensional manifold. In [11], convergence in the Wasserstein metric with applications to image restoration was discussed. In the paper [47], a low-dose CT image denoising method was introduced utilizing Wasserstein distance by generative adversarial networks (GANs). In [46], a single-image super-resolution method combining Wasserstein generative adversarial networks and deep residual network was proposed, which aims to generate realistic images with finer texture details. Similarly, in [41], Wasserstein GANs were also used for CT image reconstruction from a sparse Angle.
It can be found that the existing image restoration models almost all use various image prior information, but rarely dig out the residual information. As discussed in Section 1, considering the simplicity and validity of the Gaussian residual histogram, we propose the residual Wasserstein regularization model (RWRM) to restore images. We all know that Gaussian residual histogram is readily available, and it is stable under certain condition. By means of Wasserstein distance histogram matching, the estimated residual histogram can be close to the reference residual histogram. Moreover, we subtly impose residual histogram constraint on image restoration models for higher quality image restorations including Gaussian denoising, non-blind deconvolution. In the following we will focus on the theory of Wasserstein distance as the dissimilarity measure between two histograms.

Wasserstein distance.
It is well known that Wasserstein distance [26] [29] [27] originated from the theory of optimal transport [43] is defined as the minimal cost which must be paid to convert one histogram into the other. Wasserstein distance is a natural metric between two histograms. Intuitively, if we regard two histograms as two piles of sand on the ground, each grain of sand is an observation sample. To quantify the difference between these two distributions, we need to measure the minimum amount of sand that has to be moved so that these two distributions are exactly the same. Therefore, Wasserstein distance is the minimal total travelled ground distance that is weighted by the amount of sand moved.
For the two probability distributions p and q defined on the real number field R, the Wasserstein distance of p and q can be expressed as the solution to the following Monge problem [43]: where the random variable x obeys the distribution p, while the variable φ(x) follows another distribution q. The infimum in Eq. (1) is for all the determined functions φ : R → R that map any random variable x into another variable φ(x). One can see that the Wasserstein distance is a true statistical metric between two given probability distributions, and will be zero if and only if p and q are the same distribution. Given a probability distribution p on R, we can define its cumulative distribution function (cdf) by the formula F p (x) = x −∞ p(τ )dτ , where F is called a histogram equalization transform [13]. In addition, the percentile function of the probability distribution p (the pseudo-inverse of F p ) is defined as follows F −1 p (t) = inf{x ∈ R : F p (x) > t}. In the comprehensive introduction of optimal transport theory [43], a fundamental result is that the optimal φ in Eq. (1) has the following closed-form solution: ). Noting that the Wasserstein distance used to measure the difference between the two distributions is expressed as an expected value, we can introduce another form of the Wasserstein distance By representing x = (x 1 , x 2 , · · · , x n ) T as n independent samples derived from the distribution p, and treating histogram h q as the discrete approximation form of another distribution q, we can introduce the equivalent discrete definition of the Wasserstein distance as follows: where the functionφ maps x i to ξ i =φ(x i ), such that the transformed samples ξ = (ξ 1 , ξ 2 , · · · , ξ n ) T meet the histogram h q . Similar to the continuous situation just described, the optimalφ for Eq. (4) has also a closed-form solution [43]: where F hx and F −1 hq are the cumulative distribution function and the percentile function that are from the histograms h x and h q respectively. For an input image x = (x 1 , x 2 , · · · , x n ) T , the histogram matching operationφ can ensure that the output sample ξ = (ξ 1 , ξ 2 , · · · , ξ n ) T better matches the given histogram h q . Eqs. (4) and (5) constitute the theoretical basis of the coming image restoration.
3. The proposed residual Wasserstein regularization model for image restoration. In this section, we present the proposed residual Wasserstein regularization model (RWRM) and do some necessary analysis. Then, the basic approach of solving the RWRM is given.

3.1.
The residual Wasserstein regularization model (RWRM). The observed version y of a underlying clean image x is usually defined by the model: where n is the AWGN with zero mean and variance σ 2 . K is a linear operator which accounts for some damage to the image. Based on the maximum-a-posteriori (MAP) estimation, the desired image x can be obtained by solving the optimization problem (7) arg min where the specific form of the regularization term R(x) depends on the utilized image priors, and λ is a scale parameter. Different from most image restoration models which only contain image prior regularization, we deal with image restoration by regularizing the residual image n = y − Kx, i. e., the difference between the degraded observation y and image formation model Kx. Fully utilizing residual regularization strategy, we propose the following framework of residual Wasserstein regularization model (RWRM) for image restoration: where h n is the estimated residual histogram, and h g is the reference Gaussian residual histogram (ground truth).Ŵ (h n , h g ) denotes the Wasserstein distance between h n and h g . β ≥ 0 is the Wasserstein regularization parameter.
It is not difficult to find that the framework (8) actually embeds the residual Wasserstein regularization into the exiting image restoration models. It combines residual estimate and image prior regularization organically. For various image priors R(x), for instance, total variation (TV) [33], wavelet sparsity [23], nonlocal total variation (NLTV) [12] and sparse-representation modeling [22] [8], we can exploit different image restoration models. It can be seen that the proposed RWRM framework (8) has extensive restoration capabilities.
Without loss of generality, in our work we choose the classic total variation [33] as the regularization term to illustrate the effectiveness of the proposed RWRM framework. In other words, in Eq. (8), R(x) = ∇x 1 . Therefore, we mainly analyze the following residual Wasserstein regularization model (RWRM) for image restoration: where ∇x denotes the discrete gradient of the true image x, and ∇x 1 represents the sum of the length of gradient vector at each point.
The reasons of using the Wasserstein distance for residual regularization are twofold. On the one hand, the reference Gaussian residual histogram is readily available and is stable under certain condition. On the other hand, the parameter selection in the RWRM is robust, making specific experiments easy to implement (see Fig.  11).
3.2. The basic approach of solving the RWRM. In order to solve the proposed RWRM (9), we first introduce an auxiliary variable ξ , and the Eq. (9) is split into: This problem can be solved by alternately minimizing Eq. (10) with fixed n, x (n, x are simultaneous) and ξ respectively. Specifically, fixing n, x, we get the following optimization problem: Using the solution of the Wasserstein distance, the optimal ξ is given by the histogram matching operation as follows: where i denotes the i − th element.
Fixing ξ, the n, x problem in Eq. (10) reduces to (13) arg min The above problem can be written as an unconstrained optimization problem: (14) arg min Futhermore, Eq. (14) can be converted into (15) arg min For simplicity of notations, we set γ = 1+βσ 2 . Then Eq. (15) is rewritten as For different image restoration tasks, different approachs can be utilized to solve Eq. (16). For example, for Gaussian denoising, Chambolle's duality projection algorithm [2] can be employed to solve this problem (16). For non-blind deconvolution, the optimization problem can be solved by the proximal forward-backward splitting (PFBS) method [4].
Oncex in Eq. (16) is obtained, we can estimate the residual n by In summary, in order to solve our proposed RWRM (9), we first need to split it into Eq. (10). The key steps in dealing with the problem (10) are implementing the histogram matching operation (12) and solving the optimization problem (16). The specific solution algorithms and experimental results of the RWRM applied to Gaussian denoising and non-blind deconvolution are presented in the next two sections.
4. Application of RWRM in Gaussian denoising. In this section, we describe the first application of the RWRM: Gaussian denoising. We first present the RWRM, then give the solution procedure, and finally describe the experimental results.
4.1. The proposed RWRM for Gaussian denoising. We consider the frequently-used image degradation model y = x + n, where n is AWGN with zero mean and variance σ 2 . Following the design of the previous Eq. (9), we propose the following RWRM for Gaussian denoising: where h n is the estimated residual histogram in the observed image y, h g is the reference Gaussian residual histogram, andŴ (h n , h g ) denotes the Wasserstein distance between h n and h g . Obviously, the RWRM unifies image priori and residual estimation.
Due to the fact the gradient of clean image follows Laplacian distribution [33], the parameter λ reflects the penalty of gradient Laplacian distribution of clean image. Here β is Wasserstein regularization parameter. It can be found that when β = 0, Eq. (18) is the popular Rodin-Osher-Fatemi (ROF) model [33], namely the TV-L2 model. Therefore, we present an approach to integrate residual Wasserstein regularization with TV-L2 model for Gaussian denoising.

4.2.
The solution of RWRM for Gaussian denoising. According to the analysis in Section 3.3, when solving the RWRM applied to Gaussian denoising, we also need to specifically solve the following optimization problem: The problem (19) can be efficiently solved by the Chambolle's duality projection algorithm [2]. Hence, the solution of (19) is obtained by where the vector p can be solved by the fixed-point method below: initializing p = 0 and iterating with τ ≤ 1 / 8 to ensure convergence, and please see [2] for more details.
To sum up, in solving the proposed model Eq. (18) for Gaussian denoising, we need to alternate the two main steps: histogram matching operation Eq. (12) and Chambolle's duality projection Eq. (20). The solution process of RWRM applied to Gaussian denoising can be summarized as Algorithm 1.

4.3.
Experimental results for Gaussian denoising. We do not claim that our RWRM has the best restoration performance. We only want to show that in the RWRM the residual Wasserstein regularization can be combined with the TV constraint to improve the output image quality. In this section, we compare RWRM with TV-L2 and some popular methods such as BM3D [5], NCSR [6], GHP [49] and learning-based method LDCNN [48], and finally illustrate the robustness of parameter selection.  To verify the performance of the RWRM for Gaussian denoising, we apply it to ten natural images, whose scenes are shown in Fig. 1. All the experimental images are 512 × 512 grayscale images whose gray level is from 0 to 255. The noisy images are generated by adding AWGN with σ = 25 to the clean images. We first discuss the parameter settings in our algorithm, and then evaluate the performance with PSNR values. 1) Parameter Settings In the following we will discuss the parameter settings in two cases: β = 0 and β = 0.
When β = 0, the proposed model is the TV-L2 minimization problem. Here there is only one model parameter λ > 0 which is introduced to control the trade-off between the regularity of the desired image x and the fidelity of the noisy observation y. In the TV-L2 model, the adjustment of the regularization parameter λ is challenging. This is because it is difficult to find a standard formula for estimating λ. The general method is to adjust the parameter λ to ensure that the solution satisfies the discrepancy principle. Specifically, let the variance of the output residual y − x as much as possible equal to the noise variance σ 2 . The main problem here is that λ is the weight of the image regularization, and the image regularization metrics vary from image to image.
In this work, we present a method for selecting the regularization parameter λ in the TV-L2 model. As one can see from the Fig. 2, when λ is small, the TV-L2 removes the noise but blurs the image edges. When λ is large, the TV-L2 retains the edges but introduces noise. Faced with this dilemma, the TV-L2 model with a moderate regularization parameter λ can be implemented to remove noise while retaining details such as image edges. Specifically, for different noisy images, we experimentally choose different parameters λ, which are the integer multiples of 100 ranging from 1500 to 1800. In Fig. 3 (a), taking the image "Boat" as an example, the relationship graph between PSNR and the parameter λ is presented. How to choose the optimal λ is important, and the readers interested in the topic can also refer to the literatures [37] [34] for more details. When β = 0, the RWRM model contains two parameters λ and β. For a fair comparison, according to the above method, we choose the regularization parameter λ, which can achieve the best denoising effect when β = 0. In the RWRM, the Wasserstein regularization parameters β is easy to adjust. We believe that β should be proportional to the parameter λ, and inversely proportional to the noise variance σ. Based on specific experimental experience, we employ β = λ 0.8σ to estimate the parameter β. The graph of PSNR affected by the parameters λ and β is presented in Fig. 3 (b). One can see that the optimal PSNR is almost the same under different parameter settings. In addition, we take the identity matrix as the initialization residual n in Algorithm 1.
In Gaussian denoising, histogram matching and Chambolle dual projection are alternately performed 5 times, so the iteration number is set to J = 5. We perform parameter estimation according to the formula β = λ 0.8σ . We set λ = 1800. For example, for σ = 60, then β = 37.5. In the experiment, we take τ = 1/8 to ensure the convergence of fixed point iteration in Chambolle dual projection algorithm.
2) Analysis of experimental data In Gaussian denoising, we make use of the reference Gaussian residual histogram h g . For all the experimental images, we employ a 256-bin histogram h g . The peak signal to noise ratio (PSNR) and structural similarity index measurement (SSIM) are evaluation metrics for our image restoration. The larger the PSNR and SSIM, the better the restoration effect is. Firstly, the definition of the mean square error (MSE) is given as follows, where M × N is the size of the image, x 0 and x represent the original and denoised images, respectively. On this basis, the definition of PSNR can be expressed as Secondly, SSIM is mainly used to measure the degree of similarity between two images. The higher the value of SSIM, the closer the content of the two images is. SSIM is defined as following where µ A and µ B are the means of two different images A and B, respectively. σ A 2 and σ B 2 are the variances of A and B. σ AB is the covariance of A and B , C 1 = (0.01 × 255) 2 and C 1 = (0.03 × 255) 2 are constants used to maintain stability.
In Table 1, we show the denoised PSNR and SSIM of the RWRM and the TV-L2. It is gratifying that the average PSNR and SSIM values of RWRM are 1.13 and 0.0743 higher than TV-L2, respectively. In addition, the PSNR comparison curves of the two methods on the ten images are shown in Fig. 4. It can be found that the PSNR of the denoised image obtained by the RWRM is significantly better than that of the TV-L2 model. These indicate that the introduction of residual Wasserstein regularization based on the TV-L2 model can significantly improve the denoising performance. This is also the reason why the PSNRs of RWRM and TV-L2 in Fig. 4 are positively linearly correlated.
In Fig. 5, we show the distribution curve of the estimated residual using TV-L2, RWRM. They are also compared with real Gaussian distribution curve (ground truth). It can be found that RWRM has obvious advantages in residual distribution estimation. This is due to the fact that Wasserstein distance distribution estimation works.
3) Subjective evaluation   Figure 5. The distribution comparison for the final residuals of TV-L2 (green), RWRM (red), and the ground truth (blue). One can see that RWRM residual estimate is more accurate than TV-L2.
The experimental output images of the RWRM and the TV-L2 are presented in Fig. 6 and Fig. 7. One can see that the RWRM improves the visual quality in the following aspects: firstly, we acquires the clearer restored results than the TV-L2; secondly, in some cases, the TV-L2 might blur image structures, while our method can recover better detail features such as image edges. In a word, we can draw a conclusion that in the proposed RWRM the residual Wasserstein constraint is effective in Gaussian denoising.

4.3.2.
Comparison with some popular methods. Here, we compare the proposed RWRM with some popular methods, including BM3D [5], NCSR [6], GHP [49]    and learning-based method LDCNN [48]. The codes for all competing denoising methods are originated from the authors, and during the experiments we use the parameter settings suggested by the authors. We implement the experiments on one PC with two Intel CPUs (2.90 GHz) and 4.0 GB RAM in the Matlab R2017a operating environment. Our denoising method takes 16s to process a 256 × 256 image, which is much faster than NCSR (321s) and GHP (324s). BM3D processes a 256 × 256 image with 2s, but it should be noted that BM3D is mainly implemented by the C language. It should be mentioned that the start-of-the-art method Uni-HIST [24] only deals with pattern image denoising, not natural images. Therefore, there is no comparison with UniHIST.
In order to show the quantitative experimental results, we use the eight images shown in Fig. 8 as test images. We set five cases with noise standard deviation σ = 60, 70, 80, 90 and 100 respectively. The PSNR and SSIM results of the four denoising methods are presented in Table 2. One can see that the RWRM has significantly better denoising performance than NCSR [6] and GHP [49]. The RWRM overall performance can be comparable to BM3D [5]. However, we should note that BM3D is a block-matching denoising algorithm in which the computation amount is very large. As the noise level increases, the denoising performance of the RWRM is getting better than BM3D.
On the other hand, Fig. 9 shows the output images of the four denoising methods acting on test image 8 when the noise standard deviations are 60, 80, and 100, respectively. We can find that the proposed RWRM can get better image recovery. This is due to the fact that the RWRM makes the estimated residual histogram better approximate the true Gaussian residual histogram. The residual estimate is accurate, which helps to restore the image. Finally, we compare the image denoising effect between the proposed RWRM and the learning-based method LDCNN [48]. The test images used in the comparison experiment are 8 images as shown in Fig. 8. Table 3 shows the PSNR and SSIM results of the proposed RWRM and learning-based method LDCNN [48] in image denoising. Fig. 10 shows the visual effects of both methods on the test image 5 with σ = 40. It can be found that, although the numerical index of RWRM is not as good as LDCNN, the visual effects of the two methods have their own advantages and disadvantages, implying that the proposed method is effective.

4.3.3.
Robustness of parameter selection. A great advantage of our RWRM is the robustness of parameter selection. More specifically, when β = 0 in the proposed model, the PSNR of the denoised image varies greatly with change of parameter λ. However, when β = 0 (β is invariable), the PSNR of the denoising image varies very little with the change of λ.
From Fig. 11, we can find the fact that if the original parameter λ (when β = 0) is inaccurate, the result may be much worse. However, adding the Wasserstein histogram constraint, the result is not sensitive to the change of parameter λ. Therefore, the proposed method is robust to the parameter choices, which makes our RWRM easier to be implemented based on existing image restoration methods. This also shows that in RWRM the residual Wasserstein regularization term is more important.  Figure 11. The curve graphs of parameter robustness. One can see that λ has a strong influence on PSNR in TV-L2 (β = 0), while in the RWRM (β = 0) PSNR is little affected by λ. Thus, the parameters in our RWRM are robust.

5.
Application of RWRM in non-blind deconvolution. In this section, we first present the RWRM applied to non-blind deconvolution, then provide the solution process of the model, and finally give the experimental evaluation.
5.1. The proposed RWRM for non-blind deconvolution: Non-blind deconvolution tries to restore a latent clear image x from its blurry and noisy observation image y = Kx + n, where K is a known blurring kernel, and n is AWGN with variance σ 2 . Unlike most image restoration models which only employ image priors, we combine residual regularization and image prior to present the following residual Wasserstein regularization model (RWRM) for non-blind deconvolution: (22) arg min where h n is the estimated residual histogram in the observed image, h g is the real Gaussian histogram (ground truth), andŴ (h n , h g ) denotes the Wasserstein distance between h n and h g .

The solution to the RWRM for non-blind deconvolution.
According to the analysis in Section 3.2, when solving the RWRM applied to non-blind deconvolution, we need to further explicitly solve the following optimization problems: (23) arg min It is well known that solving the problem (23) is sometimes difficult, since it involves inverting the operator K. By combining the principles of splitting and decomposition, we employ a proximal forward-backward splitting (PFBS) strategy [4] to deal with this problem. The PFBS algorithm [4] applied to the problem (23) can be described as (24) x where the positive number 0 < δ < 2 K T K , and J(x) = 1 γ ∇x 1 . Here the proximal operator is defined as follows (25) Prox In the other words, by introducing an auxiliary variable v k , the minimization problem (23) can be solved by alternately iterating the following two-step proximal forward-backward splitting (PFBS) [4]: We know that Eq. (27) can be also rewritten as follows: (28) x k+1 = arg min where θ = γ δ . The problem (28) can be efficiently solved by Chambolle's duality projection algorithm [2]. So the solution of (28) is obtained by (29) x where the vector p can be solved by fixed-point method: initializing p = 0 and iterating Based on the analysis in Section 3.3 and above, the solution process of the proposed RWRM for non-blind deconvolution is summarized in Algorithm 2.

5.3.
The experimental results of non-blind deconvolution. For non-blind deconconvolution, we first compare the proposed RWRM with TV-L2, and then compare it with some representative methods.
1. Input: Degraded image y, n = eye(size(y)), j = 1, k = 1, x 1 , λ, β, δ, τ . 2. while j ≤ J do 3. Update ξ by the histogram matching operation: Update v by the gradient descent: 7. Update θ: Fixed-point iteration: initializing p = 0 and iterating . Implement Chambolle's duality projection algorithm: For the RWRM applied to non-blind deconvolution, we first use the ten images shown in Fig. 12 as test images to show the experimental results. Here, we mainly illustrate the role of the residual Wasserstein regularization by comparing the RWRM (β = 0) with the TV-L2 (β = 0). Likewise, we first discuss the experimental parameter settings, then analyze the experimental data, and finally give the subjective evaluation. 1) Parameter settings: In our non-blind deconvolution, the experimental blurry images are synthesized by first utilizing a commonly used 5 × 5 Gaussian blur kernel with a standard deviation 1.5 and then adding AWGN with σ = 25.
When β = 0, there is only one model parameter λ. Through some experiments, we find that in general the larger the value of λ, the higher the PSNR of the restored images will be. However, the value of λ that is too large will result in a tendency to decrease the PSNR. Here we apply a strategy similar to the one in Gaussian denoising to select λ. For different images, we choose the optimal parameters λ, which are the integer multiples of 100 ranging from 1700 to 2000. In Fig. 13, taking test image "Bacteria" as an example, we have shown the curve of the PSNR with respect to the parameter λ.  Figure 13. The graph of the PSNR with λ in the RWRM (β = 0) applied to non-blind deconvolution.
When β = 0, the proposed model has two parameters λ and β. For the scale parameter λ, we still choose the optimal value when β = 0. In our approach, we utilize Wasserstein regularization parameter β = λ 0.45σ , where σ is noise standard deviation. We still employ the image "Bacteria" to illustrate the choice of parameter β. As shown in Fig. 14, with the increase of the parameter β (λ = 1800), the PSNR of the restored image increases rapidly, and then decreases slowly. It can be found from Fig. 14 that the optimal parameter β = 160.
In solving the RWRM, the step size δ that is too large or too small in the gradient descent will increase the iteration number. The step size in the vicinity of 1 is appropriate. Therefore, we set the step size δ to 1.1. Initial input image x 1 is obtained by Tikhonov regularization [42] on degraded image y. In addition, for the initialization noise in Algorithm 2, we select the identity matrix as large as the observation image.
In the implementation of the Algorithm 2, the number of outer cycles J = 5, and the number of inner cycles M = 90. In the experiment, we set λ = 1800. According to the formula β = λ 0.45σ , we adaptively select the parameter β. In fixed point iteration, τ = 1/8 is chosen to ensure its convergence.
2) Analysis of experimental data  Figure 14. The graph of the PSNR with β (λ = 1800) in the RWRM applied to non-blind deconvolution.
For the TV-L2 (β = 0) and our RWRM (β = 0), the quantitative experimental results are shown in Table 4. One can see that the RWRM has a higher PSNR on each of the experimental images. it is gratifying that the average PSNR of the ten images in RWRM is 0.65 dB higher than the TV-L2. In addition, in Fig. 15, compared with the TV-L2, the PSNR increase graph of the RWRM on ten images is plotted. It can be concluded that it is effective to introduce the residual Wasserstein regularization on the basis of the TV-L2.  Figure 15. In non-blind deconvolution, PSNR comparison curves of the TV-L2 (β = 0) and the RWRM (β = 0) on the ten test images.

3) Subjective evaluation
In order to reflect the deblurring visual effects, in Fig. 16 we show the result images of the TV-L2 and the RWRM. It can be found that compared with the TV-L2 effect, our RWRM can produce more visually appealing restoration results. And furthermore, we take the local details and zoom in to show in Fig. 17. One can see that the RWRM can well enhance the image detail features such as edges, which are often over-smoothed by other deconvolution methods.  One can see that the RWRM can recover better edge information.

5.3.2.
Comparison with some representative methods. We compare RWRM with other two representative methods NL-H 1 [21] and NL-TV [21] on another eight test images shown in Fig. 18. In the experiments, the blurry images are synthesized by first utilizing a commonly used 3 × 3 Gaussian blur kernel with a standard deviation 2 and then adding AWGN with σ = 15. The experimental results are shown in Table  5. It can be found that from the perspective of PSNR and SSIM, our RWRM has advantages over competing methods. Fig. 19 shows a visual comparison of the three approaches. One can see that RWRM can retain better details such as edges, which implies the effectiveness of the proposed method. Figure 18. Another eight test images for the non-blind deconvolution. From left to right, they are labeled as 1 to 8 respectively.  6. Conclusion and future work. In this paper, we have introduced a residual Wasserstein regularization model (RWRM) which can integrate the residual histogram constraint with some image priors for image restoration. Concretely, with the Wasserstein distance, we minimize the discrepancy between the estimated residual histogram and the reference residual histogram. The witty cooperation of the residual Wasserstein regularization and total variation improves the output image quality in Gaussian denoising and non-blind deconvolution.
Our RWRM is different from other histogram-constrained image restoration methods in the following two aspects. Firstly, one of the advantages of RWRM is its simplicity, without complex reference histogram estimation in pixel or gradient domain. Secondly, the robustness of model parameter selection and the fast implementation of the corresponding algorithms make the RWRM easier to perform image restoration.
Based on the effectiveness of the RWRM, the basic idea of our future work is to combine the residual Wasserstein regularization with other image priors for image restoration. One strategy is to utilize residual Wassersterin regularization and low dimensional manifold [36] for other image restoration applications, such as blind inpainting and super-resolution.