REMOVING RANDOM-VALUED IMPULSE NOISE WITH RELIABLE WEIGHT

. In this paper, we present a patch based weighted means ﬁlter for removing an impulse noise by adapting the fundamental idea of the non-local means ﬁlter to the random-valued impulse noise. Our approach is to give a weight to a pixel in order to evaluate the probability that the pixel is con- taminated by the impulse noise, which we call Reliable Weight of the pixel. With the help of the Reliable Weights we introduce the similarity function to measure the similarity among patches of the image contaminated by a random impulse noise. It turns out that the similarity function has signiﬁcant anti im- pulse noise interference ability. We then incorporate the Reliable Weights and the similarity function into a ﬁlter designed to remove the random impulse noise. Under suitable conditions, we establish two convergence theorems to demonstrate that our method is feasible. Simulation results conﬁrm that our ﬁlter is competitive compared to recently proposed methods.


1.
Introduction. Random-valued impulse noise can be systematically introduced into digital images during acquisition and transmission [19]. Impulse noise is characterized by replacing a portion of an image's pixel values with random values, leaving the remainder unchanged. In most applications, denoising is fundamental to subsequent image processing operations, such as edge detection, image segmentation, object recognition, etc. The goal of denoising is to effectively remove noise from a noisy image while keeping its features intact. To this end, a variety of techniques have been proposed.
The Median Filter and its extensions [37,8,27,31,41] can suppress noise with high computational efficiency. However, since such filters are applied to the entire image without prior identification of the corrupted pixels, they tend to modify pixels that are not affected by noise. Thus Median Filter and its extensions often remove desirable image details and blur the image.
In order to better remove impulse noise, we must consider the special properties of the noise. An important characteristic of this type of noise is that only part of the pixels are corrupted and the rest are noise free. This characteristic suggests that the noisy pixels should be detected first and filtered afterward, i.e., replaced by some estimated values, whereas the undisturbed pixels should be left unchanged. By using this approach, a whole new class of impulse noise filters has been built, for example, Switching Median (SM) Filter [41], Multistate Median (MSM) Filter [13], Adaptive Center Weighted Median (ACWM) Filter [12], the Peak-and-Valley Filter [47,4], Signal-Dependent Rank-Order Mean (SD-ROM) Filter [1], Conditional Signal-Adaptive Median (CSAM) Filter [35], the Pixel-Wise Median of the Absolute Deviations (PWMAD) Filter [14], Modified Threshold Boolean Filter (MTBF) [2], Jarque-Bera Test Based Median (JM) Filter [6], Two-Output Nonlinear Filter [39], Iterative Median Filter [46], etc.. The main drawback of these filters is that they just use median values or their variations to restore the noisy pixels, and hence they usually cannot preserve the image details even when the images are only mildly corrupted, say with noise ratio less than 30%.
Recently, many edge-preserving regularization methods have been proposed to remove impulse noise [34,53,36,45,54,51,30,5,48,3,32,52,23,29].For example, Nikolova [34] used a non-smooth data-fitting term along with edge-preserving regularization. In order to improve this variational method in removing impulse noise, a two-stage method was proposed in [9] and [10]. This method is efficient in dealing with high noise ratio, e.g., ratio as high as 90% for salt-and-pepper impulse noise and 50% for random-valued impulse noise, but its performance is impaired by the inaccuracy of the noise detector in the first phase. In order to find a better noise detector, especially for the random-valued impulse noise, Garnett et al. [18] introduced a local image statistic called ROAD (Rank-Ordered Absolute Difference) to identify the impulse noisy pixels and incorporated it into a filter designed to remove additive Gaussian noise. They thus proposed a TriF capable of removing mixed Gaussian and impulse noise. This method also performs well for removing impulse noise. However, when the noise level is high, it blurs the images significantly. Delon and Desolneux [15,16], Li et al. [28] and Hu et al. [21] introduced patch-based approaches to deal with impulse noise and a mixture of Gaussian noise and impulse noise, respectively. Dong et al. [17] amplified the differences between noisy pixels and noise-free pixels in ROAD, thus introduced a modified ROAD statistic called ROLD (Rank-Ordered Logarithmic Difference). Xiong and Yin [50] proposed a detection mechanism for impulse noise and then constructed a filter which efficiently removes impulse and Gaussian mixed noise. Lien et al. [30] employed a decisiontree-based impulse noise detector and an edge-preserving filter to reconstruct the intensity values of noisy pixels, whose hardware cost is low.
In the two-step approaches (first detecting the impulse noisy points then filtering the noise), one problem is that the distinction between noise free pixels and impulsive ones cannot always be detected, and some noise free pixels may be detected as noisy ones. In this paper we will develop a one step approach. In a conference communication [24] we introduced a new filter called Reliable Radiometric Weight Filter (RRWF), based on the weight optimization. The objective of this paper is to develop the theoretical aspects of this filter.
In our approach we do not try to classify pixels to noise-free pixels and impulsive ones by detection; instead we will introduce, for each pixel, what we call Reliable Weight which measures the probability that the pixel is noise-free; the Reliable Weight of a pixel reflects the reliability level for the pixel to be noisy free. To define the Reliable Weights, we first introduce a standardized version of the ROAD statistic, called SROAD, to obtain a more stable statistic for the detection of impulse noisy pixels; we then get Reliable Weight by using the Gaussian function and a truncation of the SROAD statistic. By means of the Reliable Weights, a similarity function between two patches is introduced to measure the similarity among patches of an image contaminated by an impulse noise, which has significant anti impulse noise interference ability. Finally, we incorporate the Reliable Weights and the similarity function into a filter designed to remove random-valued impulse noise. Extensive experimental results show that our method performs significantly better than many known techniques.
Like the filters introduced in Li et al. [28] and Hu et al. [21], the filter RRWF is a one-step patch-based approach in which we do not judge whether a pixel is noisefree, but put very small weights in the weighted means for pixels which are likely impulse noisy. However, compared with [21] (which is an improvement on [28]), there are two major differences in the present work: (1) The use of the SROAD statistic rather than the ROAD statistic makes the choice of the parameter H much more stable (it is not sensible to the choice of the size R of the windows used for the calculation of the SROAD statistic); accordingly the choice of the parameter H is less sensitive to the value of p (since the choice of R depends the value of p). (2) In the weighted means here we use the optimal weights obtained in [25] instead of the usual Gaussian weights. Simulation results show that the performance of the filters are very similar for small values of p (≤ 40%), but the new filter performs significantly better for large value of p (p > 40%), especially for p ≥ 60%. The main point is that the ROAD statistic was used in [21], and the parameters proposed therein are no longer suitable for p ≥ 60%. Recall that the TriF introduced in [18] is also a one-step approach, but not patch-based; the patch-based filter proposed in [28] is an improvement of the TriF. Compared with the filters proposed in [18] and [28], in addition to the two aspects mentioned above, RRWF has one further advantage: the weights w H (x) and w H1 (x) (see (17) and (25)) used here are onedimensional indexed (x ∈ I) rather than two-dimensional indexed ((x 0 , x) ∈ I 2 ), in the sense that here we use the same weights w H (x) and w H1 (x) for each research window centered at x 0 ∈ I, while in [18] and [28] we need to calculate the new weights w H (x 0 , x) depending on both x and x 0 , when the center x 0 of the research window varies. This progress reduces very significantly the calculation time.
Two convergence theorems will be established in this paper, which not only justify the convergence of the filter RRWF to the original image u, but also give hints on the choice of parameters. More precisely we will first introduce an oracle filter by means of the similarity distance |u(x) − u(x 0 )| and prove its convergence, see Theorem 3.3. We then construct an estimate of the similarity distance |u(x) − u(x 0 )|, which is proved to be convergent, see Theorem 3.2.
The outline of this paper is as follows. In Section 2, we give a brief review of the Optimal Weights Filter [25] and the TriF with ROAD statistic [18]. In Section 3, we introduce Reliable Weights, construct our new filter and establish its convergence. The computational algorithm, the choice of parameters and the simulation results to demonstrate the performance of the new filter are presented in Section 4. Section 5 is devoted to the proofs of the convergence theorems. Conclusions are drawn in Section 6.

Preliminaries.
2.1. Impulse noise model. An image containing random-valued impulse noise can be described as follows: x ∈ I, are independent random variables uniformly distributed on G = {255 × i G |i = 0, 1, · · · , G}, G is a positive integer, and p denotes the proportion of noisy pixels. In order to facilitate the theoretical explanation, we assume that the original image is defined continuously in the unit square I 0 = [0, 1]×[0, 1], but observed on the set I of N ×N pixels. The goal is to recover the original image u(x 0 ), for any x 0 ∈ I 0 , from the observed image v(x), x ∈ I.
There are two forms of impulse noise: salt and pepper noise (G = 1) and random impulse noise G = 255. The salt-and-pepper noise is simpler than random impulse noise G = 255. It is easy to find the noised pixels for salt-and-pepper noisy image because of that n(x) takes only two values 0 and 255, but it is very difficulty to estimate noisy pixels for random impulse noisy image. Therefore there are excellent noise detectors for detecting salt-and-pepper noise and for removing it even when the noise ratio is higher than 90% (see e.g. [9,40,43,22,42,33,20]). In this paper, we focus only on random-valued impulse noise with G = 255. The dynamic range of the images is G = {0, 1, · · · , 255}, and the noise n(x) is uniformly distributed on G.

2.2.
Review on the optimal weights filter. In this section we briefly review the Optimal Weights Filter in order to adapt it for removing the impulse noise. Based on similarities among local patches, the Optimal Weights Filter [25] was initially introduced to deal with the Gaussian noise model, where u is the original image defined on the unit square I 0 , v the observed one defined on the lattice I, ε is the Gaussian noise: for x ∈ I, ε(x) are independent Gaussian variables with mean 0 and standard deviation σ > 0. For each point x is the closest point in I of x, which lies in the left and lower side of x. For any point x ∈ I 0 and a positive odd integer d, denote by the square window with center x containing d × d pixels of I, where · ∞ denotes the supremum norm: z ∞ = max{|z 1 |, |z 2 |} for z = (z 1 , z 2 ), d−1 2N represents half of the edge size of the window. For simplicity, with abuse of language, we also say that N x,d is a window with centre x. Notice that by out notation, we have, for x ∈ I 0 and t ∈ I, Windows of different sizes will be used. In the following, for x 0 , x ∈ I 0 and positive odd integers d, D, R, we will use N x,d for similarity patches, N x0,D for search windows, and N x,R for the detection window of the ROAD statistic and SROAD statistic which will be introduced later. For convenience, we extend the definition of the observed image v to the whole unit square I 0 by setting formed by the values v(y) of the observed noisy image at pixels y ∈ N x,d , arranged in the lexicographical order, will be called data patch or similarity patch centered at x. For any x 0 , x ∈ I 0 , define which measures the similarity between the data patches v(N x0,d ) and v(N x,d ).
The Optimal Weights Filter [25] is defined by a > 0 is a number depending on { ρ x0 (x) | x ∈ N x0,D } and σ 2 , whose value can be calculated by the algorithm presented in Remark 1 below, and κ tr is the usual triangular kernel: with (·) + denoting the positive part function: (a) + = max{a, 0}. The Optimal Weights Filter is constructed by minimizing a tight bound of the quadratic risk. It is shown in [25] that the optimal weights are given by the formula (6) via the triangular kernel (7). This minimization procedure gives also an exact formula for the bandwidth a as stated below.
Remark 1. According to Remark 1 of [25], the bandwidth a > 0 is the solution of and can be calculated as follows. Sort the set with the convention that a k = ∞ if ρ k = 0 and that min ∅ = M + 1. The bandwidth a > 0 can be expressed as a = a k * . Moreover, k * is also the unique integer k ∈ {1, · · · , M } such that a k ≥ ρ k and a k+1 < ρ k+1 if k < M .
For an approximation of where σ 2 is the variance of Gaussian noise. The filter needs an estimate of ρ u,x0 (x) without the square. As shown in [25], in practice, rather than extracting the root in (10), good denoising results are obtained by using the approximation The fact that ρ x0 (x) is a reasonable estimator of ρ u,x0 is justified by the convergence results in [25] (cf. Theorems 3 and 4 of [25]).

2.3.
Review on the ROAD statistic and the TriF. Garnett et al. [18] introduced the ROAD (Rank-Ordered Absolute Differences) statistic to detect points contaminated by impulse noise. For any pixel x 0 ∈ I, the ROAD statistic at x 0 with the detection R × R window N x0,R is defined by In [18] it is advised to use R = 3 and K = 4 (half of the cardinality of the detection window N x0,R \ {x 0 }). Note that if x 0 is an impulse noisy point, then the value of ROAD(x 0 ) is large; otherwise it is small. The impulsive weight is introduced to express how impulse-like the pixel x is. The parameter σ I determines the approximate threshold above which to penalize high ROAD values. The joint impulsivity between x 0 and x is defined by where the parameter σ J controls the shape of the function J(x 0 , x). It is obvious that J(x 0 , x) takes values in the interval [0, 1]. If at least one of x 0 or x is impulselike and has a high ROAD value with respect to σ J , then J(x 0 , x) ≈ 1. If neither pixel is impulse-like, then neither has a high ROAD value, and J(x 0 , x) ≈ 0. The TriF (cf. [18]) is given by This filter has been shown to be efficient in removing a mixed noise composed of a Gaussian noise and a random impulse noise.
In [18], the TriF is shown to be efficient when the noise ratio is lower than 50%. In that paper, for the calculation of the ROAD statistic, it is suggested to use 3 × 3 windows if the noise ratio is less than 25%, and 5 × 5 windows otherwise.
3. Reliable weight filter and its convergence theorems.

Reliable weights.
It is impossible to distinguish exactly noise free pixels from impulsive ones. In our paper, our idea is not to detect whether a pixel is contaminated by an impulse noise, but to give a Reliable Weight to each pixel which efficiently measures the probability that the pixel would be impulse noisy, and then use the Reliable Weights to define a patch based filter like the Non-Local Means Filter.
We first improve the ROAD statistic by standardizing. The ROAD statistic introduced in [18] is known to be efficient in removing impulse noise. However this statistic is too sensitive to the detection window size R. As an example, take  the value of R should be chosen). To overcome this difficulty, we introduce the standardization of the ROAD statistic which we call SROAD: We extend the definition of SROAD to I 0 by taking If in the definition we remove the factor 1/K, we then get the ROAD statistic introduced by Garnett et al. [18]. The factor 1/K makes the statistic less sensible to the variation of the detection window size R, as well as to the choice of the parameter K. The important feature of SROAD statistic, compared to the ROAD statistic, is that it is an average measure for the impulse characteristic relatively independent of the detection window size. In the above example, the SROAD value of the center point is the same value 35, in both cases R = 3 (with K = 4) and R = 5 (with K = 12). Accordingly, the SROAD statistic is also less sensible to the impulse level p, since the choice of the detection window size R depends on the impulse level. We now define the Reliable Weight of a pixel x in terms of the SROAD statistic: where b is the threshold for SROAD(x) under which the pixel x is considered to be noise-free ( x ∈ I is noise-free when SROAD(x) < b, but in our algorithm we will not judge whether a pixel is noise-free), and H is a parameter to determine   The Reliable Weight w H (x) indicates the reliability for the information of the pixel x to be used. By definition the value of w H (x) is located in the dynamic range [0, 1]. If w H (x) = 1, we consider that the pixel x is noise-free and the intensity value will remain the same in the process of denoising (the weighted means), while w H (x) = 0 means that the pixel x completely losses of information and will not contribute in the weighted means. When w H (x) ∈ (0, 1), the pixel is partially reliable; larger the value of w H (x), more the pixel x contributes in the weighted means. The reason why we introduce the threshold b in the definition of Reliable Weight is that the difference of intensity less than some value can be ignored. For example, for an 8-bit gray-level image, if the difference is less than 8, it is not visually noticeable [7]. So it is reasonable to set w H (x) = 1 when SROAD ≤ 4, namely b = 4, taking into account that for any Simulation results confirm that it is reasonable to take b = 4 (see Fig. 1).
Figs. 2-4 show, for the Lena image, the mean value for each of the three statistics ROAD, ROLD and Reliable Weights, on the set of noisy pixels and on the set of uncorrupted pixels, as a function of the impulse noise probability, with standard deviation error bars demonstrating the significance of the difference. The first chart illustrates the means of ROAD values with R = 3 (3 × 3 windows) and K = 4 ( cf. Garnett et al. [18] ), the second one displays the means of ROLD values with R = 5 (5×5 windows) and K = 12 ( cf. Dong et al. [17] ), and the last one gives the means of Reliable Weights values with our adaptive parameters. We can see from the charts that the distance of the mean values of Reliable Weights between noisy pixels and uncorrupted pixels is larger than those for the ROAD and ROLD statistics (the difference is very significant while comparing with ROAD, and remains interesting while comparing with ROLD), so that it is easier to separate noisy pixels from noise-free pixels while using Reliable Weights rather than the ROAD statistic or ROLD statistic. Thus, the Reliable Weights improve the accuracy in the detection of impulse noisy points, compared with ROAD and ROLD. In the next section, we will explain how the Reliable Weights contribute into our filter.

3.2.
Similarity function between two patches. For x 0 , x ∈ I 0 , we define the similarity function, also called radiometric distance between v(N x0,d ) and v(N x,d ) by (18) where the smoothing kernel κ is defined by It is possible to use the Gaussian kernel for the choice of κ, but the results are then a bit less precise. We use the product w H (x + t)w H (x 0 + t) of Reliable Weights of the pixels x + t and x 0 + t to reduce the effects of impulse noise, so that the similarity function has significant anti impulse noise interference ability.
We need to suppose that the impulse noisy pixels can be separated from the noise-free ones by using the reliable weights w H (x). We say that a point x ∈ I is contaminated by impulse noise is v(x) = u(x). Let B be the set of pixels x ∈ I contaminated by impulse noise, and B = I \ B its complement. The following hypothesis implies that with high probability w H (x) is small for all contaminated pixels x ∈ B and large for non-contaminated pixels x ∈ B, where H is a parameter whose value will be precised in theorems below. The validity of Hypothesis 1 is confirmed by simulation results.
Theorem 3.2. Assume Hypothesis 1 with H the parameter in the definition of the reliable weights (17). Assume also the Hölder condition (20). Then for x 0 , x ∈ I 0 , The proof is given in Section 5.1.
Consider the similarity distance given by (22) ρ where η > 0 is a parameter. The estimate ρ H,κ,x0 (x) will play the same role as the estimate ρ x0 (x) defined by (11) used for the Gaussian noise case. The introduction of η allows to increase the contribution in the weighted means of the patches which are very similar. By Theorem 3.2, ρ H,κ,x0 (x) is a convergent estimator of

3.3.
Construction of reliable radiometric weight filter. In this section, a new filter is designed to remove random-valued impulse noise using the Reliable Weights and the similarity function. Inspired by the construction of the Optimal Weights Filter [25] originally proposed to remove the Gaussian noise, for the impulse noise model we introduce the radiometric weight between the patches v(N x0,d ) and v(N x,d ) by (24) w where ρ H,κ,x0 (x) is the estimate of ρ x0 (x) (see (22) and (23) respectively), the bandwidth a H > 0 is the solution of the equation with σ > 0 a parameter. The calculation of the bandwidth a H > 0 is done using the algorithm presented in Remark 1 with ρ x0 (x) and a replaced by ρ H,κ,x0 (x) and a H . We will choose η = √ 2b = 4 √ 2, see Section 4.1. In the original Optimal Weights Filter [25] designed to remove the Gaussian noise, σ is the standard deviation of the Gaussian noise. In the impulse noise model, there is no Gaussian noise, so that σ = 0; in order to use the above formula we can suppose that in the impulse noise case there is still a Gaussian noise with small value of σ > 0. As differences of gray values smaller than 8 are not visually noticeable, it would be reasonable to take the value of σ around 8; the existence of the impulse noise would increase the value of σ. Experiments show that a good choice of σ is σ = 10 if p ≤ 50. For a more complete discussion about the choice of σ see Section 4.1.
The radiometric weight w(x 0 , x) decreases as the radiometric distance between v(N x0,d ) and v(N x,d ) increases. The Reliable Weights w H1 (x) introduced in Section 3.1 can be used to improve the radiometric weights w(x 0 , x), in such a way that pixels with larger reliable weights contribute more in the weighted mean. Notice   that H and H 1 may take different values. The choice of their values will be discussed in the next section. If w H1 (x 0 ) = 1, we can consider that x 0 is noise free without risk (due to the choice b = 4), so that the intensity value of the pixel remains Then the filter RRWF is defined by: for each x 0 ∈ I 0 , As the radiometric weights w D (x 0 , x) take into account not only large impulses (which are detected with high reliability as impulse noisy points in a two-step approach) but also small impulses (which may be detected by error as noise-free in a two-step approach), the new filter can remove not only the larger outliers, but also smooth away smaller impulses without blurring edges. It can still remove most of the noise while preserving image details, even when the random-valued impulse noise ratio is as high as 60%.
To justify the convergence of the filter u D (x 0 ), we introduce the oracle filter u D (x 0 ), which is obtained while replacing ρ H,κ,x0 (x) by ρ x0 (x) in (25) and (26): with a > 0 defined by The following theorem shows that the oracle filter u D (x 0 ) converges to u(x 0 ) under suitable conditions. By Theorem 3.2, we know that ρ H,κ,x0 (x) is a convergent estimator of ρ x0 (x). This justifies that the filter RRWF u D (x 0 ) is also a reasonable estimator of u(x 0 ). Theorem 3.3. Assume Hypothesis 1 with H replaced by H 1 from (28). Assume also the Hölder condition (20). Then, for every x 0 ∈ I 0 , in probability, as D → ∞ and D/N → 0.
The proof of this theorem is deferred to Section 5.  Fig. 9 (a) -(d)). Our experiments are done in the same way as in [17] in order to produce comparable results. The authors of [17] kindly provided us with their set of noisy images, restored images and PNSR values 1 .

Algorithm and parameters.
The computational algorithm is given in Algorithm 1.
The parameters we give here are suited to 8-bit gray-level images with which we do simulations.
In section 3.1 we have already discussed on the choice of the parameter b (appearing in the definition of the reliable weight (17)) : a good choice is b = 4 starting from the fact that differences of gray values less than 8 are not visually noticeable (see [7]). For the parameter η introduced in Eq. (22) the same consideration leads to the choice η = √ 2b, which is confirmed to be good by simulation results. We next discuss on the choice of the other parameters d, D, σ, R, K, H and H 1 based on experimental results. In the series of experiments that we did, we assume that the parameters {p, d, D, σ, R, K, H, H 1 } take values in the set For each p ∈ A 1 , we maximize the PSNR as a function of d, D, σ, R, K, H and H 1 , and we obtain simple formulas for the d, D, σ, R, K, H and H 1 (as function of p) where the maximum is attained. To do this, for each p, we let d, D, σ, R, K, H and H 1 ran over the set A 2 × A 3 × A 4 × A 5 × A 6 × A 7 × A 8 and record the maximal values of PSNR and the maximizers d, D, σ, R, K, H and H 1 . In our simulations, the probability p is assumed to be known.
In the following, we find expressions to approximate the maximizers d, Note that our choice of parameters is different from that proposed in [18]: here we find suitable to use detection window size R given by (33), while [18] proposes R = 3 in most cases; for impulse noise with probability p = 40% our formula gives R = 5 for the calculation of SROAD, while [18] proposes to use always R = 3. Simulation results show that our choice of parameters give better restoration results. Moreover, our filter can remove efficiently impulse noise until the noise level p = 60% (using always the above formulas for the choice of parameters), while the filters proposed in [18] and [21] do not work well for p = 60%. An important point is that the choice of parameters proposed in [18] and [21] are no longer suitable for p = 60%, mainly due to the non-stability of the ROAD statistic used therein.
All the formulas above are initially obtained from experiments using the Lena image, but they work also well for other images like Baboon, Bridge, Pentagon, etc.

Comparison of PSNR values.
We first concentrate on directly comparable and quantitative measures of restoration quality. We evaluate the performance by using the Peak Signal-to-Noise Ratio (PSNR) [7]. If u is the original image and u is a restored image of u, the PSNR of u is given by (37) PSNR( u h ) = 10 log 10 255 2 MSE , Larger PSNR value signifies better restoration result.   calculate w H (x) and w H1 (x) defined by (17) 4: end for 5: for each x ∈ I do 6: if w H1 (x) = 1 then 7: (22) 10: for all x ∈ N x0,D do 18: w(x 0 , x i ) =

21:
end for 22: end if 23: end for 24: To avoid the undesirable border effects, in our simulations we mirror the image outside the image limits symmetrically with respect to the border. At the corners, the image is extended symmetrically with respect to the corner pixels. In Table 1, we list the best PSNR values from all considered methods for the four images with p ∈ {20%, 40%, 60%}. The best values are in bold so that they can be compared easily with the values from our method. From Table 1, it is clear that our method RRWF provides significant improvement over all other algorithms for the cases of Baboon, Lena and Pentagon. In the case of Bridge, PWMF and our algorithm both provide satisfactory denoising performance.
For a large set of images (see Fig. 9(e)-(t)), we compare our algorithm with the resent methods PWNLM, PWMF and PARIGI, see Tables 2-4. PWMF doesn't work for the case p = 60, so we don't put the results of PWMF in Table 4. From Tables 2-4 we can see that our method performs better for most of the images.
In Fig. 5 we give simulation results for different sizes of N , which are in line with our theoretical results, see comments after Theorem 3.3.

Image quality.
Our main goal is to ensure that our approach provides improved denoising and visually better results. To compare the results subjectively, we enlarge portion of the images restored by some methods listed in Table 1. Figs. 10 and 11 show the results in restoring 40% corrupted images of Baboon and Lena respectively. In the images restored by ACWM-EPR [10] and TriF [18], we can see that there are still some noticeable noise remained in the face of the Lena, and some loss of details in the hair around the mouth of the baboon. The visual qualities of images restored by ROLD-EPR [17] are improved obviously, but we can still find a few noise around the nose of baboon and the face of Lena is not smooth enough. The restored images of PARIGI [15,16] are too smooth and some details are removed. Our restored images are quite good, which not only remain the abundance of image details, but also keep continuity of the details. To further compare the capability of ROLD-EPR and our method, we provide the square error images which are calculated as (38) Difference where u(x) is restored gray value, u(x) is original gray value, Difference(x) is the square error value and Γ is a parameter to control the value range of square error image for better results. The blue image regions correspond to high-confidence estimates. As show in Fig. 14, the Difference images show that our results are better than others. For example, the hot points in our Difference image is significantly less than that of ROLD-EPR and PARIGI difference images. Figs. 12 and 13 show the results in restoring 60% corrupted images of Bridge and Pentagon. There are still many noticeable noise patches in the images restored by ACWM-EPR [10]. In contrast, the images restored by TriF [18] are blurred significantly and lose so much details. ROLD-EPR [17] performs better, but it still leaves a little noise and remove some important details (see Fig. 12). Our method can suppress the noise successfully while preserving more details.

Computational complexity.
Recall that the TriF [18] is defined by the weighted average estimate      in which the trilateral weight w tr ( is the joint impulsivity weight. Instead, in our method, we calculate n × D 2 times the reliable weight w r (x 0 , x) = w H1 (x)κ tr . The main computational advantage in our approach is that we do not need to calculate the joint impulsivity weights, whose computational complexity is of order O(D 2 × n). Moreover, the calculation of the trilateral weight w tr (x 0 , x) is more complicated than that of the reliable weight w r (x 0 , x), because of the presence of the power operation in the definition of w tr (x 0 , x). This explains why our filter RRWF (26) is much faster than the TriF [18]. See Table 5 for a comparison of simulation results of the average runtime with a set of 512 × 512 images.
In Table 5 we also do a comparison of speed with the recently proposed algorithms FWNLM [49], PWMF [21] and PARIGI [15,16]. From the table, we see that our filter RRWF (26) is also much faster than FWNLM [49] and PARIGI [15,16], and significantly faster than PWMF [21]. Proof. Fix x 0 , x ∈ I 0 . Let ∆v(t) = v(x 0 + t) − v(x + t). Then where J 1 = By the definition of B we have u(y) = v(y) when y ∈ B. So when x 0 + t, x + t ∈ B, ∆v (t) = u (x 0 + t) − u (x + t). Since the image u satisfies the Hölder condition and Therefore, for any t ∈ N 0,d such that x 0 + t ∈ B and x + t ∈ B, we get ∆v (t) 2 = (u (x 0 + t) − u (x + t)) in probability. Thus, taking into account that d/N → 0 as d → ∞ and N → ∞, from (39) we see that in probability, which gives the desired result.

Proof of Theorem 3.3.
Proof. The proof is similar to that of Theorem 3.2. Fix x 0 ∈ I 0 . Let Using the fact that wH 1 (x) ≤ sup z∈B wH 1 (z) for x ∈ B, and w H1 (x) ≥ inf z∈B w H1 (z) for x ∈ B, we have where λ := sup z∈B w H1 (z) / inf z∈B w H1 (z) . By Hypothesis 1, λ → 0 in probability. Applying Theorem 1 of Pruitt [38] to the numerator and the denominator, and using the fact that the kernel κ tr is bounded and the sum in probability. On the other hand, since v(x) are bounded by a constant V max we have R 2 ≤ V max J 2 . This implies that in probability. Hence, from (40) and the condition that D/N → 0 and D → ∞, it follows that u D (x 0 ) → u (x 0 ) in probability, which proves the desired result. 6. Conclusions. In this paper, we introduce the notion of Reliable Weight to measure the probability for a pixel to be noise-free, in an image with the presence of a random impulse noise. Combining Reliable Weight with the technique of the Optimal Weights Filter [25,26], we get an efficient filter for removing randomvalued impulse noise. Under suitable assumptions we prove the convergence of the filter. Simulation results show that our filter is competitive, both visually and quantitatively, compared with a number of known filters.