REDUCING SPATIALLY VARYING OUT-OF-FOCUS BLUR FROM NATURAL IMAGE

In this paper, we focus on the challenging problem of removing the spatially varying out-of-focus blur from a single natural image. We first propose an effective method to estimate the blur map by the total variation refinement on Hölder coefficient, then discuss the properties of the corresponding kernel matrix. A tight-frame based energy functional, whose minimizer is related to the optimal defocus result, is thus built. For tackling functional more efficiently, we describe the numerical procedure based on an accelerated primal-dual scheme. To verify the effectiveness of our method, we compare it with some state-of-the-art schemes using both synthesized and natural images. Experimental results demonstrate that the proposed method performs better than the compared methods.


1.
Introduction. In the past decades, with the increasing availability of portable digital imaging devices with high performance and low price, the digital images have attracted more and more attention in our life. Generally, digital images can be divided into many categories, such as natural images, remote sensing images, medical images, microscopic images, etc. All these images may suffer from a common distortion, which is blur. The factors that cause blur contain out-of-focus, shake, motion, etc [12]. Among them, the out-of-focus blur is the most common one. It is caused by the fact that there are some trade-offs between aperture-size, depth-of-field, and exposure-time while collecting images. It becomes the main drawback when using a large aperture to capture enough light and prevent the motion-blur [39]. How to measure and remove this kind of blur is a challenging task in many image applications such as computer vision and image processing.
From the pixel-level perspective, the out-of-focus blur is caused by a pixel capturing light from its surrounding points, and it is commonly characterized by a point spread function (PSF), or "kernel", i.e., the blurred image can be seen as a convolution of the clear one. Based on this idea, many outstanding methods have been developed for deblurring (or deconvolution) problems [8-10, 16, 22, 30, 33, 36, 44, 48].
Most of these methods assume that the blur is uniform on the whole image. Unfortunately, the blur typically varies over the image plane. Such spatially-varying blur is much more difficult to detect and analyze than the spatially-invariant one. Consequently, most methods need multiple images and fuse them to obtain a deblurred result.
The removal of spatially-varying blur using single image has also presented in recent years. The main challenges of this issue contain two aspects [15]: 1) It is a blind deconvolution problem since both clear image and blur kernel are unknown; 2) It is a very complex problem because the blur kernel differs form pixel to pixel (which cannot be modeled as a circulant matrix). Due to its complexity, many existing methods, such as [2,15,35,41], first segmented the blurred image into several regions where each region has a uniform blur kernel, then handled the spatially-invariant deblurring in each region gradually to obtain a global deblurred result [32]. We collectively call these methods as segmentation-based deblurring methods. Segmentation-based deblurring methods are effective specific to the images whose blur kernel is regional different. However, since there are many natural images which can't be segmented reasonably (Figure 1 is a convincing example), the segmentation-based deblurring methods is limited in some practical applications.
The kernel of out-of-focus blur is usually approximated by a Gaussian function g(x, σ(x)) (or simply written as g(x)), where the standard deviation σ(x) measures the blur level on the pixel x [52]. Therefore, intuitively, we can firstly estimate σ(x) based on some certain properties, and use σ(x) to generate kernel g(x, σ(x)) for every pixels, then remove blur using g(x, σ(x)) directly. Here, σ, the same as the original blurred image in size, is generally termed as blur map. Based on the above ideas, in 2007, Bae and Durand extracted the blur scale of edges by calculating the first and second derivatives of the input image, and obtained a blur map by propagating the blur scale using interpolation method [1]. In 2009, Tai and Brown first built a coarse blur map by using their local contrast (LC) prior, and then refined it by the exploitation of Markov random field formulation [40]. In 2011, Zhuo and Sim re-blurred the input image using a Gaussian kernel, and calculated the blur amount at edges from the ratio between the gradients of input and re-blurred images. They then obtained the full blur map by propagating the blur amount to the entire image [52]. In 2013, Zhu et al. presented a method for coherent defocus blur estimating based on local probability estimation and coherent map labeling [51]. Moreover, Shen et al. generated a blur map by combining Tai and Zhuo's methods. They then dealt with deblurring based on the L1-L2 optimization using different blur scales, and used the blur map to do scale selection to reconstruct an all-in-focus image [39]. This method is rather effective.
In the paper, we propose a new method for blind spatially-varying out-of-focus blur removal. Our contributions are as follows: • Firstly, we analyze the property of the Hölder coefficient [20,29], and estimate a blur map based on it, then refine the blur map using a total variation (TV) based algorithm [38]. This can be regarded as an interesting complimentary for the previous approaches to estimate the blur map. • Secondly, we discuss some properties and the bound of the corresponding kernel matrix which are important for our spatially variant blur removal task. • Finally, we build an energy functional for deblurring based on the framelet system, and describe the numerical procedure for our energy based on an accelerated primal-dual (APD) scheme. To the best of our knowledge, this is the first study to introduce the APD algorithm into spatially-varying out-offocus blur removal realm. The rest of the paper is organized as follows. In Section 2, we estimate the blur map and discuss the properties of the kernel matrix. The proposed deblurring model and numerical scheme with its convergence property are addressed in Section 3. The various experiments on synthesized and real blurry images are then reported in Section 4 to illustrate the superior performance of our approach. Finally, some concluding remarks are given in Section 5.
2. Blur map estimation and kernel matrix construction. As aforementioned, for spatially varying blur, let b : Ω → R be a blurred image of size M × N , which b can be seen as a convolution of its clear version f at each pixel x. Formally, we have, where is the convolution operator, n is an additive noise. Since only b(x) is given, we need to estimate g(x) and f (x). That is, (1) is an under-constrained problem. So we should add some assumption to get reasonable result.
In what follows, we first estimate the blur map using the Hölder coefficient and a TV based algorithm. Then we discuss some properties of the corresponding blur kernel matrix.
2.1. Blur map estimation. Generally, the blurred image is smoother than its clear version. That is, the more blurred the image, the smaller the value of the overall difference. Thus, we can use a variable, which describes the overall difference in ω x , to measure the degree of blurring of x, where ω x denotes a neighborhood of x. Broadly speaking, the Hölder coefficient can roughly satisfy this requirement.
As a powerful regularity, the Hölder coefficient has been applied to many image processing tasks, such as image interpolation and denoising [28,31,34,42,43]. Formally, the Hölder coefficient of the image b in ω x is defined as [20,29], (2) [b] β ωx := sup where β ≥ 0 is an adjustable parameter. When β = 0, (2) is actually equivalent to  Hence, can be regarded as the normalized Hölder coefficient (NHC) (remark that we set |ω x | = 7 × 7 and β=2 for NHC hereinafter, where |ω x | is the area of ω x ). Figure 2 presents two histograms of the NHC for 1000 natural randomly selected in-focus images (red curve) and 1000 natural randomly selected blur images (blue curve), respectively. The x-axis is NHC scores and y-axis is percentage of pixels. It can be seen that, the in-focus images have larger NHC over the blur ones roughly, and the distribution peak of the in-focus image is around 0.75, which is much greater than that of the blur image (around 0.35).
On the other hand, numerous experiments we have done suggest that the NHC varies inversely to the level of the blur roughly. That is, the NHC can indicate the blur level, and the blur level is positively associated with the inverse NHC. For simplicity, we assume this positive correlation is a linear correlation. Then, the blur map σ(x) can be obtained by whereσ is a rough version of σ, and C is a predefined constant. In our following experiments, we find that C = |ωx| 2π is proper. Figure 3(a) is the estimated blur mapσ from an blurry image (Figure 1(a)). As we can see, it is reasonably good. However, the blur mapσ is relatively rough due to the existence of noise and soft edges. So we should refine it. In practice, the blur map is somewhat smooth and piecewise constant. Thus, here we adopt a TV based algorithm to perform the blur map refining [38]. Formally, the blur map refining problem can be formulated as minimizing the following energy, where ν > 0 is a balanced parameter, ∇ is gradient operator, is componentwise multiplication, and Π is an edge indicator which is defined as, with positive parameter ξ.  Indeed, (5) can be efficiently solved by iterating the following equations based on Chambolle-Pock algorithm [13], . where τ and ς are two positive parameters, p is an auxiliary variable which is initialized to the identity matrix I. Figure 3(b) shows the refining result using Figure 3(a) as the data fidelity term. We can see that the refined blur map σ contains less noise and is more reasonable.

Kernel matrix analysis.
After calculating the blur map σ, we can obtain the kernel g by the definition of Gaussian function, and then solve f through (1). However, since σ is different in each pixel, we should solve (1) pixel by pixel. Consequently, the time complexity is very high for obtaining the whole deblurred image. Fortunately, (1) can be rewritten as, where b, f and n are the vectorized versions of b, f and n after column concatenation, respectively. The kernel matrix A (size M N × M N ) is the matrix form of the kernel g.
The constructing of the kernel matrix A is rather straightforward [25,46]. One can easily generate it following the step of the spatially-invariant kernel matrix construction. The only difference is that, all rows of the kernel matrix of the spatially-invariant kernel share the same variance, while the matrix A may possess different variances row by row. Besides, we have the following two properties: Proposition 2.1. Let λ m be the largest eigenvalue of A, then the λ m ≤ 1 holds.
Proof. See Appendix.
Proposition 2.2. Let x = (x 1 , x 2 ), y = (y 1 , y 2 ), and let diag(σ) be a diagonal matrix expanded by σ, and A i be the matrix form of 1-D Gaussian kernel g i toward i-axis, where Then, we have Proof. See Appendix.
Given the induced norm of A defined as, The bound of A can be guaranteed by the following property: 2.2.1. Sparse kernel matrix analysis. Theoretically, the g(x, σ(x)) will be non-zero at every point of the image, meaning that the N x should be the same size with the entire image. In practice, when calculating g(x, σ(x)), pixels at a distance that more than 3σ(x) are small enough to be considered effectively zero [24]. Thus the contributions from pixels outside that range can be ignored. Therefore, we can set the radius of N x as r x = 3σ(x) in our convolution process.
In this setting, A is obviously a sparse matrix, and each row/column has at most (2r + 1) 2 non-zero elements, here r = max{r x }. Then the induced norm of this sparse kernel has the following simple property, Proposition 2.4. The norm of sparse kernel matrix A is bounded by A < (2r + 1) 2 .
Proof. See Appendix.
The time complexity of generating the sparse matrix A is O(r 2 M N ). Since r M N , the generating of kernel matrix will be very fast. In practice, only about 0.75 seconds is needed in Matlab for an image sized 256 × 256.
3. Deblurring formulation based on framelet system. In this section, before presenting the deblurring formulation for spatially varying out-of-focus blur, we first give a brief introduction to the framelet system. For simplicity, we only show the framelets in the univariate setting, and the framelets in the bivariate setting can be obtained by tensor product of the univariate one. Those who are interested in the framelets can refer to [7,8,11,14,21] for more details.

Framelets and image representation.
Firstly, we give the definition of the tight frame. A countable function subset X ⊂ L 2 (R) is called a tight frame of where ·, · and · = ·, · 1 2 are the inner product and the norm of L 2 (R), respectively.
An arbitrary orthogonal bases in L 2 (R) is a tight frame since it meets both the (9) and (10). In other words, tight frames are the generalization of the orthogonal basis, it relaxes the requirements of the orthogonality and linear independence to bring in redundancy which has been verified useful in many applications such as deblurring [8].
Given a finite set Φ := {φ 1 , .., φ r } ⊂ L 2 (R), a wavelet system X(Φ) is defined as the collection of dilations and shifts of Φ, i.e., When X(Φ) forms a tight frame, it is termed as a wavelet tight frame, and φ j is termed as a (tight) framelet.
To construct compactly supported wavelet tight frames X(Φ), one generally first obtains a compactly supported refinable function ψ ∈ L 2 (R) with a refinement mask (low-pass filter) g 0 such that Then for given ψ, the construction of a wavelet tight frame is actually to find an appropriate set of framelets Φ := {φ 1 , .., φ r } ⊂ L 2 (R), which is defined as, where g j is a high-pass filter. The unitary extension principle (UEP) in [37] asserts that the system X(Φ) forms a tight frame in L 2 (R) provided that the filters g 0 , g 1 , .., g r satisfy, for almost all ω ∈ R. Here ζ g (ω) = l g(l)e ilω and δ(γ) is a delta function. Thus, the construction of framelets Φ essentially is to design the filters g 1 , .., g r for a given g 0 such that UEP holds.
As a simple application of the UEP, a piecewise linear B-splines can be used as the refinable function ψ [4,37]. The corresponding filters are In the numerical scheme, the framelet transform can be represented by a matrix W. The processes of generating such matrices have been detailed in [7,11]. We omit them here for readability.
With matrix W, the frame transformation (decomposition) process can be easily described. Let f be a vector, the frame coefficient vector u is given by Besides, the frame reconstruction process can be expressed as, where W T is the inverse framelet transform. More importantly, the tight frame W satisfies f = W T Wf . That is, W T W = I. Generally, WW T = I, unless in the orthogonal case.
Finally, it should be noted that we use the same decomposition and reconstruction algorithms as [8,18] hereinafter. And we work in the bivariate case for our deblurring task. This corresponding transform matrix, still denote W, can be readily obtained by using the Kronecker product of the matrix corresponding to the univariate frame transform.
3.2. The proposed energy model. To restore the clear image f , one needs to solve the following minimization problem, where R(f ) is a regularization term, and F (f ) is a data fidelity term.
Generally, the regularization term is derived according to some prior assumptions (or properties) of the underlying solutions. One of the popular assumptions is the sparsity in some domain or some bases. Since the images usually have sparse representations or approximations in the framelet domain, and this assumption has proven to be efficient in many image deblurring works [8,9], we use this prior and take R(f ) = Wf 1 , where W is the framelet decomposition operator.
The fidelity term F (f ) is determined by the type of noise distribution. For example, the 1 -and 2 -norm functions are usually used for the impulse and Gaussian noises, respectively. However, it is difficult to determine the type of noise in real applications. What's more, the noise in a real image seldom follows a single specific distribution. Thus, it is necessary to build a function that can remove the unknown type as well as the mixture type of noises. For this propose, we will use Here, the p -norm function has been proved effective for unknown and mixture type of noises, and been successfully used in image processing realms [27].
Taking together, our total variational model is rather simple, where µ is a positive parameters to balance the regularization and the fidelity terms.
Since W is a linear operator, it is obviously that the above energy is strictly convex. Therefore, the minimization of (11) admits a unique solution.

Numerical algorithm.
In what follows, we will detail the numerical algorithm of our blind out-of-focus deblurring energy (11). Clearly, (11) is a 1 -normbased minimization problem. There are many efficient methods have been presented to solve this issue, such as augmented Lagrangian method (ALM) [47], alternating direction method of multipliers (ADMM) [5], and split Bregman iteration (SB) [23]. In this paper, we adopt a primal-dual algorithm for (11).
The saddle-point formulation of (11) can be derived as, where d is the dual variable. The convex set D is given by D = {d : d ∞ ≤ 1}, and the function δ D is the indicator function of the set D which is defined as Since W is a linear operator, the induced norm of W has the following property. Proof. Since W T W = I and W T = W (see [8]), we can obtain W = 1 directly.
Since both 1 p Af − b p p and δ D (d) are proper, convex and lower semi-continuous, (12) is exactly the same problem in [13]. Therefore, the primal-dual algorithm is proper for solving our problem (11).
The regular primal-dual algorithm of (12) is as follows, where τ i , ς j and θ i are positive parameters.
From (13), the solution of d can be readily given by, .
Due to the existence of the p -norm, (14) is not easy to solve. In this case, we can replace 1 p Af − b p p by its linear approximation using the first order Taylor series expansion at f i , Then (14) becomes Thus, f can be given by the following iterative algorithm, Taking all above into account, the overall numerical procedure of the proposed method can be summarized as It is clear that (18) can be considered as a linearized version of primal-dual algorithm. Here, we further use an accelerated primal-dual (APD) scheme to accelerate (18). The detail APD is summarized in Algorithm 1. As discussed in [17], the iteration cost for Algorithm 1 is about the same as that for the (18). However, by specifying a different selection of β i , the convergence of Algorithm 1 can be significantly improved.
For the convergence assessment, we propose to use the relative error as our stopping criteria. Classically, the relative error for the proposed algorithm is defined as, Given a prescribed positive small parameter ρ, for instance 10 −4 , when the relative error is less than ρ, our algorithm can be considered to reach a steady state. The convergence of the Algorithm 1 can be guaranteed by the following Theorem: Algorithm 1: An accelerated primal-dual scheme of the proposed model (11).
• Initialize: Theorem 3.1. Choose the parameters β i , τ i , ς i and θ i such that for all i ≥ 1, let (f i , d i ) be the sequence derived form Algorithm 1, then the following conclusion hold: (a) there exists a saddle-point (f , d ) such that (b) the limitation f is exactly the solution of (11).
Proof. (a) can be immediately proved following the Theorem 2.1 in [17]. Moreover, according to Proposition 3.1 in [19], (b) is equivalent to (a). Thus Theorem 3.1 is proved.

4.
Experimental results and analysis. In this section, in order to examine the effectiveness of the proposed method, we present and analyze the experimental results on some synthesized and real blurry images. Note that all the following experiments are implemented in Matlab R2013a on an Intel(R) 3.33 GHz PC with 12 GB RAM. In what follows, we set the parameters of energy (5) and (11) as ξ = 0.01, and ν = 8; set the stopping criteria of Algorithm 1 as ρ = 10 −4 ; and set the parameters of the APD algorithm according to (20), i.e., β i = i+1 2 , τ i = 1, ς i = i 2+i and θ i = i−1 i . 4.1. Effect of blur map. In first part of the experiments, we will verify the effect of our blur map generation algorithm using some synthesised blurry images. For comparison, the results of two other outstanding blur map generation methods, i.e., Zhuo and Sim's method (ZS) [52] and Shen et al.'s method (SHP) [39], will also presented.
In Figure 4, we synthesise an image using four squares where each one processes a constant intensity (see (a)), and then blur 191th-210th and 271th-330th columns of the image by 1 and 4 blur levels respectively. The blurry image are shown in (b). Obviously, there are three edges in (b): the left edge is sharp, the middle edge has a little blur, while the right edge is much blurry.
The blur maps of Figure 4(b) generated by ZS [52], SHP [39] and proposed methods are shown in (c)-(e), respectively. From these results, it can be seen that the result of ZS is gradient-related (see Figure 4(c) and green line of (g)), which    means that it will change according to the varying of image structures, rather than the real blur. SHP's result seems acceptable (see Figure 4(d) and blur line of (g)).
However, it has a jump around the 100-th column, where actually has no blur. On the contrary, the proposed result is more approximate to the ground truth visually (see Figure 4(e) and red line of (g)).
Besides, we compare our method against the other four state-of-the-art methods, i.e., the 'deconvblind' function in Matlab (hereinafter referred to as "Matlab"), Xu and Jia's method (XJ) [49], Cai et al.'s method (CJLS) [8] and SHP method [39], in which the first three methods are the spatially invariant blind deblurring methods, while the last one is the spatially variant blind deblurring method. We note that all parameters of the above algorithms are set according to the authors' recommendations, and we set the initial PSF of the 'deconvblind' as gaussian distribution with size of 4 × 4.
The corresponding deblurred results are shown in Figures 6 and 7, in which the first-fifth columns are respectively the results of Matlab, XJ, CJLS, SHP and (d) CJLS [8] (e) SHP [39] (f) Proposed The above analyses and observations are also confirmed by the image quality indicators PSNR, SSIM, and SI. As shown in Figure 8, our results outperform the others with respect to all metrics. E.g. the PSNR of our result (A5) is at least 5 larger than that of the other methods. As other three kind of results are also present the similar results. Those show that our method performed well over these images, and the results are of good quality with few artifacts.

Real images.
In the third part, we will present the comparison studies on some real blurred images. Since the reference image is not available, here we only use the aforementioned SI as the quality metric. We also compare our method with aforementioned four methods, i.e., the 'deconvblind' function in Matlab, XJ method [49], CJLS method [8], and SHP method [39]. The comparison results are shown in Figures 9-11, and the corresponding quality results are given in Figure 8 [49], CJLS [8], SHP [39], and the proposed methods.
(a) Input (b) Matlab (c) XJ [49] (d) CJLS [8] (e) SHP [39] (f) Proposed  [49], CJLS [8], SHP [39], and the proposed methods.   Figure 5(f) Figure 5(g) Figure 5(h) Figure 9(a) Figure 10(a) Figure 11 In these three figures, (a) shows the original input image, (b)-(f) illustrate the results generated by the Matlab, XJ, CJLS, SHP and proposed methods, respectively. The deblurred results of these methods may differ from each other with different type of image content and the blurring degree. However, from an overall perspective, the proposed method consistently works well, and can recover details with effective quality and few artifacts, while the results of other four compared methods seem inferior than the proposed outcomes.
In detail, the Matlab, XJ and CJLS methods perform not very well in all the figures. For example, as seen in Figure 9, compared with original image (a), (b) and (c) are somewhat clear on the close-shot while blurring on the long-shot. (d) is relatively clearer than (b) and (c) yet it contains much artifacts. The performance of the SHP method relies on how well the separability of blur map, it has blurry results in (e). Compared with (a)-(e), the content in (f) are more pleasing. As shown in the zooming of (f), the word 'Rafael' is much more clear and contains fewer artifacts.
Therefore, the results of the proposed method, which is efficient and visually pleasant with few noticeable artifacts, are consistent on these images and, in general, outperforms that of some other methods.

Efficiency analysis.
In the last part, to overall evaluate the computational efficiency, we test all the compared methods and the proposed method on the aforementioned eight blurred images, i.e., Figure 5(e)-(h), Figures 9-11(a). Please note that XJ method is implemented by C program [50], while others is implemented by Matlab. The CPU time requirements are presented in Table 2.
According to Table 2, The Matlab and XJ methods are the fastest among the five compared methods. However, as shown in above two parts, the results of these two methods are not satisfactory in both performance and visual quality. By taking the Figure 6 as an example, we can see that the PSNRs of our method is nearly 7 lager than that of Matlab, and 5 larger than that of XJ method. As for other three methods, our method is the best one.
Moreover, the CPU time of the proposed method can still be reduced at least via two possibilities: (1): a computer with better configuration since ours is not good enough; (2): employing the parallel computing techniques since some timeconsuming procedures, such as the calculating of NHC (see (4)), can be computed in paralleling way.

5.
Conclusions. We have introduced a tight-frame based method for spatiallyvarying out-of-focus blur removal. The proposed method first generated the blur map using normalized Hölder coefficient and a TV based refining algorithm, then discussed the properties of corresponding kernel matrix. A tight-frame based energy functional was thus built. For tackling functional more efficiently, we used an APD algorithm to obtain the minimizing solution. The performance of the proposed method has been compared with some state of the art algorithms for both synthesized and real images. The quantitative and qualitative results on test images demonstrated that our method can generate reasonable blur maps and can remove the blur efficiently with few artifacts.
Till now, the spatially-varying deblurring is still a high challenging task, and there are many open questions that should be investigated. Our method is effective, but it also can be improved. For example, in blur map estimation step, we cannot distinguish whether a blur texture is caused by defocus or blur texture. Further research will be tried to develop some more precise blur map estimation methods. Besides, the existing spatially-varying deblurring methods mainly estimate the blur kernel and deblurred image independently. We will extend our work to obtain them simultaneously for more excellent deblurring results in future. Baptist University, Hong Kong. The research of F. Li was supported by the National Science Foundation of China (No. 11671002) and the Science and Technology Commission of Shanghai Municipality (STCSM) (No. 13dz2260400).

Appendix.
Proof of property 2.1.

Proof. Let
where A jk is the element of A at pixel (j, k). By Gershgorin circle theorem [26], all the eigenvalues of A are located in the union of M N discs: We know that which implies that G(A) ⊂ U, where U is the unit disc centered at origin. Thus, all the eigenvalues of A are smaller than the radius of U, i.e., λ m < 1.
Proof of property 2.2.
Then (g f )(x) in (1) equals to: The matrix form of the above formula for whole image is exactly √ 2πdiag(σ)A 1 A 2 . Thus Property 2.2 is proved.
Proof of property 2.3.
Proof. Using Property 2.2, we can readily obtain the following inequality according to the compatibility of matrix norm: For any matrix, the induced norm is essentially the largest singular value. Thus, the induced norm of the diagonal matrix diag(σ) can be readily given as, diag(σ) = σ m .
To obtain the bound of A 1 , we can first look for the bound of A 1 ϕ 2 : (23) where A 1 jk and ϕ k are the elements of A 1 and ϕ at pixels (j, k) and k, respectively. We know that A 1 is the matrix form of the g 1 , and g 1 (x) is a 1-D Gaussian function. Let j = (x 2 − 1)M + x 1 , the j-th row of A 1 is exactly constructed from the discrete version of g 1 (x). Besides, given g 1 (x) with an arbitrary variance, when |y 1 − x 1 | = 0, g 1 (x)(y 1 ) < 1 always holds. Furthermore, due to the symmetry property of Gaussian function, when |y 1 −x 1 | = 1, there are three values that greater than or equal to g 1 (x)(y 1 ). That is, g 1 (x)(y 1 ) < 1 3 . Analogously, g 1 (x)(y 1 ) < 1 2k+1 holds when |y 1 − x 1 | = k.
Thus, under the symmetric boundary conditions, the upper bound of A 1 ij is given by, Accordingly, (23) can be further rewritten as: Thus, we have A 1 < π 2 2 − 2. Since the 1-D Gaussian function g 2 (x) is the same as g 1 (x) except the direction of convolution, the bound of A 2 is the same as A 1 .
Therefore, taking all above into account, we have Proof of property 2.4.
Proof. As shown in |A jk | = 1, and each column has at most (2r + 1) 2 non-zero elements, we have,