COLOR IMAGE PROCESSING BY VECTORIAL TOTAL VARIATION WITH GRADIENT CHANNELS COUPLING

,


(Communicated by Sung Ha Kang)
Abstract. We study a regularization method for color images based on the vectorial total variation approach along with channel coupling for color image processing, which facilitates the modeling of inter channel relations in multidimensional image data. We focus on penalizing channel gradient magnitude similarities by using L 2 differences, which allow us to explicitly couple all the channels along with a vectorial total variation regularization for edge preserving smoothing of multichannel images. By using matched gradients to align edges from different channels we obtain multichannel edge preserving smoothing and decomposition. A detailed mathematical analysis of the vectorial total variation with penalized gradient channels coupling is provided. We characterize some important properties of the minimizers of the model as well as provide geometrical results regarding the regularization parameter. We are interested in applying our model to color image processing and in particular to denoising and decomposition. A fast global minimization based on the dual formulation of the total variation is used and convergence of the iterative scheme is provided. Extensive experiments are given to show that our approach obtains good decomposition and denoising results in natural images. Comparison with previous color image decomposition and denoising methods demonstrate the advantages of our approach.
1. Introduction. Techniques based on variational and partial differential equations (PDEs) are widely used in various image processing tasks. In particular, classical ill-posed problems such as image restoration and denoising can be effectively tackled by variational energy minimization models. One of the famous method is the total variation (T V ) regularization which was original proposed for edge preserving image restoration [58]. The well-known Rudin-Osher-Fatemi (ROF) image denoising model seeks a minimizer of the sum of a fidelity term measured in the square of L 2 -norm along with the T V regularization, where f : Ω ⊂ R 2 → R is the given noisy image, BV (Ω, R) is space of bounded variation functions, and λ ≥ 0 is the regularization parameter. The T V term acts as an important regularizer penalizing the function's gradient [63,36]: • Sharp edges are recovered without noise and avoids discriminate smoothing associated with L 2 regularization functionals. • Edge locations are preserved, and under certain conditions edge locations kept exactly. This is important as dislocations of edges while smoothing is detrimental to further higher level image processing tasks. Despite its usefulness in image restoration, T V regularization is known to exhibit blocky or staircasing artifacts in solutions [47]. To avoid such artifacts various modifications has been studied in the recent years like higher order regularization [38,41] and adaptive or weighted T V regularization [64,62,56,52,57]. For example, consider the weighted T V regularization, with function g, an edge indicator function that vanishes at object boundaries, g(x) = (1+β |∇f (x)| 2 ) −1 where β > 0 is a constant parameter. Recently, other nonsmooth data fidelity terms [45] have found to be useful in various image processing applications [65,25,72,5,44,6,40,73,34,29,35,33,43] where the properties of corresponding minimization formulations are very well stablished [42,46].
Bresson et al [14] have considered a similar regularization model with a L 1 data fidelity term, (3) min We refer to [22,30] for the geometric motivation and properties of L 1 fidelity term. The minimization of the gT V + L 1 model given by (3) yields a contrast invariant filter [22,27,61] and well preserves contrasted features at different scales. The weighted T V norm allows a better preservation of corners and sharp angles in shape denoising. Moreover the introduction of such as function allows to establish a link between gT V and the geodesic active contours model introduced by [17,50,19] as an improvement of the original snakes model [39]. The variational minimization problem (3) involves the non-differentiable total variation regularization. Nevertheless, it can be solved by standard calculus of variations and Euler-Lagrange equations toward a smooth approximation of the L 1 -norm driving the function u towards a minimum [9,23,14]. Following the work in [24,20,5,8,7] a convex regularization of the variational model is implemented in [14] by applying a dual formulation of the T V norm. While most existing work involving T V focuses on scalar valued functions, the generalization to vector-valued images remains an important challenge. The main idea of the vectorial total variation (V T V ) approach is to extend the T V definition for vector-valued functions u = (u 1 , . . . , u N ) : Ω → R N , also referred as multispectral images, with u i ∈ BV (Ω, R) such that in the scalar case, N = 1, both definitions coincide. Two important notions of vectorial total variation can be considered: • Channel by channel summation: Total variation for vector (multichannel) functions can simply be channel by channel summation [2]. This simple modelling does not assume any coupling between channels and the multidimensional T V consist of summing up the contributions of separate channels. Therefore, chromatic edges can be lost and color smearing can occur between inter-channel edges, see [13,31] for discussions. The V T V expression is given by, • Implicit coupling with Euclidean norm: In this approach, an implicit coupling between channels is considered such that each channel uses information coming from other channels [11,13]. The V T V definition is given by, Coupling channels in a variational/PDE setting for improving denoising and other image processing performance has been done by many researchers in the past. One of the earlier works is by Blomgren and Chan [11] who utilized the implicit coupling via the V T V 2 above. However as noted in [66] color smearing can still appear near edges and the staircasing of flat regions can persist. Bresson and Chan [13] used a dual formulation to decrease the computational expense via T V minimization. A multiscale approach along with Moreau-Yosida based primaldual approach was considered by Dong and Hintermuller [28]. Brook et al [15] used the inter-channel term E c (u) = 3 i=1 ∇ x u i × ∇ y u i 2 , where the derivatives (∇ x u, ∇ y u) are computed in their corresponding (x, y) axis directions. This helps in aligning the color edges as in our model. Moreover the intrachannel term is chosen to be isotropic, φ( ∇u i ) = ∇u i 2 . The corresponding wellposedness of the minimization problem and its numerical convergence are not proven. Also, the term |∇ x u × ∇ y u| 2 is of order |∇u| 4 and cannot be bounded below by c |∇u| 4 for some c > 0, thus we cannot prove a similar result such as Theorem 2.4 in our work. More recently Ehrhardt and Arridge [32] use a similar alignment for coupling parallel level sets. A closely related approach denoise color or multichannel is to use anisotropic diffusion [59,66,12,55]. They typically involve similar coupling as in the variational approaches or selectively denoise color images in pure chromaticity spaces. Tschumperle and Deriche [68] unified such vector valued diffusion PDEs into one common framework via tensor diffusion. Goldluecke et al [36] use implicit coupling via a natural vectorial total variation motivated from geometric measure theory. Ono and Yamada [48] decorrelate the different channels along with vectorial total variation.
In this paper, we study V T V with an explicit channel coupling term which uses component gradients of a multispectral image. Our aim here is to demonstrate that it is possible to perform an efficient minimization of the V T V with proposed coupling term. This paper, besides developing new coupling models, also proposes a new numerical scheme based on Chambolle's dual minimization [20] to perform the algorithm evolution in an efficient and fast way. The aims of this paper is to introduce a regularizing term allowing to generalize the T V term to multispectral images. The main contributions of this paper are summarized as follows: (1) Introduction of the new vectorial total variation with coupling channels (V T V ) approach which penalizes different image gradient components in order to perform edge preservation across channels. (2) Presentation of the different properties related to the image processing schemes involving the new V T V norm. In this context, we show the existence of solutions to the decomposition model. We also consider the L 1 data fidelity term along with V T V regularization with coupling for image decomposition. The coupling terms provide better edge alignment and can also be used in a PDE formulation [55,51,53]. Here we utilize weighted T V regularization along with coupling gradient channels for obtaining decomposition and denoising of images. Our aim here is to provide a detailed mathematical analysis of the variational minimization method and provide some applications in decomposition and denoising of noisy images. Experimental results with other related variational/PDE color image denoising schemes indicate that the proposed approach consistently outperforms them in terms of different error metrics. The rest of the paper is organized as follows. Section 2 explains the model in detail using the theory of level sets, and in Section 3 we describe the implementation of the proposed model using the dual minimization. Section 4 provides some experimental results. Section 5 concludes the paper.
2. Vectorial total variation with channel coupling.

2.1.
Coupling channels. For motivating our V T V scheme, we first consider a scenario where two 1-D spatially varying edges meet ( Figure 1, First Row). Let us consider the signals u 1 and u 2 (Figure 1(a)) satisfying the following relation in terms of gradient magnitudes: |∇u 1 | > |∇u 2 | (Figure 1(b)). In order to define a unique correlated edge we will penalize the expression ∇u 1 − ∇u 2 (Figure 1(c)) in terms of L 2 norm. Caselles et al [18] have shown that the geometry is constrained by the intensity channel along and the image level lines contain all the salient information. Then, the main idea of the coupling terms is to constrain the regularization near discontinuities by penalizing large derivations in gradients between different channels, therefore making them to look different to the normal Red, Green and Blue image gradient magnitudes. Second row in Figure 1 shows the behavior of the proposed coupling term over a synthetic image defined by the intersection of three different components. This synthetic image is composed of discs determined by pure chromatic colors of red, green, and blue. The gradient differences for each channel clearly differentiates from channel-wise gradients (|∇u i | 2 ) since the coupling term is able to align edges over different channels. This in turn helps to keep the edge contrast during the further vectorial total variation smoothing process. Intersected edges over different colors are penalized by minimizing the terms J(u k , u l ). From the geometrical point of view, the proposed coupling term is well suited as it does not degrade the edge contrast quality of non-correlated channels, as shown in third row of Figure 1. In this case, a synthetic image where the green color circle does not intersect with the red and blue circles. Behavior of the coupling channels for a color image with crossing edges between different channels Input image Behavior of the coupling channels for a color image with crossing edges over two color channels Input image Second-Third Rows: Show the edge contrast quality due to the proposed coupling terms over two color images, and the differences with the gradient magnitude components.
In general, we propose the coupling term J(u k , u l ) = Ω |∇u k − ∇u l | 2 dx for two given channels u k and u l . Then, the penalizing term extending the T V term for a multispectral image u : Ω → R N , with Ω ⊂ R M a bounded open domain, is given by the following regularization term, where it is expected to penalize large differences between gradient images [55,51]. We further define the coupling-V T V of u to be the following, where α ≥ 0 is a parameter for the cross-coupling term. Thus, the above model unifies the weighted T V regularization with coupling terms for multichannel image processing. We next provide detailed mathematical analysis of the coupled V T V method along with L 1 fidelity case. Note that the L 2 -norm is used in the proposed term to smooth common salient edges within the different channels as well as to distinguish between scaled edge features with nonuniform intensities. A general framework with J(u k , u l ) = Ω ψ(|∇u k − ∇u l |) dx for a convex function ψ can also be accommodated within our framework studied here and we restrict ourselves to the canonical example of L 2 differences.

2.2.
Analysis of the coupling V T V model. Under definition (7) and following the original T V theory in [4,2] we define for N ≥ 1 the space BV (Ω, R N ) of vector valued functions as the set of functions u ∈ L 1 (Ω, R N ) such that V T V (u) < ∞ according to equation in (7). The space BV (Ω, R N ) endowed with the following norm: where u BV (Ω,R N ) := V T V 1 (u), is a Banach space. We recall further properties of V T V and a decomposition property which extends to our coupling with V T V .
• Compactness: Every sequence {u n } ∈ BV (Ω, R N ) such that u n L ∞ (Ω,R N ) ≤ M , for all n ∈ N, admits a subsequence {u n k } converging in L 1 (Ω, R N ) to a function u ∈ BV (Ω, R N ). • Decomposition of u BV (Ω,R N ) : The functional V T V can be decomposed as follows where D c u is the Cantor part of Du, J u is the set of all jump points of u, u + k , u − k are the jump functions and H M −1 denotes the (M-1)-dimensional Hausdorff measure in R M with Ω ⊂ R M .
Given u ∈ BV (Ω, R), the gT V term is defined as: (11) gT We also adopt the notation [14] (12) We recall that the perimeter of a Borel measurable set Γ ⊂ Ω is defined by where 1 Σ is the characteristic function of Σ. For a given image u and any ξ ∈ R, we define the upper level set We should also use the co-area formula where γ ξ is the boundary of the set Γ(u, ξ).
The following result shows that our proposed V T V term decouples the level sets of the given multichannel image for each channel, making it a geometry problem for each channel's level set. The proof relies the perimeter (13) and its connection to the T V regularization.
Proof. From the standard total variation based co-area formula (15), Now, let f k = ∇ xi u k and f l = ∇ xi u l . We have that Let us observe that and similarly We also set By using the identities in (18), (19), (20) and (21), we get A combination of (17) and (22) leads to the desired result.
2.3. Vectorial T V with L 1 fidelity term. The gT V + L 1 model is one of the most influential variational PDE-based image decomposition models in image processing [22,14]. This decomposition model separates a grayscale image in cartoon and texture components while preserving main features such as edges. For multispectral images, f : Ω → R N , we extend the gT V + L 1 model to the vectorial case called gV T V + L 1 by considering the following variational problem: Note that in the edge indicator function the noisy image f is smoothed with a Gaussian kernel 1 . When g ≡ 1 we refer to our model as V T V + L 1 .

Remark 1.
Let us note that (23) with L 2 image fidelity term is straightforward [54,55], By following [22] and the result from Theorem 2.1, we show that the gV T V + L 1 based functional (23) can also be expressed in terms of level sets. That is, the following theorem proves the equivalence of the energy functional in (23) based on V T V with L 1 data fidelity term and each channel level sets via the co-area formula.
Theorem 2.2. Given a noise image f and a function u ∈ BV (Ω, R N ) we have (25) Proof. The result follows from the weighted co-area formula (15), equation (22) and by noticing that, Following the level set formulation of the functional E 1 gV T V given in Theorem 2.2, and the convexity of the minimization problem (23), we can consider the following geometric sub-problems (27) min With the geometrical minimization formulation (27), we can prove the monotonicity property, which is in turn a generalization of the property called the submodularity of P er functional [1].
Given an image f k and two minimizers Γ 1 and Γ 2 with finite parameter of the minimization problem (27) respectively, the following identities are satisfied In order to prove (31) we observe the following identities, (33), (34), (35) and (36) combined all together, it follows Similarly, the same previous reasoning can be applied to coupling term G 3 , and we get By considering the sub modularity property of P er functional and the identities (37) and (38) that To prove the identity (32), we apply a similar reasoning as before to the term The following theorem proves that the our gV T V + L 1 energy minimization problem (23)  Proof. For the proof of this theorem see Appendix A.
The parameter λ in our minimization functional (23) is crucial in experimental results (see Section 4.2 which shows the importance of λ with respect to image denoising and different error metrics). The value of λ determines how much of important geometrical features are retained. We extend the work of Scherzer et al [60] to the vectorial case and study G(Ω) color spaceá la Y. Meyer [42] for obtaining important quantitative guidelines, see Appendix B.
3. Dual minimization based implementation. Implementation of V T V with coupling in (23) can be done in a variety of ways. The dual minimization technique for total variation introduced by Chambolle [20] is an attractive one as it is proven to be very efficient. We adapt dual minimization to our model and provide a convergence result for the alternative minimizations. Following [14] we use a splitting scheme by introducing an auxiliary variable v, or equivalently the subproblems where θ is chosen to be small so that f k ∼ u k + v k , where (u k , v k ) represents the cartoon and texture components respectively. Since the functionalẼ 1 k is convex, (42) can be split into two alternating minimization problems and a fast numerical scheme based on Chambolle's dual minimization [20] can be used.
The solution of the minimization problem (42) is given by the following alternating minimization steps: 1. Fixing v k , the minimization problem in u k is: The solution of (43) is given by which is solved using a fixed point method: p 0 k = 0 and We remark that (43) can also been implemented using algorithms which solve T V minimization problems of the form min u {gT V (u) + λ u − f 2 }, such as the split Bregman [37], fast gradient based (FISTA) [10], and the firstorder primal-dual (due to Chambolle and Pock) [21] algorithms, by considering and the solution is found as By implementing the algorithm, we get the sequences, The following theorem provides a proof of convergence of these sequences based on the coerciveness of the functionals in (42). This result is supported by experimental results, see for example Figure 7 energy plots. To reduce the computational cost one can utilize faster alternative algorithmic implementations such as split Bregman [37], FISTA [10] or Chambolle-Pock first order primal-dual [21]. We consider here color image processing, that is, N = 3, for RGB channels and extension of the cross-channel term (6) to cases N > 3 is straightforward [51].

4.1.
Decomposition and multiscale decomposition results. Image decomposition provides a separation of a given image into cartoon (piecewise constant) and texture components. For obtaining decomposition results we choose λ = 1/2 in our scheme, since with this value we are able to obtain nice cartoon (smooth) regions along with edge preservation. A better justification for a practical choice of the λ parameter is displayed in Tables 1-3 for the denoising results. In Figure 2 we use two images from the Mosaic art database to first show the cartoon component result using the proposed V T V model with L 1 (23) and L 2 (24) fidelity terms. We also display the result incorporating the weight function g in the proposed model. By comparing the cartoon component of the proposed gV T V + L 1 against the gV T V + L 2 scheme for the input image Figure 2(a) we see that they behave qualitatively differently. For example, different shapes of the input image are preserved well in our scheme using the L 1 fidelity terms whereas the result using the implementation with the L 2 fidelity terms looks blurred in the final result. Canny edge map computed with all the three channels using the MATLAB's command edge(Input,'canny') for both different results (implementing gV T V + L 1 and gV T V + L 2 schemes) indicates that chromatic edges are preserved coherently and the spatial location is intact. Figure 2   proposed V T V with coupling channels approach by using edge indicator functions g k = (1 + |∇f k |) −1 and g k = 1. As can be seen from the corresponding cartoon and edge map components, the result using edge indicator function contains smaller scale chromatic edges compared with setting g k = 1 in our implementation. Figure 3 shows different texture components for Boats and Goldhill images. Figure 3(c-d) shows the texture component given by the proposed scheme and its scaled version. The scaled texture component image is obtained by linearly transforming its range to [0, 255] and making differences outside this range to saturate at 0 and 255. Figure 3(f-h) shows the texture component results in RGB channels, proving that the texture available in each channel is captured with our scheme. Figure 4 shows the proposed color coupling channels for the cartoon components of close-up from M otorcycle & Lighthouse images. As can be seen, different channel edges are captured by the coupling terms (bottom row) and the final cartoon (smooth) components are devoid of small scale texture details. Moreover, the smoothing is edge preserving due to the weighted T V regularization. Figure 5 provides cartoon and texture components for a variety of natural color images using our scheme as stopping parameter increases from = 10 −4 to = 10 −2 . We notice that when the stopping time decreases texture details are gradually lost in the cartoon components and they are added to the texture component.
We next show in Figure 6 a comparison of the cartoon and texture components for the Lighthouse image using different vectorial total variation terms along with L 1 data fidelity: gV T V 1 + L 1 and gV T V 2 + L 1 with our coupling V T V method. We can observe that cartoon component results look visually similar for all three

Input images
Cartoon components   models, although our scheme retains the structures better. This can be observed in the color distribution of cartoon components using the corresponding histograms. It can also be noted from scaled texture components that edges are smoother for our proposed gV T V + L 1 scheme. Figure 7 depicts cartoon-texture decomposition experiments using different natural images from Berkeley Segmentation data set (BSDS500), using our gV T V + L 1 model, compared with gV T V 1 + L 1 and gV T V 2 + L 1 schemes. A close look at the cartoon components reveals that our scheme obtains better preservation of small scale geometrical structures. For both the Butterf ly, Lady images we show corresponding energy decrease against number of iterations of the dual minimization implementation for different V T V schemes (see Theorem 3.1).
Following Yin et al [73] and Tang et al [67] we can make the gV T V +L 1 (23) model with multiscale parameter λ k = λ 0 /2 k in the L 1 -fidelity terms with k = 1, 2, . . . and λ 0 a given value. Figure 8 displays 5 scales of our decomposition for the close-up of Barbara color image. Cartoon components (Figure 8(b-e)) retain salient edges during increasing steps and corresponding (scaled) texture components (Figure 8(gj)) start to show progressive capture from small scale textures to big scales.

4.2.
Denoising results. Note that the decomposition provides cartoon -piecewise cartoon component which is obtained using a weighted T V (gV T V ) regularization in an edge preserving way. This allows us to apply our model to color image denoising as well, with u being the denoised image and v noise/texture part. To compare the schemes quantitatively in denoising we report our results using three standard error metrics: mean squared error (MSE), peak signal to noise ratio (PSNR) [4], and mean structural similarity measure (MSSIM) [71]. We also fix the stopping parameter = 10 −2 for all the results.
where |Ω| is the size of the image domain and u 0 is the original noise-free color image. 2. PSNR is given in decibels (dB). A difference of 0.5 dB can be identified visually. Higher PSNR value indicates optimum denoising capability.
PSNR(u) := 10 * log 10 3. MSSIM index is in the range [0, 1]. MSSIM value near one implies the optimal denoising capability of the scheme [71] and is mean value of the SSIM metric. SSIM uses a combined product of luminance, contrast and structure comparison measures. The SSIM is calculated between two windows ω 1 and ω 2 of common size N × N , where µ ωi the average of ω i , σ 2 ωi the variance of ω i , σ ω1ω2 the covariance, c 1 , c 2 stabilization parameters, see [71] for more details.   Tables 1, 2 and 3 compare the MSE, PSNR and MSSIM performance measures for the gV T V 1 + L 1 , gV T V 2 + L 1 and the proposed gV T V + L 1 schemes by using different λ values weighting the L 1 fidelity term averaged over the complete BSDS500, respectively. The idea behind choosing different λ values is to determine which value is able to capture important geometrical features (see Theorem B.5 in Appendix B related to the choice of λ). The proposed vectorial total variation approach with coupling channels scheme performed better than other vectorial total variation approaches over different λ values. We also see that MSE, PSNR and MSSIM value change over different λ values, with decreasing MSE values and increasing PSNR and MSSIM values. We note that λ = 1 is the best choice for multichannel edge preservation property followed by λ = 1/2 k with k = 0 which tends to over-smooth the images. Note that the MSSIM is a better error metric than MSE and PSNR as it provides a quantitative way of measuring the structural similarity of denoised image against the original noise-free image. Different images for visualizing the improvements of the proposed gV T V + L 1 scheme are shown in Figure 9, and Figure 10 where we can see the effect of different V T V regularizations corresponding to λ choices.
Next we use the digitized color images taken from the USC-SIPI standard image dataset to compare the performance of the proposed scheme (23) (Our) together with the multiscale case (M-Our) along with following schemes: variational regularization (BKS, gradient descent with finite differences) [15], vector valued regularization with PDEs of (TD, spatial discretization with centered finite differences) [68], regularization of vector signals based on vectorial T V norm (BC, Chambolle's dual approach [20]) [13], Perona and Malik with a coupling term [55] (PS, finite differences), multiscale vectorial L τ -T V scheme for color images (DHC, using a unified Table 1. Average MSE values for the gV T V +L 1 proposed model and the gV T V 1 + L 1 , gV T V 2 + L 1 schemes (see equations (4) and (5)) over different Gaussian noise levels and λ parameters on the complete Berkeley segmentation database (BSDS500). Lower value indicates better result.

Method
Noise  Table 2. Average PSNR values for the gV T V + L 1 proposed model and the gV T V 1 +L 1 , gV T V 2 +L 1 schemes (see equations (4) and (5)) over different Gaussian noise levels and λ parameters on the complete Berkeley segmentation database (BSDS500). Higher value indicates better result.  Table 3. Average MSSIM values for the gV T V + L 1 proposed model and the gV T V 1 +L 1 , gV T V 2 +L 1 schemes (see equations (4) and (5) Moreau-Yosida based primal-dual approach) [28] which uses the V T V 2 regularization, vectorial total variation derived from the generalized Jacobians from geometric measure (GSC, primal-dual method [21]) [36], multichannel gradients combined by parallel level sets (EA, central differencing scheme) [32], and separate gradient measurement of the luminance component and that of the chrominance (OY, primal-dual method [21]) [48]. In these results, we do not average the PSNR and the MSSIM error metrics over the RGB channels. Instead, we apply directly the PSNR error metric for multichannel images given by equation (46) and (47), respectively. In case of the MSSIM error metric we adapt it to color images by converting the color image into a grayscale (for example, using MATLAB's command rgb2gray) and then compute MSSIM for the converted grayscale image. Results are displayed in Table 4 for seventeen standard test images of size 256 × 256 and 512 × 512. Our proposed vectorial total variation with coupling tends to perform better than    other variational/PDEs base diffusion methods by improving color enhancement properties along with edge preservation. Although, we are not comparing with other state-of-the art schemes (non variational-PDE methods) in terms of PSNR and MSSIM, the last two columns in Table 4 show the results for non-local means (BCM [16]) and block matching 3D filtering (DFKE [26]). These results evidence that the proposed model is able to remove noise while preserving edges (according to PSNR (dB) & MSSIM values, which are relatively close to these state of the art schemes). Finally, Figure 11 shows a visual comparison of different methods on sub-images (size 330 × 360) of the noisy Lena test image with σ = 20 for the T V based schemes OY [48], GSC [36] and DHC [28]. Visual comparison of various regions show that color smearing artifacts occur in chromatic edges in other schemes due to channel mixing and diffusion transfer and also some staircasing artifacts in flat regions are visible. The method noise comparison shows that our multiscale method leaves no visible structure in the residue in comparison with other schemes. The parameters  involved in all the comparison schemes are chosen based on maximum peak signalto-noise ratio (PSNR).

Conclusion.
In this paper, we study a multichannel image restoration scheme with total variation regularization and L 1 fidelity. By using matched gradients along with cross channels to align the edges our scheme performs multichannel smoothing and restoration. We provided a complete mathematical characterization of our vectorial total variation along with channel coupling model using its geometrical formulation, which provides some insights for the original model. Utilizing the notion of G-sets and G-values, we give a characterization of a solution of the proposed minimization problem in terms of the λ parameter, as well as the ability of such a parameter to decouple image features at different level sets. A fast dual minimization based implementation was used and experimental results indicate the proposed scheme provides better smoothing than other related models. Experimental results with other related color image denoising schemes indicate that the proposed approach consistently outperforms other color based restoration methods in terms of the perceptually guided PSNR (dB), MSSIM and MSE error metrics.   [48], (e) GSC [36] and (f) DHC [28].
Appendix A. Existence of solution and convergence of the dual minimization based scheme.
Proof of Theorem 2.4. Let C := inf E 1 gV T V (u), and let {u n } ∞ n=1 ⊆ BV (Ω, R N ) be a minimizer sequence for the energy E 1 gV T V , i.e., Then, for a given k = 1, . . . , N , it follows and consequently sup n∈N Ω |∇u n k | dx < ∞. Moreover, Then, we have that {u n } ∞ n=1 is bounded in BV (Ω, R N ). By compactness property of the BV (Ω, R N ) space there is a subsequence also denoted by {u n } ∞ n=1 , strongly convergent to an element u * = (u * 1 , . . . , u * N ) ∈ L 1 (Ω, R N ). Therefore, it follows that u * ∈ BV (Ω, R N ) and We also notice that the fidelity term and the coupling term in E 1 gV T V are convex in u, then both term are lower semicontinuous with respect to the BV (Ω, R N ) weak * -topology and therefore and (51) Finally, combining inequalities (49), (50) and (51) on a suitable sequence gV T V . Proof of Theorem 3.1. Since we are solving a splitting scheme defined by the subproblems (43) and (45), the following inequalities hold . We also have that the sequenceẼ 1 gV T V (u k , v k ) is non increasing and bounded by zero. Then, there exist n ∈ R such that n = lim k→∞Ẽ gV T V is coercive, we have that the sequence (u k , v k ) is bounded in BV (Ω, R N ) × BV (Ω, R N ) allowing us to extract a subsequence (u kr , v kr ) converging to (u * , v * ), i.e., (u kr , v kr ) kr→∞ − −−−− → (u * , v * ) and by (44) we have (53) u Considering that 1.Ẽ 1 gV T V (u kr+1 , v kr ) ≤Ẽ 1 gV T V (u, v kr ) for all u ∈ BV (Ω, R N ), and 2.Ẽ 1 gV T V (u kr , v kr ) ≤Ẽ 1 gV T V (u kr , v) for all v ∈ BV (Ω, R N ), it follows from (53) that for all u ∈ BV (Ω, R N ) Appendix B. The color G(Ω) space. Characterizing the minimizers of nondifferentiable regularization functional has been analyzed by many authors. For grayscale images, Y. Meyer [42] defined the space G(R 2 ) to model texture using the ROF scheme [58]. Here, we extend the notion of G-sets and G-values studied in [60] to the vectorial case. For an observed color image f , we show a relation between the solutions of the minimization problem (23) and the parameter λ which balances the gV T V regularization term and L 1 fitting term. This will help us to correctly model the cartoon and texture decomposition, i.e., f ≈ u + v. A quantitative study using MSE, PSNR (db) and MSSIM error metrics is given on different selections for the λ in denoising, see Section 4.2 and Tables 1, 2 and 3. See also [3,70,69,49,73,31,61] for related efforts to study T V models in this context. The following definition is for vectorial G-values using measurable functions.
We define the G-value of Ψ as follows:

Remark 2.
There is no distinction between the set Ψ and the function Ψ. In case Ψ is a single value and measurable then G( Ψ) is the G-norm of Ψ as defined in [31].
We next derive the definition of G-sets for the vectorial case. Let ∂|g| denote the set-valued subdiferential of g 1 , i.e., (59) ∂|g An important G-value property of ∂|g| is given by the following lemma which relates the parameter λ and our gV T V energy.
for all h ∈ BV (Ω, R N ). Moreover Definition B.4. Let f ∈ L 1 (Ω, R N ) and u ∈ BV (Ω, R N ). Assume that for every h ∈ BV (Ω, R N ) We define the G-set for the vectorial case as Note that for λ ∈ G u (∂|u − f |) it follows that for every h ∈ BV (Ω, R N ) and thus An equivalence relation between finding a solution of the minimization problem (23) and choosing an appropriate parameter λ for the gV T V + L 1 model is stated in the next theorem.
Theorem B.5. For λ > 0, u λ is a minimizer of the vectorial gV T V +L 1 model (23), if only if λ satisfies Proof. If u λ = (u λ 1 , . . . , u λ N ) is a minimizer of (23), we have that for any h ∈ BV (Ω, R N ) and = ( 1 , . . . , N ) with k > 0 for every k = 1, . . . , N the following inequalities (67) The convexity of gT V (u λ k ) + α N l=1 J k,l (u λ k , u λ l ) implies for all 0 < k < 1. Since η( k , u λ k , h k ) → {u k =f k } |h k | dx, we found by using (69) and letting → 0 that Conversely by (66) and the convexity of | · | it follows that (71) The proposed gV T V + L 1 model is expected to discriminate between different features in an image during regularizing while preserving strong geometrical features such as edges and corners. For this purpose, it is essential to get bounds on the values for the parameter λ that can provide a guideline in experiments. In the next theorem, it is shown that for some particular selections of the parameter λ, we can separate different level sets of an image at multiple scales.
In order to prove that u N1 = 0, we proceed by contradiction by assuming that u N1 = 0. Given N 3 = N \ (N 1 ∪ N 2 ) with u N3 = 0, we notice that Therefore, On the other hand, since u is a minimizer (81) and due to the fact that f N3 = 0, we get the inequality which is different from the identity in (80). Then, considering the definition of the subset N 3 , it necessarily happens that u N3 = 0, and we get (83) We also have from the convexity of | · | that From inequalities in (84) and (85) we get and at the same time from Lemma B.3, it happens which is a contradiction with (86). Then, we have u N1 = 0.