MULTIPLICATIVE NOISE REMOVAL WITH A SPARSITY-AWARE OPTIMIZATION MODEL

. Restoration of images contaminated by multiplicative noise (also known as speckle noise) is a key issue in coherent image processing. Notice that images under consideration are often highly compressible in certain suit-ably chosen transform domains. By exploring this intrinsic feature embedded in images, this paper introduces a variational restoration model for multiplicative noise reduction that consists of a term reﬂecting the observed image and multiplicative noise, a quadratic term measuring the closeness of the underlying image in a transform domain to a sparse vector, and a sparse regularizer for removing multiplicative noise. Being diﬀerent from popular existing models which focus on pursuing convexity, the proposed sparsity-aware model may be nonconvex depending on the conditions of the parameters of the model for achieving the optimal denoising performance. An algorithm for ﬁnding a criti- cal point of the objective function of the model is developed based on coupled ﬁxed-point equations expressed in terms of the proximity operator of functions that appear in the objective function. Convergence analysis of the algorithm is provided. Experimental results are shown to demonstrate that the proposed iterative algorithm is sensitive to some initializations for obtaining the best restoration results. We observe that the proposed method with SAR-BM3D ﬁltering images as initial estimates can remarkably outperform several state-of-art methods in terms of the quality of the restored images.

1. Introduction. Multiplicative noise (i.e., speckle noise) appears naturally in coherent imaging systems, such as ultrasound imaging [31], laser imaging [27] and synthetic aperture radar (SAR) [24], due to random interference between coherent returns. Observed intensity images through coherent imaging systems usually have low signal-to-noise (SNR) ratios and high noise levels [2]. Suppressing noise in these images is therefore of vital importance in follow-up processes by human observers or computer programs. The goal of this paper is to develop efficient methods for restoration of images contaminated by multiplicative noise.
An image f degraded by multiplicative noise is modeled in the form of where u η denotes the componentwise multiplication of a clean image u and noise η. We assume that each component of the clean image u and the noise η is positive, and so is that of f . The multiplicative noise in each pixel is usually assumed to follow a Gamma distribution with probability density function [13] p where Γ is the classical Gamma function and L is the number of looks. Hence η has mean 1 and standard deviation 1/ √ L. A high number L corresponds to a high signal-to-noise ratio of the observed image. Using the Bayesian maximum a posteriori (MAP) estimator for multiplicative Gamma noise, various variational models [1, 4, 10-12, 14, 16, 17, 19, 28, 32] for removing multiplicative noise were proposed recently. In [1], a model, referred to as AA model, has a nonconvex data fidelity term and a convex total variation regularization term. Through the logarithm transform, many methods form optimization convex models by using the log-image data and the exponential of the obtained minimizer is considered as the restored image, see e.g. [4,12,14,16,28] and the reference therein. The mth root transformation for relaxing the nonconvexity of AA model has considered in [17,32]. Based on the statistics of the noise, an additional quadratic penalty term in [11] is added to the AA model to obtain a strictly convex one under certain conditions. In addition, the convex I-divergence (I-DIV) model [29], typically designed for dealing with Poisson noise, was used for speckle reduction. Most recently, by using Box-Cox transformation, the multiplicative noise removal problem was converted into the additive noise removal problem followed by applying BM3D method to get the final recovered image [15].
In our recent work [20], the following model was proposed in the log-image domain where ·, · and · 2 denote the standard inner product and the usual 2 -norm over R n , respectively, α is a positive number, 1 denotes the vector having one for all components, β ≥ 1 depends on the noise level, and λ is the positive regularization parameter. The term x TV denotes the total variation (TV) of x. The operation f w denotes the componentwise division of f by w for f, w ∈ R n + . Let N be the set of all natural numbers and N n := {1, 2, . . . , n}. The exponential function at x ∈ R d is defined as e x := (e xi : i ∈ N n ) and the square root function at w ∈ R d + as √ w := ( √ w i : i ∈ N n ). The three terms in the objective function of (2) are: a data fitting term resulting directly from the presence of the multiplicative noise, a quadratic term reflecting the statistics of the noise, and a sparse regularizer.
Numerical experiments presented in [20] show that this model compares favorably to several state-of-the-art reference models in terms of the peak signal-to-noise ratios of the denoised images and the CPU time consumed. Certain restrictions were imposed on the parameters α and β to ensure the convexity of the optimization model (2), which certainly limit the ability of the model to yield denoised images of high quality. Furthermore, for the TV regularization, it is only optimal in describing piecewise constant images and sometime causes undesirable staircase artifacts. We can view the TV regularization in a different way. The total variation x TV can be rewritten as ψ(Hx), where ψ is an 2,1 -like sparse promoting function and H is from the first order difference matrix. Therefore, the total variation regularization prefers Hx to be sparse. Unfortunately, Hx is not sparse in general, and, as a result, the denoised image with model (2) may suffer from over-constraints on its sparsity under the first order difference matrix H. Instead, Hx is approximately sparse in the sense that it consists of a few significant entries while the rest of the entries are essentially negligible.
In this paper we propose a new model for multiplicative noise removal that can address the aforementioned issues in model (2). This model is nonsmooth, and could be nonconvex for obtaining the restoration performance. We characterize the solutions of the model via a coupled fixed-point equations derived from the proximity operators of the functions associated with the model. An algorithm based on the characterization is developed to find the solutions of the fixed-point equations. Convergence analysis of the algorithm is provided. The performance of our method is demonstrated with numerical examples.
The rest of this paper is organized as follows. In section 2, we introduce our model for multiplicative noise removal and study mathematical properties of the model. In section 3, we develop an algorithm for the model and then analyze its convergence property. In section 4, we specify the proposed model regularized by TV and present the corresponding algorithm for solving it. We review some existing models for multiplicative noise removal in section 5. We demonstrate performance of our model with numerical examples in section 6 and give a conclusion in section 7. 2. Proposed model and its characterization. This section contains main results concerning multiplicative noise removal. We present a variational model for restoring images degraded by multiplicative noise. The proposed model is motivated from addressing the issues in model (2) mentioned above. We study the existence of solutions to the proposed model and explore conditions that make the resulting model convex. The solutions of the model are characterized in terms of the fixed-points of a nonlinear map formed via the proximity operators of the functions appearing in the model. Based on the characterization, we develop an algorithm for finding solutions of the model. Convergence analysis of the proposed algorithm is provided.

2.1.
Model. An image is often assumed to have a sparse representation under a basis having vanishing moments. The total variation [26] and wavelets (such as orthogonal wavelets, framelets, curvelets, bandedlets, and the discrete cosine transform [8,9]) are appropriate bases for fulfilling this requirement. Let H denote the transform matrix associated with the filters of an wavelet system. For an image x, Hx is the vector of the wavelet coefficients of x. From the viewpoint of wavelet analysis, the vector Hx is approximately sparse, that is, this vector has few significant components while the rest of its components are essentially negligible. With this in mind, we propose the following model to reconstruct the clean image u in (1): where for (x, y) ∈ R n × R m , we define where α, β, µ, and λ are all positive. In (6), we assume that the matrix H of size m × n is able to sparsify the image x while ψ is a sparsity promoting function. We assume that ψ is a proper lower semicontinuous convex function on R m and is able to promote sparsity. Examples of such functions include the p norm with 0 ≤ p ≤ 1 and the 2,1 norm among many others. Suppose that a pair (x , y ) is a solution of problem (3). Then e x is viewed as an approximation to the clean image u in (1).
Let us explain the proposed model (3). There are two variables x and y in the model. The first term Φ(x) from the first two terms in model (2) reflects the presence of the multiplicative noise and the statistics of the noise. The third term Ψ(y) enforces the sparsity of y. The image x is assumed to be almost sparse in its information content under the transform H. We leverage this approximate sparsity by measuring the closeness of Hx to y with their Euclidean distance given in the second term.

2.2.
Properties. In this subsection, we discuss the convexity properties of the model (3) which will be utilized in the next subsection to characterize the solutions of the model. We first show the condition ensuring that the function J(x, y), (x, y) ∈ R n × R m , is convex with respect to (x, y), which leads to the uniqueness of the solution of model (3). Then, we will show that, when taken under some relaxing conditions, though the function J may be nonconvex with respect to (x, y), it is definitely convex with each of x and y, separately. Proof. Since the vectors u and η in (1) are positive, so is f . Hence, Φ is a continuous function on R n and is coercive, that is, lim x 2→+∞ Φ(x) = +∞ which can be verified directly. Together with the coercivity assumption of Ψ, the function J in (3) is coercive on R n × R m , that is lim (x,y) 2→+∞ J(x, y) = +∞. It implies that there exists at least one solution to the optimization problem (3).
Next, we prove that problem (3) has only one solution under the condition αβ 4 ≤ 4096 27 and convexity assumption of Ψ. To this end, it will be sufficient to show that Φ + h is strictly convex on R n × R m . By Theorem 2.4 in [20], Φ in (5) is strictly convex on R n if αβ 4 ≤ 4096 27 . That is, (8) λΦ(x) hold for all distinct points x and x in R n , and λ ∈ (0, 1). We further notice that h is convex on R n × R m and h(x, ·) is strictly convex on R m for any fixed x ∈ R n . That is, for any (x, y) and ( x, y) in R n × R m , and λ ∈ [0, 1] and for any fixed x ∈ R n , any two distinct points y and y in R m , and λ ∈ (0, 1) By (8)- (10), for all distinct pairs (x, y) and ( x, y) in R n × R m , and λ ∈ (0, 1), which, by the definition of strict convexity, implies the strict convexity of Φ + h. As a consequence, J is strictly convex on R n × R m .
Next, we explore other choices of α and β to ensure the convexity of the function Φ+ ρ 2 · 2 2 with some ρ > 0. To this end, we adopt assumptions on positive numbers α, β, and ρ: H1. The parameters α, β, and ρ are positive and satisfy one of the following three conditions: Associated with the parameters α, β, and ρ in H1, for a fixed positive number a, we define which can be viewed as the one-dimensional counterpart of Φ + ρ 2 · 2 2 . With the function g a , the role of the conditions in H1 becomes clear in the following result.
Lemma 2.1. Let H1 hold. Then the function g a given in (12) is strictly convex on R.
Proof. Note that g a (t) = a e t + α( e t a − β 2 e t a ) + ρ is not constant on any open interval of R. Then g a (t) ≥ 0 for all t ∈ R ensures the strict convexity of g a by the definition of strict convexity. Actually, for any two distinct numbers t 1 and t 2 with t 1 < t 2 and for any τ ∈ (0, 1), by the mean-value theorem, there exist ξ 1 ∈ (t 1 , τ t 1 + (1 − τ )t 2 ) and ξ 2 ∈ (τ t 1 + (1 − τ )t 2 , t 2 ) such that and Hence, The inequality in (13) holds due to g a being nonnegative and not constant on any open interval of R. We conclude that g a is strictly convex. In the rest of the proof, we just verify g a ≥ 0 on R under the conditions in H1.
If the first condition of H1 holds, then g a (t) ≥ ρ, for all t ∈ R, is a direct consequence of Lemma 2.3 in [20] (or Theorem 1 in [18]).
If the second condition of H1 holds, then g a (t) = a e t +α( e t a − β Finally, we show g a (t) ≥ 0 under the third condition of H1. Note that g a (t) = a e t q( e t a ), where q(s) = 1+αs 2 − αβ 2 s 3/2 +ρs with s > 0. Hence, g a (t) ≥ 0 for all t ∈ R if and only if q(s) ≥ 0 for all s ≥ 0. If ρ < αβ 2 16 , then the derivative of q at s is q (s) = 2αs− 3αβ 4 s 1/2 +ρ = 2( . We know that q achieves its local maximum at t 2 − with q(t 2 − ) > q(0) = 1 and has its local minimum at t 2 + with q(t 2 + ) ≥ 0 by the given condition. We conclude that q(s) ≥ 0 for all s ≥ 0.
An extension of Lemma 2.1 to a higher dimension is as follows.
Proof. For x ∈ R n , we clearly have Φ(x) = n i=1 g fi (x i ), where g a is given in (12). By Lemma 2.1, g fi are strictly convex for all 1 ≤ i ≤ n, and so does Φ.
Next result is a direct consequence of Lemma 2.2. For the matrix H in (6), we define (14) a := min{ H j 2 2 : j = 1, 2, . . . , n}, where, H j is the jth column of the matrix H. (14) and µ are positive, H is an m × n matrix, and the parameters α, β, and ρ with ρ := aµ fulfill H1, then for a given y ∈ R m , J y (x) := Φ(x) + h(x, y), the sum of the first two terms in the objective function of (3), is strictly convex with respect to each component of Then Note that all diagonal entries of H H − D are zeros, in other words, r(x, y) does not contain terms with x 2 j for all possible j. Therefore, the strict convexity of J y (x) with respect to each component of x, say x j , is completely determined by that of x 2 j and g aµ is strictly convex by Lemma 2.2, indeed, J y (x) is strictly convex with respect to each element of x for a fixed y.
Lemma 2.4. If α, β, λ, µ are positive numbers, H is an m × n matrix, and ψ in (7) is convex, then for any fixed is strictly convex. Together with the convexity of Ψ, we conclude that J(x, ·) is strictly convex with respect to y.
In the next subsection, we can apply those results developed in this subsection to characterize the solutions of the model (3).

2.3.
Characterizations. From Proposition 1, we know that the optimization problem (3) always has a solution. In this subsection, we will characterize this solution through the notions of the proximity operator and subdifferentials which are two fundamental tools in nonlinear optimization.
Recall that for ϕ : R d → (−∞, +∞] a proper and lower semicontinuous function, the domain of ϕ is defined by domϕ := {z ∈ R d : ϕ(z) < +∞}. For a given z ∈ domϕ, the Fréchet subdifferential of ϕ at z ∈ R d , denoted by ∂ϕ(z), is the set of all vectors w ∈ R d which satisfy When w / ∈ domϕ, we set ∂ϕ(z) = ∅. If 0 ∈ ∂ϕ(z), then z is called a critical point of ϕ. Fermat's rule is that if z ∈ R d is a local minimizer of ϕ then 0 ∈ ∂ϕ(z).
If ϕ 1 : R d → (−∞, +∞] and ϕ 2 : R d → (−∞, +∞] are subdifferentiable at z, then ϕ 1 + ϕ 2 is subdifferentiable at z and ∂(ϕ 1 + ϕ 2 )(z) ⊃ ∂ϕ 1 (z) + ∂ϕ 2 (z). In particular, if ϕ 1 is differentiable at z, then ∂(ϕ 1 + ϕ 2 )(z) = ∇ϕ 1 (z) + ∂ϕ 2 (z). As an application to our problem (3), we have that for (x, y) ∈ domJ (15) ∂J where ∇ x and ∇ y are gradient operators with respect to x and y, respectively. The proximity operator is another fundamental tool for developing algorithms for nonsmooth and nonconvex optimization problems. Let ϕ : R d → (−∞, +∞] be a proper and lower semicontinuous function. The proximity operator of ϕ at z ∈ R d is defined by (see. e.g., [3,22]) The proximity operator of ϕ is a set-valued operator from R d to 2 R d . If inf R d ϕ > −∞, then the set prox ϕ (z) is nonempty and compact. In addition, we say that ϕ The following result characterizes the relationship between the proximity operator and subdifferential of a proper lower semicontinuous (nonconvex) function. This proposition will serve as a basic tool for the algorithmic development of model (3).
Proof. If the vectors z, w ∈ R d satisfy (16), it follows from the definition of the proximity operator that (18) z ∈ argmin Fermat's rule implies that which is (17). Next we show the converse of the proposition. Since ϕ is supercoercive, then as v 2 → +∞. We conclude that for given z, w ∈ R d the function 1 2 ·−(z+w) 2 2 +ϕ is coercive and proper convex, and therefore by (19) it has z as one of its minimizers. Hence, (16) holds. The strict convexity of ϕ clearly implies the strict convexity of 1 2 · −(z + w) 2 2 + ϕ. It follows that 1 2 · −(z + w) 2 2 + ϕ has a unique minimizer, that is, inclusion (18) becomes z = prox ψ (z + w).
The following two examples concretely indicate that the convexity assumption on ϕ is necessary to the "converse" part of Proposition 2. For instance, in Example 2.5, even if ϕ is a concave function, the "converse" part of Proposition 2 holds provided that ϕ is convex. Example 2.6 shows that (17) can not lead to (16) if ϕ is nonconvex.
Example 2.5. Given a real number c, define ϕ(z) := cz 2 for z ∈ R. We get 0 ∈ ∂ϕ(0) and We can see that if c > − 1 2 then ϕ(z) := (c + 1 2 )z 2 is supercoercive, strictly convex. Hence, by identifying both z and w in Proposition 2 as zero, the above result is consistent with the converse part of Proposition 2. 2 z 2 is supercoercive, but, nonconvex. We can see that the conditions of the converse part in Proposition 2 are not fully satisfied and 0 / ∈ prox ϕ (0).
With above preparations, we now characterize the solutions of model (3) in terms of the proximity operators.
Proposition 3. Let α, β, ρ and λ be positive numbers, H be an m×n matrix, and ψ be a proper convex function. If α, β, and ρ satisfy H1 and the pair (x, y) ∈ R n ×R m is a critical point of the optimization problem (3), then for any σ > 0 the pair (x, y) satisfies the following coupled equations Conversely, if a pair (x, y) ∈ R n × R m satisfies (21) and then the pair (x, y) is a critical point of J in (3). Moreover, if a given in (14) is positive and the parameters α, β, and ρ with ρ := aµ satisfy H1, then (x, y) is a (local) minimizer or a saddle (but not a (local) maximizer) of J. In particular, if the first condition in H1 holds, then (x, y) is definitely a global minimizer of (3).
Proof. If (x, y) ∈ R n × R m is a critical point of the optimization problem (3), then 0 ∈ ∂J(x, y) which, by using (15), can be equivalently written as is strictly convex which, together with the first inclusion in (23) and Proposition 2, yields equation (20). Since Ψ is convex, for any σ > 0, Ψ + σ 2 · 2 2 is coercive and strictly convex. It follows from Proposition 2 that the second inclusion in (23) lead to equation (21).
Next, we show the converse of the proposition. From Proposition (2), relations (22) and (21) lead to (23), which is equivalent to the relation 0 ∈ ∂J(x, y), i.e., (x, y) is a critical point of J. Further, by Lemmas 2.3 and 2.4, the set of critical points of J will not contain any (local) maximizer. Therefore, the pair (x, y) must not be a (local) maximizer of (3). In particular, if α and β satisfy αβ 4 ≤ 4096 27 , from Proposition 1, the pair (x, y) must be the unique minimizer of (3).
We next show that a global minimizer of model (3) is a solution of the fixed-point equations (20) and (21). To this end, we recall the so-called descent lemma which is essential in our analysis.
This proof of the descent lemma can be found, see for example, in [23].
Proof. Let L := µ( H 2 2 + 1). First, one can directly verify that the gradient of h in (6) is L-Lipschitz continuous. By the Descent Lemma and the condition min{ρ, σ} ≥ µ( H 2 2 + 1), for any ( On the other hand, since the pair (x, y) ∈ R n × R m is a solution of model (3), we have that Combining the above inequalities together yields Since the above inequality holds for an arbitrary pair ( (20) and (21), respectively.
3. Convergence analysis. The coupled fixed-point equations (20) and (21) presented in Proposition 3 naturally leads to an iterative scheme to find a critical point of the objective function of model (3). This scheme is as follows: Set an initial estimate (x 0 , y 0 ) ∈ R n × R m , iterate for k = 0, 1, ..., (24) x ). We rewrite iterative scheme (24) in a compact form. For the positive parameters ρ and σ, proper lower semi-continuous function Φ and convex function Ψ, we define a set-valued mapping T : R n+m → 2 R n+m at u := (x, y) ∈ R n × R m by (25) T (u) := prox 1 ρ Φ (x) × prox 1 σ Ψ (y). And we define and R := R 1 + R 2 .
With the above notation, the iterative scheme (24) can be rewritten in a compact equivalent form as We next analyze convergence of the iterative scheme (24). Proof. Note that the mapping T consists of the two proximity operators. The conclusion can be directly followed by the fact that the proximity operator of a proper lower semi-continuous function is continuous [7].
The generated sequence is denoted by To prove the convergence of the sequence W, the following lemmas are useful.
Lemma 3.2. Let α, β, ρ, λ, σ, and µ be positive numbers, H be an m × n matrix, ψ be a proper convex function, and W be the sequence generated by the iterative scheme (24). If the following conditions hold: (i) the sequence W is bounded, and (ii) lim k→∞ u (k+1) − u (k) 2 = 0, then there exists a subsequence of W that converges to a solution of the coupled relations (20) and (21).
A direct calculation shows that From (27), (3), and the above two inequalities, we get , which, by the cosine rule and the inequality Since both Φ and Ψ are bounded below, when µ < σ and µ H 2 2 < ρ inequality (28) implies that item (i) holds, and the sequence {J(u (k) ) : k ∈ N} is decreasing and convergent. Hence, from inequality (28) again, we can further conclude that item (ii) holds as well.
Theorem 3.4. Let α, β, ρ, λ, σ, and µ be positive numbers, let H be an m × n matrix, let ψ be a proper convex function on R m , and let W be the sequence generated by the iterative scheme (24). If µ < σ and µ H 2 2 < ρ, and J has only finite number of critical points on R n × R m , then the sequence W converges to a critical point, say (x, y), of J. Moreover, if the number a given in (14) is positive and the parameters α, β, and µ satisfy H1 (with ρ in H1 being aµ), then (x, y) is a (local) minimizer or a saddle (but not a (local) maximizer) of J. In particular, if the first condition of H1 holds, then (x, y) is a solution of model (3).
Proof. By Lemma 3.3 (ii), the sequence W is bounded and lim k→∞ u (k+1) − u (k) 2 = 0. And by Lemma 3.2, there exists a subsequence {u kj : j ∈ N} of W converges to a solution of the coupled relations (20) and (21), denoted by u := (x, y). Then the subsequence {u kj +1 : j ∈ N} also converges to u. By the iteration scheme (26) and Lemma 3.1, we get that u is a fixed-point of T • R. Therefore, all accumulation points of sequence W are fixed-points of mapping T • R. Since the number of critical points of J is finite on R n × R m , we have that the number of accumulation points of W is finite. Recalling the result of point set topology that if W is bounded and lim k→∞ u (k+1) − u (k) = 0, the set of accumulations of sequence W is connected, the sequence W has only one accumulation point. So, the sequence W converges to u, the fixed-point of T • R. And then, we can complete the proof by directly applying Proposition 3.

4.
Algorithm for the proposed model regularized by TV. In this section, we restrict ourselves on the selections of Ψ (i.e., ψ) and H arising from the TV norm [26] although many other selections, for example, ψ is the 1 norm and H is a matrix from a tight framelet system, can be considered. For ease of exposition, we assume the vector z ∈ R n is formed from a square image of size √ n × √ n by sequentially concatenating the columns of the image.
Let D denote the √ n × √ n matrix defined by the equation and choose H to be an m × n matrix with m = 2n given by (29) H := I √ n ⊗ D D ⊗ I √ n where I √ n is the √ n × √ n identity matrix and the notation P ⊗ Q denotes the Kronecker product of matrices P and Q. It is known in [21] that H 2 2 = 8 sin 2 ( √ n−1)π 2 √ n (see, e.g., [21]). From the structure of H, one can easily get a = min{ H j We further define ψ : R m → R in (7) at y ∈ R m as which is convex.
Lemma 4.1. Let α, β, µ, λ are all positive numbers, if ψ is defined by (30) and H is an 2n × n matrix, then the function J defined by (4) has only finite number of critical points.
being an exponent-based function, equation (31) with respect to x has a finite number of roots on Ω; hence, by relation (32), there are finite number of corresponding y-solutions. The finite number of solutions of the coupled equations (31) and (32) on Ω × R m , together with the fact that there are at most two critical points for J on (Ω 1 ∪ Ω 2 ) × R m , implies that J(x, y), (x, y) ∈ R n × R m has only finite number of critical points. This completes the proof.
We remark that the above Lemma 4.1 also holds for many other popular selections of ψ and H, e.g., ψ being the 1 norm and H being a matrix from a tight framelet system. Theorem 4.2. Let α, β, ρ, λ, σ, and µ be positive numbers, H be an 2n × n matrix, and W be the sequence generated by the iterative scheme (24). If µ < σ and µ H 2 2 < ρ, and ψ is a proper convex function defined by (30), then the sequence W converges to a critical point, say (x, y), of J in model (3). Moreover, if a given in (14) is positive and the parameters α, β, and µ satisfy H1 (with letting ρ in H1 be aµ), then (x, y) is a (local) minimizer or a saddle (but not a (local) maximizer) of J. In particular, if the first condition of H1 holds, then (x, y) is a solution of model (3).
Proof. The result can be readily obtained by Lemma 4.1 and Theorem 3.4.
In summary, based on the iterative scheme (24) and Theorem 4.2, we end up with Algorithm 1 for model (3).

Algorithm 1 Fixed-point algorithm based on the proximity operators for model (3).
Input: noisy image f > 0 in R n ; parameters λ > 0, µ, β > 1; α > 0, Initialization: x (0) and y (0) = 0; positive numbers σ and ρ such µ < σ and ρ µ > 8 sin 2 ( , until converges or satisfies a stopping criteria. Write the output of x (k+1) from the above iteration as x. The restored image is u = e x . Implementing Algorithm 1 requires the availability of a convenient way of calculating the two proximity operators, prox 1 ρ Φ and prox λ σ ψ . Firstly, we present the explicit form of prox λ Secondly, we describe an algorithm to evaluate the operator prox 1 ρ Φ at x ∈ R n . Let u = prox 1 ρ Φ (x). Then by the definition of the proximity operator, z is a solution of the following equation ρ(z − x) + ∇Φ(z) = 0. For the parameters α, β, and ρ fulfilling assumption H1, the above equation has a unique solution by Lemma 2.2; thus we can determine the solution efficiently by Newton's method.

5.
A review of several existing models. To prepare a comparison study on the performance of our proposed model (3) for multiplicative noise removal with existing models in the next section, several state-of-the-art models and the associated algorithms will be briefly reviewed in this section. The models included in our study are the DZ model in [11], the TwL-mV model in [17], the I-DIV model in [29], and the HMNZ model in [14].
where Y := {q ∈ R 2n : max i∈Nn q 2 i + q 2 i+n ≤ 1}. The vector p (k+1) is actually the projection of p (k) +λβH u (k) onto the convex set Y , which can be easily obtained. As suggested in [11], the vector u (k+1) was approximated by using the Newton method following with one projection step onto the convex set X.
The TwL-mV model in [17] is a two-level relaxation of the mV model (see [32]) by using the mth root transformation [32] and the concave conjugate. It is where λ > 0, m ≥ 1, s ≥ 1, U := [ , C] n , and m √ U = [ m √ , m √ C] n with both and C being positive, H and ψ are given by (29) and (30), respectively.. The following algorithm was suggested for solving (37) in [17], It is easy to see that a (k+1) = m (v (k) ) s . The update v (k+1) is approximated by the following iterative scheme where Proj m √ U is the projection operator onto the set m √ U and 1 ≤ ≤ T for a pre-given positive integer T . The output v (k,T +1) will be considered as v (k+1) in the second step of (38).
The I-DIV model in [29] reads as (40) min u − f log u, 1 + λψ(Hu), u ∈ R n + , where λ > 0, H and ψ are given by (29) and (30), respectively. This model can be equivalently written as The model above is solved by using the alternating direction method of multipliers (ADMM) in the following way: The HMNZ method in [14] is a learned dictionary based de-speckling method. The first phase of the method is to learn a dictionary by using the K-SVD method. Let D be the learned dictionary obtained from log f , R i,j ∈ R d×n is the matrix corresponding to the extraction of the patch located in (i, j), and Dα i,j be the sparse dictionary representation obtained for each patch R i,j log f . Define where λ, µ > 0, and Then the second phase is to solve the following variational model: where H and ψ are given by (29) and (30), respectively. And the model can be solved by using Chambolle-Pock algorithm in the following way: 6. Experiments. In this section, we illustrate the effectiveness of the proposed model (3) and the corresponding algorithm (i.e., Algorithm 1) in producing highquality images for images corrupted by multiplicative noise. In our experiments, we use test images of "Cameraman", "Lena", "Peppers" with size of 512 × 512, "Remote1" with size of 768 × 574 and "Remote2" with size of 632 × 540, as shown in Fig. 1. To generate the observed images f in (1), we degrade the original test images by multiplicative Gamma noise at various noise-levels. In order to have a through comparison of the performance of the proposed model (3) and the corresponding algorithm, we will compare our method with some state-of-the-art methods, the DZ method in [11], the TwL-mV method in [17], the HMNZ method in [14] (which remarkably outperforms the BS [5], DFN [12], DRSM [29], and NL [30] methods in terms of the quality of the restored images, as discussed in [14]) and the I-DIV method in [29]; and these four methods were reviewed shortly in the previous section. All the algorithms are performed under Windows 7 using MATLAB 7.0(R14) on a Intel Dual-Core i5-4570 CPU 3.20 GHz PC with 4.0G RAM memory. The execution time required in each method is measured by seconds. We terminate the tested algorithms when u (k+1) −u (k) 2 / u (k) 2 ≤ 3×10 −4 . Here u (k) is the resulting image of the kth iteration produced by the underlying algorithm. The quality of the results is evaluated in terms of the peak signal-to-signal ratio (PSNR) in dB as PSNR = 10 log 10 where u is the original image with n the total number of pixels and u is the recovered image by a testing algorithm. Two types of experiments are designed. Experiment 1 will discuss the influence of initialization x (0) for the performance of our proposed Algorithm 1. Experiment 2 will focus on the comparison of our proposed model (3) with others, namely, the DZ, TwL-mV, HMNZ and I-DIV methods.
6.1. Experiment 1. In this subsection, we present a numerical study on how an initial estimate x (0) affects the performance of our Algorithm 1.   [25]. We note that the BM3D represents the state-of-the-art algorithm for recovering images affected by white Gaussian noise [6] and SAR-BM3D is a new version of BM3D which is adapted to denoise SAR images by taking into account their peculiar features. By using x (0) = log(f ) as the initial estimate, the parameters involved in Algorithm 1 are chosen differently for multiplicative noise at different levels to obtain the convergently stable PSNR values as high as possible. The values of these parameters are listed in Table 1. With these parameters, we can readily verify that model (3) is convex for L = 10 and 6, but is nonconvex for L = 4 and 2. Fig. 2 illustrates the curves of  Table 1. According to Theorem 4.2, Algorithm 1 may converge to a local minimizer or a saddle, but not a local maximizer of J. To this end, three different initializations, x (0) = 10ones(size(f )), x (0) = 15ones(size(f )), x (0) = 50ones(size(f )) for L = 4 and two different initializations, x (0) = 15ones(size(f )), x (0) = 50ones(size(f )) for L = 2 are tested. Here ones(size(f )) is the matrix whose entries are ones and has the same size as f . Fig. 3 illustrates the curves of PSNR values of the updates against the number of iterations. One can see that for L = 4, Algorithm 1 with x (0) = 10ones(size(f )) converges to a stable PSNR of -47.43 dB while Algorithm 1 with x (0) = 15ones(size(f )) and x (0) = 50ones(size(f )) converges to a stable PSNR of -84.52 dB; for L = 2, Algorithm 1 with x (0) = 15ones(size(f )) and x (0) = 50ones(size(f )) uniformly converges to a stable PSNR of -122.58 dB. All stable PSNR values for L = 4 and 2 are different from the stable values in the third and fourth subfigures of Fig. 2(a), respectively. These results confirm that J with  the parameters in Table 2 has more than one critical points for L = 4 and 2; and the restored images may be sensitive to the selected initial estimates. Experimentally, we find that, for L = 4, 2, if the value of each entry of x (0) is restricted to the interval (−∞, log 255], Algorithm 1 can consistently converge to their respective fixed stable PSNR values as shown in the third and fourth subfigures of Fig. 2(a).
6.2. Experiment 2. Motivated by the observations in our first experiment, we use log(SAR-BM3D(f )) as the initial estimate and re-select the parameters for Algorithm 1 to make the PSNR value of a restored image as high as possible under the prescribed tolerance TOL = 3 × 10 −4 . These parameters are listed in Table 2.
For the purpose of illustration, the solid lines record the PSNR value (in Fig. 4(a)) and the relative error (in Fig. 4(b)) of the denoised image at each iteration for different noise-levels, L = 10, 6, 4, and 2 up to the first 12000 iterations.  Fig. 4 are generated from Algorithm 1 with log(f ) as the initial estimate. Table 3 reports the PSNR values of the noisy images restored through our proposed Algorithm 1 with using log(SAR-BM3D(f )) as the initial estimate, the I-DIV method, the DZ method, the TwL-4V method, the HMNZ method, and SAR-BM3D for multiplicative noise removal. Correspondingly, Table 2 lists the values of parameters for our method, the I-DIV method, the DZ method, the HMNZ method, and Table 3. PSNR (dB) and CPU time (s) for I-DIV [29], DZ [11], TwL-4V [17], HMNZ [14], our algorithm (Algorithm 1 with x (0) = log(SAR-BM3D(f ))), and SAR-BM3D [25] for test images of Fig. 1 corrupted by multiplicative noise with L = 10, 6, 4, 2, respectively. the TwL-4V method at various noise levels. We can see that our method outperforms the other five state-of-the-art methods in terms of PSNR values. Especially for the low level noisy case L = 10, it is significantly superior to the I-DIV method, the DZ method, the TwL-4V method and the SAR-BM3D, about 1.0 dB. We remark that unlike the formulation of multiplicative noise in model (3), the DZ model, the TwL-4V model, the I-DIV model and HMNZ model, the multiplicative noise model discussed in SAR-BM3D [25] follows a square-root Gamma distribution. This is the reason why the PSNR values of the noisy images at various noise levels in Table 3 in present paper are inconsistent with those in Table I in [25] (eg. see the results of "Lena" image in both tables). When focusing on the CPU time of the algorithms in Table 3, we see that our Algorithm 1 is time consuming in comparison with the other four methods. It is due to the fact that the total execution time of our Algorithm 1 consists of two components: the time for initializing x (0) by running SAR-BM3D on f and the subsequent time for the developed fixed-point iterative scheme, as listed in Algorithm 1; and the SAR-BM3D holds a large proportion of the time while the fixed-point iterative scheme is very fast. For visual comparisons, the denoised images of "Cameraman" and "Remote1" by various methods are displayed in Figs. 5 and 6. We observe that the restored images by our proposed method have less artifacts than those by I-DIV, DZ, TwL-4V and HMNZ, can keep sharp edges and fine details (e.g., see the background and the remote columnar building in "Cameraman" image).

7.
Conclusion. This paper introduces a new variational model for multiplicative noise removal by exploring approximate sparsity of underlying images. In order to solve the proposed model, we first study the convexity of the objective function of the model. We give the conditions that can ensure the convexity of the model. And then we relax the conditions such that the model may be nonconvex; and we show that the critical points of the objective function of the (noncovex) model can be viewed as the solutions of a coupled fixed-point equations in terms of the proximity operators. In particular, under certain conditions, we show that the global minimizer of the model is a solution of the fixed-point equations. Based on the fixed-point equations, an proximity algorithm is developed; and we prove that the sequence generated from the algorithm converges to one of the critical points of the the objective function of the model for some specific cases. Furthermore, We examine the suitability of the model and the corresponding algorithm for restoring images contaminated by multiplicative noise. Our numerical results indicate that our method evidently performs better than existing state-of-the-art methods in terms of PSNR values.