A PENALTY DECOMPOSITION METHOD FOR NUCLEAR NORM MINIMIZATION WITH l 1 NORM FIDELITY TERM

. It is well known that the nuclear norm minimization problems are widely used in numerous ﬁelds such as machine learning, system recognition, and image processing, etc. It has captured considerable attention and taken good progress. Many researchers have made great contributions to the nu- clear norm minimization problem with l 2 norm ﬁdelity term, which is just able to deal with those problems with Gaussian noise. In this paper, we pro- pose an eﬃcient penalty decomposition(PD) method to solve the nuclear norm minimization problem with l 1 norm ﬁdelity term. One subproblem admits a closed-form solution due to its special structure, and another can also get a closed-form solution by linearizing its quadratic term. The convergence re- sults of the proposed method are given as well. Finally, the eﬀectiveness and eﬃciency of the proposed method are veriﬁed by some numerical experiments.

1. Introduction. The nuclear norm minimization problem with linear equality constraints can be formulated as where X ∈ R m×n is an unknown matrix, A : R m×n → R p is a linear map and b ∈ R p . In (1), X * is the nuclear norm of matrix X [19], which is defined as 696 DUO WANG, ZHENG-FEN JIN AND YOULIN SHANG the sum of its positive singular values. Assume that the matrix X has r positive singular values of σ 1 ≥ σ 2 ≥ · · · ≥ σ r > 0, then its nuclear norm is formulated as In some sense, (1) is the tightest convex relaxation of the matrix rank minimization problem [19] min X∈R m×n rank(X), s.t. A(X) = b. (2) The nuclear norm minimization problem (1) is convex, which is equivalent to a semidefinite programming(SDP) problem, and admits a lot of effective algorithms such as SeDuMi [25] and SDPT3 [28]. Unfortunately, both solvers are not valid for solving a higher dimensional problem. Subsequently, Cai et al presented a singular value thresholding(SVT) [8] algorithm overcoming the aboved disadvantage to solve the following problem min X∈R m×n τ X * + where τ ≥ 0 is a given parameter, and . F is the Frobenius norm. Obviously, the solution to (3) converges to the one of (1) as τ → ∞. Nevertheless, when the matrix is not very low rank, the efficiency is not remarkable. Afterwards, a fixed point continuation with approximate SVD(FPCA) [19] method of Ma et al is applied to solve min X∈R m×n τ X * + Moreover, Toh and Yun introduced an accelerated proximal gradient(APG) [26] method for solving the Lagrangian version of (4). More recently, Yang and Yuan developed an alternating direction method(ADM) [31], where the closed-form solutions of some subproblems can be easily obtained by using a linearized technique. The other ADMM type algorithms can be seen in [29], [10], [17], etc. In many applications, the measurement b in (1) may be contaminated by a small amount of noise. In consideration of noise, quite a few researchers have presented two typical noisy models with l 2 norm fidelity term, which can deal with the Gaussian noisy problem effectively. One is the inequality constrained nuclear norm minimization problem where δ ≥ 0 is a parameter reflecting the level of noise. Obviously, model (5) reduces to (1) when δ = 0.
The other one is the nuclear norm regularized least square problem where γ ≥ 0 is a regularization parameter. When the γ and δ are chosen appropriately, the model (5) is equivalent to (6). Unfortunately, it was demonstrated that l 2 norm fidelity term is less valid for non-Gaussian additive noise because it may amplify the influence of noise [21]. As we know, the recovering low rank matrix problems are widely used in the domains of system identification [19], [22], [12], machine learning [14], [13], [30] and video denoising [15]. In addition, the impulsive noise, Gaussian noise and their mixture widely exist in the corrupted images and videos [1], [20]. Therefore, it is necessary to find a substitutable formulation to overcome the disadvantage of the l 2 norm fidelity term. The literatures [2], [6], [9], [23] have demonstrated that the l 1 norm fidelity term is particularly efficient for solving non-Gaussian additive noise such as impulsive noise. Thus, in this paper, we consider the nuclear norm minimization problem with l 1 norm fidelity term, namely where the observation b ∈ R p might contain some noise and ν > 0 is a penalty parameter. It is well known that when the penalty parameter ν is less than a certain threshold, the l 1 fidelity term makes problem (7) into an exact penalty function problem and reduces it to problem (1). The model (7) can solve the problems not only with impulsive noise but also Gaussian noise or their mixture efficiently. Given the two non-smooth terms in the objective function, we transform the convex and non-smooth objective function into variable separated form by introducing an auxiliary variable. Here we introduce an auxiliary variable r, the problem (7) is converted equivalently into min X,r Then we construct an efficient algorithm to solve the problem (8) according to its separable structure.
Recently, Lu and Zhang [18] introduced a penalty decomposition(PD) method for solving the general rank minimization problem and tested its performance by the matrix completion and nearest low-rank correlation matrix problems. Inspired by the well performance of their method, in this paper, we extend it to solve the problem (8) and compare it with the well known algorithm FPCA. Since both of the subproblems admit closed-form solutions and the block coordinate descent (BCD) method or the Gauss-Seidel method performs well if each subproblem can be solved efficiently [18]. We apply the BCD method to minimize each penalty subproblem.
Preciously the convergence of the BCD method was studied in [18], [4], [5], [27]. If the objective function f is strictly convex differentiable, then the sequence generated by this method converges to a stationary point of f [5]. More recently, in [4], the global sublinear rate of convergence of the block coordinate gradient projection algorithm for smooth convex programming problems has been established. In nonconvex and nonsmooth setting, Attouch et al. [3] studied the convergence properties of the proximal-forward-backward (PFB) applied for minimizing a nonconvex and nonconvex function f . Moreover, based on the KL property of f , Bolte et al. [7] established the global convergence results for a proximal alternating linearized minimization (PALM) algorithm for nonconvex and nonsmooth minimization problems. The problem (8) is convex and nonsmooth, so the convergence of our proposed algorithm can be easily obtained according to the literature [7]. Finally, the efficiency of the proposed algorithm is demonstrated by some numerical experiments.
The paper proceeds as follow. In Section 2, we will apply the PD for solving the equality constrained problem (8) and give the convergence result. The numerical results are presented in Section 3. Finally, we conclude the paper with some remarks in Section 4.

2.
Algorithm. In this section, we will construct our PD method for solving the problem (8) and give the convergence result of the proposed method.
2.1. Penalty decomposition method for problem (8). In this subsection , we mainly focus on solving the problem (8) via PD method. Given a parameter µ > 0, the corresponding quadratic penalty function of (8) is where µ is a penalty parameter. Because of the strong convexity, the solution of penalty problem (9) is unique. In addition, as µ → ∞, the solution of problem (9) converges to the one of (8). Now we apply the block coordinate decent(BCD) method to minimize the penalty problem (9), i.e., For simplicity, we use l and k represent the inner iteration index and the outer iteration index respectively. Given {(X k l , r k l )}, the main steps are to solve the both penalty subproblems, which can be described as follows: Firstly, for fixed {(X k l , r k l )}, the minimizer X k l+1 of (9) with respect to X is given by Obviously, its analytic solution is not clear due to the linear mapping A. Thus we linearize the last quadratic term to get a closed-form solution of the above subproblem. Let G(X) , it is clear to see that G(X) is a quadratic function. Then let g k be the gradient of G(X) at X k l , which can be represented as Hence, the X-subproblem (11) can be approximately solved by the following model To get the exact solution of (15), let Y = X k l − τ g k , then the solution of Xsubproblem can be reformulated as According to the theorem in [8], [19], we need to compute the singular value decomposition(SVD) of matrix Y ∈ R m×n of rank r firstly, i.e., where U and V are m×r and r ×n matrices respectively with orthonormal columns, and the σ i is the i − th positive singular value of matrix Y . Then the X k l+1 can be written directly as where D τ /µ (Y ) = U (Σ − τ /µI) + V T , and (t) + := max{t, 0}. Secondly, for fixed X k l+1 , we can obtain r k l+1 by minimizing (9) with respect to r, that is where Shrink(., .) is the well known vector shrinkage [19].
, and start the outer loops.
In light of all the analysis above, we give the following framework of the PD method for solving problem (8). Here we note it as P D − l 1 .
Step 1. Set l = 0 and find an approximate solution (X k , r k ) by the following steps (1)-(4): (1) Compute X k l+1 via (17) with fixed X k l and r k l .
Remark 1. The above algorithm is inspired by the PD method in literature [18]. Differently, we extend the PD method to solve the model (8), which can solve the problems not only with impulsive noise but also Gaussian noise or their mixture efficiently. One subproblem admits a closed-form solution due to its special structure, and another can also get a closed-form solution by linearizing its quadratic term.
Remark 2. For the above algorithm, the terminated condition can also be the relative error between the original matrix M and the optimal solution X * produced by the P D − l 1 , that is for some > 0.

Convergence analysis.
In this subsection, we firstly give the convergence of the inner iterations of P D − l 1 . Since the quadratic penalty function of (8) is convex and nonsmooth, it can be seen as a special case of the Problem (M ) [7] with f (X) = X * , g(r) = 1 ν r 1 , H(X, r) = µ 2 A(X)−b−r 2 2 . Obviously, the objective function satisfies the KL property. Moreover, both functions of X → H(X, r)(for fixed r) and r → H(X, r)(for fixed X) are C 1,1 . Hence, from Theorem 1 in [7], we give the following convergence theorem for inner iterations of P D − l 1 without proof.
Theorem 2.1. Let the sequence (X l , r l ) be generated by the above BCD method which is assumed to be bounded. Then, (i) The Sequence (X l , r l ) has finite length, that is (ii) The Sequence (X l , r l ) converges to a optimal solution (X * , r * ) of the problem (8).
Next, by the theory of the penalty function method [11], [24] and convergence results for the PD method in [18], we describe the convergence property of the proposed algorithm. The following convergence result shows that the proposed method can solve the problem (7) effectively.
Theorem 2.2. Let = 0 in P D − l 1 . Suppose that the solution to the equality constrained problem (8) exists and let the sequence (X k , r k ) be generated by the above P D − l 1 method. Then, for any accumulation point (X * , r * ) of the sequence {(X k , r k )}, (X * , r * ) is an optimal solution of (8).
Proof. Let (X,r) be an optimal solution of problem (8), that is Since {(X k , r k )} minimizes P µ k (X, r) for each k, we have that which leads to the inequality By rearranging this expression, we obtain Suppose that (X * , r * ) is an accumulation point of {(X k , r k )}, so that there is an infinite subsequence K such that lim k∈K,k→∞ (X k , r k ) = (X * , r * ).
By taking the limit as k → ∞, k ∈ K, on both sides of (20), we obtain where the last equality follows from lim k→∞ µ k = +∞. Therefore, we have that A(X * ) − b − r * = 0, so that (X * , r * ) is feasible. Moreover, by taking the limit as k → ∞ for all k ∈ K in (19), we have by nonnegativity of µ k and of A( Since (X * , r * ) is a feasible solution whose objective value is no larger than the optimal minimizer (X,r), we conclude that (X * , r * ), too, is an optimal solution.
3. Numerical experiments. In this section, we carry out some numerical experiments to illustrate that the model (7) with l 1 -fidelity can solve different noise cases and perform better than the models with l 2 -fidelity. Firstly, we test the P D − l 1 method against the state-of-art algorithm FPCA for solving a series of matrix completion problems with different noise cases. Then, we test a gray image to graphically illustrate the practical performance of P D − l 1 . Finally, we apply the proposed P D − l 1 to solve the problem (7) with different experimental settings. All experiments are performed under Windows 7 premium and MATLAB 2016a running on a Lenovo laptop with an Intel core CPU at 2.5 GHz and 4 GB memory.
In all performance assessments, we first generate a real low rank matrix M = M L M T R , where matrix M L ∈ R m×r and M R ∈ R n×r are generated with independent identically distributed Gaussian entries. We produce M via the Matlab script "randn(m, r) × randn(r, n)". Obviously, the rank of M is r. We use sr = p/mn to denote the sampling ratio, where p is the number of measurements. Additionally, the number of degree of freedom for a real-valued rank r matrix is defined by dr = r(m + n − r). In all tests, the quadratic penalty parameter µ is chosen as µ = 2.5/min(m, n) by experience [31]. Furthermore, we set the approximate parameter τ = 1.0 for matrix completion problems. In the noisy cases, the measurement b is produced by b = A(X) + ω G + ω I , where ω G is the additive Gaussian noise of zero mean and standard deviation σ, which can be generated by ω G = σ × randn(p, 1) , and ω I is the impulsive noise, which can be set to ±1 at N random positions of b, where N = m × n × q, and q = 0.01, 0.001, 0.005 can be chosen respectively. For each test, we use X * to represent the optimal solution produced by our algorithm, and use relative error(RelErr) to measure the quality of X * for original matrix M , i.e., We say that M is recovered successfully by X * if the corresponding RelErr is less than 10 −3 , which has been used in [19], [8], [16].
3.1. Test on matrix completion problems. In this subsection we present some numerical results to illustrate the feasibility and efficiency of the proposed algorithm.

DUO WANG, ZHENG-FEN JIN AND YOULIN SHANG
For our purpose, we test our proposed method by using the nuclear norm matrix completion problem with l 1 -fidelity, which can be formulated as In order to assess the performance of P D − l 1 , we test it against the state-of-art algorithm FPCA used to solve the nuclear norm matrix completion problem with l 2 -fidelity: Firstly, we test the two problems of (22) and (23) with three noisy cases: impulsive noise only, Gaussian and impulsive noise, and Gaussian noise only. We set the parameter maxit = 150 to observe their performance more clearly. The test results are displayed in Figure 1. Observing the Figure 1, it is clear to see that the model (22) with l 1 -fidelity can be dramatically better than (23) whenever the observation data may contain Gaussian noise, impulsive noise or their mixtures. Particularly, it is clear that P D − l 1 can recover the matrix successfully via solving the model (22), but the model (23) is inefficient for the data with impulsive noise. Moreover, the accuracy of solution for model (22) can attain 10 −4 within 150 iterations, which is a little higher than model (23).
Secondly, we test the performance of PD method for the model (22) and (23) with different sr and r. Numerical results can bee seen in the Figure 2. In the first row of Figure 2, we set m = n = 500 and r = 10 with different sr. It can be seen that both models work successfully for the matrix completion problem with noiseless. Moreover, it is clear that the P D − l 1 for model (22) converged faster than the model (23) except for the case sr = 0.3 and can get an approximate exact solution, which is less than 10 −10 . Observing the second row of Figure 2, we can see that both models perform well with r increasing. The model (22) can get a higher accuracy as 10 −10 but the model (23) is not able to attain a high accuracy, which just attains a relatively middle accuracy. Moreover, from the Iter point of view, model (22) clearly saves much number of iterations. Take everything together, it can be concluded that the model (22) is a better choice if a higher accuracy of solution is needed.
To end this subsection, we test the P D − l 1 method for recovering a real gray image to illustrate its superiority. Numerical results can bee seen in the Figure 3. In this experiment, we test a real 512 × 512 gray image (a), which has been widely used in many simulations. Firstly, we get a low rank image (b) by applying matrix SVD. Then we randomly select 70% samples from the low rank image and add some different kinds of noise to obtain the corrupted images (c) and (e). Finally, we get the recovered images (d) and (f) via solving the matrix completion model (22) by using the proposed P D − l 1 . Furthermore, the relative error is RelErr=4.760e − 3, 4.827e − 3 respectively. Comparing (d) with (b), it is clear to see that the proposed P D − l 1 is efficient for recovering the corrupted images with impulsive noise. In addition, (f) shows that the P D − l 1 recovers the corrupted images with Gaussian noise and impulsive noise successfully. This test illustrates that the proposed P D−l 1 performs well on real image restoration.  Figure 1. Numerical comparisons of model (22) and (23) with three noisy cases(m=n=500, r=10, sr=0.5).
3.2. Nuclear norm minimization with l 1 -fidelity term. In this subsection, we report the efficiency of the P D − l 1 for solving the problem (7) with different experimental settings. Firstly, we test P D − l 1 method for solving problem (7) with impulsive noise. In this test, we choose m = n from 100 to 1000 and set sr = 0.5, r = 5, q = 0.01. The test results are given in Table 1. From Table 1, we can see that the accuracy of the solution is 10 −4 . In other words, the problem is solved successfully. The limited numerical results demonstrate that the proposed P D −l 1 algorithm can effectively solve the nuclear norm minimization problem with impulsive noise. Secondly, we test the performance of P D − l 1 method for solving problem (7) with different sr. The numerical results can be seen in Figure 4. From the Figure  4, we can see that as the higher the sampling ratio is, the higher accuracy can be obtained.
Finally, in order to further reveal the efficiency of the P D − l 1 method, we test P D − l 1 to solve the problem (7) with different noises. The test results are given in Table 2 Figure 2. Numerical comparisons of model (22) and (23) with noiseless(m=n=500). First row: r = 10, sr is 0.3, 0.5, 0.7 respectively. Second row: sr = 0.5, r is 5, 15, 30 respectively.
successfully, especially for the impulsive noise and mixture noise. So base on the above analysis, we can conclude that the P D − l 1 method is efficient and promising for the problem (7).    (7) with impulsive noise: m=n=500, r=10, sr=0.5, 0.7, 0.9, q=0.01. Table 2. Numerical results of P D − l 1 for (7), m=n, sr=0.5. 4. Concluding remarks. In this paper, we first proposed a P D − l 1 method for solving the nuclear norm minimization problem with l 1 norm fidelity term. The proposed method mainly minimized the quadratic penalty function of the problem (7) by alternatively solving the resulting penalty subproblems on X and r. One subproblem admits a closed-form solution due to its special structure, and the other subproblem on X can also get a closed-form solution by linearizing its quadratic term. Convergence analysis of the proposed method has been given. Then we presented some numerical experiments with different experiment settings to illustrate the performance of the proposed algorithm. Numerical results show that the l 1fidelity model is better than the l 2 one when the observation data contains some impulsive noise or the mixture with Gaussian noise. Moreover, the recovery of some corrupted images have further illustrated that the P D − l 1 method is promising and practical. Here we devote to illustrating the efficiency of the l 1 -fidelity model for the low rank minimization problem with non-Gaussian noise, so all the numerical experiments use full SVD. In addition, methods based on augmented Lagrangian methods and ADMM can be applied as well to the problem (7). In the near future, we will improve our proposed method with some efficient acceleration techniques and study on other efficient algorithms for problem (7).