HOMOTOPY METHOD FOR MATRIX RANK MINIMIZATION BASED ON THE MATRIX HARD THRESHOLDING METHOD

. Based on the matrix hard thresholding method, a homotopy method is proposed for solving the matrix rank minimization problem. This method iteratively solves a series of regularization subproblems, whose solutions are given in closed form by the matrix hard thresholding operator. Under some mild assumptions, convergence of the proposed method is proved. The proposed method does not depend on a prior knowledge of exact rank value. Numerical experiments demonstrate that the proposed homotopy method weakens the af-fection of the choice of the regularization parameter, and is more eﬃcient and eﬀective than the existing sate-of-the-art methods.

1. Introduction. In this paper, we consider recovering a low rank matrix under a part of linear measurements. Generally, undersampling makes it impossible to find an accurate matrix. However, the information that the matrix has a low rank structure makes it be recovered. So the interesting problem could be formed as the matrix rank minimization problem min X rank(X) s.t. A(X) = y, where X ∈ R m×n is an unknown variable, A is a linear map: R m×n → R q (q mn and rank(X) min{m, n}), and y ∈ R q is the problem data. Since this problem extracts low dimensional structures from high dimensional data, it has many applications in different fields, such as system identification [7,14,19], optimal control [6], and signal processing [21,24]. Moreover, the problem becomes especially interesting when it has increasingly massive data.
A special case of problem (1) is the matrix completion problem, min X rank(X) s.t. X ij = M ij , ∀(i, j) ∈ Ω. (2) and its regularization version is In problem (4), λ > 0 is a regularization parameter which balances the sampling error and the nuclear norm, and || · || 2 denotes the standard Euclidean norm. Several algorithms have been proposed for solving problems (3) and (4). Liu et al. [14] proposed interior-point methods to solve a semidefinite programming version of problem (3). However, their methods could not be applied to large scale problems. Then some first-order methods were proposed, including the Fixed Point Continuation (FPC) method [8,18], and the iteratively reweighted nuclear norm algorithms [15,19,20].
Amounts of work from the theoretical aspect shows that, under some assumptions, the nuclear norm minimization (3) could solve the matrix rank minimization problem (1) with high probability [4,5,21]. However, the corresponding methods solve the relaxation problem and are restricted to those assumptions to guarantee effective. Hence, there have been lots of methods [1,3,9,11,12,13,17,22,23] solving the low rank problem without the nuclear norm relaxation. In fact, most of them solve the rank-constrained problem min X∈Cr where C r = {X ∈ R m×n : rank(X) ≤ r}. That is to say, the exact rank r must be known prior. However, in some cases, we just know that the matrix is low rank, but do not know the exact rank value. Hence we will consider in this paper how to solve the low rank minimization problem without knowing the rank value.
In [17], the Penalty Decomposition methods (PD) were proposed for the general matrix rank minimization problems with the rank function appearing either in the constraints or in the objective function. In the latter case, the PD method solves the low rank minimization problem (1) without knowing the rank value.
To the best of our knowledge, except the PD method, there is no other method which does not need to know the rank value prior and solves the low rank minimization problem using the rank function directly. In this paper, we propose a homotopy method (denote it as HIHT) for the low rank minimization problem (1) by solving a series of regularization problem, which is where λ > 0 is a regularization parameter. This method does not require a rank value prior. Though problem (6) is not easy to solve, we could linearize the first term and turn to iteratively solve some subproblems, which have closed form solutions based on the matrix hard thresholding operator (for more details refer to Section 2). For the regularization parameter λ, we adopt the homotopy technique to avoid the difficulty of choosing a suitable value of λ. Under some mild assumptions, convergence of the proposed method is proved. Experimental results show that our homotopy method is more efficient and effective than the existing sate-of-the-art methods. The rest of this paper is organized as follows. Section 2 is the preliminary, in which some notations and the IHT method for the low rank matrix minimization are described. Section 3 presents the homotopy method for the matrix rank minimization problem (1), which combines the homotopy technique with the IHT method. Moreover, convergence analyses are given. In Section 4, some practical issues are considered. Numerical experiments are shown in Section 5 and conclusions are made in Section 6, respectively.

Preliminaries.
2.1. Notations. In this subsection, some notations are presented to simplify presentation. Let X ∈ R m×n . r = rank(X) denotes the rank of matrix X. σ(X) denotes the singular values of X with decreasing order, where σ i (X) is the i-th largest singular value. The spectral norm ||X|| 2 is the largest singular value of matrix X, i.e., ||X|| 2 = max i σ i (X) = σ 1 (X), and the Frobenious norm of matrix X is denotes the adjoint operator of A(·). The transpose of matrix X is denoted by X T . For a square matrix X, its trace is defined by the sum of its diagonal entries. The inner product of two matrices X and Y is denoted by X, Y = T r(X T Y ) = T r(Y T X).

2.2.
Iterative hard thresholding method. In this subsection, the iterative hard thresholding (IHT) method with singular value decomposition is presented, and the convergence of the method is proved. We consider the problem where f (X) = 1 2 ||A(X) − y|| 2 2 and its gradient is Lipschitz continuous (denote the Lipschitz constant as L f ). To solve this problem, we approximate the first term by linearizing f (X) at the current solution X k , and replace the objective function by where the quadratic term is the proximal term, L > 0 is a constant. Then problem (7) can be approximated by

ZHENGSHAN DONG, JIANLI CHEN AND WENXING ZHU
The above problem has a closed form solution based on the following lemmas (refer to Corollary 2.2 in [17]).
Lemma 2.1. [17] Let α > 0, and Z ∈ R m×n be given, and let t = min(m, n). Suppose U Σ(Z)V T is the singular value decomposition of Z. Then, X * = U D(x * )V T is an optimal solution of the problem where x * ∈ R t is an optimal solution of the problem and D(·) transforms a vector to a diagonal matrix with the vector as its diagonal entries.
Lemma 2.2. [2,16] The solution of problem (11) is given by the hard thresholding operator where [·] i denotes the i-th component of the vector.
In fact, the above two lemmas present the matrix hard thresholding operator for problem (10): For simplification, we denote H α (·) as the hard thresholding operator on a vector or a matrix, which is defined by (12) and (13). Then we adopt the iterative hard thresholding method to solve problem (7). The frame is presented as Algorithm 1.
Algorithm 1: X * ← IHT (L, λ, X 0 ) Input: L > L f , λ, X 0 ; Output: X * ; 1: initialization k ← 0; 2: repeat 3: Remark 1. The iterative hard thresholding method proposed in Algorithm 1 is different from the one proposed in [8]. In [8], the iterative hard operator is a rank-r projection on the set C r . More precisely, it just keeps the r largest singular values and sets others as zeros. However, in Algorithm 1, the corresponding operator depends on a thresholding value, which controls the rank value. Hence we need no information about the exact rank value.
Next, we show the convergence and some properties of Algorithm 1, which will be used later. Theorem 2.3. Let {X k } be the sequence generated by Algorithm 1. If L > L f , then the following statements hold: (iii) rank(X k ) changes only in a finite number of steps.
Proof. (i) By the iteration process of Algorithm 1, we have In the above equations, the second inequality follows from that ∇f is Lipschitz continuous and the Lipschitz constant is L f ; the third inequality follows from the assumption that L > L f . Thus, P L,λ (X k , X k+1 ) is nonincreasing. Moreover, if X k = X k−1 , then the above third inequality is strict and P L,λ (X k , X k+1 ) < P L,λ (X k−1 , X k ), since L > L f . (ii) By the iteration process of Algorithm 1 and L > L f , we also have where the last inequality follows from that ∇f is Lipschitz continuous. Thus which together with the convergence of {φ λ (X k )} and L > L f imply that {||X k − X k+1 || 2 F } converges to 0. (iii) Without loss of generality, suppose that λ > 0. If rank(X k ) still changes when k is large enough, then |rank(X k ) − rank(X k+1 )| ≥ 1. However, the convergence of {X k } and the continuity of f (X k ) imply that, for any ε > 0 such that which contradicts the convergence of {φ λ (X k )}. Hence, rank(X k ) changes only in a finite number of steps.
From Theorem 2.3.(ii), we can see that if X k+1 = X k , then all the following solutions generated by Algorithm 1 are equal to X k , and Algorithm 1 will terminate. Hence, by (14) the sequence {φ λ (X k )} is strictly decreasing.
3. Homotopy method. In the previous section, we convert problem (1) to the regularization problem (7), which is tackled by iteratively solving problem (9). Since the regularization parameter λ balances the rank of X and the reconstruction error, a suitable value of λ could produce a better solution. However, finding a good value of λ is still open. In this section, we adopt the homotopy technique to weaken the affection of the regularization parameter, and efficiently calculate and trace the solutions of the regularized matrix rank problem along a homotopy path.
Based on the hard thresholding operator with the singular value decomposition, the core idea of our homotopy method is as follows. Set a large initial value of the regularization parameter λ and gradually decrease it with some strategy. More precisely, the decreasing speed of λ could be geometrical: for an initial value λ 0 and a parameter ρ ∈ (0, 1), set λ k+1 = ρλ k for k = 0, 1, 2, · · · . For every fixed value of the regularization parameter λ, Algorithm 1 is used to find an approximate optimal solution of problem (7), and then use it as the initial solution for the next iteration.
The framework of the proposed Homotopy Iterative Hard Thresholding (HIHT) algorithm is described as Algorithm 2.
Remark 2. According to (iii) of Theorem 2.3, the iteration between lines 5-8 of Algorithm 2 terminates in a finite number of steps. Let n k be the number of iterations between lines 5-8 at the k-th outer iteration. Then for each λ k , when i ≥ n k , rank(X k,i ) =rank(X k,n k ) and rank(X k,n k ) = rank(X k,n k −1 ).
However, the condition used in line 8 of Algorithm 2 is not easy to verify, since at some i rank(X k,i ) = rank(X k,i+1 ) does not indicate that rank(X k,i ) will not change forever. We will consider some practical termination conditions in Section 4.
The convergence analysis of Algorithm 2 is presented as the following theorem. The proof is based on some properties of IHT in Section 2.
Theorem 3.1. Let {X k } be the sequence generated by Algorithm 2. If L > L f , then the following statements hold: (i) {φ λ k (X k )} is strictly decreasing and convergent. Indeed, Proof. (i) Firstly, for each fixed λ k , by (iii) of Theorem 2.3, lines 5-8 in Algorithm 2 will terminate in a finite number of iterations. Furthermore, by (i) of Theorem 2.3 and the choice of X k,i from Algorithm 2, we can obtain that where n k is the number of iterations between lines 5-8 at the k-th outer iteration. On the other hand, since ∇f is Lipschitz continuous, one can observe that , which together with (17), X k+1 = X k,n k , and L > L f imply that ≥ λ k (1 − ρ)rank(X k+1 ) > 0.
(ii) By the proof of Theorem 2.3, i.e., for every fixed λ k , {φ λ k (X k,i )} is nonincreasing, the convergence of {φ λ k (X k )} and nonincreasing λ k , one can observe that the sequence · · · is nonincreasing and converges.
(iii) Denote From Theorem 3.1.(i), we know that the sequence {φ λ k (X k )} is strictly decreasing, which together with the strong convex of f (X) implies that for all X ∈ R m×n and k > 0. Substituding X =X into the above inequalities and using (19), we obtain for all k > 0. Hence the sequence {X k } is bounded, which together with Theorem 3.1.(ii) and the finiteness of n k implies that the sequence {X k } is convergent. (ii) Note that we could reformulate problem (7) equivalently such that 1 λ is a penalty parameter on f (X). Then by the theory of the penalty function method, any limit point of {X k,i } is a feasible solution of problem (1), i.e., if X * = lim k→∞ X k,i , then A(X * ) = y.
4. Practical issues. Since the matrix rank minimization problem is always large scale in practice, finding an upper bound on L f is difficult. In this section, some issues will be considered, such as the value of L and the termination conditions.

4.1.
Upper bound on L f . One simple choice is to set an upper bound on L f as a large constant value. But this strategy may cause the algorithm converging slow or converging to a bad solution. An alternative method is setting L by a linear search: initializing L as a suitable value and then increasing it until it satisfies The first condition is based on the definition of L f and L > L f . The second one is based on (16) with ≥ L − L f (similar strategy can refer to [16]).
For solving the matrix completion problem, L could be 1 + δ (δ ≥ 0). This is by the reason that the corresponding sampling matrix A of A is a projection matrix and satisfies ||A T A|| 2 = 1.

4.2.
Termination conditions. In the previous sections, we have proved that the sequence {X k } generated by our method is convergent. Hence, we could terminate our method when reaches for a small number > 0. For Algorithm 2, we stop the algorithm when λ reaches a small given value λ t . For the inner termination condition (line 8) of Algorithm 2, it may not be easily and exactly detected. Since Algorithm 1 converges for each fixed λ, we may use as the inner stopping criterion for a small number in > 0. In addition, from the proof of Theorem 3.1, we can find that the number of inner iterations can be fixed as some constant and Theorem 3.1 still holds. More precisely, Algorithm 2 is still convergent. Hence, we will stop the inner loop between lines 5-8 of Algorithm 2 when the number of inner iterations reaches step or the criterion (21) is satisfied in our later computational experiments.

Computational experiments.
Since most of algorithms for matrix rank minimization are tested on the matrix completion problem, in this section, we also test the performances of our algorithms on the matrix completion problem. In subsection 5.1, we show the influence of some parameters on our methods. In subsection 5.2, we compare our methods with two state-of-the-art algorithms, i.e., the FPCA method [8,18] and the PD method [17]. All experiments are performed on a personal computer with an Intel(R) Core(TM)2 Duo CPU T6600 (2.20GHz) and 3GB memory, using a MATLAB toolbox (version R2012b). We generate test data as follows. The matrix M ∈ R m×n with rank r is created by the product of two random Gaussian or uniform matrices, i.e., M = M L M T R , where M L ∈ R m×r and M R ∈ R r×n , and their entries follow independently the standard Gaussian or uniform distributions. The subset Ω of q entries are sampled independently and uniformly at random.
If without special statement, some parameters are set as follows. We set m = n = 500 with sampling ratio τ . Then the sampling number q = τ mn. Also, the initial solution of our algorithms is set as X 0 = 0, the initial value of the upper bound on ∇f (X) is set as L = 1, and the tolerances as = in = 10 −5 . The initial value λ 0 of the regularization parameter λ is set as η||A * (y)|| ∞ and ρ = 0.3 controls its decreasing speed, i.e., λ 0 = η||A * (y)|| 2 , λ k+1 = 0.3λ k for η > 0, k = 0, 1, 2, · · · , λ t , and λ t = 0.001. All parameters in the compared methods are set as their default values if without special statement.
As [4,18], for a given approximate recoveryX for M , the relative error, which is defined as rel.err. : is used to estimate the reconstruction error. We call thatX recovers a matrix M , if the relative error is less than 10 −3 and the recovered rank is exactly r. This definition is different from that in [4,18], in which only the relative error less than 10 −3 is required. We redefine this concept due to that, though some methods can reconstruct low rank matrix with small relative error, they cannot recover a matrix with the exact rank in some cases (refer to Table 4). We denote N S as the number of matrices that are successfully recovered.

Influence of parameters.
First of all, we test the influence of some parameters on our method, such as the regularization parameter λ, and the maximum number of inner iterations for each fixed λ in Algorithm 2. In this subsection, the entries of M L ∈ R m×r and M R ∈ R r×n follow independently the standard Gaussian distribution.
In the first experiment, we investigate the influence of the regularization parameter λ on Algorithm 1 and the influence of the initial value λ 0 on Algorithm 2. All parameters are set as above except that the sampling ratio is set as τ = 0.5, rank as r = 40, the number of inner iterations step as 50, and For each value of M , sampling number q, rank r, and the iteration step, 50 random trials are made. The averaged experimental results are put in Table 1, including NS, the average recovered rank r, the average rel.err. and the average running time (in seconds). From Table 1, we can find that, Algorithm 1 can recover the low rank matrix only when choosing a suitable value of the regularization parameter λ. However, without knowing the rank value r, the HIHT method can solve the problem when λ 0 is not too small. Moreover, Table 1 also shows that the proposed HIHT method can exactly recover the low rank matrix without prior knowledge of the exact matrix rank. Furthermore, considering performance of our homotopy method, we set η = 20 for the HIHT method if without special statement in the following experiments.
In the second experiment, we investigate the influence of the maximum number of inner iterations step (the loop between lines 5-8) on Algorithm 2. Values of all parameters are set as in the first experiment except the number of inner iterations. The maximum number of inner iterations step ranges from 5 to 70. For each value of M , sampling number q, rank r, and the number of inner iterations step, 50 random trials are made. The averaged experimental results are put in Table 2.
From Table 2, we can find out that the running time just changes in several seconds and the relative error keeps on the same order of magnitude when the number of inner iterations ranges from 10 to 70. Hence, in the following experiments, considering robustness of our homotopy method, we set step = 50.

5.2.
Comparing with other methods. In this subsection, we compare the performances of our HIHT method with the state-of-the-art methods: FPCA [8,18], PD [17] for solving the matrix completion problem on random instances. The two methods can solve the problem without prior knowledge of the rank value. More precisely, the FPCA method solves the nuclear norm minimization problem (3), while the PD method solves the low rank minimization problem (1) directly. We mainly test the compared methods on instances with different ranks.
All the methods are tested on random Gaussian and uniform matrices with problem size m = n = 500 and fixed sampling ratio τ = 0.5, but with different rank. All parameters of our methods are set as stated before, except with different rank. We also set parameters of all other compared methods as default values except that tol = 10 −4 , µ = 10 −4 for FPCA. 50 instances are randomly generated for each rank value. The averaged experimental results are put in Table 3 and Table 4, respectively. Table 3 shows that all compared methods can reconstruct the low rank Gaussian matrices well, but with obviously different running times. Among the compared methods, The FPCA method runs fast when the rank is lower than 50 but much slow when the rank becomes larger. The PD method is almost the slowest on instances with different ranks. This is due to that there are many subproblems to be solved and each iteration uses the exact SVD. Our HIHT method also uses the exact SVD, but it is much faster than the PD method.
From Table 4, we can find out that our method is still efficient and effective. Specifically, when the rank is larger, our method outperforms all the other methods both in running time and solution quality. The FPCA method could reconstruct low rank matrix only when the rank is smaller than 30. The PD method performs better than the FPCA method when the rank is not less than 30, and presents almost the same performance as our HIHT method but with higher CPU time.  In the last experiment, we compare the three methods on the matrix rank minimization problem with m = n = 4000, r = 200, and sampling ratio τ = 0.3. We set parameters as before except that η = 200. We generate 50 instances randomly for each case. The average experimental results are put in Table 5. Table 5 shows that, though the FPCA method is the best on the Gaussian matrices, it cannot recover the uniform matrices. However, the HIHT method works well in the both cases. It also shows that our HIHT method performs better than the PD method both in running time and solution quality on the Gaussian matrices. For uniform matrices, our method is comparable with the PD method. 6. Conclusions. A homotopy method, which combines the advantage of the homotopy technique with the matrix hard thresholding method, is proposed for solving the matrix rank minimization problem. The method is strictly decreasing and convergent, and can overcome the difficulty of the iterative hard thresholding method on the choice of the regularization parameter, by tracing solutions of the sparse problem along a homotopy path. In addition, the homotopy method with exact SVD does not require a prior knowledge of the rank value. Numerical experiments demonstrate effectiveness of the proposed algorithm in accurately and efficiently recovering a low rank matrix. In particular, the low computational complexity per iteration makes the proposed method be efficient for large scale problems.