SIDE-INFORMATION-INDUCED REWEIGHTED SPARSE SUBSPACE CLUSTERING

. Subspace clustering segments a collection of data from a union of several subspaces into clusters with each cluster corresponding to one subspace. The geometric information of the dataset reﬂects its intrinsic structure and can be utilized to assist the segmentation. In this paper, we propose side-information-induced reweighted sparse subspace clustering (SRSSC) for high-dimensional data clustering. In our method, the geometric information of the high-dimensional data points in a target space is utilized to induce subspace clustering as side-information. We solve the method by iterating the reweighted l 1 -norm minimization to obtain the self-representation coeﬃcients of the data and segment the data using the spectral clustering framework. We compare the performance of our proposed algorithm with some state-of-the-art algorithms using synthetic data and three famous real datasets. Our proposed SRSSC algorithm is the simplest but the most eﬀective. In the experiments, the results of these clustering algorithms verify the eﬀectiveness of our proposed algorithm.

ambient space. Each subspace has the corresponding low-dimensional structure or cluster. In many real problems, the low-dimensional structure of the data in each class or category can be well represented by a low-dimensional subspace of the high-dimensional ambient space. Finding such structures or subspaces can greatly help the processing of the high-dimensional data. A useful method for detecting the subspace is subspace clustering, which separates the high-dimensional data into multiple low-dimensional subspaces according to their underlying relationships or intrinsic structures. In the past few years, the subspace clustering problem has caused wide spread concern because subspace clustering is a very effective means of processing high-dimensional data. It has numerous applications in computer vision, such as in motion/video/image segmentation [28,29,24], pattern recognition, face clustering [16,8] and handwritten digit recognition [19]. Figure 1 gives a simple illustration of subspace clustering. Many detailed definitions of subspace clustering can be found in reference [30]. Due to its simplicity and outstanding performance, spectral clustering-based methods [22] have become extremely popular among subspace clustering methods. These methods employ the spectral clustering algorithm as the framework. These methods can be divided into two stages. In the first stage, they construct an affinity matrix whose entries measure the similarity between pairs of points. In the second stage, the segmentation results of the data are obtained by applying spectral clustering algorithms, such as normalized cuts [26]. The main challenge of this kind of method is how to define the informative affinity matrix. The Local Subspace Affinity (LSA) [36] and Spectral Local Best-fit Flats (SLBF) [38] use the local information around each point to build an affinity matrix. The main difficulty of these methods is how to deal with points near the intersection of two subspaces because the neighbors of a point may belong to different subspaces. To solve these issues, global spectral clustering-based methods utilize the global information to construct the similarity matrix, and they include Sparse Subspace Clustering (SSC) [8], Low-Rank Recovery (LRR) [20], Structured Sparse Subspace Clustering (StrSSC or S 3 C) [18], etc. The key idea of SSC is the self-expressiveness property of the data, i.e., each data point can be efficiently represented by a linear or an affine combination of other points. Among the infinite possible representations of a data point, a sparse representation of a data ideally corresponds to a combination of a few points in the same subspace. The solution of a global sparse optimization program, such as the affinity matrix, is used in the spectral clustering framework to obtain the segmentation of the data. Similar to SSC, LRR assumes that the data have the self-expressiveness property, and then it seeks the lowest rank representation. The coefficient matrix of the lowest rank representation is used to obtain the clustering of the data based on the spectral clustering framework. StrSSC utilizes the fact that the affinity and the segmentation depend on each other to define the subspace structured measure that is used instead of the sparsity constraint of SSC. StrSSC is an iterative procedure that alternates between the structured sparse representation and data segmentation. Currently, many methods have been extended from the methods that were mentioned above in reference [33]- [25]. The effectiveness of the clustering results has been demonstrated using synthetic data and benchmark datasets such as the Extended Yale B dataset [17], the COIL 20 database [23], and the USPS dataset [14].
Although these methods lead to state-of-the-art results in many applications, the computational costs of these methods are too expensive. Meanwhile, these methods focus on the sparse or low-rank self-representation, and do not explore the geometrical relationships between the data points that are drawn from a union of subspaces. As we know, the geometrical information, e.g., the angle between each pair of points in the linear space or affine space, has a direct effect on data segmentation. The angle between data points in the same subspace is much smaller than the angle between data points in different subspaces. In this paper, we utilize the geometric information to construct the affine matrix. i.e., the sparse coefficient matrix is weighted using the information of the angle as a penalty. We propose a geometrical information-induced weighted l 1 -minimization problem to construct affinity matrix, and then efficiently obtain the segmentation of the data using spectral clustering. This method fully considers the influence of the geometric relationship between data points on data clustering. We call it side-information-induced reweighted sparse subspace clustering (SRSSC). Experiments that were conducted using synthetic data and several real datasets demonstrate that our method outperforms the state-of-the-art subspace clustering methods.
The proposed side-information-induced reweighted algorithm in this paper both improves the accuracy of subspace clustering and greatly reduces the computational complexity. Compared with the state-of-the-art methods, the main advantages of our work are summarized as follows.
(1) We address the effect of the geometric information of the high-dimensional data on the data segmentation. By utilizing the side-information, which reveals the intrinsic structure of the dataset, the proposed SRSSC more efficiently and accurately segments the high-dimensional dataset.
(2) The solution of the iterative reweighted l 1 -minimization problem is much more approximate to the l 0 -minimization, which is NP-hard. The affinity matrix that is constructed by the SRSSC algorithm is more accurate and sparser. The results that are obtained by spectral clustering with this affinity matrix are more accurate than those of the state-of-the-art algorithms.
(3) Experiments on synthetic data and several commonly used real datasets demonstrate that our method outperforms state-of-the-art subspace clustering methods, which include LSR [21], SMR [13], LRR, SSC, StrSSC and RSSC [35]. Specially, the accuracy of SRSSC has been greatly improved when the angles of the pairs of data are greater than 10 degrees. Figure 2 illustrates the distributions of the angles of the pairs of data in the three real databases.
The outline of this paper is as follows. Section 2 briefly introduces the theory of the SSC model using the l 0 -minimization framework. Section 3 proposes the side-information-induced reweighted sparse subspace clustering (SRSSC) and the  Figure 2. Angles of pairs of data in the databases optimization algorithm. Section 4 presents the experimental results from using several benchmark databases to demonstrate the superior performance of the proposed method. Finally, Section 5 concludes this paper.
2. Related work. Consider a given data matrix X ∈ R D×N , where each column of X represents a point x i ∈ R D (i = 1, 2, ..., N ) that is drawn from an unknown union of K ≥ 1 linear or affine subspaces {S l } K l=1 . The dimensions d l =dim § l ), where 0 < d l < D, of these subspaces are unknown, and the points that belong to specific subspaces are also unknown. The goal of subspace clustering is to find the parameters of each subspace and separate the high-dimensional data into multiple low-dimensional subspaces according to their underlying relationship so that the points in the same cluster belong to the same subspace.
2.1. Sparse subspace clustering. We assume that the data points are self-expressive, i.e., each data point in the union of subspaces can be well represented by a linear combination of other points in the dataset. More precisely, x i ∈ S l can be written as where z i [z i1 , z i2 , . . . , z iN ] T . The constraint z ii = 0 is used to eliminate the trivial solution of representing a point as a linear combination of itself. This can be expressed in matrix form as However, the representation of X is not unique. Since the number of data points is larger than the spatial dimension, i.e., N > D, this equation is underdetermined. There are infinitely many representations of each data point in X. The compressive sensing [1] field needs to find the sparse representation of a data point, which means that it needs to find as few data points as possible to represent a data point. In mathematics, it is equal to solving the l 0 quasi-norm minimization problem where z i 0 is the member of nonzero components of z i . This problem is nonconvex and NP-hard. We can rewrite the l 0 quasi-norm minimization (3) for all data points in the matrix form as follows: Since we all know that the l 1 -norm is the convex envelope of the l 0 quasi-norm and the l 1 -norm minimization has a closed form solution in the literature [3], SSC finds the approximate solutions by using the l 1 -norm minimization instead of the l 0 quasi-norm minimization. The l 1 -norm minimization problem can be written as where z i 1 = N j=1 |z ij |. This formulation corresponds to the matrix form as min The SSC uses the optimal solution Z * of the problem (6) to construct the affinity matrix(similarity matrix) A = (|Z * |+|Z * |)/2 and then obtains the final clustering results by spectral clustering framework.
2.2. l 0 -sparse subspace clustering. Most of subspace clustering methods , such as SSC and StrSSC, require independence or disjointness assumptions on the subspaces. In practice, these assumptions are not guaranteed to hold. Yang, Yingzhen, et al proposed l 0 -induced sparse clustering (l 0 -SSC) [37] and proved that the similarity matrix can be obtained by l 0 -SSC for arbitrary distinct underlying subspace almost surely under the mild i.i.d assumption on the data. l 0 -SSC directly solved the original l 0 quasi-norm minimization problem (3). Allowing some tolerance for inexact representation, l 0 -SSC turn to optimize the following problem: which is equivalent to the following problem: l 0 -SSC employed proximal gradient descent to solve the above problem and obtained a sub-optimal solution which is used to build a similarity matrix for clustering. In the literature [4], Baraniuk, R. G, et al proposed directly using S 0 and l 0 as constraints for low rank and sparsity (S 0 /l 0 ) -LRSSC. They obtained the following nonconvex optimization problem: (S 0 /l 0 ) -LRSSC used a proximal average method to approximate the proximal map of the joint solution. The above two methods obtained more accurate approximation of sparsity of the data representation matrix than it is the case with l 1 -norms, but these methods have no advantage on computational complexity.

2.3.
Reweighted sparse subspace clustering. Since the l 1 -norm minimization problem can be efficiently solved via convex programming tools [12], the l 1 regularization has been widely used in signal analysis, data compression, image processing and so on. According to the definitions of the l 0 quasi-norm and the l 1 -norm, while the l 0 quasi-norm treats the coefficients equally, the l 1 -norm penalizes the larger coefficients more heavily than smaller ones. To solve this imbalance, E.J. Candès et al. proposed a weighted l 1 minimization to more democratically penalize nonzero coefficients [5].
Consider the weighted l 1 minimization problem where W ∈ R N ×N is the diagonal matrix, and the diagonal elements of this matrix are positive weights. When the weighted l 1 minimization problem is guaranteed to find the correct solution of the l 0 -norm minimization problem (3). How to obtain the weight matrix without the correct solution Z is a remaining question. In literature [11], FOCUSS was proposed by Gorodnitsky and Rao as an iterative method for finding the sparse solutions of underdetermined systems. At each iteration, FOCUSS finds the solution of a reweighted l 2 minimization problem. M. Fazel et al. proposed a similar idea in [9]. They locally minimized the logarithm of the determinant as a smooth approximation for minimizing the rank of matrices by an iterative l 1 -norm minimization technique, and examined the vector case as a special case. E.J. Candès et al. used the same strategy to solve the remaining question. They proposed an iterative algorithm that solves a sequence of weighted l 1 -minimization problems. The value of the current solution is used for the weighted matrix computation of the next iteration. The iterative weighting l 1 minimization framework largely improves the performance of the standard l 1 minimization framework. Xu J et al. applied this algorithm to segment high-dimensional data and proposed reweighted sparse subspace clustering (RSSC) [35]. Specifically, the iterative weighting l 1 minimization framework is used to solve the weighted l 1 minimization problem where denotes the elementwise product between two matrices. The optimal solution Z * of this problem is used to construct the affinity matrix, like with SSC. The spectral clustering framework is employed to obtain the final clustering results for the affinity matrix. Due to the outperformance of iterative weighting l 1 minimization, RSSC is more accurate when compared to other state-of-the-art subspace clustering algorithms such as SSC.
3. Side-information-induced reweighted sparse subspace clustering. In the linear space or affine space, the geometric information is important structural information for subspace clustering, especially the angle between each pair of points.
In general, the angle between two points in the same subspace is less than the angle between two points in different subspaces.
Definition 1: The angle between two data points x i and x j , which is denoted as θ ij , is defined as the cluster selection using the improved Sparse Subspace Clustering: According to the formula above, the cosine of the angle between every two points is greater than 0 and less than 1, i.e., cos(θ ij ) ∈ [0, 1].
To make full use of geometric information between data points, we propose a sideinformation-induced l 0 minimization problem that heavily penalizes the representative coefficient z ij where the angle between points x i and x j is larger than others. This can be formulated as min The weighting matrix D ∈ R N ×N is the side-information, which is also a penalty for the coefficient matrix Z. We take the element of D as d ij = ρ · e − x i x j x i 2 x j 2 . ρ is positive, and it is taken as the scale parameter. In this work, we set ρ = 1. According to the definition of the l 0 quasi-norm, the solutions of equation (14) and equation (4) are equivalent. As we know, the optimization of l 0 quasi-norm minimization is NP-hard. We use the l 1 -norm to relax the l 0 quasi-norm, as is the usual practice. Then, equation (11) could be relaxed as follows: When all the elements of D are one, the above problem is a standard l 1 minimization problem. It is convex and can be efficiently solved via a convex programming algorithm [3] [12]. Obviously, the solution of (15) is not the correct solution of (14). How does this happen? According to the definitions of the l 0 quasi-norm and the l 1 -norm, while the l 0 quasi-norm treats the coefficients equally, the l 1 -norm penalizes the larger coefficients more heavily than smaller ones. To solve this imbalance, E.J. Candès et al. proposed a weighted l 1 minimization to more democratically penalize nonzero coefficients [5]. The performance of the weighted l 1 minimization is better than the standard l 1 minimization in many applications. Xu J et al. applied this algorithm to segment high-dimensional data and proposed reweighted sparse subspace clustering (RSSC) [35]. Based on this heuristic, we propose a sideinformation-induced reweighted l 1 -norm minimization as follows: where this weighting matrix W = [w 1 , w 2 , . . . , w N ] ∈ R N ×N is designed to solve equation (16), which is more approximate to the solution of the equation (14). We term this model as side-information-induced reweighted sparse subspace clustering (SRSSC).

3.2.
Iterative reweighted algorithm for side-information-induced l 1 -norm minimization. In the last few decades, the l 1 -norm was widely used as a sparsitypromoting functional, and the sparsity-promoting nature of the l 1 norm minimization was empirically confirmed. With heuristic literature [35], we employ the same strategy to solve problem (16).
The SRSSC can be written as: The above problem (17) equals problem (16), and can be viewed as a relaxation of the weighted l 0 minimization problem. Intuitively, the larger element of the weighting matrices w ij will penalize d ij z ij more heavily, which will make it smaller or even zero. Inversely, a smaller w ij will encourage the magnitude of the penalty of d ij z ij to be larger or even nonzero. Hence, as a rule of thumb, the weight w ij should be inversely related to the true absolute value of d ij z ij . However, if we do not know the true values of d ij z ij , how can we know the weighting matrix W ? The iterative weighting l 1 minimization framework [5] which proposed by E.J. Candès et al., largely improves the performance of the standard l 1 minimization framework.Similar to this method, we can make full use of the log-sum heuristic method to solve problem (17).
Consider the following problem: where > 0. Equivalently, this problem (18) can be written as follows: Obviously, if z * i is a solution of problem (18), then (z * i , u * i ) is a solution to (19) and vice versa. However, the surrogate objective function N j=1 log(u ij + ) is concave and smooth. We can use an iterative linearization method to find the solution of the cardinality minimization [9]. Let z denote the kth iterations of the optimization variables z i ,u i and d i , respectively.The first-order Taylor series expansion of the concave surrogate function N j=1 log(u ij + ) about z ij is given by the following: Since the gradient of log(u ij + ) with respect to z ij is , the iterative linearization of the concave function gives the heuristic method for the cardinality minimization as follows: which is equivalent to Each iteration is the solution of a convex optimization problem as a weighted l 1 -norm minimization problem. Therefore, we can employ the algorithm to solve problem (16). At the beginning of the iteration, we use the data points x i and x j to obtain the d  (13) is the solution of the standard l 1 -norm minimization. Based on (19), the (i, j) entry of the weighting matrix is w . This also reflects that the weights are inversely proportional to the solution.
3.3. ADMM algorithm for solving SRSSC. Problem (16) can be solved using the well-known lasso problem [27]. Based on the Alternating Direction Method of Multipliers (ADMM) [2] framework, we consider the following equivalent optimization problem: Here, matrix A helps to obtain faster updates as an auxiliary matrix and parameters λ e and λ z balance the three terms in the objective function. Two Lagrangian multipliers δ ∈ R N and ∆ ∈ R N ×N augment the penalty terms to obtain the Lagrange function of the optimization program from (23) as where tr(·)is the trace operator. We update A, Z, W, D, E, δ and ∆ alternatively while keeping the other variables fixed. We use A (k) , Z (k) and E (k) to denote the optimization variables at the kth iteration δ (k) and ∆ (k) as the Lagrange multipliers at the kth iteration.
solution as where τ W η (·) is the shrinkage-thresholding operator that is defined by Update for E (k+1) By fixing A (k+1) and minimizing the Lagrange function (24) with respect to , we obtain Update for D (k+1) and W (k+1) By fixing Z (k+1) , we obtain D (k+1) and W (k+1) as follows: Update for δ (k+1) and ∆ (k+1) By fixing A (k+1) and Z (k+1) , we obtain the Lagrange multipliers δ (k+1) and ∆ (k+1) with the step size of µ as In brief, the ADMM algorithm for solving the optimization program (23) proceeds as shown in Algorithm 1. For more details on the ADMM algorithm, the readers can refer to [2]. After solving the proposed SRSSC optimization program in (23), we obtain the sparse representation coefficient matrix of all the data points. Now, we use this

Algorithm 2 Algorithm 2 Side-Information-Induced Reweighted Sparse Subspace Clustering
Input: Data matrix X ∈ R D×N and the number of desired clusters K Solve problem (23) via Algorithm 1. Form a similarity matrix C = 1 2 (|Z * | + |Z * |) . Apply spectral clustering to the similarity matrix C. Output: The clustering result of the data points X.
representation matrix to build a symmetric nonnegative similarity matrix Then, we obtain the clustering results of the data points by applying spectral clustering to the similarity matrix C.
In summary, the SRSSC algorithm can be outlined as in Algorithm 2.
3.4. Convergence analysis. Similar to RSSC, we update the weighting matrix W by approximating the l 0 quasi-norm using the log-sum surrogate function. Since the log-sum function is concave, the solution of its minimization problem can be found using an iterative linearization method. As shown in [5], the solution of the reweighted l 1 -norm minimization with the log-sum surrogate function is a local minimizer. When Algorithm 1 converges to a local minimum, the method that is proposed in this paper may not converge. In practice, if the optimal parameters are set wisely, the convergence of our method can be guaranteed. The experimental results in the next section also show the convergence of our method. Figure 3 illustrates the convergence behavior of SRSSC on the Extended Yale B dataset. In this section, we implement synthetic data and three famous real datasets, including the Extended Yale B, COIL 20 and USPS datasets, to test the clustering performance of our proposed SRSSC method when applied to subspace clustering. We compare the performance of SRSSC with several state-of-the-art subspace clustering algorithms, such as LSR, SMR, LRR, SSC, StrSSC, RSSC and our proposed SRSSC algorithm. The codes of the algorithms that are mentioned above were released by the authors. All the parameters that were mentioned in their papers are kept the same, and we obtain the same results as they did in their papers. The performance of the subspace clustering algorithms is measured using the clustering error, which is defined as follows: where N error denotes the number of misclassified points and N total represents the total number of data points. A smaller clustering error means better clustering performance.
All the experiments are performed in MATLAB 2014a on a PC with a 2.40 GHz Intel(R) Xeon(R) CPU and 8.00 GB RAM.

4.1.
Experiments with synthetic data sets. In this experiment, we generate K subspaces {s l } K l=1 that are embedded in the D-dimensional ambient space. We set K = 5 and D = 30. The dimensions of these subspaces are 3, 5, 8, 11, and 15, respectively. Specifically, for each subspace, we obtain the orthogonal basis using the Singular Value Decomposition (SVD) [15] of a random matrix with d i rows. Now, we generate the data vectors with N (0, 1) random matrix . In our experiment, we set n i = 300. We  Figure 4. Visualization of the similarity matrices that were obtained by the different methods sample 100 data vectors from each subspace X i to construct the synthetic data set X ∈ R D×N . Here, D = 30 and N = 500. Finally, white Gaussian noise with a signal-to-noise ratio (snr) [34] of 15 is added to the data matrix (in MATLAB using awgn(X, snr)). We compare the performances of LSR, SMR, LRR, SSC, StrSSC, RSSC and our proposed SRSSC algorithm. All the parameters that were mentioned in their paper are kept the same. We set λ = 753 for both the RSSC and our SRSSC algorithm as in SSC. In RSSC, 1 = 9 × 10 −3 and 2 = 2.7 × 10 −4 . In SRSSC, parameters 1 and 2 are kept the same as with RSSC. We repeated this experiment 10 times to make the results more general. The similarity matrices  Figure 4. All of these similarity matrices in Figure 4 have obvious block diagonal structures. The similarity matrix in Figure 4(d) that was obtained by our SRSSC algorithm is sparser than that of Figure 4(a),(b) and (c). The accuracies of the state-of-the-art subspace clustering algorithms are reported in Table 1.

4.2.
Experiments using the Extended Yale B dataset. For the face clustering problem, multiple subjects are required to take facial images under different illumination conditions. Then, the images are clustered according to their subjects. As shown in [10], under the Lambertian assumption, the facial images of a subject under a wide variety of lighting conditions approximate a linear subspace of 9 dimensions. Therefore, the face clustering problem can be regarded as a subspace clustering problem. The Extended Yale B dataset includes the facial images of 38 human subjects, and each subject contains 64 frontal face images under various illumination conditions. To reduce the computational and storage complexity, the authors of [17] downsampled every image from 192 × 168 pixels to 48 × 42 pixels and vectorized the image as a 2016-dimensional data point. Figure 5 shows some sample images from the Extended Yale B database. By adopting the same protocol that was proposed in [5], we distribute the 38 subjects into 4 groups: 1-10, 11-20, 21-30 and 31-38. We perform all choices of subjects for the first three groups. For the last group, we consider all choices of n ∈ {2, 3, 4, 5, 6, 7, 8} subjects. In the experiment of our proposed method, we set 1 = 0.06, 2 = 0.03 and λ Z = 753. The clustering results of the state-of-the-art algorithms that are mentioned above are presented in Table  2.
The average computation times of the different algorithms using the Yale B dataset are shown in Figure 6. LSR and SMR are faster than LRR, SSC, StrSSC, RSSC and SRSSC. Specially, the computation time of StrSSC is significantly longer than that of LRR, SSC, RSSC and SRSSC.  was rotated on a turntable. Therefore, each object has 72 images that are 32 × 32 pixels in size with 256 gray levels. Then, each image is transposed to a 1024dimensional vector as a point. In our experiment, we consider n ∈ {2, 3, 20} objects in the database. For two objects, there are 190 cases. When n = 3, the number of different cases is 1140. Therefore, this experiment is very representative of other cases that do not consider in this paper. In this experiment, all the parameters are the same as those in their papers. In our proposed SRSSC algorithm, we set 1 = 0.06, 2 = 0.003 and λ Z = 10. The clustering results are exhibited in Table 3. 4.4. Experiment using the USPS dataset. The USPS digit database is a famous dataset for handwritten digit recognition. It is often used as a benchmark dataset for the subspace clustering algorithm. It consists of 11000 images of 10 subjects corresponding to handwritten digits from 0-9, respectively. Each image is 16 × 16 pixels. Some sample images from USPS are shown in Figure 8. As we see, some handwritten digits are scrawled, which makes it difficult to recognize handwritten digits. In the experiment, we consider 2, 3 and 10 objects in the USPS database. We set 1 = 0.07, 2 = 0.0003 and λ Z = 30. in our proposed SRSSC algorithm. The results of our experiments on the USPS dataset are listed in Table  4. It shows that SRSSC achieves the lowest clustering error. It is well known that handwritten digits 1 and 7, 2 and 3, 3 and 5, 6 and 0, 7 and 9 are very similar, which makes it very difficult to distinguish them accurately when there are 10 objects.

5.
Conclusion. In this work, we present side-information-induced reweighted sparse subspace clustering. The proposed algorithm makes the most of the geometric information between pairs of data. This information is directly related to the underlying structure of the data, so it is very useful for subspace clustering. Our proposed method is the simplest but the most effective. Extensive experiments demonstrate that SRSSC outperforms the existing state-of-the-art algorithms. In our future research, we will utilize more information about the data to design more effective models for subspace clustering and improve the accuracy of subspace clustering.