# American Institute of Mathematical Sciences

June  2017, 11(3): 539-551. doi: 10.3934/ipi.2017025

## Subspace clustering by (k,k)-sparse matrix factorization

 Department of Mathematics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China

* Corresponding author

Received  April 2016 Revised  February 2017 Published  April 2017

High-dimensional data often lie in low-dimensional subspaces instead of the whole space. Subspace clustering is a problem to analyze data that are from multiple low-dimensional subspaces and cluster them into the corresponding subspaces. In this work, we propose a $(k,k)$-sparse matrix factorization method for subspace clustering. In this method, data itself is considered as the "dictionary", and each data point is represented as a linear combination of the basis of its cluster in the dictionary. Thus, the coefficient matrix is low-rank and sparse. With an appropriate permutation, it is also blockwise with each block corresponding to a cluster. With an assumption that each block is no more than $k$-by-$k$ in matrix recovery, we seek a low-rank and $(k,k)$-sparse coefficient matrix, which will be used for the construction of affinity matrix in spectral clustering. The advantage of our proposed method is that we recover a coefficient matrix with $(k,k)$-sparse and low-rank simultaneously, which is better fit for subspace clustering. Numerical results illustrate the effectiveness that it is better than SSC and LRR in real-world classification problems such as face clustering and motion segmentation.

Citation: Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems & Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025
##### References:

show all references

##### References:
The plot of RSS vs. the reduced dimension $s$
The ExtendedYale data B
ADMM Algorithm to solve Model (11)
 Data: Initialize $E=0$, $U=0$, $V=0$, $\lambda_1=0$, $\lambda_2=0$. 1 while not convergence do 2 │ Update $Z$ with Equation (14); 3 │ Update $E$ with Equation (15) and $U$, $V$ with Equation (16), respectively; 4 │ Update $\lambda_1$ with Equation (17) and $\lambda_2$ with Equation (18); 5 end
 Data: Initialize $E=0$, $U=0$, $V=0$, $\lambda_1=0$, $\lambda_2=0$. 1 while not convergence do 2 │ Update $Z$ with Equation (14); 3 │ Update $E$ with Equation (15) and $U$, $V$ with Equation (16), respectively; 4 │ Update $\lambda_1$ with Equation (17) and $\lambda_2$ with Equation (18); 5 end
The algorithm for proximity operator of $\frac{1}{2\alpha}(\|\cdot\|_{sp}^{k})^2$ with input $\mathbf{h}$
 Data: $\mathbf{h}\in\mathbb{R}^n$ and the parameter $\alpha$.     Result: $\mathbf{p}=\underset{\mathbf{g}}{\arg\min}\left\{\frac{1}{2\alpha}(\|\mathbf{g}\|^{sp}_{k})^2+\frac{1}{2}\|\mathbf{g}-\mathbf{h}\|^2_2\right\}$ 1 Let $\tilde{\mathbf{h}}=[\tilde{h}_1,\cdots,\tilde{h}_n]^T$, i.e., $\tilde{h}_i$ is the $i$-th largest element of $|\mathbf{h}|$. Let $\Pi$ be the permutation matrix such that $\tilde{\mathbf{h}}=\Pi|\mathbf{h}|$. For simplicity, define $\tilde{h}_0:=+\infty$, $\tilde{h}_{n+1}:=-\infty$ and $\gamma_{r,l}:=\sum^l_{i=k-r}\tilde{h}_i$ 2 Find $r\in\{0,\cdots,k-1\}$ and $l\in\{k,\cdots,n\}$ such that             $\frac{\tilde{h}_{k-r-1}}{\alpha+1}>\frac{\gamma_{r, l}}{l-k+(\alpha+1)(r+1)}\ge\frac{\tilde{h}_{k-r}}{\alpha+1},$             $\tilde{u}_l>\frac{\gamma_{r, l}}{l-k+(\alpha+1)(r+1)}\ge \tilde{g}_{l+1}.$ 3 Define             $q_i = \left\{ \begin{array}{ll} \frac{\alpha}{\alpha+1}\tilde{h}_i & \textrm{if}\; i=1, \cdots, k-r-1\\ \tilde{h}_i-\frac{\gamma_{r, l}}{l-k+(\alpha+1)(r+1)}\\ & \textrm{if}\; i=k-r, \cdots, l\\ 0 & \textrm{if} \; i=l+1, \cdots, n \end{array} \right.$ 4 Set $\mathbf{p}=[p_1,\ldots,p_n]^T$, where $p_i=\mathrm{sign}(h_i)(\Pi^{-1}\mathbf{q})_i$.
 Data: $\mathbf{h}\in\mathbb{R}^n$ and the parameter $\alpha$.     Result: $\mathbf{p}=\underset{\mathbf{g}}{\arg\min}\left\{\frac{1}{2\alpha}(\|\mathbf{g}\|^{sp}_{k})^2+\frac{1}{2}\|\mathbf{g}-\mathbf{h}\|^2_2\right\}$ 1 Let $\tilde{\mathbf{h}}=[\tilde{h}_1,\cdots,\tilde{h}_n]^T$, i.e., $\tilde{h}_i$ is the $i$-th largest element of $|\mathbf{h}|$. Let $\Pi$ be the permutation matrix such that $\tilde{\mathbf{h}}=\Pi|\mathbf{h}|$. For simplicity, define $\tilde{h}_0:=+\infty$, $\tilde{h}_{n+1}:=-\infty$ and $\gamma_{r,l}:=\sum^l_{i=k-r}\tilde{h}_i$ 2 Find $r\in\{0,\cdots,k-1\}$ and $l\in\{k,\cdots,n\}$ such that             $\frac{\tilde{h}_{k-r-1}}{\alpha+1}>\frac{\gamma_{r, l}}{l-k+(\alpha+1)(r+1)}\ge\frac{\tilde{h}_{k-r}}{\alpha+1},$             $\tilde{u}_l>\frac{\gamma_{r, l}}{l-k+(\alpha+1)(r+1)}\ge \tilde{g}_{l+1}.$ 3 Define             $q_i = \left\{ \begin{array}{ll} \frac{\alpha}{\alpha+1}\tilde{h}_i & \textrm{if}\; i=1, \cdots, k-r-1\\ \tilde{h}_i-\frac{\gamma_{r, l}}{l-k+(\alpha+1)(r+1)}\\ & \textrm{if}\; i=k-r, \cdots, l\\ 0 & \textrm{if} \; i=l+1, \cdots, n \end{array} \right.$ 4 Set $\mathbf{p}=[p_1,\ldots,p_n]^T$, where $p_i=\mathrm{sign}(h_i)(\Pi^{-1}\mathbf{q})_i$.
The error rate (mean % and median %) for face clustering on Extended Yale dataset B
 # Classes mean/median SSC LRR (3, 3)-SMF (4, 4)-SMF error s error s 2 mean 15.83 6.37 3.38 18 3.53 18 median 15.63 6.25 2.34 2.34 3 mean 28.13 9.57 6.19 25 6.06 25 median 28.65 8.85 5.73 5.73 5 mean 37.90 14.86 11.06 35 10.04 35 median 38.44 14.38 9.38 9.06 8 mean 44.25 23.27 23.08 50 22.51 50 median 44.82 21.29 27.54 26.06 10 mean 50.78 29.38 25.36 65 23.91 65 median 49.06 32.97 27.19 27.34
 # Classes mean/median SSC LRR (3, 3)-SMF (4, 4)-SMF error s error s 2 mean 15.83 6.37 3.38 18 3.53 18 median 15.63 6.25 2.34 2.34 3 mean 28.13 9.57 6.19 25 6.06 25 median 28.65 8.85 5.73 5.73 5 mean 37.90 14.86 11.06 35 10.04 35 median 38.44 14.38 9.38 9.06 8 mean 44.25 23.27 23.08 50 22.51 50 median 44.82 21.29 27.54 26.06 10 mean 50.78 29.38 25.36 65 23.91 65 median 49.06 32.97 27.19 27.34
The error rate (mean %/median %) for motion segmentation on Hopkins155 dataset
 SSC LRR (3, 3)-SMF (4, 4)-SMF Mean 9.28 8.43 6.61 7.16 Median 0.24 1.54 1.20 1.32
 SSC LRR (3, 3)-SMF (4, 4)-SMF Mean 9.28 8.43 6.61 7.16 Median 0.24 1.54 1.20 1.32
 [1] Dan Zhu, Rosemary A. Renaut, Hongwei Li, Tianyou Liu. Fast non-convex low-rank matrix decomposition for separation of potential field data using minimal memory. Inverse Problems & Imaging, 2021, 15 (1) : 159-183. doi: 10.3934/ipi.2020076 [2] Manuel Friedrich, Martin Kružík, Jan Valdman. Numerical approximation of von Kármán viscoelastic plates. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 299-319. doi: 10.3934/dcdss.2020322 [3] Mingchao Zhao, You-Wei Wen, Michael Ng, Hongwei Li. A nonlocal low rank model for poisson noise removal. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021003 [4] Sihem Guerarra. Maximum and minimum ranks and inertias of the Hermitian parts of the least rank solution of the matrix equation AXB = C. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 75-86. doi: 10.3934/naco.2020016 [5] Zongyuan Li, Weinan Wang. Norm inflation for the Boussinesq system. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020353 [6] Xuemei Chen, Julia Dobrosotskaya. Inpainting via sparse recovery with directional constraints. Mathematical Foundations of Computing, 2020, 3 (4) : 229-247. doi: 10.3934/mfc.2020025 [7] Peizhao Yu, Guoshan Zhang, Yi Zhang. Decoupling of cubic polynomial matrix systems. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 13-26. doi: 10.3934/naco.2020012 [8] Shengxin Zhu, Tongxiang Gu, Xingping Liu. AIMS: Average information matrix splitting. Mathematical Foundations of Computing, 2020, 3 (4) : 301-308. doi: 10.3934/mfc.2020012 [9] Sujit Kumar Samanta, Rakesh Nandi. Analysis of $GI^{[X]}/D$-$MSP/1/\infty$ queue using $RG$-factorization. Journal of Industrial & Management Optimization, 2021, 17 (2) : 549-573. doi: 10.3934/jimo.2019123 [10] Huiying Fan, Tao Ma. Parabolic equations involving Laguerre operators and weighted mixed-norm estimates. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5487-5508. doi: 10.3934/cpaa.2020249 [11] Ferenc Weisz. Dual spaces of mixed-norm martingale hardy spaces. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020285 [12] Meng Ding, Ting-Zhu Huang, Xi-Le Zhao, Michael K. Ng, Tian-Hui Ma. Tensor train rank minimization with nonlocal self-similarity for tensor completion. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021001 [13] Ningyu Sha, Lei Shi, Ming Yan. Fast algorithms for robust principal component analysis with an upper bound on the rank. Inverse Problems & Imaging, 2021, 15 (1) : 109-128. doi: 10.3934/ipi.2020067 [14] Sugata Gangopadhyay, Constanza Riera, Pantelimon Stănică. Gowers $U_2$ norm as a measure of nonlinearity for Boolean functions and their generalizations. Advances in Mathematics of Communications, 2021, 15 (2) : 241-256. doi: 10.3934/amc.2020056 [15] Petr Čoupek, María J. Garrido-Atienza. Bilinear equations in Hilbert space driven by paths of low regularity. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 121-154. doi: 10.3934/dcdsb.2020230 [16] Parikshit Upadhyaya, Elias Jarlebring, Emanuel H. Rubensson. A density matrix approach to the convergence of the self-consistent field iteration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 99-115. doi: 10.3934/naco.2020018 [17] Russell Ricks. The unique measure of maximal entropy for a compact rank one locally CAT(0) space. Discrete & Continuous Dynamical Systems - A, 2021, 41 (2) : 507-523. doi: 10.3934/dcds.2020266 [18] Kihoon Seong. Low regularity a priori estimates for the fourth order cubic nonlinear Schrödinger equation. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5437-5473. doi: 10.3934/cpaa.2020247 [19] Maoding Zhen, Binlin Zhang, Vicenţiu D. Rădulescu. Normalized solutions for nonlinear coupled fractional systems: Low and high perturbations in the attractive case. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020379 [20] Youming Guo, Tingting Li. Optimal control strategies for an online game addiction model with low and high risk exposure. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020347

2019 Impact Factor: 1.373