The task of predicting missing entries of a matrix, from a subset of known entries, is known as matrix completion. In today's data-driven world, data completion is essential whether it is the main goal or a pre-processing step. Structured matrix completion includes any setting in which data is not missing uniformly at random. In recent work, a modification to the standard nuclear norm minimization (NNM) for matrix completion has been developed to take into account sparsity-based structure in the missing entries. This notion of structure is motivated in many settings including recommender systems, where the probability that an entry is observed depends on the value of the entry. We propose adjusting an Iteratively Reweighted Least Squares (IRLS) algorithm for low-rank matrix completion to take into account sparsity-based structure in the missing entries. We also present an iterative gradient-projection-based implementation of the algorithm that can handle large-scale matrices. Finally, we present a robust array of numerical experiments on matrices of varying sizes, ranks, and level of structure. We show that our proposed method is comparable with the adjusted NNM on small-sized matrices, and often outperforms the IRLS algorithm in structured settings on matrices up to size $ 1000 \times 1000 $.
Citation: |
Figure 1. We consider twenty $ 1000 \times 1000 $ sparse random matrices of rank 10 with average density equal to 0.66. The upper plots display (left) the average relative error for sIRLS $ \|M - \tilde X\|_F/\| M\|_F $, and (right) the average relative error for Structured sIRLS $ \|M - \hat X\|_F/\| M\|_F $. The lower plots display (left) the average ratio $ \|M - \hat X\|_F / \|M - \tilde X\|_F $, and (right) the binned average ratio where white means the average ratio is strictly less than 1, and black otherwise
Figure 2. We consider twenty $ 500 \times 500 $ sparse random matrices of rank 10 with average density equal to 0.66. The upper plots display (left) the average relative error for sIRLS $ \|M - \tilde X\|_F/\| M\|_F $, and (right) the average relative error for Structured sIRLS $ \|M - \hat X\|_F/\| M\|_F $. The lower plots display (left) the average ratio $ \|M - \hat X\|_F / \|M - \tilde X\|_F $, and (right) the binned average ratio where white means the average ratio is strictly less than 1, and black otherwise
Figure 3. We consider twenty $ 100 \times 100 $ sparse random matrices of rank 10 with average density equal to 0.66. The upper plots display (left) the average relative error for sIRLS $ \|M - \tilde X\|_F/\| M\|_F $, and (right) the average relative error for Structured sIRLS $ \|M - \hat X\|_F/\| M\|_F $. The lower plots display (left) the average ratio $ \|M - \hat X\|_F / \|M - \tilde X\|_F $, and (right) the binned average ratio where white means the average ratio is strictly less than 1, and black otherwise
Figure 4. We consider twenty $ 100 \times 100 $ sparse random matrices of rank 8 with density equal to 0.58, but we do not input any rank guess. The upper plots display (left) the average relative error for sIRLS $ \|M - \tilde X\|_F/\| M\|_F $, and (right) the average relative error for Structured sIRLS $ \|M - \hat X\|_F/\| M\|_F $. The lower plots display (left) the average ratio $ \|M - \hat X\|_F / \|M - \tilde X\|_F $, and (right) the average ratio when the sampling rate of non-zero entries is at least 0.70 (a zoomed in version of part of the lower left plot)
Figure 5. We consider twenty $ 100 \times 100 $ sparse random matrices of rank 20 with average density equal to 0.88. The upper plots display (left) the average relative error for sIRLS $ \|M - \tilde X\|_F/\| M\|_F $, and (right) the average relative error for Structured sIRLS $ \|M - \hat X\|_F/\| M\|_F $. The lower plots display (left) the average ratio $ \|M - \hat X\|_F / \|M - \tilde X\|_F $, and (right) the binned average ratio where white means the average ratio is strictly less than 1, and black otherwise. The red line separates the hard cases from those that are not: all the cases below the red line are hard
Figure 6. We consider twenty $ 30 \times 30 $ sparse random matrices of rank 7, with average density equal to 0.53. We provide Structured sIRLS with the rank of the matrices. We optimize for each matrix and combination of sampling rates the regularization parameter $ \alpha \in \{ 10^{-4}, 10^{-3}, 10^{-2}, 10^{-1}\} $ for Structured NNM and report the "optimal" results. The upper plots display (left) the average relative error for Structured NNM $ \|M - \bar X\|_F/\| M\|_F $, and (right) the average relative error for Structured sIRLS $ \|M - \hat X\|_F/\| M\|_F $. The lower plots display (left) the average ratio $ \|M - \hat X\|_F / \|M - \bar X\|_F $, and (right) the average ratio when the sampling rate of non-zero entries is at most 0.90 (a zoomed in version of part of the lower left plot)
Figure 7. We consider twenty $ 100 \times 100 $ random matrices of rank 3 with noise parameter $ \epsilon = 10^{-4} $. The upper plots display (left) the average relative error for sIRLS $ \|B - \tilde X\|_F/\| B\|_F $, and (right) the average relative error for Structured sIRLS $ \|B - \hat X\|_F/\| B\|_F $. The lower plots display (left) the average ratio $ \|B - \hat X\|_F / \|B - \tilde X\|_F $, and (right) the average ratio when the sampling rate of non-zero entries is at least 0.35 (a zoomed in version of part of the lower left plot)
[1] | H. Adams, L. Kassab and D. Needell, Structured IRLS code, https://github.com/lara-kassab/structured-matrix-completion-IRLS.git. |
[2] | R. M. Bell and Y. Koren, Lessons from the Netflix prize challenge, SiGKDD Explorations, 9 (2007), 75-79. doi: 10.1145/1345448.1345465. |
[3] | J. Bennett and S. Lanning, The Netflix prize, in Proceedings of KDD Cup and Workshop, vol. 2007, New York, NY, USA, (2007), 35. |
[4] | P. Biswas, T.-C. Lian, T.-C. Wang and Y. Ye, Semidefinite programming based algorithms for sensor network localization, ACM Transactions on Sensor Networks (TOSN), 2 (2006), 188-220. |
[5] | J.-F. Cai, E. J. Candès and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim., 20 (2010), 1956-1982. doi: 10.1137/080738970. |
[6] | E. J. Candès, X. Li, Y. Ma and J. Wright, Robust principal component analysis?, J. ACM, 58 (2011), Art. 11, 37 pp. doi: 10.1145/1970392.1970395. |
[7] | E. J. Candès and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98 (2010), 925-936. |
[8] | E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), 717-772. doi: 10.1007/s10208-009-9045-5. |
[9] | E. J. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory, 51 (2005), 4203-4215. doi: 10.1109/TIT.2005.858979. |
[10] | E. J. Candès and T. Tao, The power of convex relaxation: Near-optimal matrix completion, IEEE Trans. Inform. Theory, 56 (2010), 2053-2080. doi: 10.1109/TIT.2010.2044061. |
[11] | V. Chandrasekaran, S. Sanghavi, P. A. Parrilo and A. S. Willsky, Rank-sparsity incoherence for matrix decomposition, SIAM J. Optim., 21 (2011), 572-596. doi: 10.1137/090761793. |
[12] | Y. Chen, S. Bhojanapalli, S. Sanghavi and R. Ward, Coherent matrix completion, in International Conference on Machine Learning, PMLR, (2014), 674–682. |
[13] | Y. Chen, S. Bhojanapalli, S. Sanghavi and R. Ward, Completing any low-rank matrix, provably, J. Mach. Learn. Res., 16 (2015), 2999-3034. |
[14] | Y. Chen, A. Jalali, S. Sanghavi and C. Caramanis, Low-rank matrix recovery from errors and erasures, IEEE Transactions on Information Theory, 59 (2013), 4324-4337. |
[15] | I. Daubechies, R. DeVore, M. Fornasier and C. S. Güntürk, Iteratively reweighted least squares minimization for sparse recovery, Comm. Pure Appl. Math., 63 (2010), 1-38. doi: 10.1002/cpa.20303. |
[16] | D. L. Donoho and P. B. Stark, Uncertainty principles and signal recovery, SIAM J. Appl. Math., 49 (1989), 906-931. doi: 10.1137/0149053. |
[17] | M. Fazel, Matrix Rank Minimization with Applications, PhD thesis, Stanford University, 2002. |
[18] | M. Fazel, H. Hindi and S. P. Boyd, Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices, in Proceedings of the 2003 American Control Conference, 2003, vol. 3, IEEE, (2003), 2156–2162. doi: 10.1109/ACC.2003.1243393. |
[19] | M. Fornasier, H. Rauhut and R. Ward, Low-rank matrix recovery via iteratively reweighted least squares minimization, SIAM J. Optim., 21 (2011), 1614-1640. doi: 10.1137/100811404. |
[20] | D. Goldfarb and S. Ma, Convergence of fixed-point continuation algorithms for matrix rank minimization, Found. Comput. Math., 11 (2011), 183-210. doi: 10.1007/s10208-011-9084-6. |
[21] | M. C. Grant and S. P. Boyd, Graph implementations for nonsmooth convex programs, in Recent Advances in Learning and Control (eds. V. Blondel, S. Boyd and H. Kimura), Lecture Notes in Control and Information Sciences, Springer-Verlag Limited, (2008), 95–110. doi: 10.1007/978-1-84800-155-8_7. |
[22] | M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Programming, version 2.1, http://cvxr.com/cvx, 2014. |
[23] | D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Trans. Inform. Theory, 57 (2011), 1548-1566. doi: 10.1109/TIT.2011.2104999. |
[24] | D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker and J. Eisert, Quantum state tomography via compressed sensing, Phys. Rev. Lett., 105 (2010), 150401. doi: 10.1103/PhysRevLett.105.150401. |
[25] | N. Halko, P.-G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), 217-288. doi: 10.1137/090771806. |
[26] | P. Jain, R. Meka and I. S. Dhillon, Guaranteed rank minimization via singular value projection, in Advances in Neural Information Processing Systems, (2010), 937–945. |
[27] | P. Jain, P. Netrapalli and S. Sanghavi, Low-rank matrix completion using alternating minimization, in Proceedings of the 2013 ACM Symposium on Theory of Computing, (2013), 665–674. doi: 10.1145/2488608.2488693. |
[28] | R. H. Keshavan and S. Oh, A gradient descent algorithm on the Grassman manifold for matrix completion, arXiv preprint. arXiv: 0910.5260. |
[29] | Y. Koren, R. Bell and C. Volinsky, Matrix factorization techniques for recommender systems, Computer, 30–37. |
[30] | C. Kümmerle and J. Sigl, Harmonic mean iteratively reweighted least squares for low-rank matrix recovery, J. Mach. Learn. Res., 19 (2018), 1815-1863. |
[31] | K. Lee and Y. Bresler, ADMiRA: Atomic decomposition for minimum rank approximation, IEEE Trans. Inform. Theory, 56 (2010), 4402-4416. doi: 10.1109/TIT.2010.2054251. |
[32] | N. Linial, E. London and Y. Rabinovich, The geometry of graphs and some of its algorithmic applications, Combinatorica, 15 (1995), 215-245. doi: 10.1007/BF01200757. |
[33] | Z. Liu and L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl., 31 (2009), 1235-1256. doi: 10.1137/090755436. |
[34] | S. Ma, D. Goldfarb and L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program., 128 (2011), 321-353. doi: 10.1007/s10107-009-0306-5. |
[35] | K. Mohan and M. Fazel, IRLS code, https://faculty.washington.edu/mfazel/, [Online; accessed 01-Aug-2019]. |
[36] | K. Mohan and M. Fazel, Reweighted nuclear norm minimization with application to system identification, in Proceedings of the 2010 American Control Conference, IEEE, (2010), 2953–2959. doi: 10.1109/ACC.2010.5531594. |
[37] | K. Mohan and M. Fazel, Iterative reweighted algorithms for matrix rank minimization, J. Mach. Learn. Res., 13 (2012), 3441-3473. |
[38] | D. Molitor and D. Needell, Matrix completion for structured observations, in 2018 Information Theory and Applications Workshop (ITA), IEEE, (2018), 1–5. doi: 10.1109/ITA.2018.8503240. |
[39] | I. Razenshteyn, Z. Song and D. P. Woodruff, Weighted low rank approximations with provable guarantees, in Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, (2016), 250–263. |
[40] | B. Recht, A simpler approach to matrix completion, J. Mach. Learn. Res., 12 (2011), 3413-3430. |
[41] | B. Recht, M. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501. doi: 10.1137/070697835. |
[42] | G. Robin, O. Klopp, J. Josse, É. Moulines and R. Tibshirani, Main effects and interactions in mixed and incomplete data frames, J. Amer. Statist. Assoc., 15 (2020), 1292-1303. doi: 10.1080/01621459.2019.1623041. |
[43] | A. A. Rontogiannis, P. V. Giampouras and K. D. Koutroumbas, Online reweighted least squares robust PCA, IEEE Signal Processing Letters, 27 (2020), 1340-1344. |
[44] | R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Transactions on Antennas and Propagation, 34 (1986), 276-280. doi: 10.1109/TAP.1986.1143830. |
[45] | T. Schnabel, A. Swaminathan, A. Singh, N. Chandak and T. Joachims, Recommendations as treatments: Debiasing learning and evaluation, arXiv preprint. arXiv: 1602.05352. |
[46] | A. Singer, A remark on global positioning from local distances, Proc. Natl. Acad. Sci. USA, 105 (2008), 9507-9511. doi: 10.1073/pnas.0709842104. |
[47] | A. M.-C. So and Y. Ye, Theory of semidefinite programming for sensor network localization, Math. Program., 109 (2007), 367-384. doi: 10.1007/s10107-006-0040-1. |
[48] | A. Sportisse, C. Boyer and J. Josse, Imputation and low-rank estimation with missing not at random data, Stat. Comput., 30 (2020), 1629-1643. doi: 10.1007/s11222-020-09963-5. |
[49] | K.-C. Toh and S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems, Pac. J. Optim., 6 (2010), 615-640. |
We consider twenty
We consider twenty
We consider twenty
We consider twenty
We consider twenty
We consider twenty
We consider twenty