The soft SVD is a robust matrix decomposition algorithm and a key component of matrix completion methods. However, computing the soft SVD for large sparse matrices is often impractical using conventional numerical methods for the SVD due to large memory requirements. The Rank-Restricted Soft SVD (RRSS) algorithm introduced by Hastie et al. addressed this issue by sequentially computing low-rank SVDs that easily fit in memory. We analyze the convergence of the standard RRSS algorithm and we give examples where the standard algorithm does not converge. We show that convergence requires a modification of the standard algorithm, and is related to non-uniqueness of the SVD. Our modification specifies a consistent choice of sign for the left singular vectors of the low-rank SVDs in the iteration. Under these conditions, we prove linear convergence of the singular vectors using a technique motivated by alternating subspace iteration. We then derive a fixed point iteration for the evolution of the singular values and show linear convergence to the soft thresholded singular values of the original matrix. This last step requires a perturbation result for fixed point iterations which may be of independent interest.
| Citation: |
Figure 1. For a full-rank $ X $ matrix (left) Algorithms 1 and 3 have similar performance, however when $ X $ is approximately low-rank (right) Algorithm 3 is significantly faster. In both examples $ X $ is $ 500\times 500 $ and we set $ r = 10 $ and $ \lambda = 0.5 $ and the minimum cost is computed using $ A_{\rm opt}, B_{\rm opt} $. In the left panel the entries of $ X $ are independent standard Gaussian random variables. In the right panel $ \tilde A,\tilde B $ are $ 500\times 10 $ matrices with independent standard Gaussian entries and $ X = \tilde A\tilde B^\top + 10\tilde X $ where $ \tilde X $ is $ 500\times 500 $ with standard Gaussian entries
Figure 3. Comparison of the convergence of the singular vectors on the same full-rank (left) and approximately low-rank (right) examples from Figure 1. Error is measured by $ ||U_k^\top U_{(r+1:n)}||_{\max} $, where $ U_{(r+1:n)} $ is the matrix containing the $ (r+1) $-st through $ n $-th columns of $ U $. The theoretical convergence rate $ \left(\frac{s_{r+1}}{s_r}\right)^2 $ shown is proven in Theorem 2.3. Notice that the singular vectors converge for both Algorithm 2 from [9] and our new Algorithm 3
| [1] |
H. Antil, D. Chen and S. E. Field, A note on qr-based model reduction: Algorithm, software, and gravitational wave application, Computing in Science & Engineering, 20 (2018).
doi: 10.1109/MCSE.2018.042781323.
|
| [2] |
S. Boyd, N. Parikh and E. Chu, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Now Publishers Inc, (2011).
doi: 10.1561/9781601984616.
|
| [3] |
R. Bro, E. Acar and T. G. Kolda, Resolving the sign ambiguity in the singular value decomposition, Journal of Chemometrics: A Journal of the Chemometrics Society, 22 (2008), 135-140.
doi: 10.1002/cem.1122.
|
| [4] |
Jian-Feng Cai, E. J. Candès and Zuowei Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20 (2010), 1956-1982.
doi: 10.1137/080738970.
|
| [5] |
E. J. Candes and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98 (2010), 925-936.
|
| [6] |
Z. Drmač and K. Veselić, New fast and accurate jacobi svd algorithm. i, SIAM Journal on Matrix Analysis and Applications, 29 (2008), 1322-1342.
doi: 10.1137/050639193.
|
| [7] |
Z. Drmač and K. Veselić, New fast and accurate jacobi svd algorithm. ii, SIAM Journal on Matrix Analysis and Applications, 29 (2008), 1343-1362.
doi: 10.1137/05063920X.
|
| [8] |
G. H. Golub and C. F. Van Loan, Matrix Computations, JHU press, 3 (2013).
doi: 10.56021/9781421407944.
|
| [9] |
T. Hastie, R. Mazumder, J. D. Lee and R. Zadeh, Matrix completion and low-rank svd via fast alternating least squares, The Journal of Machine Learning Research, 16 (2015), 3367-3402.
|
| [10] |
M. Kurucz, A. A. Benczúr and K. Csalogány, Methods for large scale svd with missing values, Proceedings of KDD Cup and Workshop, 12 (2007), 31-38.
|
| [11] |
R. Mazumder, T. Hastie and R. Tibshirani, Spectral regularization algorithms for learning large incomplete matrices, The Journal of Machine Learning Research, 11 (2010) 2287-2322.
|
| [12] |
N. Parikh and S. Boyd, Proximal algorithms, Foundations and Trends in Optimization, 1 (2014), 127-239.
|
| [13] |
G. W. Stewart, Perturbation theory for the singular value decomposition, Technical Report, (1998).
|
For a full-rank
Comparison of Algorithm 2 from [9] with our new Algorithm 3 on the same full-rank (left) and approximately low-rank (right) examples from Figure 1
Comparison of the convergence of the singular vectors on the same full-rank (left) and approximately low-rank (right) examples from Figure 1. Error is measured by