\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Nonconvex regularizer and homotopy-based sparse optimization: Convergent algorithms and applications

  • *Corresponding author: Wenxing Zhu

    *Corresponding author: Wenxing Zhu
Abstract / Introduction Full Text(HTML) Figure(5) / Table(5) Related Papers Cited by
  • Non-convex optimization regularized with a sparsity function has many applications in machine learning and other fields. Non-convex relaxation of the sparsity function often leads to more effective sparse optimization algorithms. In this paper, we propose a new weighted regularizer to approximate the sparsity function, and derive a weighted thresholding operator for the sparse optimization problem with the regularizer. Then, iterative weighted thresholding algorithms are designed, followed with an acceleration by using Nesterov's acceleration method and non-monotone line search. Under the Kurdyka-Łojasiewicz (KL) property, the smoothness and the appropriate convexity assumptions, we prove that the two algorithms are convergent and the convergence rates are $ O({1}/{k}) $ and $ O({1}/{k^2}) $ respectively, where $ k $ is the iteration counter. Moreover, we develop convergent practical homotopy algorithms by invoking the two iterative weighted thresholding algorithms as subroutines respectively. A series of numerical experiments demonstrate that our algorithms are superior in both average recovery rate and average running time for the sparse recovery problem, and are competitive in solution quality and average running time for the logistic regression problem, compared to state-of-the-art algorithms.

    Mathematics Subject Classification: Primary: 65K05; Secondary: 90C06, 90C25, 90C26, 90C59.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Comparisons of algorithms on compressed sensing problem with Gaussian matrix and Uniform signal

    Figure 2.  Performance metrics w.r.t. each sparsity $ K $

    Figure 3.  Computational comparisons of algorithms in the noiseless case with Gaussian matrix and Gaussian signal, $ m = 512, n = 2048, K\in\{100,110, ..., 240\} $

    Figure 4.  Objective function values of FIWTH_BB, IWTH and IWTH_BB in a single test, where $ (m, n, K, seed) = (512, 2048,235, 2010) $

    Figure 5.  Objective function values of FIWTH_BB, IWTH and IWTH_BB in a single test, where $ (m, n, K, seed) = (512, 2048,130, 2010) $

    Table 1.  Default parameters settings for our algorithm

    Algorithm parameter value
    Homotopy (Algorithm 5) $ x^0 $, $ \lambda_0 $, $ s_0 $, $ L_0 $ $ {\bf 0} $, $ \|\nabla{f(x^0)}\|_{\infty}*\rho $, 0, 1
    $ \varepsilon_0 $ $ \frac{f(x^0)}{\lambda_0}*10^{-3} $
    $ N $ 14
    $ \rho $ (for $ \{\lambda_j\} $) 0.5
    $ \gamma $ (for $ \{\varepsilon_j\} $) 0.7
    $ \epsilon $ $ 10^{-6} $
    FIWT (Algorithm 4) $ \zeta $ 0.9
    IWT, FIWT (Algorithms 3, 4) $ \text{mxitr} $ 1000
    $ \epsilon $ in (19) 0.01
    Linesearch (Algorithm 2) $ L_{\text{min}} $, $ L_{max} $, $ \gamma_{\text{inc}} $, $ \gamma_{dec} $ $ 10^{-8} $, $ 10^{8} $, 2, 2
    $ \eta $ 1
    $ Update_{\varepsilon,s} $ (Algorithm 6) $ \alpha_1 $, $ \alpha_2 $, $ \alpha_3 $ 0.2, 0.7, 0.1
    $ \gamma_s $ 2
    $ s_{max} $ $ \max({\lceil m/2\rceil,\lceil n*0.12 \rceil}) $
     | Show Table
    DownLoad: CSV

    Table 2.  Dimensions of real data sets

    Data sets $ m $ (number of samples) $ n $ (number of features)
    Ionosphere 351 34
    Advertisements 2359 1558
    ARCENE 300 20000
     | Show Table
    DownLoad: CSV

    Table 3.  Test results on Advertisements data

    Sparsity Algorithm nnz Lavg Error Time (s)
    $ \leq5 $ GraSP 5 0.32188 9.7075 0.48
    GraSPd 5 0.22509 5.9347 0.61
    $ \text{FIWTH}^0 $ 5 0.19911 5.5956 4.20
    $ \text{FIWTH}^1 $ 5 0.27392 8.5206 0.34
    $ \leq20 $ GraSP 20 0.50514 6.5706 9.64
    GraSPd 20 0.49604 5.426 24.61
    $ \text{FIWTH}^0 $ 6 0.19708 5.4684 4.51
    $ \text{FIWTH}^1 $ 13 0.19062 5.426 0.85
    $ \leq35 $ GraSP 35 0.54891 7.7151 46.29
    GraSPd 35 0.41596 7.6727 30.18
    $ \text{FIWTH}^0 $ 6 0.19708 5.4684 4.50
    $ \text{FIWTH}^1 $ 22 0.15184 3.9 0.27
    $ \leq50 $ GraSP 50 0.44485 3.9423 120.57
    GraSPd 50 0.36953 3.8576 42.15
    $ \text{FIWTH}^0 $ 6 0.19708 5.4684 4.82
    $ \text{FIWTH}^1 $ 21 0.15186 3.9 0.41
     | Show Table
    DownLoad: CSV

    Table 4.  Test results on the Ionosphere data

    Sparsity Algorithm nnz Lavg Error Time (s)
    $ \leq5 $ GraSP 5 0.35677 11.9658 0.15
    GraSPd 5 0.34405 12.8205 0.03
    $ \text{FIWTH}^0 $ 5 0.30834 11.9658 0.29
    $ \text{FIWTH}^1 $ 5 0.32306 11.1111 0.16
    $ \leq10 $ GraSP 10 0.29281 12.2507 0.02
    GraSPd 10 0.28697 11.9658 0.02
    $ \text{FIWTH}^0 $ 10 0.25 10.5413 0.34
    $ \text{FIWTH}^1 $ 10 0.26373 9.9715 0.07
    $ \leq15 $ GraSP 15 0.32883 15.0997 0.03
    GraSPd 15 0.2291 9.4017 0.03
    $ \text{FIWTH}^0 $ 15 0.19742 8.547 0.25
    $ \text{FIWTH}^1 $ 15 0.19519 7.9772 0.08
    $ \leq20 $ GraSP 20 0.26341 11.6809 0.03
    GraSPd 20 0.17326 6.2678 0.04
    $ \text{FIWTH}^0 $ 20 0.1701 5.9829 0.22
    $ \text{FIWTH}^1 $ 20 0.21202 7.6923 0.08
     | Show Table
    DownLoad: CSV

    Table 5.  Test results on the ARCENE data

    Sparsity Algorithm nnz Lavg Error Time (s)
    $ \leq5 $ GraSP 5 27.6539 44 0.37
    GraSPd 5 0.59284 27 2.19
    $ \text{FIWTH}^0 $ 5 0.51598 22 4.81
    $ \text{FIWTH}^1 $ 3 0.58747 25 1.28
    $ \leq10 $ GraSP 10 56.9329 56 1.19
    GraSPd 10 0.61291 30 7.38
    $ \text{FIWTH}^0 $ 10 0.53049 26 5.01
    $ \text{FIWTH}^1 $ 10 0.52685 26 0.79
    $ \leq15 $ GraSP 15 738.3838 44 0.79
    GraSPd 15 0.34262 16 9.13
    $ \text{FIWTH}^0 $ 15 0.12188 4 13.57
    $ \text{FIWTH}^1 $ 15 0.37713 15 0.99
    $ \leq20 $ GraSP 20 586.0595 44 2.99
    GraSPd 20 1.263e-07 0 0.64
    $ \text{FIWTH}^0 $ 20 4.162e-08 0 16.12
    $ \text{FIWTH}^1 $ 19 0.29872 16 1.31
     | Show Table
    DownLoad: CSV
  • [1] S. BahmaniB. Raj and P. T. Boufounos, Greedy sparsity-constrained optimization, Journal of Machine Learning Research, 14 (2013), 807-841. 
    [2] J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA Journal of Numerical Analysis, 8 (1988), 141-148.  doi: 10.1093/imanum/8.1.141.
    [3] A. Beck and N. Hallak, Optimization problems involving group sparsity terms, Mathematical Programming, 178 (2019), 39-67.  doi: 10.1007/s10107-018-1277-1.
    [4] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), 183-202.  doi: 10.1137/080716542.
    [5] D. Bertsekas, Convex Optimization Algorithms, Athena Scientific, 2015.
    [6] S. BiX. Liu and S. Pan, Exact penalty decomposition method for zero-norm minimization based on mpec formulation, SIAM Journal on Scientific Computing, 36 (2014), A1451-A1477.  doi: 10.1137/110855867.
    [7] T. Blumensath and M. E. Davies, Iterative thresholding for sparse approximations, Journal of Fourier Analysis and Applications, 14 (2008), 629-654.  doi: 10.1007/s00041-008-9035-z.
    [8] J. BolteS. Sabach and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming, 146 (2014), 459-494.  doi: 10.1007/s10107-013-0701-9.
    [9] X. ChenD. GeZ. Wang and Y. Ye, Complexity of unconstrained $l_2$-$l_p$ minimization, Mathematical Programming, 143 (2014), 371-383.  doi: 10.1007/s10107-012-0613-0.
    [10] Z. Dong and W. Zhu, Homotopy methods based on $l_0$-norm for compressed sensing, IEEE Transactions on Neural Networks and Learning Systems, 29 (2017), 1132-1146.  doi: 10.1109/TNNLS.2017.2658953.
    [11] D. Dua and C. Graff, UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences, (2019). 
    [12] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (2001), 1348-1360.  doi: 10.1198/016214501753382273.
    [13] J. GotohA. Takeda and K. Tono, Dc formulations and algorithms for sparse optimization problems, Mathematical Programming, 169 (2018), 141-176.  doi: 10.1007/s10107-017-1181-0.
    [14] I. GuyonS. GunnA. Ben-Hur and G. Dror, Result analysis of the nips 2003 feature selection challenge, Advances in Neural Information Processing Systems, 17 (2005), 545-552. 
    [15] E. T. HaleW. Yin and Y. Zhang, Fixed-point continuation for $l_1$-minimization: Methodology and convergence, SIAM Journal on Optimization, 19 (2008), 1107-1130.  doi: 10.1137/070698920.
    [16] Y. JiaoB. Jin and X. Lu, A primal dual active set with continuation algorithm for the $l_0$-regularized optimization problem, Applied and Computational Harmonic Analysis, 39 (2015), 400-426.  doi: 10.1016/j.acha.2014.10.001.
    [17] Y. JiaoB. Jin and X. Lu, Group sparse recovery via the $l^0(l^2)$ penalty: Theory and algorithm, IEEE Transactions on Signal Processing, 65 (2016), 998-1012.  doi: 10.1109/TSP.2016.2630028.
    [18] H. Li and Z. Lin, Accelerated proximal gradient methods for nonconvex programming, Advances in Neural Information Processing Systems, 28 (2015), 379-387. 
    [19] X. LiD. Sun and K.-C. Toh, A highly efficient semismooth newton augmented lagrangian method for solving lasso problems, SIAM Journal on Optimization, 28 (2018), 433-458.  doi: 10.1137/16M1097572.
    [20] Z. Lu and X. Li, Sparse recovery via partial regularization: Models, theory, and algorithms, Mathematics of Operations Research, 43 (2018), 1290-1316.  doi: 10.1287/moor.2017.0905.
    [21] Z. Lu and Y. Zhang, Sparse approximation via penalty decomposition methods, SIAM Journal on Optimization, 23 (2013), 2448-2478.  doi: 10.1137/100808071.
    [22] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM Journal on Computing, 24 (1995), 227-234.  doi: 10.1137/S0097539792240406.
    [23] Y. E. Nesterov, A method for solving the convex programming problem with convergence rate $\text{O} (1/k^{2})$, Dokl. Akad. Nauk SSSR, 269 (1983), 543-547. 
    [24] Y. E. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, 140 (2013), 125-161.  doi: 10.1007/s10107-012-0629-5.
    [25] L. Pan and X. Chen, Group sparse optimization for images recovery using capped folded concave functions, SIAM Journal on Imaging Sciences, 14 (2021), 1-25.  doi: 10.1137/19M1304799.
    [26] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss., 317. Springer-Verlag, Berlin, 1998.
    [27] E. SoubiesL. Blanc-Féraud and G. Aubert, A continuous exact $l_0$ penalty $(\text{CEL0})$ for least squares regularized problem, SIAM Journal on Imaging Sciences, 8 (2015), 1607-1639.  doi: 10.1137/151003714.
    [28] P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Mathematical Programming, 117 (2009), 387-423.  doi: 10.1007/s10107-007-0170-0.
    [29] Z. WenW. YinD. Goldfarb and Y. Zhang, A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation, SIAM Journal on Scientific Computing, 32 (2010), 1832-1857.  doi: 10.1137/090747695.
    [30] L. Xiao and T. Zhang, A proximal-gradient homotopy method for the sparse least-squares problem, SIAM Journal on Optimization, 23 (2013), 1062-1091.  doi: 10.1137/120869997.
    [31] Z. XuX. ChangF. Xu and H. Zhang, $l_ {1/2}$ regularization: A thresholding representation theory and a fast solver, IEEE Transactions on Neural Networks and Learning Systems, 23 (2012), 1013-1027.  doi: 10.1109/TNNLS.2012.2197412.
    [32] J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problems in compressive sensing, SIAM Journal on Scientific Computing, 33 (2011), 250-278.  doi: 10.1137/090777761.
    [33] C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38 (2010), 894-942.  doi: 10.1214/09-AOS729.
    [34] T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Research, 11 (2010), 1081-1107. 
    [35] S. ZhouN. Xiu and H.-D. Qi, Global and quadratic convergence of newton hard-thresholding pursuit, Journal of Machine Learning Research, 22 (2021), 1-45. 
    [36] W. ZhuH. HuangL. Jiang and J. Chen, Weighted thresholding homotopy method for sparsity constrained optimization, Journal of Combinatorial Optimization, 44 (2022), 1924-1952.  doi: 10.1007/s10878-020-00563-7.
    [37] W. ZhuZ. HuangJ. Chen and Z. Peng, Iteratively weighted thresholding homotopy method for the sparse solution of underdetermined linear equations, Science China Mathematics, 64 (2021), 639-664.  doi: 10.1007/s11425-018-9467-7.
  • 加载中

Figures(5)

Tables(5)

SHARE

Article Metrics

HTML views(464) PDF downloads(190) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return