\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

A variable metric and nesterov extrapolated proximal DCA with backtracking for a composite DC program

  • *Corresponding author: Yu You

    *Corresponding author: Yu You 

This work was supported by the National Natural Science Foundation of China (Grant 11601327).

Abstract / Introduction Full Text(HTML) Figure(2) / Table(2) Related Papers Cited by
  • In this paper, we consider a composite difference-of-convex (DC) program, whose objective function is the sum of a smooth convex function with Lipschitz continuous gradient, a proper closed and convex function, and a continuous concave function. This problem has many applications in machine learning and data science. The proximal DCA (pDCA), a special case of the classical difference-of-convex algorithm (DCA), as well as two Nesterov-type extrapolated DCA – ADCA (Phan et al. IJCAI:1369–1375, 2018) and pDCAe (Wen et al. Comput. Optim. Appl. 69:297–324, 2018) – can solve this problem. The algorithmic stepsizes of pDCA, pDCAe, and ADCA are fixed and determined by estimating a prior the smoothness parameter of the loss function. However, such an estimate may be hard to obtain or poor in some real-world applications. Motivated by this difficulty, we propose a variable metric and Nesterov extrapolated proximal DCA with backtracking (SPDCAe), which combines the backtracking line search procedure (not necessarily monotone) and the Nesterov's extrapolation for potential acceleration; moreover, the variable metric method is incorporated for better local approximation. Numerical simulations on sparse binary logistic regression and compressed sensing with Poisson noise demonstrate the effectiveness of our proposed method.

    Mathematics Subject Classification: Primary: 65K05, 90C26, 90C30.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Trend of the relative error for $\texttt{w8a}$ and $\texttt{CINA}$ datasets

    Figure 2.  Performance for the Poisson denoising problem

    Table 1.  Total number of iterations and CPU time (in seconds) averaged on 10 runs for $\texttt{w8a}$ and $\texttt{CINA}$ datasets. Bold values correspond to the best results for each dataset

    Dataset Algorithm $ tol: 10^{-2} $ $ tol: 10^{-4} $ $ tol: 10^{-6} $ $ tol: 10^{-8} $
    Time Iter. Time Iter. Time Iter. Time Iter.
    $\texttt{w8a}$ SPDCAe1 0.11 18 0.18 32 0.23 41 0.29 50
    PDCAe1 0.20 36 0.32 57 0.38 70 0.48 89
    pDCAe 0.84 148 2.93 587 5.84 1113 7.75 1571
    ADCA 1.03 153 2.12 356 3.08 486 4.24 739
    $\texttt{CINA}$ SPDCAe1 0.16 186 0.49 684 0.72 1059 1.01 1524
    PDCAe1 0.47 743 1.42 1992 3.56 3799 3.89 5964
    pDCAe 3.43 5781 Max Max Max Max Max Max
    ADCA 0.91 1082 1.29 1656 1.76 2467 2.28 3152
     | Show Table
    DownLoad: CSV

    Table 2.  Total number of iterations, CPU time (in seconds) averaged on 10 runs, and the relative reconstruction error. Bold values correspond to the best results

    Algorithm $ tol $ Time Iter. RRE
    SPDCAe1 0.22 28 0.810
    PDCAe1 $ 10^{-1} $ 0.57 78 0.937
    SPDCAe0 0.56 79 0.853
    PDCAe0 19.66 2910 0.937
    SPDCAe1 0.29 38 0.200
    PDCAe1 $ 10^{-2} $ 0.80 101 0.221
    SPDCAe0 1.86 298 0.252
    PDCAe0 Max Max Max
    SPDCAe1 0.31 42 0.035
    PDCAe1 $ 10^{-3} $ 0.81 104 0.044
    SPDCAe0 3.25 456 0.055
    PDCAe0 Max Max Max
    SPDCAe1 0.39 54 0.019
    PDCAe1 $ 10^{-4} $ 0.96 110 0.023
    SPDCAe0 6.91 953 0.023
    PDCAe0 Max Max Max
    SPDCAe1 0.40 55 0.018
    PDCAe1 $ 10^{-5} $ 0.97 112 0.018
    SPDCAe0 18.62 2508 0.018
    PDCAe0 Max Max Max
     | Show Table
    DownLoad: CSV
  • [1] A. D. Aleksandrov, On the surfaces representable as difference of convex functions, Sib. Èlektron. Mat. Izv., 9 (2012), 360-376. 
    [2] F. J. A. Artacho and P. T. Vuong, The boosted dc algorithm for nonsmooth functions, SIAM J. Optim., 30 (2020), 980-1006.  doi: 10.1137/18M123339X.
    [3] H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), 5-16.  doi: 10.1007/s10107-007-0133-5.
    [4] H. AttouchJ. Bolte and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods, Math. Program., 137 (2013), 91-129.  doi: 10.1007/s10107-011-0484-9.
    [5] A. Beck, First-Order Methods in Optimization, SIAM, Philadelphia, 2017. doi: 10.1137/1.9781611974997.ch1.
    [6] A. Beck and T. Marc, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), 183-202.  doi: 10.1137/080716542.
    [7] A. Beck and Y. Vaisbourd, Globally solving the trust region subproblem using simple first-order methods, SIAM J. Optim., 28 (2018), 1951-1967.  doi: 10.1137/16M1150281.
    [8] D. Bertsekas, Convex Optimization Algorithms, Athena Scientific, 2015.
    [9] J. BolteA. Daniilidis and A. Lewis, The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim., 17 (2006), 1205-1223.  doi: 10.1137/050644641.
    [10] S. BonettiniF. Porta and V. Ruggiero, A variable metric forward-backward method with extrapolation, SIAM J. Optim., 38 (2016), 2588-2584.  doi: 10.1137/15M1025098.
    [11] S. Bonettini, F. Porta, V. Ruggiero and L. Zanni, Variable metric techniques for forward-backward methods in imaging, J. Comput. Appl. Math., 385 (2021), Paper No. 113192, 30 pp. doi: 10.1016/j.cam.2020.113192.
    [12] E. J. CandèsM. B. Wakin and S. P. Boyd, Enhancing sparsity by reweighted $\ell_1$ minimization, J. Fourier Anal. Appl., 14 (2008), 877-905.  doi: 10.1007/s00041-008-9045-x.
    [13] C. C. Chang and C. J. Lin, Libsvm: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2 (2011), 1-27.  doi: 10.1145/1961189.1961199.
    [14] J. DuchiE. Hazan and Y. Singer, Adaptive subgradient methods for online learing and stochastic optimization, J. Mach. Learn. Res., 12 (2011), 2121-2159. 
    [15] G. França, D. P. Robinson and R. Vidal, Gradient flows and proximal splitting methods: A unified view on accelerated and stochastic optimization, Phys. Rev. E, 103 (2021), Paper No. 053304, 19 pp. doi: 10.1103/physreve.103.053304.
    [16] P. GongC. ZhangZ. LuJ. Huang and J. Ye, A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, PMLR, 28 (2013), 37-45. 
    [17] J. Y. GotohA. Takeda and K. Tono, DC formulations and algorithms for sparse optimization problems, Math. Program., 169 (2018), 141-176.  doi: 10.1007/s10107-017-1181-0.
    [18] Z. T. HarmanyR. F. Marcia and R. M. Willett, This is SPIRAL-TAP: Sparse Poisson intensity reconstruction algorithms–theory and practice, IEEE Trans. Image Process., 21 (2012), 1084-1096.  doi: 10.1109/TIP.2011.2168410.
    [19] J. B. Hiriart-Urruty, Generalized differentiability/duality and optimization for problems dealing with differences of convex functions, Convexity and Duality in Optimization, 256 (1985), 37-70.  doi: 10.1007/978-3-642-45610-7_3.
    [20] R. Horst and N. V. Thoai, DC programming: Overview, J. Optim. Theory Appl., 103 (1999), 1-43.  doi: 10.1023/A:1021765131316.
    [21] H. LantériM. RocheO. Cuevas and C. Aime, A general method to devise maximum-likelihood signal restoration multiplicative algorithms with non-negativity constraints, Signal Process., 81 (2001), 945-974. 
    [22] H. A. Le ThiM. T. Belghiti and D. T. Pham, A new efficient algorithm based on DC programming and DCA for clustering, J. Global Optim., 37 (2007), 593-608.  doi: 10.1007/s10898-006-9066-4.
    [23] H. A. Le ThiM. MoeiniD. T. Pham and J. Júdice, A DC programming approach for solving the symmetric eigenvalue complementarity problem, Comput. Optim. Appl., 51 (2012), 1097-1117.  doi: 10.1007/s10589-010-9388-5.
    [24] H. A. Le Thi and D. T. Pham, A continuous approach for large-scale constrained quadratic zero-one programming, Optimization, 45 (2001), 1-28. 
    [25] H. A. Le Thi and D. T. Pham, Large-scale molecular optimization from distance matrices by a dc optimization approach, SIAM J. Optim., 14 (2003), 77-114.  doi: 10.1137/S1052623498342794.
    [26] H. A. Le Thi and D. T. Pham, The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, Ann. Oper. Res., 133 (2005), 23-46.  doi: 10.1007/s10479-004-5022-1.
    [27] H. A. Le Thi and D. T. Pham, DC programming and DCA: Thirty years of developments, Math. Program., Special Issue Dedicated to: DC Programming - Theory, Algorithm and Applications, 169 (2018), 5-68.  doi: 10.1007/s10107-018-1235-y.
    [28] H. A. Le ThiD. T. PhamH. M. Le and X. T. Vo, DC approximation approaches for sparse optimization, Eur. J. Oper. Res., 244 (2015), 26-46.  doi: 10.1016/j.ejor.2014.11.031.
    [29] Y. E. Nesterov, A method for solving the convex programming problem with convergence rate $\mathcal{O} (1/{k^2})$, Dokl. Akad. Nauk SSSR, 269 (1983), 543-547. 
    [30] Y. E. Nesterov, Gradient methods for minimizing composite functions, Math. Program., 140 (2013), 125-161.  doi: 10.1007/s10107-012-0629-5.
    [31] Y. S. Niu, Programmation DC et DCA en Optimisation Combinatoire et Optimisation Polynomiale via les Techniques de SDP, Ph.D thesis, INSA de Rouen, France, 2010.
    [32] Y. S. Niu and R. Glowinski, Discrete dynamical system approaches for boolean polynomial optimization, J. Sci. Comput., 92 (2022), Paper No. 46, 39 pp. doi: 10.1007/s10915-022-01882-z.
    [33] Y. S. Niu, J. Júdice, H. A. Le Thi and D. T. Pham, Solving the quadratic eigenvalue complementarity problem by DC programming, Proceedings of the Modelling, Computation and Optimization in Information Systems and Management Sciences, (2015), 203-214.
    [34] Y. S. NiuJ. JúdiceH. A. Le Thi and D. T. Pham, Improved dc programming approaches for solving the quadratic eigenvalue complementarity problem, Appl. Math. Comput., 353 (2019), 95-113.  doi: 10.1016/j.amc.2019.02.017.
    [35] Y. S. Niu and D. T. Pham, A DC programming approach for mixed-integer linear programs, Proceedings of the Modelling, Computation and Optimization in Information Systems and Management Sciences, (2008), 244-253.
    [36] Y. S. Niu and D. T. Pham, DC programming approaches for BMI and QMI feasibility problems, Proceedings of the Advanced Computational Methods for Knowledge Engineering, (2014), 37-63.
    [37] Y. S. NiuD. T. PhamH. A. Le Thi and J. Júdice, Efficient DC programming approaches for the asymmetric eigenvalue complementarity problem, Optim. Methods Softw., 28 (2013), 812-829.  doi: 10.1080/10556788.2011.645543.
    [38] Y. S. Niu, Y. J. Wang, H. A. Le Thi and D. T. Pham, High-order moment portfolio optimization via an accelerated difference-of-convex programming approach and sums-of-squares, preprint, 2019, arXiv: 1906.01509.
    [39] Y. S. Niu, Y. You and W. Z. Liu, Parallel dc cutting plane algorithms for mixed binary linear program, Proceedings of the World Congress on Global Optimization, France, (2019), 330-340.
    [40] Y. S. NiuY. YouW. XuW. DingJ. Hu and S. Yao, A difference-of-convex programming approach with parallel branch-and-bound for sentence compression via a hybrid extractive model, Optim. Lett., 15 (2021), 2407-2432. 
    [41] B. O'Donoghue and E. Candès, Adaptive restart for accelerated gradient schemes, Found. Comput. Math., 15 (2015), 715-732.  doi: 10.1007/s10208-013-9150-3.
    [42] W. de Oliveira, The abc of dc programming, Set-Valued Var. Anal., 28 (2020), 679-706.  doi: 10.1007/s11228-020-00566-w.
    [43] W. de Oliveira and M. P. Tcheou, An inertial algorithm for dc programming, Set-Valued Var. Anal., 27 (2019), 895-919.  doi: 10.1007/s11228-018-0497-0.
    [44] D. T. Pham and H. A. Le Thi, Convex analysis approach to d.c. programming: Theory, algorithms and applications, Acta Math. Vietnam., 22 (1997), 289-355. 
    [45] D. T. Pham and H. A. Le Thi, A d.c. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim., 8 (1998), 476-505.  doi: 10.1137/S1052623494274313.
    [46] D. T. Pham and H. A. Le Thi, Recent advances in DC Programming and DCA, Proceedings of the Transactions on Computational Intelligence XIII, (2014), 1-37.
    [47] D. T. PhamH. A. Le ThiV. N. Pham and Y. S. Niu, DC programming approaches for discrete portfolio optimization under concave transaction costs, Optim. Lett., 10 (2016), 261-282.  doi: 10.1007/s11590-015-0931-2.
    [48] D. T. Pham and Y. S. Niu, An efficient DC programming approach for portfolio decision with higher moments, Comput. Optim. Appl., 50 (2011), 525-554.  doi: 10.1007/s10589-010-9383-x.
    [49] D. T. Pham and E. B. Souad, Algorithms for solving a class of nonconvex optimization problems. Methods of subgradients, North-Holland Math. Stud., 129 (1986), 249-271.  doi: 10.1016/S0304-0208(08)72402-2.
    [50] D. N. Phan, H. M. Le and H. A. Le Thi, Accelerated difference of convex functions algorithm and its application to sparse binary logistic regression, IJCAI, (2018), 1369-1375.
    [51] M. RaginskyR. M. WillettZ. T. Harmany and R. F. Marcia, Compressed sensing performance bounds under Poisson noise, IEEE Trans. Signal Process., 58 (2010), 3990-4002.  doi: 10.1109/TSP.2010.2049997.
    [52] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, N.J. 1970.
    [53] K. ScheinbergD. Goldfarb and X. Bai, Fast first-Order methods for composite convex optimization with backtracking, Found. Comput. Math., 14 (2014), 389-417.  doi: 10.1007/s10208-014-9189-9.
    [54] B. WenX. Chen and T. K. Pong, A proximal difference-of-convex algorithm with extrapolation, Comput. Optim. Appl., 69 (2018), 297-324.  doi: 10.1007/s10589-017-9954-1.
    [55] P. YinY. LouQ. He and J. Xin, Minimization of $\ell_{1-2}$ for compressed sensing, SIAM J. Sci. Comput., 37 (2015), 536-563.  doi: 10.1137/140952363.
    [56] Y. You and Y. S. Niu, A refined inertial DC algorithm for DC programming, Optim. Eng., 2022.
  • 加载中

Figures(2)

Tables(2)

SHARE

Article Metrics

HTML views(3229) PDF downloads(130) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return