\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Alternating direction method of multipliers with variable metric indefinite proximal terms for convex optimization

  • * Corresponding author: Yan Gu

    * Corresponding author: Yan Gu 

This paper is dedicated in honor and to the memory of Professor Wenyu Sun

This work is supported by JSPS KAKENHI Grant Number 17K00032

Abstract / Introduction Full Text(HTML) Figure(3) / Table(4) Related Papers Cited by
  • This paper studies a proximal alternating direction method of multipliers (ADMM) with variable metric indefinite proximal terms for linearly constrained convex optimization problems. The proximal ADMM plays an important role in many application areas, since the subproblems of the method are easy to solve. Recently, it is reported that the proximal ADMM with a certain fixed indefinite proximal term is faster than that with a positive semidefinite term, and still has the global convergence property. On the other hand, Gu and Yamashita studied a variable metric semidefinite proximal ADMM whose proximal term is generated by the BFGS update. They reported that a slightly indefinite matrix also makes the algorithm work well in their numerical experiments. Motivated by this fact, we consider a variable metric indefinite proximal ADMM, and give sufficient conditions on the proximal terms for the global convergence. Moreover, based on the BFGS update, we propose a new indefinite proximal term which can satisfy the conditions for the global convergence. Experiments on several datasets demonstrated that our proposed variable metric indefinite proximal ADMM outperforms most of the comparison proximal ADMMs.

    Mathematics Subject Classification: Primary: 90C25, 90C53, 65K05.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Evolution of the objective function values with respect to iterations on real datasets

    Figure 2.  Evolution of the objective function values with respect to iterations on synthetic datasets

    Figure 3.  Comparison on the CPU time

    Table 1.  Comparisons among existing methods

    method $ S, T $ $ \alpha $ $ \tau $
    Classical ADMM [9,11] $ S=T=0 $ $ (0, (1+\sqrt 5)/2) $ -
    Semi-proximal ADMM [7] fixed positive semidefinite matrices $ (0, (1+\sqrt 5)/2) $ $ (1, +\infty) $
    Variable semi-proximal ADMM [13,14] variable positive semidefinite matrices 1 $ (1, +\infty) $
    Indefinite proximal ADMM [20] fixed indefinite matrices 1 $ (0.75,1) $
    Proposed method variable indefinite matrices 1 $ (0.75,1) $
     | Show Table
    DownLoad: CSV
    Construction of $ T_k $ via the BFGS update
    1 Let $ \delta \in (0, \infty) $, $ \tau \in (\frac3 4, 1) $ and $ B_0 \succeq \tau M $;
    2 Let $ c_k $ be a sequence such that $ c_k \in [0,1] $ and $ \sum\limits_{k=0}^{\infty} c_k < \infty $;
    3 If $ s_k \neq 0 $, then set $ \tilde{l}_k = M^{\delta} s_k $ and update $ B_{k+1} $ via
    $\begin{equation*} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ B_{k+1} = B_{k}+ c_k \left(\frac {{\tilde{l}}_{k}{\tilde{l}}_{k}^\top}{{\tilde{l}}_{k}^\top {s}_{k}} -{\frac {B_{k}{s}_{k}{s}_{k}^\top B_{k}^\top}{{s}_{k}^\top B_{k}{s}_{k}}} \right); \end{equation*} $
    4 Otherwise
    $\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ B_{k+1} = B_k;$
    5 Construct $ T_{k+1} $ as
    $\begin{equation*} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ T_{k+1} = B_{k+1} - M. \end{equation*} $
     | Show Table
    DownLoad: CSV

    Table 2.  Results for real datasets

    Datasets Tolerance $ \beta $ ADMM SPADM IPADM IADMB IADML
    Iter. T-A Iter. T-A Iter. T-A Iter. T-A Iter. T-A
    Boston-house
    $ m=506 $
    $ n=13 $
    $ \epsilon^ \mathrm{rel} = 10^{-2} $ 100 22.0 0.0010 70.0 0.0015 48.0 0.0013 26.0 0.0015 25.0 0.0018
    $ \epsilon ^ \mathrm{abs} = 10^{-3} $ 300 9.0 0.0010 37.0 0.0013 32.0 0.0012 20.0 0.0015 21.0 0.0016
    500 11.0 0.0010 29.0 0.0012 26.0 0.0011 20.0 0.0012 19.0 0.0014
    $ \epsilon ^ \mathrm{abs} = 10^{-3} $ 300 16.0 0.0010 69.0 0.0014 60.0 0.0013 29.0 0.0014 {34.0} 0.0017
    $ \epsilon ^ \mathrm{abs} = 10^{-4} $ 500 18.0 0.0011 75.0 0.0015 65.0 0.0014 28.0 0.0015 30.0 0.0016
    700 25.0 0.0010 86.0 0.0017 75.0 0.0015 36.0 0.0017 36.0 0.0017
    California-housing
    $ m=20640 $
    $ n=8 $
    $ \epsilon^ \mathrm{rel} = 10^{-2} $ 1000 36.0 0.0014 89.0 0.0069 77.0 0.0064 38.0 0.0022 38.0 0.0054
    $ \epsilon ^ \mathrm{abs} = 10^{-3} $ 1500 24.0 0.0012 73.0 0.0057 64.0 0.0056 26.0 0.0022 27.0 0.0050
    1700 21.0 0.0013 67.0 0.0056 59.0 0.0049 24.0 0.0018 24.0 0.0041
    $ \epsilon ^ \mathrm{abs} = 10^{-3} $ 5000 23.0 0.0013 55.0 0.0046 49.0 0.0044 28.0 0.0020 28.0 0.0048
    $ \epsilon ^ \mathrm{abs} = 10^{-4} $ 5300 22.0 0.0014 53.0 0.0045 47.0 0.0043 27.0 0.0023 27.0 0.0042
    5500 22.0 0.0012 52.0 0.0041 47.0 0.0042 26.0 0.0018 26.0 0.0036
     | Show Table
    DownLoad: CSV

    Table 3.  Results for Synthetic datasets ($ \epsilon^ \mathrm{rel} = 10^{-2} $, $ \epsilon ^ \mathrm{abs} = 10^{-3} $)

    Data T-LU/ME $ \beta $ ADMM SPADM IPADM PADML IADML
    Iter. T-A Iter. T-A Iter. T-A Iter. T-A Iter. T-A
    $ m=2664 $ T-LU$ =10.315 $ 300 30.7 0.5069 75.2 0.5898 66.9 0.5286 36.3 0.2987 34.1 0.2784
    $ n=2778 $ T-ME$ =1.980 $ 500 17.9 0.3010 49.9 0.4102 42.5 0.3555 26.5 0.2260 24.5 0.2034
    800 11.5 0.1888 36.0 0.2854 31.7 0.2539 20.1 0.1695 18.3 0.1507
    1000 10.3 0.1716 31.6 0.2537 27.3 0.2232 18.4 0.1578 17.1 0.1437
    $ m=3580 $ T-LU$ =28.516 $ 300 44.9 1.4417 111.5 1.9481 95.9 1.6807 50.2 0.9149 47.8 0.8656
    $ n=4620 $ T-ME$ =5.378 $ 500 27.9 0.9129 73.1 1.2882 62.1 1.0911 36.0 0.6608 33.7 0.6198
    800 17.9 0.5888 53.2 0.9468 42.5 0.7578 27.9 0.5178 25.4 0.4786
    1000 14.3 0.4787 41.1 0.7307 35.1 0.6266 24.6 0.4576 22.9 0.4221
    $ m=7498 $ T-LU=118.826 500 46.8 1.7680 86.9 4.0866 67.2 3.1514 47.2 2.2569 47.0 2.2453
    $ n=5633 $ T-ME=16.141 1000 24.3 0.9105 51.0 2.3636 45.0 2.0981 27.6 1.3902 26.1 1.2590
    1500 16.1 0.6303 40.9 1.9333 36.0 1.6943 20.7 0.9949 19.5 0.9493
    2000 12.1 0.4829 28.9 1.3641 25.9 1.2279 18.8 0.9107 17.0 0.8338
    $ m=4774 $ T-LU=158.214 500 48.0 4.7976 134.3 8.7370 116.7 7.6785 61.6 4.1815 59.7 3.9039
    $ n=10368 $ T-ME=24.249 1000 23.4 2.3597 80.7 5.3301 70.8 5.7369 41.6 3.6014 38.1 2.6846
    1500 17.4 1.4561 55.7 3.0283 49.0 2.6745 31.4 1.7645 29.1 1.6197
    2000 13.7 1.1631 47.0 2.6077 41.2 2.2898 29.1 1.6362 26.3 1.4693
     | Show Table
    DownLoad: CSV
  • [1] S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends® in Machine Learning, 3 (2011), 1–122. doi: 10.1561/2200000016.
    [2] A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision, 40 (2011), 120-145.  doi: 10.1007/s10851-010-0251-1.
    [3] W. Deng and W. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers, Journal of Scientific Computing, 66 (2016), 889-916.  doi: 10.1007/s10915-015-0048-x.
    [4] J. Eckstein, Some saddle-function splitting methods for convex programming, Optimization Methods and Software, 4 (1994), 75-83.  doi: 10.1080/10556789408805578.
    [5] J. Eckstein and D. P. Bertsekas, On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, 55 (1992), 293-318.  doi: 10.1007/BF01581204.
    [6] J. Eckstein and W. Yao, Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM, Mathematical Programming, 170 (2018), 417-444.  doi: 10.1007/s10107-017-1160-5.
    [7] M. FazelT. K. PongD. Sun and P. Tseng, Hankel matrix rank minimization with applications to system identification and realization, SIAM Journal on Matrix Analysis and Applications, 34 (2013), 946-977.  doi: 10.1137/110853996.
    [8] M. Fortin and R. Glowinski, Chapter ⅲ on decomposition-coordination methods using an augmented lagrangian, in Studies in Mathematics and Its Applications, vol. 15, Elsevier, (1983), 97–146. doi: 10.1016/S0168-2024(08)70028-6.
    [9] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Computers and Mathematics with Applications, 2 (1976), 17-40.  doi: 10.1016/0898-1221(76)90003-1.
    [10] B. Gao and F. Ma, Symmetric alternating direction method with indefinite proximal regularization for linearly constrained convex optimization, Journal of Optimization Theory and Applications, 176 (2018), 178-204.  doi: 10.1007/s10957-017-1207-z.
    [11] R. Glowinski and A. Marroco, Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de dirichlet non linéaires, ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, 9 (1975), 41–76. doi: 10.1051/m2an/197509R200411.
    [12] M. L. N. GonçalvesM. M. Alves and J. G. Melo, Pointwise and ergodic convergence rates of a variable metric proximal alternating direction method of multipliers, Journal of Optimization Theory and Applications, 177 (2018), 448-478.  doi: 10.1007/s10957-018-1232-6.
    [13] Y. Gu and N. Yamashita, An alternating direction method of multipliers with the BFGS update for structured convex quadratic optimization, preprint, arXiv: 1903.02270.
    [14] Y. Gu and N. Yamashita, A proximal ADMM with the Broyden family for convex optimization problems, Journal of Industrial and Management Optimization, 13 (2020). doi: 10.3934/jimo.2020091.
    [15] D. Harrison Jr and D. L. Rubinfeld, Hedonic housing prices and the demand for clean air, Journal of Environmental Economics and Management, 5 (1978), 81-102.  doi: 10.1016/0095-0696(78)90006-2.
    [16] B. HeL.-Z. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Mathematical Programming, 92 (2002), 103-118.  doi: 10.1007/s101070100280.
    [17] B. HeH. LiuZ. Wang and X. Yuan, A strictly contractive Peaceman–Rachford splitting method for convex programming, SIAM Journal on Optimization, 24 (2014), 1011-1040.  doi: 10.1137/13090849X.
    [18] B. He and X. Yuan, On the $O(1/n)$ convergence rate of the Douglas-Rachford alternating direction method, SIAM Journal on Numerical Analysis, 50 (2012), 700-709.  doi: 10.1137/110836936.
    [19] B. He and X. Yuan, Block-wise alternating direction method of multipliers for multiple-block convex programming and beyond, SMAI Journal of Computational Mathematics, 1 (2015), 145-174.  doi: 10.5802/smai-jcm.6.
    [20] B. He, F. Ma and X. Yuan, Optimally linearizing the alternating direction method of multipliers for convex programming, Computational Optimization and Applications, (2019), 1–28. doi: 10.1007/s10589-019-00152-3.
    [21] K. KohS.-J. Kim and S. Boyd, An interior-point method for large-scale l1-regularized logistic regression, Journal of Machine Learning Research, 8 (2007), 1519-1555.  doi: 10.1109/JSTSP.2007.910971.
    [22] M. LiD. Sun and K.-C. Toh, A majorized admm with indefinite proximal terms for linearly constrained convex composite optimization, SIAM Journal on Optimization, 26 (2016), 922-950.  doi: 10.1137/140999025.
    [23] P.-L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM Journal on Numerical Analysis, 16 (1979), 964-979.  doi: 10.1137/0716071.
    [24] P. A. LotitoL. A. Parente and M. Solodov, A class of variable metric decomposition methods for monotone variational inclusions, Journal of Convex Analysis, 16 (2009), 857-880. 
    [25] Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, 140 (2013), 125-161.  doi: 10.1007/s10107-012-0629-5.
    [26] R. K. Pace and R. Barry, Sparse spatial autoregressions, Statistics and Probability Letters, 33 (1997), 291-297.  doi: 10.1016/S0167-7152(96)00140-X.
    [27] L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.
    [28] H. Trevor, T. Robert and F. JH, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Science and Business Media, 2009.
    [29] P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Mathematical Programming, 117 (2009), 387-423.  doi: 10.1007/s10107-007-0170-0.
    [30] M. Xu, Proximal alternating directions method for structured variational inequalities, Journal of Optimization Theory and Applications, 134 (2007), 107-117.  doi: 10.1007/s10957-007-9192-2.
    [31] M. Xu and T. Wu, A class of linearized proximal alternating direction methods, Journal of Optimization Theory and Applications, 151 (2011), 321-337.  doi: 10.1007/s10957-011-9876-5.
    [32] W. YinS. OsherD. Goldfarb and J. Darbon, Bregman iterative algorithms for $l_1$-minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences, 1 (2008), 143-168.  doi: 10.1137/070703983.
  • 加载中

Figures(3)

Tables(4)

SHARE

Article Metrics

HTML views(1671) PDF downloads(197) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return