
- Previous Article
- JDG Home
- This Issue
-
Next Article
Market games and walrasian equilibria
A regularization interpretation of the proximal point method for weakly convex functions
Department of Mathematics and Statistics, McGill University, Montreal, Canada |
Empirical evidence and theoretical results suggest that the proximal point method can be computed approximately and still converge faster than the corresponding gradient descent method, in both the stochastic and exact gradient case. In this article we provide a perspective on this result by interpreting the method as gradient descent on a regularized function. This perspective applies in the case of weakly convex functions where proofs of the faster rates are not available. Using this analysis we find the optimal value of the regularization parameter in terms of the weak convexity.
References:
[1] |
Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, Journal of Machine Learning Research (JMLR), 18 (2017), 51 pp. |
[2] |
L. T. H. An and P. D Tao,
Convex analysis approach to d.c. programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, 22 (1997), 289-355.
|
[3] |
H. Attouch and D. Azé,
Approximation and regularization of arbitrary functions in Hilbert spaces by the Lasry-Lions method, Ann. Inst. H. Poincaré Anal. Non Linéaire, 10 (1993), 289-319.
doi: 10.1016/S0294-1449(16)30214-1. |
[4] |
H. Attouch and J. Bolte,
On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), 5-16.
doi: 10.1007/s10107-007-0133-5. |
[5] |
J.-B. Baillon and G. Haddad,
Quelques propriétés des opérateurs angle-bornés et $n$-cycliquement monotones, Israel J. Math., 26 (1977), 137-150.
doi: 10.1007/BF03007664. |
[6] |
H. H. Bauschke and P. L. Combettes,
The Baillon-Haddad theorem revisited, J. Convex Anal., 17 (2010), 781-787.
|
[7] |
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, New York, 2011.
doi: 10.1007/978-1-4419-9467-7. |
[8] |
A. Beck, First-order Methods in Optimization, MOS-SIAM Series on Optimization, 25. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, Mathematical Optimization Society, Philadelphia, PA, 2017.
doi: 10.1137/1.9781611974997.ch1. |
[9] |
L. Bottou, F. E. Curtis and J. Nocedal,
Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223-311.
doi: 10.1137/16M1080173. |
[10] |
S. Bubeck,
Convex optimization: Algorithms and complexity, Found. Trends Mach. Learn., 8 (2015), 231-357.
doi: 10.1561/2200000050. |
[11] |
Y. Carmon, J. C. Duchi, O. Hinder and A. Sidford,
Accelerated methods for nonconvex optimization, SIAM J. Optim., 28 (2018), 1751-1772.
doi: 10.1137/17M1114296. |
[12] |
P. Chaudhari, A. Oberman, S. Osher, S. Soatto and G. Carlier, Deep Relaxation: Partial differential equations for optimizing deep neural networks, Res. Math. Sci., 5 (2018), 30 pp.
doi: 10.1007/s40687-018-0148-y. |
[13] |
L. C. Evans, Partial Differential Equations, Second edition, Graduate Studies in Mathematics, 19. American Mathematical Society, Providence, RI, 2010.
doi: 978-0-8218-4974-3. |
[14] |
N. Flammarion and F. Bach, From averaging to acceleration, there is only a step-size, Conference on Learning Theory, (2015), 658–695. Google Scholar |
[15] |
A. Kaplan and R. Tichatschke,
Proximal point methods and nonconvex optimization, J. Global Optim., 13 (1998), 389-406.
doi: 10.1023/A:1008321423879. |
[16] |
J.-M. Lasri and P.-L. Lions,
A remark on regularization in Hilbert spaces, Israel J. Math., 55 (1986), 257-266.
doi: 10.1007/BF02765025. |
[17] |
L. Lessard, B. Recht and A. Packard,
Analysis and design of optimization algorithms via integral quadratic constraints, SIAM J. Optim., 26 (2016), 57-95.
doi: 10.1137/15M1009597. |
[18] |
H. Lin, J. Mairal and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems 28, Curran Associates, Inc., (2015), 3384–3392. Google Scholar |
[19] |
B. Merlet and M. Pierre,
Convergence to equilibrium for the backward Euler scheme and applications, Commun. Pure Appl. Anal., 9 (2010), 685-702.
doi: 10.3934/cpaa.2010.9.685. |
[20] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[21] |
A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, Advances in Neural Information Processing Systems 27, Curran Associates, Inc., (2014), 1574–1582. Google Scholar |
[22] |
C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal and Z. Harchaoui, Catalyst for gradient-based nonconvex optimization, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 84 (2018), 613-622. Google Scholar |
[23] |
N. Parikh and S. Boyd,
Proximal algorithms, Found. Trends Optim., 1 (2014), 127-239.
doi: 10.1561/2400000003. |
[24] |
B. T. Polyak, Some methods of speeding up the convergence of iteration methods, Ž. Vyčisl. Mat i Fiz., 4 (1964), 791–803. |
[25] |
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren der Mathematischen Wissenschaften, 317. Springer-Verlag, Berlin, 1998.
doi: 10.1007/978-3-642-02431-3. |
[26] |
L. Rosasco, S. Villa and B. C. Vũ, Convergence of stochastic proximal gradient algorithm, preprint, arXiv: 1403.5074. Google Scholar |
[27] |
M. Schmidt, N. Le Roux and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, Advances in Neural Information Processing Systems 24, Curran Associates, Inc., (2011), 1458–1466. Google Scholar |
[28] |
D. Scieur, V. Roulet, F. Bach and A. D'Aspremont, Integration methods and optimization algorithms, Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., (2017), 1109–1118. Google Scholar |
[29] |
A. M. Stuart and A. R. Humphries, Dynamical Systems and Numerical Analysis, Cambridge
Monographs on Applied and Computational Mathematics, 2. Cambridge University Press,
Cambridge, 1996. |
[30] |
W. J. Su, S. Boyd and E. J. Candes, A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights, J. Mach. Learn. Res., 17 (2016), 43 pp. |
[31] |
P. H. Yin, M. Pham, A. Oberman and S. Osher,
Stochastic backward Euler: An implicit gradient descent algorithm for $k$-means clustering, J. Sci. Comput., 77 (2018), 1133-1146.
doi: 10.1007/s10915-018-0744-4. |
show all references
References:
[1] |
Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, Journal of Machine Learning Research (JMLR), 18 (2017), 51 pp. |
[2] |
L. T. H. An and P. D Tao,
Convex analysis approach to d.c. programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, 22 (1997), 289-355.
|
[3] |
H. Attouch and D. Azé,
Approximation and regularization of arbitrary functions in Hilbert spaces by the Lasry-Lions method, Ann. Inst. H. Poincaré Anal. Non Linéaire, 10 (1993), 289-319.
doi: 10.1016/S0294-1449(16)30214-1. |
[4] |
H. Attouch and J. Bolte,
On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), 5-16.
doi: 10.1007/s10107-007-0133-5. |
[5] |
J.-B. Baillon and G. Haddad,
Quelques propriétés des opérateurs angle-bornés et $n$-cycliquement monotones, Israel J. Math., 26 (1977), 137-150.
doi: 10.1007/BF03007664. |
[6] |
H. H. Bauschke and P. L. Combettes,
The Baillon-Haddad theorem revisited, J. Convex Anal., 17 (2010), 781-787.
|
[7] |
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, New York, 2011.
doi: 10.1007/978-1-4419-9467-7. |
[8] |
A. Beck, First-order Methods in Optimization, MOS-SIAM Series on Optimization, 25. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, Mathematical Optimization Society, Philadelphia, PA, 2017.
doi: 10.1137/1.9781611974997.ch1. |
[9] |
L. Bottou, F. E. Curtis and J. Nocedal,
Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223-311.
doi: 10.1137/16M1080173. |
[10] |
S. Bubeck,
Convex optimization: Algorithms and complexity, Found. Trends Mach. Learn., 8 (2015), 231-357.
doi: 10.1561/2200000050. |
[11] |
Y. Carmon, J. C. Duchi, O. Hinder and A. Sidford,
Accelerated methods for nonconvex optimization, SIAM J. Optim., 28 (2018), 1751-1772.
doi: 10.1137/17M1114296. |
[12] |
P. Chaudhari, A. Oberman, S. Osher, S. Soatto and G. Carlier, Deep Relaxation: Partial differential equations for optimizing deep neural networks, Res. Math. Sci., 5 (2018), 30 pp.
doi: 10.1007/s40687-018-0148-y. |
[13] |
L. C. Evans, Partial Differential Equations, Second edition, Graduate Studies in Mathematics, 19. American Mathematical Society, Providence, RI, 2010.
doi: 978-0-8218-4974-3. |
[14] |
N. Flammarion and F. Bach, From averaging to acceleration, there is only a step-size, Conference on Learning Theory, (2015), 658–695. Google Scholar |
[15] |
A. Kaplan and R. Tichatschke,
Proximal point methods and nonconvex optimization, J. Global Optim., 13 (1998), 389-406.
doi: 10.1023/A:1008321423879. |
[16] |
J.-M. Lasri and P.-L. Lions,
A remark on regularization in Hilbert spaces, Israel J. Math., 55 (1986), 257-266.
doi: 10.1007/BF02765025. |
[17] |
L. Lessard, B. Recht and A. Packard,
Analysis and design of optimization algorithms via integral quadratic constraints, SIAM J. Optim., 26 (2016), 57-95.
doi: 10.1137/15M1009597. |
[18] |
H. Lin, J. Mairal and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems 28, Curran Associates, Inc., (2015), 3384–3392. Google Scholar |
[19] |
B. Merlet and M. Pierre,
Convergence to equilibrium for the backward Euler scheme and applications, Commun. Pure Appl. Anal., 9 (2010), 685-702.
doi: 10.3934/cpaa.2010.9.685. |
[20] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[21] |
A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, Advances in Neural Information Processing Systems 27, Curran Associates, Inc., (2014), 1574–1582. Google Scholar |
[22] |
C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal and Z. Harchaoui, Catalyst for gradient-based nonconvex optimization, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 84 (2018), 613-622. Google Scholar |
[23] |
N. Parikh and S. Boyd,
Proximal algorithms, Found. Trends Optim., 1 (2014), 127-239.
doi: 10.1561/2400000003. |
[24] |
B. T. Polyak, Some methods of speeding up the convergence of iteration methods, Ž. Vyčisl. Mat i Fiz., 4 (1964), 791–803. |
[25] |
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren der Mathematischen Wissenschaften, 317. Springer-Verlag, Berlin, 1998.
doi: 10.1007/978-3-642-02431-3. |
[26] |
L. Rosasco, S. Villa and B. C. Vũ, Convergence of stochastic proximal gradient algorithm, preprint, arXiv: 1403.5074. Google Scholar |
[27] |
M. Schmidt, N. Le Roux and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, Advances in Neural Information Processing Systems 24, Curran Associates, Inc., (2011), 1458–1466. Google Scholar |
[28] |
D. Scieur, V. Roulet, F. Bach and A. D'Aspremont, Integration methods and optimization algorithms, Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., (2017), 1109–1118. Google Scholar |
[29] |
A. M. Stuart and A. R. Humphries, Dynamical Systems and Numerical Analysis, Cambridge
Monographs on Applied and Computational Mathematics, 2. Cambridge University Press,
Cambridge, 1996. |
[30] |
W. J. Su, S. Boyd and E. J. Candes, A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights, J. Mach. Learn. Res., 17 (2016), 43 pp. |
[31] |
P. H. Yin, M. Pham, A. Oberman and S. Osher,
Stochastic backward Euler: An implicit gradient descent algorithm for $k$-means clustering, J. Sci. Comput., 77 (2018), 1133-1146.
doi: 10.1007/s10915-018-0744-4. |


[1] |
Gang Luo, Qingzhi Yang. The point-wise convergence of shifted symmetric higher order power method. Journal of Industrial & Management Optimization, 2021, 17 (1) : 357-368. doi: 10.3934/jimo.2019115 |
[2] |
Xiu Ye, Shangyou Zhang, Peng Zhu. A weak Galerkin finite element method for nonlinear conservation laws. Electronic Research Archive, 2021, 29 (1) : 1897-1923. doi: 10.3934/era.2020097 |
[3] |
Lateef Olakunle Jolaoso, Maggie Aphane. Bregman subgradient extragradient method with monotone self-adjustment stepsize for solving pseudo-monotone variational inequalities and fixed point problems. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2020178 |
[4] |
Mehdi Bastani, Davod Khojasteh Salkuyeh. On the GSOR iteration method for image restoration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 27-43. doi: 10.3934/naco.2020013 |
[5] |
Xiaoxiao Li, Yingjing Shi, Rui Li, Shida Cao. Energy management method for an unpowered landing. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2020180 |
[6] |
Ying Liu, Yanping Chen, Yunqing Huang, Yang Wang. Two-grid method for semiconductor device problem by mixed finite element method and characteristics finite element method. Electronic Research Archive, 2021, 29 (1) : 1859-1880. doi: 10.3934/era.2020095 |
[7] |
Hong Niu, Zhijiang Feng, Qijin Xiao, Yajun Zhang. A PID control method based on optimal control strategy. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 117-126. doi: 10.3934/naco.2020019 |
[8] |
Qing-Hu Hou, Yarong Wei. Telescoping method, summation formulas, and inversion pairs. Electronic Research Archive, , () : -. doi: 10.3934/era.2021007 |
[9] |
Li-Bin Liu, Ying Liang, Jian Zhang, Xiaobing Bao. A robust adaptive grid method for singularly perturbed Burger-Huxley equations. Electronic Research Archive, 2020, 28 (4) : 1439-1457. doi: 10.3934/era.2020076 |
[10] |
Zexuan Liu, Zhiyuan Sun, Jerry Zhijian Yang. A numerical study of superconvergence of the discontinuous Galerkin method by patch reconstruction. Electronic Research Archive, 2020, 28 (4) : 1487-1501. doi: 10.3934/era.2020078 |
[11] |
Yuxia Guo, Shaolong Peng. A direct method of moving planes for fully nonlinear nonlocal operators and applications. Discrete & Continuous Dynamical Systems - S, 2020 doi: 10.3934/dcdss.2020462 |
[12] |
Noah Stevenson, Ian Tice. A truncated real interpolation method and characterizations of screened Sobolev spaces. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5509-5566. doi: 10.3934/cpaa.2020250 |
[13] |
Yue Feng, Yujie Liu, Ruishu Wang, Shangyou Zhang. A conforming discontinuous Galerkin finite element method on rectangular partitions. Electronic Research Archive, , () : -. doi: 10.3934/era.2020120 |
[14] |
Bing Yu, Lei Zhang. Global optimization-based dimer method for finding saddle points. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 741-753. doi: 10.3934/dcdsb.2020139 |
[15] |
C. J. Price. A modified Nelder-Mead barrier method for constrained optimization. Numerical Algebra, Control & Optimization, 2020 doi: 10.3934/naco.2020058 |
[16] |
Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems & Imaging, 2021, 15 (1) : 147-158. doi: 10.3934/ipi.2020045 |
[17] |
Kai Zhang, Xiaoqi Yang, Song Wang. Solution method for discrete double obstacle problems based on a power penalty approach. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2021018 |
[18] |
Ke Su, Yumeng Lin, Chun Xu. A new adaptive method to nonlinear semi-infinite programming. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2021012 |
[19] |
Tengfei Yan, Qunying Liu, Bowen Dou, Qing Li, Bowen Li. An adaptive dynamic programming method for torque ripple minimization of PMSM. Journal of Industrial & Management Optimization, 2021, 17 (2) : 827-839. doi: 10.3934/jimo.2019136 |
[20] |
Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]