
-
Previous Article
Differential equation method based on approximate augmented Lagrangian for nonlinear programming
- JIMO Home
- This Issue
-
Next Article
Analytical modeling of laminated composite plates using Kirchhoff circuit and wave digital filters
Improved SVRG for finite sum structure optimization with application to binary classification
1. | School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China |
2. | School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China |
This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate $ O(\rho^k) $ for smooth and strongly convex functions, where $ \rho\in(0.68, 1) $. Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method.
References:
[1] |
J. Barzilai and J. M. Borwein,
Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.
doi: 10.1093/imanum/8.1.141. |
[2] |
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.
doi: 10.1017/CBO9780511804441.![]() ![]() ![]() |
[3] |
Y. Chen, L. Qi and Q. Wang,
Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.
doi: 10.1007/s10915-015-0155-8. |
[4] |
W. Cheng and Y. Dai,
Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.
doi: 10.1090/mcom/3238. |
[5] |
Y. Dai and R. Fletcher,
On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.
doi: 10.1007/s10107-004-0516-9. |
[6] |
Y. Dai and R. Fletcher,
Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.
doi: 10.1007/s00211-004-0569-y. |
[7] |
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. |
[8] |
H. Liu, Z. Liu and X. Dong,
A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.
doi: 10.1007/s11590-017-1150-9. |
[9] |
J. Mairal,
Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.
doi: 10.1137/140957639. |
[10] |
E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942.
doi: 10.1109/IJCNN.2017.7966219. |
[11] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[12] |
A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. |
[13] |
A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. |
[14] |
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007.
doi: 10.1002/9780470182963. |
[15] |
M. Raydan,
On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.
doi: 10.1093/imanum/13.3.321. |
[16] |
H. Robbins and S. Monro,
A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.
doi: 10.1214/aoms/1177729586. |
[17] |
N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. |
[18] |
Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. |
[19] |
C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. |
[20] |
Z. Wen, C. Yang, X. Liu and Y. Zhang,
Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.
doi: 10.1007/s10915-015-0061-0. |
[21] |
L. Xiao and T. Zhang,
A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.
doi: 10.1137/140961791. |
[22] |
W. Xue, W. Zhang and J. Ren,
Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51.
|
[23] |
G. Yu and S. Niu,
Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.
doi: 10.3934/jimo.2013.9.117. |
[24] |
B. Zhou, L. Gao and Y. Dai,
Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.
doi: 10.1007/s10589-006-6446-0. |
[25] |
R. Zhou, X. Shen and L. Niu,
A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.
doi: 10.1016/j.neucom.2017.08.048. |
show all references
References:
[1] |
J. Barzilai and J. M. Borwein,
Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.
doi: 10.1093/imanum/8.1.141. |
[2] |
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.
doi: 10.1017/CBO9780511804441.![]() ![]() ![]() |
[3] |
Y. Chen, L. Qi and Q. Wang,
Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.
doi: 10.1007/s10915-015-0155-8. |
[4] |
W. Cheng and Y. Dai,
Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.
doi: 10.1090/mcom/3238. |
[5] |
Y. Dai and R. Fletcher,
On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.
doi: 10.1007/s10107-004-0516-9. |
[6] |
Y. Dai and R. Fletcher,
Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.
doi: 10.1007/s00211-004-0569-y. |
[7] |
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. |
[8] |
H. Liu, Z. Liu and X. Dong,
A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.
doi: 10.1007/s11590-017-1150-9. |
[9] |
J. Mairal,
Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.
doi: 10.1137/140957639. |
[10] |
E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942.
doi: 10.1109/IJCNN.2017.7966219. |
[11] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[12] |
A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. |
[13] |
A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. |
[14] |
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007.
doi: 10.1002/9780470182963. |
[15] |
M. Raydan,
On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.
doi: 10.1093/imanum/13.3.321. |
[16] |
H. Robbins and S. Monro,
A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.
doi: 10.1214/aoms/1177729586. |
[17] |
N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. |
[18] |
Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. |
[19] |
C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. |
[20] |
Z. Wen, C. Yang, X. Liu and Y. Zhang,
Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.
doi: 10.1007/s10915-015-0061-0. |
[21] |
L. Xiao and T. Zhang,
A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.
doi: 10.1137/140961791. |
[22] |
W. Xue, W. Zhang and J. Ren,
Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51.
|
[23] |
G. Yu and S. Niu,
Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.
doi: 10.3934/jimo.2013.9.117. |
[24] |
B. Zhou, L. Gao and Y. Dai,
Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.
doi: 10.1007/s10589-006-6446-0. |
[25] |
R. Zhou, X. Shen and L. Niu,
A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.
doi: 10.1016/j.neucom.2017.08.048. |









Data set | | | |
a1a | 1605 | 30956 | 123 |
a9a | 32561 | 16281 | 123 |
w1a | 2477 | 47272 | 300 |
w8a | 49749 | 14951 | 300 |
Data set | | | |
a1a | 1605 | 30956 | 123 |
a9a | 32561 | 16281 | 123 |
w1a | 2477 | 47272 | 300 |
w8a | 49749 | 14951 | 300 |
[1] |
Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008 |
[2] |
Wen-Chiao Cheng. Two-point pre-image entropy. Discrete and Continuous Dynamical Systems, 2007, 17 (1) : 107-119. doi: 10.3934/dcds.2007.17.107 |
[3] |
Feliz Minhós, A. I. Santos. Higher order two-point boundary value problems with asymmetric growth. Discrete and Continuous Dynamical Systems - S, 2008, 1 (1) : 127-137. doi: 10.3934/dcdss.2008.1.127 |
[4] |
Wenming Zou. Multiple solutions results for two-point boundary value problem with resonance. Discrete and Continuous Dynamical Systems, 1998, 4 (3) : 485-496. doi: 10.3934/dcds.1998.4.485 |
[5] |
Frédéric Legoll, William Minvielle. Variance reduction using antithetic variables for a nonlinear convex stochastic homogenization problem. Discrete and Continuous Dynamical Systems - S, 2015, 8 (1) : 1-27. doi: 10.3934/dcdss.2015.8.1 |
[6] |
Jerry L. Bona, Hongqiu Chen, Shu-Ming Sun, Bing-Yu Zhang. Comparison of quarter-plane and two-point boundary value problems: The KdV-equation. Discrete and Continuous Dynamical Systems - B, 2007, 7 (3) : 465-495. doi: 10.3934/dcdsb.2007.7.465 |
[7] |
Xiao-Yu Zhang, Qing Fang. A sixth order numerical method for a class of nonlinear two-point boundary value problems. Numerical Algebra, Control and Optimization, 2012, 2 (1) : 31-43. doi: 10.3934/naco.2012.2.31 |
[8] |
Jerry Bona, Hongqiu Chen, Shu Ming Sun, B.-Y. Zhang. Comparison of quarter-plane and two-point boundary value problems: the BBM-equation. Discrete and Continuous Dynamical Systems, 2005, 13 (4) : 921-940. doi: 10.3934/dcds.2005.13.921 |
[9] |
Shao-Yuan Huang, Shin-Hwa Wang. On S-shaped bifurcation curves for a two-point boundary value problem arising in a theory of thermal explosion. Discrete and Continuous Dynamical Systems, 2015, 35 (10) : 4839-4858. doi: 10.3934/dcds.2015.35.4839 |
[10] |
Marcel Lesieur. Two-point closure based large-eddy simulations in turbulence, Part 1: Isotropic turbulence. Discrete and Continuous Dynamical Systems - S, 2011, 4 (1) : 155-168. doi: 10.3934/dcdss.2011.4.155 |
[11] |
Marcel Lesieur. Two-point closure based large-eddy simulations in turbulence. Part 2: Inhomogeneous cases. Discrete and Continuous Dynamical Systems, 2010, 28 (1) : 227-241. doi: 10.3934/dcds.2010.28.227 |
[12] |
Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008 |
[13] |
Tengteng Yu, Xin-Wei Liu, Yu-Hong Dai, Jie Sun. Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. Journal of Industrial and Management Optimization, 2022, 18 (4) : 2611-2631. doi: 10.3934/jimo.2021084 |
[14] |
Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178 |
[15] |
Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011 |
[16] |
Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics and Games, 2021, 8 (3) : 203-231. doi: 10.3934/jdg.2021008 |
[17] |
Sriram Nagaraj. Optimization and learning with nonlocal calculus. Foundations of Data Science, 2022 doi: 10.3934/fods.2022009 |
[18] |
Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008 |
[19] |
Joon Kwon, Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of Dynamics and Games, 2017, 4 (2) : 125-148. doi: 10.3934/jdg.2017008 |
[20] |
Aude Hofleitner, Tarek Rabbani, Mohammad Rafiee, Laurent El Ghaoui, Alex Bayen. Learning and estimation applications of an online homotopy algorithm for a generalization of the LASSO. Discrete and Continuous Dynamical Systems - S, 2014, 7 (3) : 503-523. doi: 10.3934/dcdss.2014.7.503 |
2021 Impact Factor: 1.411
Tools
Metrics
Other articles
by authors
[Back to Top]