Data set | $\sharp$ of training samples | $\sharp$ of test samples | $\sharp$ of dimension |
a1a | 1605 | 30956 | 123 |
a9a | 32561 | 16281 | 123 |
w1a | 2477 | 47272 | 300 |
w8a | 49749 | 14951 | 300 |
This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate $ O(\rho^k) $ for smooth and strongly convex functions, where $ \rho\in(0.68, 1) $. Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method.
Citation: |
Table 1. Details of data sets
Data set | $\sharp$ of training samples | $\sharp$ of test samples | $\sharp$ of dimension |
a1a | 1605 | 30956 | 123 |
a9a | 32561 | 16281 | 123 |
w1a | 2477 | 47272 | 300 |
w8a | 49749 | 14951 | 300 |
[1] | J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148. doi: 10.1093/imanum/8.1.141. |
[2] | S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004. doi: 10.1017/CBO9780511804441. |
[3] | Y. Chen, L. Qi and Q. Wang, Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738. doi: 10.1007/s10915-015-0155-8. |
[4] | W. Cheng and Y. Dai, Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305. doi: 10.1090/mcom/3238. |
[5] | Y. Dai and R. Fletcher, On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559. doi: 10.1007/s10107-004-0516-9. |
[6] | Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47. doi: 10.1007/s00211-004-0569-y. |
[7] | R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. |
[8] | H. Liu, Z. Liu and X. Dong, A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873. doi: 10.1007/s11590-017-1150-9. |
[9] | J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855. doi: 10.1137/140957639. |
[10] | E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942. doi: 10.1109/IJCNN.2017.7966219. |
[11] | Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004. doi: 10.1007/978-1-4419-8853-9. |
[12] | A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. |
[13] | A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. |
[14] | W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007. doi: 10.1002/9780470182963. |
[15] | M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326. doi: 10.1093/imanum/13.3.321. |
[16] | H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407. doi: 10.1214/aoms/1177729586. |
[17] | N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. |
[18] | Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. |
[19] | C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. |
[20] | Z. Wen, C. Yang, X. Liu and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203. doi: 10.1007/s10915-015-0061-0. |
[21] | L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075. doi: 10.1137/140961791. |
[22] | W. Xue, W. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51. |
[23] | G. Yu and S. Niu, Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129. doi: 10.3934/jimo.2013.9.117. |
[24] | B. Zhou, L. Gao and Y. Dai, Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86. doi: 10.1007/s10589-006-6446-0. |
[25] | R. Zhou, X. Shen and L. Niu, A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592. doi: 10.1016/j.neucom.2017.08.048. |
Objective loss and test classification accuracy versus epochs on data set "a9a" (best viewed in color)
Objective loss and test classification accuracy versus epochs on data set "w8a"
Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a9a"
Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w8a"
Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a1a"
Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w1a"
Evolutions of objective loss and test classification accuracy versus epochs on data set "a9a" with different
Evolutions of objective loss and test classification accuracy versus epochs on data set "w8a" with different
Comparison results between SVM and STSG on test classification accuracy on four test data sets