• Previous Article
    Differential equation method based on approximate augmented Lagrangian for nonlinear programming
  • JIMO Home
  • This Issue
  • Next Article
    Analytical modeling of laminated composite plates using Kirchhoff circuit and wave digital filters
September  2020, 16(5): 2253-2266. doi: 10.3934/jimo.2019052

Improved SVRG for finite sum structure optimization with application to binary classification

1. 

School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China

2. 

School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China

* Corresponding author: Wei Xue (E-mail: cswxue@ahut.edu.cn)

Received  May 2018 Revised  November 2018 Published  May 2019

This paper looks at a stochastic variance reduced gradient (SVRG) method for minimizing the sum of a finite number of smooth convex functions, which has been involved widely in the field of machine learning and data mining. Inspired by the excellent performance of two-point stepsize gradient method in batch learning, in this paper we present an improved SVRG algorithm, named stochastic two-point stepsize gradient method. Under some mild conditions, the proposed method achieves a linear convergence rate $ O(\rho^k) $ for smooth and strongly convex functions, where $ \rho\in(0.68, 1) $. Simulation experiments on several benchmark data sets are reported to demonstrate the performance of the proposed method.

Citation: Guangmei Shao, Wei Xue, Gaohang Yu, Xiao Zheng. Improved SVRG for finite sum structure optimization with application to binary classification. Journal of Industrial & Management Optimization, 2020, 16 (5) : 2253-2266. doi: 10.3934/jimo.2019052
References:
[1]

J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.  doi: 10.1093/imanum/8.1.141.  Google Scholar

[2] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.  doi: 10.1017/CBO9780511804441.  Google Scholar
[3]

Y. ChenL. Qi and Q. Wang, Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.  doi: 10.1007/s10915-015-0155-8.  Google Scholar

[4]

W. Cheng and Y. Dai, Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.  doi: 10.1090/mcom/3238.  Google Scholar

[5]

Y. Dai and R. Fletcher, On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.  doi: 10.1007/s10107-004-0516-9.  Google Scholar

[6]

Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.  doi: 10.1007/s00211-004-0569-y.  Google Scholar

[7]

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. Google Scholar

[8]

H. LiuZ. Liu and X. Dong, A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.  doi: 10.1007/s11590-017-1150-9.  Google Scholar

[9]

J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.  doi: 10.1137/140957639.  Google Scholar

[10]

E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942. doi: 10.1109/IJCNN.2017.7966219.  Google Scholar

[11]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004. doi: 10.1007/978-1-4419-8853-9.  Google Scholar

[12]

A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. Google Scholar

[13]

A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. Google Scholar

[14]

W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007. doi: 10.1002/9780470182963.  Google Scholar

[15]

M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.  doi: 10.1093/imanum/13.3.321.  Google Scholar

[16]

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.  doi: 10.1214/aoms/1177729586.  Google Scholar

[17]

N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. Google Scholar

[18]

Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. Google Scholar

[19]

C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. Google Scholar

[20]

Z. WenC. YangX. Liu and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.  doi: 10.1007/s10915-015-0061-0.  Google Scholar

[21]

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.  doi: 10.1137/140961791.  Google Scholar

[22]

W. XueW. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51.   Google Scholar

[23]

G. Yu and S. Niu, Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.  doi: 10.3934/jimo.2013.9.117.  Google Scholar

[24]

B. ZhouL. Gao and Y. Dai, Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.  doi: 10.1007/s10589-006-6446-0.  Google Scholar

[25]

R. ZhouX. Shen and L. Niu, A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.  doi: 10.1016/j.neucom.2017.08.048.  Google Scholar

show all references

References:
[1]

J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.  doi: 10.1093/imanum/8.1.141.  Google Scholar

[2] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.  doi: 10.1017/CBO9780511804441.  Google Scholar
[3]

Y. ChenL. Qi and Q. Wang, Computing extreme eigenvalues of large scale hankel tensors, J. Sci. Comput., 68 (2016), 716-738.  doi: 10.1007/s10915-015-0155-8.  Google Scholar

[4]

W. Cheng and Y. Dai, Gradient-based method with active set strategy for l1 optimization, Math. Comp., 87 (2018), 1283-1305.  doi: 10.1090/mcom/3238.  Google Scholar

[5]

Y. Dai and R. Fletcher, On the asymptotic behaviour of some new gradient methods, Math. Program., 103 (2005), 541-559.  doi: 10.1007/s10107-004-0516-9.  Google Scholar

[6]

Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., 100 (2005), 21-47.  doi: 10.1007/s00211-004-0569-y.  Google Scholar

[7]

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst., (2013), 315–323. Google Scholar

[8]

H. LiuZ. Liu and X. Dong, A new adaptive Barzilai and Borwein method for unconstrained optimization, Optim. Lett., 12 (2018), 845-873.  doi: 10.1007/s11590-017-1150-9.  Google Scholar

[9]

J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, Optim. Lett., 25 (2015), 829-855.  doi: 10.1137/140957639.  Google Scholar

[10]

E. Min, Y. Zhao, J. Long, C. Wu, K. Li and J. Yin, SVRG with adaptive epoch size, in Proc. IEEE Int. Joint Conf. Neural Netw., (2017), 2935–2942. doi: 10.1109/IJCNN.2017.7966219.  Google Scholar

[11]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, Boston, 2004. doi: 10.1007/978-1-4419-8853-9.  Google Scholar

[12]

A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, in Proc. Int. Conf. Artif. Intell. Statist., (2016), 195–203. Google Scholar

[13]

A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, in Adv. Neural Inf. Process. Syst., (2016), 1574–1582. Google Scholar

[14]

W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley and Sons, New Jersey, 2007. doi: 10.1002/9780470182963.  Google Scholar

[15]

M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal., 13 (1993), 321-326.  doi: 10.1093/imanum/13.3.321.  Google Scholar

[16]

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407.  doi: 10.1214/aoms/1177729586.  Google Scholar

[17]

N. L. Roux, M. Schmidt and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Adv. Neural Inf. Process. Syst., (2012), 2663–2671. Google Scholar

[18]

Z. Shen, H. Qian, T. Zhou and T. Mu, Adaptive variance reducing for stochastic gradient descent, in Proc. Int. Joint Conf. Artif. Intell., (2016), 1990–1996. Google Scholar

[19]

C. Tan, S. Ma, Y. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Adv. Neural Inf. Process. Syst., (2016), 685–693. Google Scholar

[20]

Z. WenC. YangX. Liu and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput., 66 (2016), 1175-1203.  doi: 10.1007/s10915-015-0061-0.  Google Scholar

[21]

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.  doi: 10.1137/140961791.  Google Scholar

[22]

W. XueW. Zhang and J. Ren, Online leaming based on stochastic spectral gradient, Comput. Sci., 43 (2016), 47-51.   Google Scholar

[23]

G. Yu and S. Niu, Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints, J. Ind. Manag. Optim., 9 (2013), 117-129.  doi: 10.3934/jimo.2013.9.117.  Google Scholar

[24]

B. ZhouL. Gao and Y. Dai, Gradient methods with adaptive step-sizes, Comput. Optim. Appl., 35 (2006), 69-86.  doi: 10.1007/s10589-006-6446-0.  Google Scholar

[25]

R. ZhouX. Shen and L. Niu, A fast algorithm for nonsmooth penalized clustering, Neurocomputing, 273 (2018), 583-592.  doi: 10.1016/j.neucom.2017.08.048.  Google Scholar

Figure 1.  Objective loss and test classification accuracy versus epochs on data set "a9a" (best viewed in color)
Figure 2.  Objective loss and test classification accuracy versus epochs on data set "w8a"
Figure 3.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a9a"
Figure 4.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w8a"
Figure 5.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "a1a"
Figure 6.  Evolutions of objective loss and test classification accuracy w.r.t. epochs for Eq. (14) on data set "w1a"
Figure 7.  Evolutions of objective loss and test classification accuracy versus epochs on data set "a9a" with different $ \eta_0 $
Figure 8.  Evolutions of objective loss and test classification accuracy versus epochs on data set "w8a" with different $ \eta_0 $
Figure 9.  Comparison results between SVM and STSG on test classification accuracy on four test data sets
Table 1.  Details of data sets
Data set $\sharp$ of training samples $\sharp$ of test samples $\sharp$ of dimension
a1a160530956123
a9a3256116281123
w1a247747272300
w8a4974914951300
Data set $\sharp$ of training samples $\sharp$ of test samples $\sharp$ of dimension
a1a160530956123
a9a3256116281123
w1a247747272300
w8a4974914951300
[1]

Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352

[2]

Lateef Olakunle Jolaoso, Maggie Aphane. Bregman subgradient extragradient method with monotone self-adjustment stepsize for solving pseudo-monotone variational inequalities and fixed point problems. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020178

[3]

Christian Beck, Lukas Gonon, Martin Hutzenthaler, Arnulf Jentzen. On existence and uniqueness properties for solutions of stochastic fixed point equations. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020320

[4]

Haodong Yu, Jie Sun. Robust stochastic optimization with convex risk measures: A discretized subgradient scheme. Journal of Industrial & Management Optimization, 2021, 17 (1) : 81-99. doi: 10.3934/jimo.2019100

[5]

Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021003

[6]

Min Ji, Xinna Ye, Fangyao Qian, T.C.E. Cheng, Yiwei Jiang. Parallel-machine scheduling in shared manufacturing. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020174

[7]

Javier Fernández, Cora Tori, Marcela Zuccalli. Lagrangian reduction of nonholonomic discrete mechanical systems by stages. Journal of Geometric Mechanics, 2020, 12 (4) : 607-639. doi: 10.3934/jgm.2020029

[8]

Zhiyan Ding, Qin Li, Jianfeng Lu. Ensemble Kalman Inversion for nonlinear problems: Weights, consistency, and variance bounds. Foundations of Data Science, 2020  doi: 10.3934/fods.2020018

[9]

Ying Lin, Qi Ye. Support vector machine classifiers by non-Euclidean margins. Mathematical Foundations of Computing, 2020, 3 (4) : 279-300. doi: 10.3934/mfc.2020018

[10]

Youming Guo, Tingting Li. Optimal control strategies for an online game addiction model with low and high risk exposure. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020347

[11]

Shuang Chen, Jinqiao Duan, Ji Li. Effective reduction of a three-dimensional circadian oscillator model. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020349

[12]

Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019

[13]

Ugo Bessi. Another point of view on Kusuoka's measure. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020404

[14]

Jiannan Zhang, Ping Chen, Zhuo Jin, Shuanming Li. Open-loop equilibrium strategy for mean-variance portfolio selection: A log-return model. Journal of Industrial & Management Optimization, 2021, 17 (2) : 765-777. doi: 10.3934/jimo.2019133

[15]

Gang Luo, Qingzhi Yang. The point-wise convergence of shifted symmetric higher order power method. Journal of Industrial & Management Optimization, 2021, 17 (1) : 357-368. doi: 10.3934/jimo.2019115

[16]

Sabira El Khalfaoui, Gábor P. Nagy. On the dimension of the subfield subcodes of 1-point Hermitian codes. Advances in Mathematics of Communications, 2021, 15 (2) : 219-226. doi: 10.3934/amc.2020054

[17]

Balázs Kósa, Karol Mikula, Markjoe Olunna Uba, Antonia Weberling, Neophytos Christodoulou, Magdalena Zernicka-Goetz. 3D image segmentation supported by a point cloud. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 971-985. doi: 10.3934/dcdss.2020351

[18]

Annegret Glitzky, Matthias Liero, Grigor Nika. Dimension reduction of thermistor models for large-area organic light-emitting diodes. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020460

[19]

Tien-Yu Lin, Bhaba R. Sarker, Chien-Jui Lin. An optimal setup cost reduction and lot size for economic production quantity model with imperfect quality and quantity discounts. Journal of Industrial & Management Optimization, 2021, 17 (1) : 467-484. doi: 10.3934/jimo.2020043

[20]

Min Xi, Wenyu Sun, Jun Chen. Survey of derivative-free optimization. Numerical Algebra, Control & Optimization, 2020, 10 (4) : 537-555. doi: 10.3934/naco.2020050

2019 Impact Factor: 1.366

Metrics

  • PDF downloads (203)
  • HTML views (627)
  • Cited by (0)

Other articles
by authors

[Back to Top]