[1]
|
J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148.
doi: 10.1093/imanum/8.1.141.
|
[2]
|
J. F. Bonnans, J. Ch. Gilbert, C. Lemaréchal and C. A. Sagastizábal, A family of variable metric proximal methods, Math. Programming, 68 (1995), 15-47.
doi: 10.1007/BF01585756.
|
[3]
|
L. Bottou, F. E. Curtis and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223-311.
doi: 10.1137/16M1080173.
|
[4]
|
Y.-H. Dai, M. Al-Baali and X. Yang, A positive Barzilai-Borwein-like stepsize and an extension for symmetric linear systems, Numerical Analysis and Optimization, 134 (2015), 59-75.
doi: 10.1007/978-3-319-17689-5_3.
|
[5]
|
Y.-H. Dai, Y. Huang and X.-W. Liu, A family of spectral gradient methods for optimization, Comput. Optim. Appl., 74 (2019), 43-65.
doi: 10.1007/s10589-019-00107-8.
|
[6]
|
A. Defazio, F. Bach and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Advances in Neural Information Processing Systems, (2014), 1646-1654.
|
[7]
|
R. Fletcher, On the Barzilai-Borwein method, in Optimization and control with applications, Springer, New York, 96 (2005), 235-256.
doi: 10.1007/0-387-24255-4_10.
|
[8]
|
S. Ghadimi and G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), 2341-2368.
doi: 10.1137/120880811.
|
[9]
|
S. Ghadimi, G. Lan and H. Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Math. Program., 155 (2016), 267-305.
doi: 10.1007/s10107-014-0846-1.
|
[10]
|
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics. Springer, New York, 2009.
doi: 10.1007/978-0-387-84858-7.
|
[11]
|
Y. Huang, Y.-H. Dai, X.-W. Liu and H. Zhang, Gradient methods exploiting spectral properties, Optim. Methods Softw., 35 (2020), 681-705.
doi: 10.1080/10556788.2020.1727476.
|
[12]
|
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems, (2013), 315-323.
|
[13]
|
H. Karimi, J. Nutini and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, (2016), 795-811.
doi: 10.1007/978-3-319-46128-1_50.
|
[14]
|
J. Konečnỳ, J. Liu, P. Richtárik and M. Takáč, Mini-batch semi-stochastic gradient descent in the proximal setting, IEEE Journal of Selected Topics in Signal Processing, 10 (2015), 242-255.
|
[15]
|
J. Konečnỳ and P. Richtárik, Semi-stochastic gradient descent methods, Frontiers in Applied Mathematics and Statistics, 3 (2017), 9.
|
[16]
|
L. Lei, C. Ju, J. Chen and M. I. Jordan, Non-convex finite-sum optimization via SCSG methods, in Advances in Neural Information Processing Systems, (2017), 2348-2358.
|
[17]
|
Z. Li and J. Li, A simple proximal stochastic gradient method for nonsmooth nonconvex optimization, in Advances in Neural Information Processing Systems, (2018), 5564-5574.
|
[18]
|
Y. Liu, X. Wang and T. Guo, A linearly convergent stochastic recursive gradient method for convex optimization, Optim. Lett., 14 (2020), 2265-2283.
doi: 10.1007/s11590-020-01550-x.
|
[19]
|
Y. Nesterov, Introductory Lectures on Convex Programming: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9.
|
[20]
|
L. M. Nguyen, J. Liu, K. Scheinberg and M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 2613-2621.
|
[21]
|
L. A. Parente, P. A. Lotito and M. V. Solodov, A class of inexact variable metric proximal point algorithms, SIAM J. Optim., 19 (2008), 240-260.
doi: 10.1137/070688146.
|
[22]
|
N. Parikh and S. Boyd et al., Proximal algorithms, Foundations and Trends in Optimization, 1 (2014), 127-239.
|
[23]
|
Y. Park, S. Dhar, S. Boyd and M. Shah, Variable metric proximal gradient method with diagonal Barzilai-Borwein stepsize, (2019), arXiv: 1910.07056.
|
[24]
|
N. H. Pham, L. M. Nguyen, D. T. Phan and Q. Tran-Dinh, ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization, J. Mach. Learn. Res., 21 (2020), Paper No. 110, 48 pp.
|
[25]
|
S. J. Reddi, A. Hefny, S. Sra, B. Poczos and A. Smola, Stochastic variance reduction for nonconvex optimization, in International Conference on Machine Learning, (2016), 314-323.
|
[26]
|
S. J. Reddi, S. Sra, B. Poczos and A. J. Smola, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, in Advances in Neural Information Processing Systems, (2016), 1145-1153.
|
[27]
|
H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statistics, 22 (1951), 400-407.
doi: 10.1214/aoms/1177729586.
|
[28]
|
M. Schmidt, N. Le Roux and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), 83-112.
doi: 10.1007/s10107-016-1030-6.
|
[29]
|
F. Shang, K. Zhou, H. Liu, J. Cheng, I. W. Tsang, L. Zhang, D. Tao and L. Jiao, VR-SGD: A simple stochastic variance reduction method for machine learning, IEEE Transactions on Knowledge and Data Engineering, 32 (2020), 188-202.
doi: 10.1109/TKDE.2018.2878765.
|
[30]
|
C. Tan, S. Ma, Y. -H. Dai and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, in Advances in Neural Information Processing Systems, (2016), 685-693.
|
[31]
|
X. Wang, X. Wang and Y.-X. Yuan, Stochastic proximal quasi-Newton methods for non-convex composite optimization, Optim. Methods Softw., 34 (2019), 922-948.
doi: 10.1080/10556788.2018.1471141.
|
[32]
|
X. Wang, S. Wang and H. Zhang, Inexact proximal stochastic gradient method for convex composite optimization, Comput. Optim. Appl., 68 (2017), 579-618.
doi: 10.1007/s10589-017-9932-7.
|
[33]
|
X. Wang and H. Zhang, Inexact proximal stochastic second-order methods for nonconvex composite optimization, Optim. Methods Softw., 35 (2020), 808-835.
doi: 10.1080/10556788.2020.1713128.
|
[34]
|
L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075.
doi: 10.1137/140961791.
|
[35]
|
T. Yu, X. -W. Liu, Y. -H. Dai and J. Sun, A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai-Borwein stepsizes, IEEE Transactions on Neural Networks and Learning Systems, 2020.
doi: 10.1109/TNNLS. 2020.3025383.
|
[36]
|
T. Yu, X. -W. Liu, Y. -H. Dai and J. Sun, Stochastic variance reduced gradient methods using a trust-region-like scheme, J. Sci. Comput., 87 (2021), Article number: 5.
doi: 10.1007/s10915-020-01402-x.
|
[37]
|
T. Yu, X. -W. Liu, Y. -H. Dai and J. Sun, A variable metric mini-batch proximal stochastic recursive gradient algorithm with diagonal Barzilai-Borwein stepsize, 2020, arXiv: 2010.00817.
|