|
[1]
|
S. Ahn, A. Korattikara and M. Welling, Bayesian posterior sampling via stochastic gradient fisher scoring, In 29th International Conference on Machine Learning, ICML 2012, (2012), 1591–1598.
|
|
[2]
|
F. Bach, Stochastic gradient methods for machine learning, Technical report, INRIA - Ecole Normale Superieur, 2013. http://lear.inrialpes.fr/people/harchaoui/projects/gargantua/slides/bach_gargantua_nov2013.pdf.
|
|
[3]
|
J. Baker, P. Fearnhead, E. Fox and C. Nemeth, Control variates for stochastic gradient MCMC, Stat. Comput., 29 (2019), 599-615.
doi: 10.1007/s11222-018-9826-2.
|
|
[4]
|
R. Bardenet, A. Doucet and C. Holmes, On markov chain Monte Carlo methods for tall data, J. Mach. Learn. Res., 18 (2017), 43 pp.
|
|
[5]
|
V. S. Borkar and S. K. Mitter, A strong approximation theorem for stochastic recursive algorithms, J. Optim. Theory Appl., 100 (1999), 499-513.
doi: 10.1023/A:1022630321574.
|
|
[6]
|
N. Bou-Rabee, A. Eberle and R. Zimmer, Coupling and convergence for Hamiltonian Monte Carlo, Ann. Appl. Probab., 30 (2018), 1209-1250.
doi: 10.1214/19-AAP1528.
|
|
[7]
|
N. Bou-Rabee and H. Owhadi, Long-run accuracy of variational integrators in the stochastic context, SIAM J. Numer. Anal., 48 (2010), 278-297.
doi: 10.1137/090758842.
|
|
[8]
|
N. Bou-Rabee and J. M. Sanz-Serna, Geometric integrators and the Hamiltonian Monte Carlo method, Acta Numer., 27 (2018), 113-206.
doi: 10.1017/S0962492917000101.
|
|
[9]
|
S. Brooks, A. Gelman, G. Jones and X.-L. Meng, Handbook of Markov Chain Monte Carlo, CRC press, 2011.
doi: 10.1201/b10905.
|
|
[10]
|
N. Chatterji, N. Flammarion, Y. Ma, P. Bartlett and M. Jordan, On the theory of variance reduction for stochastic gradient monte carlo, ICML, (2018).
|
|
[11]
|
C. Chen, N. Ding and L. Carin, On the convergence of stochastic gradient mcmc algorithms with high-order integrators, Advances in Neural Information Processing Systems, (2015), 2278-2286.
|
|
[12]
|
T. Chen, E. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, International Conference on Machine Learning, (2014), 1683-1691.
|
|
[13]
|
X. Cheng, N. S. Chatterji, Y. Abbasi-Yadkori, P. L. Bartlett and M. I. Jordan, Sharp convergence rates for langevin dynamics in the nonconvex setting, preprint, arXiv: 1805.01648, 2018.
|
|
[14]
|
X. Cheng, N. S. Chatterji, P. L. Bartlett and M. I. Jordan, Underdamped langevin mcmc: A non-asymptotic analysis, Proceedings of the 31st Conference On Learning Theory, PMLR, (2018).
|
|
[15]
|
D. Csiba and P. Richtárik, Importance sampling for minibatches, J. Mach. Learn. Res., 19 (2018), 21 pp.
|
|
[16]
|
A. S. Dalalyan and A. Karagulyan, User-friendly guarantees for the langevin Monte Carlo with inaccurate gradient, Stochastic Process. Appl., 129 (2019), 5278-5311.
doi: 10.1016/j.spa.2019.02.016.
|
|
[17]
|
A. Defazio, F. Bach and S. Lacoste-Julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, In Advances in Neural Information Processing Systems, (2014), 1646–1654.
|
|
[18]
|
A. Defazio and L. Bottou, On the ineffectiveness of variance reduced optimization for deep learning, In Advances in Neural Information Processing Systems, (2019), 1755–1765.
|
|
[19]
|
K. A. Dubey, S. J. Reddi, S. A. Williamson, B. Poczos, A. J. Smola and E. P. Xing, Variance reduction in stochastic gradient langevin dynamics, In Advances in Neural Information Processing Systems, (2016), 1154–1162.
|
|
[20]
|
T. Fu and Z. Zhang, Cpsg-mcmc: Clustering-based preprocessing method for stochastic gradient mcmc, In Artificial Intelligence and Statistics, (2017), 841–850.
|
|
[21]
|
K. Jarrett, K. Kavukcuoglu, M. A. Ranzato and Y. LeCun, What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th International Conference on Computer Vision, (2009), 2146–2153.
doi: 10.1109/ICCV.2009.5459469.
|
|
[22]
|
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, (2013), 315-323.
|
|
[23]
|
A. Korattikara, Y. Chen and M. Welling, Austerity in mcmc land: Cutting the metropolis-hastings budget, In International Conference on Machine Learning, (2014), 181–189.
|
|
[24]
|
R. Kubo, The fluctuation-dissipation theorem, Reports on Progress in Physics, 29 (1966).
doi: 10.1088/0034-4885/29/1/306.
|
|
[25]
|
C. Li, C. Chen, D. Carlson and L. Carin, Preconditioned stochastic gradient langevin dynamics for deep neural networks, In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
|
|
[26]
|
Q. Li, C. Tai and E. Weinan, Stochastic modified equations and adaptive stochastic gradient algorithms, In J. Mach. Learn. Res., 20 (2019), 47 pp.
|
|
[27]
|
R. Li, H. Zha and M. Tao, Mean-square analysis with an application to optimal dimension dependence of Langevin Monte Carlo, preprint, arXiv: 2109.03839, 2021.
|
|
[28]
|
M. Lichman et al, UCI Machine Learning Repository, 2013.
|
|
[29]
|
Y.-A. Ma, T. Chen and E. B. Fox, A complete recipe for stochastic gradient mcmc, In Advances in Neural Information Processing Systems, (2015), 2917–2925.
|
|
[30]
|
D. Maclaurin and R. P. Adams, Firefly monte carlo: Exact mcmc with subsets of data, In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
|
|
[31]
|
S. Mandt, M. D. Hoffman and D. M. Blei, Stochastic gradient descent as approximate Bayesian inference, J. Mach. Learn. Res., 18 (2017), 134, 35 pp.
|
|
[32]
|
J. C. Mattingly, A. M. Stuart and M. V. Tretyakov, Convergence of numerical time-averaging and stationary measures via Poisson equations, SIAM J. Numer. Anal., 48 (2010), 552-577.
doi: 10.1137/090770527.
|
|
[33]
|
D. Needell, R. Ward and N. Srebro, Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm, Math. Program., 155 (2016), 549-573.
doi: 10.1007/s10107-015-0864-7.
|
|
[34]
|
S. Patterson and Y. W. Teh, Stochastic gradient Riemannian Langevin dynamics on the probability simplex, Advances in Neural Information Processing Systems, (2013), 3102-3110.
|
|
[35]
|
G. A. Pavliotis, Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, Texts in Applied Mathematics, 60. Springer, New York, 2014.
doi: 10.1007/978-1-4939-1323-7.
|
|
[36]
|
G. O Roberts and R. L. Tweedie, et al., Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), 341-363.
doi: 10.2307/3318418.
|
|
[37]
|
M. Schmidt, R. Babanezhad, M. Ahmed, A. Defazio, A. Clifton and A. Sarkar, Non-uniform stochastic average gradient method for training conditional random fields, Artificial Intelligence and Statistics, (2015), 819-828.
|
|
[38]
|
M. Schmidt, N. L. Roux and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), 83-112.
doi: 10.1007/s10107-016-1030-6.
|
|
[39]
|
M. Tao and T. Ohsawa, Variational optimization on lie groups, with examples of leading (generalized) eigenvalue problems, AISTATS, (2020).
|
|
[40]
|
Y. W. Teh, A. H. Thiery and S. J. Vollmer, Consistency and fluctuations for stochastic gradient Langevin dynamics, J. Mach. Learn. Res., 17 (2016), 7, 33 pp.
|
|
[41]
|
S. J. Vollmer, K. C. Zygalakis and Y. W. Teh, Exploration of the (non-) asymptotic bias and variance of stochastic gradient langevin dynamics, J. Mach. Learn. Res., 17 (2016), 159, 45 pp.
|
|
[42]
|
M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, International Conference on Machine Learning, (2001), 681-688.
|
|
[43]
|
A. G. Wilson, The case for bayesian deep learning, preprint, arXiv: 2001.10995, 2020.
|
|
[44]
|
R. Zhang, A. F. Cooper and C. D. Sa, Asymptotically optimal exact minibatch metropolis-hastings, NeurIPS, (2020).
|
|
[45]
|
R. Zhang and C. D. Sa, Poisson-minibatching for gibbs sampling with convergence rate guarantees, NeurIPS, (2019).
|
|
[46]
|
P. Zhao and T. Zhang, Stochastic optimization with importance sampling for regularized loss minimization, International Conference on Machine Learning, (2015), 1-9.
|
|
[47]
|
R. Zhu, Gradient-based sampling: An adaptive importance sampling for least-squares, Advances in Neural Information Processing Systems, (2016), 406-414.
|