doi: 10.3934/dcdss.2021157
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

Improving sampling accuracy of stochastic gradient MCMC methods via non-uniform subsampling of gradients

1. 

Georgia Institute of Technology, USA

2. 

Google Research, USA

3. 

School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China

*Corresponding author

© 2021 The Author(s). Published by AIMS, LLC. This is an Open Access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Received  April 2021 Revised  September 2021 Early access December 2021

Many Markov Chain Monte Carlo (MCMC) methods leverage gradient information of the potential function of target distribution to explore sample space efficiently. However, computing gradients can often be computationally expensive for large scale applications, such as those in contemporary machine learning. Stochastic Gradient (SG-)MCMC methods approximate gradients by stochastic ones, commonly via uniformly subsampled data points, and achieve improved computational efficiency, however at the price of introducing sampling error. We propose a non-uniform subsampling scheme to improve the sampling accuracy. The proposed exponentially weighted stochastic gradient (EWSG) is designed so that a non-uniform-SG-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method, and hence the inaccuracy due to SG approximation is reduced. EWSG differs from classical variance reduction (VR) techniques as it focuses on the entire distribution instead of just the variance; nevertheless, its reduced local variance is also proved. EWSG can also be viewed as an extension of the importance sampling idea, successful for stochastic-gradient-based optimizations, to sampling tasks. In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index, which is coupled to the MCMC algorithm. Numerical experiments are provided, not only to demonstrate EWSG's effectiveness, but also to guide hyperparameter choices, and validate our non-asymptotic global error bound despite of approximations in the implementation. Notably, while statistical accuracy is improved, convergence speed can be comparable to the uniform version, which renders EWSG a practical alternative to VR (but EWSG and VR can be combined too).

Citation: Ruilin Li, Xin Wang, Hongyuan Zha, Molei Tao. Improving sampling accuracy of stochastic gradient MCMC methods via non-uniform subsampling of gradients. Discrete & Continuous Dynamical Systems - S, doi: 10.3934/dcdss.2021157
References:
[1]

S. Ahn, A. Korattikara and M. Welling, Bayesian posterior sampling via stochastic gradient fisher scoring, In 29th International Conference on Machine Learning, ICML 2012, (2012), 1591–1598. Google Scholar

[2]

F. Bach, Stochastic gradient methods for machine learning, Technical report, INRIA - Ecole Normale Superieur, 2013. http://lear.inrialpes.fr/people/harchaoui/projects/gargantua/slides/bach_gargantua_nov2013.pdf. Google Scholar

[3]

J. BakerP. FearnheadE. Fox and C. Nemeth, Control variates for stochastic gradient MCMC, Stat. Comput., 29 (2019), 599-615.  doi: 10.1007/s11222-018-9826-2.  Google Scholar

[4]

R. Bardenet, A. Doucet and C. Holmes, On markov chain Monte Carlo methods for tall data, J. Mach. Learn. Res., 18 (2017), 43 pp.  Google Scholar

[5]

V. S. Borkar and S. K. Mitter, A strong approximation theorem for stochastic recursive algorithms, J. Optim. Theory Appl., 100 (1999), 499-513.  doi: 10.1023/A:1022630321574.  Google Scholar

[6]

N. Bou-RabeeA. Eberle and R. Zimmer, Coupling and convergence for Hamiltonian Monte Carlo, Ann. Appl. Probab., 30 (2018), 1209-1250.  doi: 10.1214/19-AAP1528.  Google Scholar

[7]

N. Bou-Rabee and H. Owhadi, Long-run accuracy of variational integrators in the stochastic context, SIAM J. Numer. Anal., 48 (2010), 278-297.  doi: 10.1137/090758842.  Google Scholar

[8]

N. Bou-Rabee and J. M. Sanz-Serna, Geometric integrators and the Hamiltonian Monte Carlo method, Acta Numer., 27 (2018), 113-206.  doi: 10.1017/S0962492917000101.  Google Scholar

[9] S. BrooksA. GelmanG. Jones and X.-L. Meng, Handbook of Markov Chain Monte Carlo, CRC press, 2011.  doi: 10.1201/b10905.  Google Scholar
[10]

N. ChatterjiN. FlammarionY. MaP. Bartlett and M. Jordan, On the theory of variance reduction for stochastic gradient monte carlo, ICML, (2018).   Google Scholar

[11]

C. ChenN. Ding and L. Carin, On the convergence of stochastic gradient mcmc algorithms with high-order integrators, Advances in Neural Information Processing Systems, (2015), 2278-2286.   Google Scholar

[12]

T. ChenE. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, International Conference on Machine Learning, (2014), 1683-1691.   Google Scholar

[13]

X. Cheng, N. S. Chatterji, Y. Abbasi-Yadkori, P. L. Bartlett and M. I. Jordan, Sharp convergence rates for langevin dynamics in the nonconvex setting, preprint, arXiv: 1805.01648, 2018. Google Scholar

[14]

X. ChengN. S. ChatterjiP. L. Bartlett and M. I. Jordan, Underdamped langevin mcmc: A non-asymptotic analysis, Proceedings of the 31st Conference On Learning Theory, PMLR, (2018).   Google Scholar

[15]

D. Csiba and P. Richtárik, Importance sampling for minibatches, J. Mach. Learn. Res., 19 (2018), 21 pp.  Google Scholar

[16]

A. S. Dalalyan and A. Karagulyan, User-friendly guarantees for the langevin Monte Carlo with inaccurate gradient, Stochastic Process. Appl., 129 (2019), 5278-5311.  doi: 10.1016/j.spa.2019.02.016.  Google Scholar

[17]

A. Defazio, F. Bach and S. Lacoste-Julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, In Advances in Neural Information Processing Systems, (2014), 1646–1654. Google Scholar

[18]

A. Defazio and L. Bottou, On the ineffectiveness of variance reduced optimization for deep learning, In Advances in Neural Information Processing Systems, (2019), 1755–1765. Google Scholar

[19]

K. A. Dubey, S. J. Reddi, S. A. Williamson, B. Poczos, A. J. Smola and E. P. Xing, Variance reduction in stochastic gradient langevin dynamics, In Advances in Neural Information Processing Systems, (2016), 1154–1162. Google Scholar

[20]

T. Fu and Z. Zhang, Cpsg-mcmc: Clustering-based preprocessing method for stochastic gradient mcmc, In Artificial Intelligence and Statistics, (2017), 841–850. Google Scholar

[21]

K. Jarrett, K. Kavukcuoglu, M. A. Ranzato and Y. LeCun, What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th International Conference on Computer Vision, (2009), 2146–2153. doi: 10.1109/ICCV.2009.5459469.  Google Scholar

[22]

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, (2013), 315-323.   Google Scholar

[23]

A. Korattikara, Y. Chen and M. Welling, Austerity in mcmc land: Cutting the metropolis-hastings budget, In International Conference on Machine Learning, (2014), 181–189. Google Scholar

[24]

R. Kubo, The fluctuation-dissipation theorem, Reports on Progress in Physics, 29 (1966).  doi: 10.1088/0034-4885/29/1/306.  Google Scholar

[25]

C. Li, C. Chen, D. Carlson and L. Carin, Preconditioned stochastic gradient langevin dynamics for deep neural networks, In Thirtieth AAAI Conference on Artificial Intelligence, 2016. Google Scholar

[26]

Q. Li, C. Tai and E. Weinan, Stochastic modified equations and adaptive stochastic gradient algorithms, In J. Mach. Learn. Res., 20 (2019), 47 pp.  Google Scholar

[27]

R. Li, H. Zha and M. Tao, Mean-square analysis with an application to optimal dimension dependence of Langevin Monte Carlo, preprint, arXiv: 2109.03839, 2021. Google Scholar

[28]

M. Lichman et al, UCI Machine Learning Repository, 2013. Google Scholar

[29]

Y.-A. Ma, T. Chen and E. B. Fox, A complete recipe for stochastic gradient mcmc, In Advances in Neural Information Processing Systems, (2015), 2917–2925. Google Scholar

[30]

D. Maclaurin and R. P. Adams, Firefly monte carlo: Exact mcmc with subsets of data, In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. Google Scholar

[31]

S. Mandt, M. D. Hoffman and D. M. Blei, Stochastic gradient descent as approximate Bayesian inference, J. Mach. Learn. Res., 18 (2017), 134, 35 pp.  Google Scholar

[32]

J. C. MattinglyA. M. Stuart and M. V. Tretyakov, Convergence of numerical time-averaging and stationary measures via Poisson equations, SIAM J. Numer. Anal., 48 (2010), 552-577.  doi: 10.1137/090770527.  Google Scholar

[33]

D. NeedellR. Ward and N. Srebro, Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm, Math. Program., 155 (2016), 549-573.  doi: 10.1007/s10107-015-0864-7.  Google Scholar

[34]

S. Patterson and Y. W. Teh, Stochastic gradient Riemannian Langevin dynamics on the probability simplex, Advances in Neural Information Processing Systems, (2013), 3102-3110.   Google Scholar

[35]

G. A. Pavliotis, Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, Texts in Applied Mathematics, 60. Springer, New York, 2014. doi: 10.1007/978-1-4939-1323-7.  Google Scholar

[36]

G. O Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), 341-363.  doi: 10.2307/3318418.  Google Scholar

[37]

M. SchmidtR. BabanezhadM. AhmedA. DefazioA. Clifton and A. Sarkar, Non-uniform stochastic average gradient method for training conditional random fields, Artificial Intelligence and Statistics, (2015), 819-828.   Google Scholar

[38]

M. SchmidtN. L. Roux and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), 83-112.  doi: 10.1007/s10107-016-1030-6.  Google Scholar

[39]

M. Tao and T. Ohsawa, Variational optimization on lie groups, with examples of leading (generalized) eigenvalue problems, AISTATS, (2020).   Google Scholar

[40]

Y. W. Teh, A. H. Thiery and S. J. Vollmer, Consistency and fluctuations for stochastic gradient Langevin dynamics, J. Mach. Learn. Res., 17 (2016), 7, 33 pp.  Google Scholar

[41]

S. J. Vollmer, K. C. Zygalakis and Y. W. Teh, Exploration of the (non-) asymptotic bias and variance of stochastic gradient langevin dynamics, J. Mach. Learn. Res., 17 (2016), 159, 45 pp.  Google Scholar

[42]

M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, International Conference on Machine Learning, (2001), 681-688.   Google Scholar

[43]

A. G. Wilson, The case for bayesian deep learning, preprint, arXiv: 2001.10995, 2020. Google Scholar

[44]

R. ZhangA. F. Cooper and C. D. Sa, Asymptotically optimal exact minibatch metropolis-hastings, NeurIPS, (2020).   Google Scholar

[45]

R. Zhang and C. D. Sa, Poisson-minibatching for gibbs sampling with convergence rate guarantees, NeurIPS, (2019).   Google Scholar

[46]

P. Zhao and T. Zhang, Stochastic optimization with importance sampling for regularized loss minimization, International Conference on Machine Learning, (2015), 1-9.   Google Scholar

[47]

R. Zhu, Gradient-based sampling: An adaptive importance sampling for least-squares, Advances in Neural Information Processing Systems, (2016), 406-414.   Google Scholar

show all references

References:
[1]

S. Ahn, A. Korattikara and M. Welling, Bayesian posterior sampling via stochastic gradient fisher scoring, In 29th International Conference on Machine Learning, ICML 2012, (2012), 1591–1598. Google Scholar

[2]

F. Bach, Stochastic gradient methods for machine learning, Technical report, INRIA - Ecole Normale Superieur, 2013. http://lear.inrialpes.fr/people/harchaoui/projects/gargantua/slides/bach_gargantua_nov2013.pdf. Google Scholar

[3]

J. BakerP. FearnheadE. Fox and C. Nemeth, Control variates for stochastic gradient MCMC, Stat. Comput., 29 (2019), 599-615.  doi: 10.1007/s11222-018-9826-2.  Google Scholar

[4]

R. Bardenet, A. Doucet and C. Holmes, On markov chain Monte Carlo methods for tall data, J. Mach. Learn. Res., 18 (2017), 43 pp.  Google Scholar

[5]

V. S. Borkar and S. K. Mitter, A strong approximation theorem for stochastic recursive algorithms, J. Optim. Theory Appl., 100 (1999), 499-513.  doi: 10.1023/A:1022630321574.  Google Scholar

[6]

N. Bou-RabeeA. Eberle and R. Zimmer, Coupling and convergence for Hamiltonian Monte Carlo, Ann. Appl. Probab., 30 (2018), 1209-1250.  doi: 10.1214/19-AAP1528.  Google Scholar

[7]

N. Bou-Rabee and H. Owhadi, Long-run accuracy of variational integrators in the stochastic context, SIAM J. Numer. Anal., 48 (2010), 278-297.  doi: 10.1137/090758842.  Google Scholar

[8]

N. Bou-Rabee and J. M. Sanz-Serna, Geometric integrators and the Hamiltonian Monte Carlo method, Acta Numer., 27 (2018), 113-206.  doi: 10.1017/S0962492917000101.  Google Scholar

[9] S. BrooksA. GelmanG. Jones and X.-L. Meng, Handbook of Markov Chain Monte Carlo, CRC press, 2011.  doi: 10.1201/b10905.  Google Scholar
[10]

N. ChatterjiN. FlammarionY. MaP. Bartlett and M. Jordan, On the theory of variance reduction for stochastic gradient monte carlo, ICML, (2018).   Google Scholar

[11]

C. ChenN. Ding and L. Carin, On the convergence of stochastic gradient mcmc algorithms with high-order integrators, Advances in Neural Information Processing Systems, (2015), 2278-2286.   Google Scholar

[12]

T. ChenE. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, International Conference on Machine Learning, (2014), 1683-1691.   Google Scholar

[13]

X. Cheng, N. S. Chatterji, Y. Abbasi-Yadkori, P. L. Bartlett and M. I. Jordan, Sharp convergence rates for langevin dynamics in the nonconvex setting, preprint, arXiv: 1805.01648, 2018. Google Scholar

[14]

X. ChengN. S. ChatterjiP. L. Bartlett and M. I. Jordan, Underdamped langevin mcmc: A non-asymptotic analysis, Proceedings of the 31st Conference On Learning Theory, PMLR, (2018).   Google Scholar

[15]

D. Csiba and P. Richtárik, Importance sampling for minibatches, J. Mach. Learn. Res., 19 (2018), 21 pp.  Google Scholar

[16]

A. S. Dalalyan and A. Karagulyan, User-friendly guarantees for the langevin Monte Carlo with inaccurate gradient, Stochastic Process. Appl., 129 (2019), 5278-5311.  doi: 10.1016/j.spa.2019.02.016.  Google Scholar

[17]

A. Defazio, F. Bach and S. Lacoste-Julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, In Advances in Neural Information Processing Systems, (2014), 1646–1654. Google Scholar

[18]

A. Defazio and L. Bottou, On the ineffectiveness of variance reduced optimization for deep learning, In Advances in Neural Information Processing Systems, (2019), 1755–1765. Google Scholar

[19]

K. A. Dubey, S. J. Reddi, S. A. Williamson, B. Poczos, A. J. Smola and E. P. Xing, Variance reduction in stochastic gradient langevin dynamics, In Advances in Neural Information Processing Systems, (2016), 1154–1162. Google Scholar

[20]

T. Fu and Z. Zhang, Cpsg-mcmc: Clustering-based preprocessing method for stochastic gradient mcmc, In Artificial Intelligence and Statistics, (2017), 841–850. Google Scholar

[21]

K. Jarrett, K. Kavukcuoglu, M. A. Ranzato and Y. LeCun, What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th International Conference on Computer Vision, (2009), 2146–2153. doi: 10.1109/ICCV.2009.5459469.  Google Scholar

[22]

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, (2013), 315-323.   Google Scholar

[23]

A. Korattikara, Y. Chen and M. Welling, Austerity in mcmc land: Cutting the metropolis-hastings budget, In International Conference on Machine Learning, (2014), 181–189. Google Scholar

[24]

R. Kubo, The fluctuation-dissipation theorem, Reports on Progress in Physics, 29 (1966).  doi: 10.1088/0034-4885/29/1/306.  Google Scholar

[25]

C. Li, C. Chen, D. Carlson and L. Carin, Preconditioned stochastic gradient langevin dynamics for deep neural networks, In Thirtieth AAAI Conference on Artificial Intelligence, 2016. Google Scholar

[26]

Q. Li, C. Tai and E. Weinan, Stochastic modified equations and adaptive stochastic gradient algorithms, In J. Mach. Learn. Res., 20 (2019), 47 pp.  Google Scholar

[27]

R. Li, H. Zha and M. Tao, Mean-square analysis with an application to optimal dimension dependence of Langevin Monte Carlo, preprint, arXiv: 2109.03839, 2021. Google Scholar

[28]

M. Lichman et al, UCI Machine Learning Repository, 2013. Google Scholar

[29]

Y.-A. Ma, T. Chen and E. B. Fox, A complete recipe for stochastic gradient mcmc, In Advances in Neural Information Processing Systems, (2015), 2917–2925. Google Scholar

[30]

D. Maclaurin and R. P. Adams, Firefly monte carlo: Exact mcmc with subsets of data, In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. Google Scholar

[31]

S. Mandt, M. D. Hoffman and D. M. Blei, Stochastic gradient descent as approximate Bayesian inference, J. Mach. Learn. Res., 18 (2017), 134, 35 pp.  Google Scholar

[32]

J. C. MattinglyA. M. Stuart and M. V. Tretyakov, Convergence of numerical time-averaging and stationary measures via Poisson equations, SIAM J. Numer. Anal., 48 (2010), 552-577.  doi: 10.1137/090770527.  Google Scholar

[33]

D. NeedellR. Ward and N. Srebro, Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm, Math. Program., 155 (2016), 549-573.  doi: 10.1007/s10107-015-0864-7.  Google Scholar

[34]

S. Patterson and Y. W. Teh, Stochastic gradient Riemannian Langevin dynamics on the probability simplex, Advances in Neural Information Processing Systems, (2013), 3102-3110.   Google Scholar

[35]

G. A. Pavliotis, Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, Texts in Applied Mathematics, 60. Springer, New York, 2014. doi: 10.1007/978-1-4939-1323-7.  Google Scholar

[36]

G. O Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), 341-363.  doi: 10.2307/3318418.  Google Scholar

[37]

M. SchmidtR. BabanezhadM. AhmedA. DefazioA. Clifton and A. Sarkar, Non-uniform stochastic average gradient method for training conditional random fields, Artificial Intelligence and Statistics, (2015), 819-828.   Google Scholar

[38]

M. SchmidtN. L. Roux and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), 83-112.  doi: 10.1007/s10107-016-1030-6.  Google Scholar

[39]

M. Tao and T. Ohsawa, Variational optimization on lie groups, with examples of leading (generalized) eigenvalue problems, AISTATS, (2020).   Google Scholar

[40]

Y. W. Teh, A. H. Thiery and S. J. Vollmer, Consistency and fluctuations for stochastic gradient Langevin dynamics, J. Mach. Learn. Res., 17 (2016), 7, 33 pp.  Google Scholar

[41]

S. J. Vollmer, K. C. Zygalakis and Y. W. Teh, Exploration of the (non-) asymptotic bias and variance of stochastic gradient langevin dynamics, J. Mach. Learn. Res., 17 (2016), 159, 45 pp.  Google Scholar

[42]

M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, International Conference on Machine Learning, (2001), 681-688.   Google Scholar

[43]

A. G. Wilson, The case for bayesian deep learning, preprint, arXiv: 2001.10995, 2020. Google Scholar

[44]

R. ZhangA. F. Cooper and C. D. Sa, Asymptotically optimal exact minibatch metropolis-hastings, NeurIPS, (2020).   Google Scholar

[45]

R. Zhang and C. D. Sa, Poisson-minibatching for gibbs sampling with convergence rate guarantees, NeurIPS, (2019).   Google Scholar

[46]

P. Zhao and T. Zhang, Stochastic optimization with importance sampling for regularized loss minimization, International Conference on Machine Learning, (2015), 1-9.   Google Scholar

[47]

R. Zhu, Gradient-based sampling: An adaptive importance sampling for least-squares, Advances in Neural Information Processing Systems, (2016), 406-414.   Google Scholar

Figure 1.  Performance quantification (Gaussian target)
Figure 2.  BLR learning curve
Figure 3.  BNN learning curve. Shade: one standard deviation.
Figure 4.  KL divergence
Figure 5.  Posterior prediction of mean (left) and standard deviation (right) of log likelihood on test data set generated by SGHMC, EWSG and EWSG-VR on two Bayesian logistic regression tasks. Statistics are computed based on 1000 independent simulations. Minibatch size $ b = 1 $ for all methods except FG. $ M = 1 $ for EWSG and EWSG-VR
Figure 6.  (a) Histogram of data used in each iteration for FlyMC algorithm. (b) Autocorrelation plot of FlyMC, EWSG and MH. (c) Samples of EWSG. (d) Samples of FlyMC
Table 1.  Accuracy, log likelihood and wall time of various algorithms on test data after one data pass (mean $ \pm $ std)
Method SGLD pSGLD SGHMC EWSG FlyMC
Accuracy(%) 75.283 $ \pm $ 0.016 75.126 $ \pm $ 0.020 75.268 $ \pm $ 0.017 75.306 $ \pm $ 0.016 75.199 $ \pm $ 0.080
Log Likelihood -0.525 $ \pm $ 0.000 -0.526 $ \pm $ 0.000 -0.525 $ \pm $ 0.000 -0.523 $ \pm $ 0.000 -0.523 $ \pm $ 0.000
Wall Time (s) 3.085 $ \pm $ 0.283 4.312 $ \pm $ 0.359 3.145 $ \pm $ 0.307 3.755 $ \pm $ 0.387 291.295 $ \pm $ 56.368
Method SGLD pSGLD SGHMC EWSG FlyMC
Accuracy(%) 75.283 $ \pm $ 0.016 75.126 $ \pm $ 0.020 75.268 $ \pm $ 0.017 75.306 $ \pm $ 0.016 75.199 $ \pm $ 0.080
Log Likelihood -0.525 $ \pm $ 0.000 -0.526 $ \pm $ 0.000 -0.525 $ \pm $ 0.000 -0.523 $ \pm $ 0.000 -0.523 $ \pm $ 0.000
Wall Time (s) 3.085 $ \pm $ 0.283 4.312 $ \pm $ 0.359 3.145 $ \pm $ 0.307 3.755 $ \pm $ 0.387 291.295 $ \pm $ 56.368
Table 2.  Test error (mean $ \pm $ standard deviation) after 200 epoches
Method Test Error(%), MLP Test Error(%), CNN
SGLD 1.976 $ \pm $ 0.055 0.848 $ \pm $ 0.060
pSGLD 1.821 $ \pm $ 0.061 0.860 $ \pm $ 0.052
SGHMC 1.833 $ \pm $ 0.073 0.778 $ \pm $ 0.040
CP-SGHMC 1.835 $ \pm $ 0.047 0.772 $ \pm $ 0.055
EWSG 1.793 $ \pm $ 0.100 0.753 $ \pm $ 0.035
Method Test Error(%), MLP Test Error(%), CNN
SGLD 1.976 $ \pm $ 0.055 0.848 $ \pm $ 0.060
pSGLD 1.821 $ \pm $ 0.061 0.860 $ \pm $ 0.052
SGHMC 1.833 $ \pm $ 0.073 0.778 $ \pm $ 0.040
CP-SGHMC 1.835 $ \pm $ 0.047 0.772 $ \pm $ 0.055
EWSG 1.793 $ \pm $ 0.100 0.753 $ \pm $ 0.035
Table 3.  Test errors of EWSG (top of each cell) and SGHMC (bottom of each cell) after 200 epoches. $ b $ is minibatch size for EWSG, and minibatch size of SGHMC is set as $ b\times(M+1) $ to ensure the same number of data used per parameter update for both algorithms. Step size is set $ h = \frac{10}{b(M+1)} $ as suggested in [12], different from that used to produce Table 2. Results with smaller test error is highlighted in boldface
$ b $ $ M+1=2 $ $ M+1=5 $ $ M+1=10 $
$ 100 $ 1.86% 1.83% 1.80%
1.94% 1.92% 1.97%
$ 200 $ 1.90% 1.87% 1.80%
1.87% 1.97% 2.07%
$ 500 $ 1.79% 2.01% 2.36%
1.97% 2.17% 2.37%
$ b $ $ M+1=2 $ $ M+1=5 $ $ M+1=10 $
$ 100 $ 1.86% 1.83% 1.80%
1.94% 1.92% 1.97%
$ 200 $ 1.90% 1.87% 1.80%
1.87% 1.97% 2.07%
$ 500 $ 1.79% 2.01% 2.36%
1.97% 2.17% 2.37%
[1]

Yanqin Bai, Yudan Wei, Qian Li. An optimal trade-off model for portfolio selection with sensitivity of parameters. Journal of Industrial & Management Optimization, 2017, 13 (2) : 947-965. doi: 10.3934/jimo.2016055

[2]

Lihui Zhang, Xin Zou, Jianxun Qi. A trade-off between time and cost in scheduling repetitive construction projects. Journal of Industrial & Management Optimization, 2015, 11 (4) : 1423-1434. doi: 10.3934/jimo.2015.11.1423

[3]

Reuven Cohen, Mira Gonen, Avishai Wool. Bounding the bias of tree-like sampling in IP topologies. Networks & Heterogeneous Media, 2008, 3 (2) : 323-332. doi: 10.3934/nhm.2008.3.323

[4]

Reza Lotfi, Zahra Yadegari, Seyed Hossein Hosseini, Amir Hossein Khameneh, Erfan Babaee Tirkolaee, Gerhard-Wilhelm Weber. A robust time-cost-quality-energy-environment trade-off with resource-constrained in project management: A case study for a bridge construction project. Journal of Industrial & Management Optimization, 2022, 18 (1) : 375-396. doi: 10.3934/jimo.2020158

[5]

Boris Kalinin, Victoria Sadovskaya. Normal forms for non-uniform contractions. Journal of Modern Dynamics, 2017, 11: 341-368. doi: 10.3934/jmd.2017014

[6]

Yakov Pesin, Vaughn Climenhaga. Open problems in the theory of non-uniform hyperbolicity. Discrete & Continuous Dynamical Systems, 2010, 27 (2) : 589-607. doi: 10.3934/dcds.2010.27.589

[7]

Pablo G. Barrientos, Abbas Fakhari. Ergodicity of non-autonomous discrete systems with non-uniform expansion. Discrete & Continuous Dynamical Systems - B, 2020, 25 (4) : 1361-1382. doi: 10.3934/dcdsb.2019231

[8]

Markus Bachmayr, Van Kien Nguyen. Identifiability of diffusion coefficients for source terms of non-uniform sign. Inverse Problems & Imaging, 2019, 13 (5) : 1007-1021. doi: 10.3934/ipi.2019045

[9]

Keaton Hamm, Longxiu Huang. Stability of sampling for CUR decompositions. Foundations of Data Science, 2020, 2 (2) : 83-99. doi: 10.3934/fods.2020006

[10]

Deng Lu, Maria De Iorio, Ajay Jasra, Gary L. Rosner. Bayesian inference for latent chain graphs. Foundations of Data Science, 2020, 2 (1) : 35-54. doi: 10.3934/fods.2020003

[11]

Sahani Pathiraja, Sebastian Reich. Discrete gradients for computational Bayesian inference. Journal of Computational Dynamics, 2019, 6 (2) : 385-400. doi: 10.3934/jcd.2019019

[12]

Vladimir Kazakov. Sampling - reconstruction procedure with jitter of markov continuous processes formed by stochastic differential equations of the first order. Conference Publications, 2009, 2009 (Special) : 433-441. doi: 10.3934/proc.2009.2009.433

[13]

Christopher Rackauckas, Qing Nie. Adaptive methods for stochastic differential equations via natural embeddings and rejection sampling with memory. Discrete & Continuous Dynamical Systems - B, 2017, 22 (7) : 2731-2761. doi: 10.3934/dcdsb.2017133

[14]

Alexandre J. Chorin, Fei Lu, Robert N. Miller, Matthias Morzfeld, Xuemin Tu. Sampling, feasibility, and priors in data assimilation. Discrete & Continuous Dynamical Systems, 2016, 36 (8) : 4227-4246. doi: 10.3934/dcds.2016.36.4227

[15]

Shixu Meng. A sampling type method in an electromagnetic waveguide. Inverse Problems & Imaging, 2021, 15 (4) : 745-762. doi: 10.3934/ipi.2021012

[16]

Zhong-Jie Han, Gen-Qi Xu. Spectrum and dynamical behavior of a kind of planar network of non-uniform strings with non-collocated feedbacks. Networks & Heterogeneous Media, 2010, 5 (2) : 315-334. doi: 10.3934/nhm.2010.5.315

[17]

Evangelos Evangelou. Approximate Bayesian inference for geostatistical generalised linear models. Foundations of Data Science, 2019, 1 (1) : 39-60. doi: 10.3934/fods.2019002

[18]

Donald L. DeAngelis, Bo Zhang. Effects of dispersal in a non-uniform environment on population dynamics and competition: A patch model approach. Discrete & Continuous Dynamical Systems - B, 2014, 19 (10) : 3087-3104. doi: 10.3934/dcdsb.2014.19.3087

[19]

Zhong-Jie Han, Gen-Qi Xu. Exponential decay in non-uniform porous-thermo-elasticity model of Lord-Shulman type. Discrete & Continuous Dynamical Systems - B, 2012, 17 (1) : 57-77. doi: 10.3934/dcdsb.2012.17.57

[20]

Izumi Takagi, Conghui Zhang. Existence and stability of patterns in a reaction-diffusion-ODE system with hysteresis in non-uniform media. Discrete & Continuous Dynamical Systems, 2021, 41 (7) : 3109-3140. doi: 10.3934/dcds.2020400

2020 Impact Factor: 2.425

Article outline

Figures and Tables

[Back to Top]