\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Accelerating Metropolis-Hastings algorithms by Delayed Acceptance

  • * Corresponding author: Christian Robert

    * Corresponding author: Christian Robert 
Abstract / Introduction Full Text(HTML) Figure(5) / Table(3) Related Papers Cited by
  • MCMC algorithms such as Metropolis--Hastings algorithms are slowed down by the computation of complex target distributions as exemplified by huge datasets. We offer a useful generalisation of the Delayed Acceptance approach, devised to reduce such computational costs by a simple and universal divide-and-conquer strategy. The generic acceleration stems from breaking the acceptance step into several parts, aiming at a major gain in computing time that out-ranks a corresponding reduction in acceptance probability. Each component is sequentially compared with a uniform variate, the first rejection terminating this iteration. We develop theoretical bounds for the variance of associated estimators against the standard Metropolis--Hastings and produce results on optimal scaling and general optimisation of the procedure.

    Mathematics Subject Classification: Primary: 68U20, 65C40; Secondary: 62C10.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Fit of a two-step Metropolis-Hastings algorithm applied to a normal-normal posterior distribution $ \mu|x\sim N(x/(\{1+\sigma_\mu^{-2}\}, 1/\{1+\sigma_\mu^{-2}\}) $ when $ x = 3 $ and $ \sigma_\mu = 10 $, based on $ T = 10^5 $ iterations and a first acceptance step considering the likelihood ratio and a second acceptance step considering the prior ratio, resulting in an overall acceptance rate of 12%

    Figure 2.  (left) Fit of a multiple-step Metropolis-Hastings algorithm applied to a Beta-binomial posterior distribution $ p|x\sim Be(x+a, n+b-x) $ when $ N = 100 $, $ x = 32 $, $ a = 7.5 $ and $ b = .5 $. The binomial $ \mathcal{B}(N, p) $ likelihood is replaced with a product of $ 100 $ Bernoulli terms and an acceptance step is considered for the ratio of each term. The histogram is based on $ 10^5 $ iterations, with an overall acceptance rate of 9%; (centre) raw sequence of successive values of $ p $ in the Markov chain simulated in the above experiment; (right) autocorrelogram of the above sequence

    Figure 3.  Two top panels: behaviour of $\ell^*(\delta)$ and $\alpha^*(\delta)$ as the relative cost varies. Note that for $\delta >> 1$ the optimal values converges towards the values computed for the standard Metropolis--Hastings (dashed in red). Two bottom panels: close--up of the interesting region for $0 < \delta < 1$.

    Figure 4.  Optimal acceptance rate for the DA-MALA algorithm as a function of $\delta$. In red, the optimal acceptance rate for MALA obtained by [27] is met for $\delta = 1$.

    Figure 5.  Comparison between geometric MALA (top panels) and geometric MALA with Delayed Acceptance (bottom panels): marginal chains for two arbitrary components (left), estimated marginal posterior density for an arbitrary component (middle), 1D chain trace evaluating mixing (right).

    Table 1.  Comparison between MH and MH with Delayed Acceptance on a logistic model. ESS is the effective sample size, ESJD the expected square jumping distance, time is the computation time

    Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.)
    DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
     | Show Table
    DownLoad: CSV

    Table 2.  Comparison between standard geometric MALA and geometric MALA with Delayed Acceptance, with ESS the effective sample size, ESJD the expected square jumping distance, time the computation time and a the observed acceptance rate

    Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.)
    MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03
    DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
     | Show Table
    DownLoad: CSV

    Table 3.  Comparison using different performance indicators in the example of mixture estimation, based on 100 replicas of the experiments according to model (9) with a sample size $ n = 500 $, $ 10^5 $ MH simulations and $ 500 $ samples for the prior estimation. ("ESS" is the effective sample size, "time" is the computational time). The actual averaged gain ($ \frac{ESS_{DA}/ESS_{MH}}{time_{DA}/time_{MH}} $) is $ 9.58 $, higher than the "double average" that the table above suggests as being around $ 5 $

    Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd)
    MH 1575.96 245.96 0.226 0.44 513.95 57.81
    MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
     | Show Table
    DownLoad: CSV
  • [1] C. AndrieuA. Lee and M. Vihola, et al., Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers, Bernoulli, 24 (2018), 842-872.  doi: 10.3150/15-BEJ785.
    [2] E. Angelino, E. Kohler, A. Waterland, M. Seltzer and R. Adams, Accelerating MCMC via parallel predictive prefetching, UAI'14 Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, (2014), 22-31.
    [3] R. Bardenet, A. Doucet and C. Holmes, On Markov chain Monte Carlo methods for tall data, The Journal of Machine Learning Research, 18 (2017), Paper No. 47, 43 pp.
    [4] A. Brockwell, Parallel Markov chain Monte Carlo simulation by pre-fetching, J. Comput. Graphical Stat., 15 (2006), 246-261.  doi: 10.1198/106186006X100579.
    [5] J. Christen and C. Fox, Markov chain Monte Carlo using an approximation, Journal of Computational and Graphical Statistics, 14 (2005), 795-810.  doi: 10.1198/106186005X76983.
    [6] O. F. ChristensenG. O. Roberts and J. S. Rosenthal, Scaling limits for the transient phase of local Metropolis-Hastings algorithms, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (2005), 253-268.  doi: 10.1111/j.1467-9868.2005.00500.x.
    [7] R. Cornish, P. Vanetti, A. Bouchard-Côté, G. Deligiannidis and A. Doucet, Scalable Metropolis-Hastings for exact Bayesian inference with large datasets, arXiv preprint, arXiv: 1901.09881.
    [8] L. Devroye, Nonuniform Random Variate Generation, Springer-Verlag, New York, 1986. doi: 10.1007/978-1-4613-8643-8.
    [9] J. Diebolt and C. P. Robert, Estimation of finite mixture distributions by Bayesian sampling, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 56 (1994), 363-375.  doi: 10.1111/j.2517-6161.1994.tb01985.x.
    [10] C. Fox and G. Nicholls, Sampling conductivity images via MCMC, The Art and Science of Bayesian Image Analysis, (1997), 91-100.
    [11] A. Gelfand and S. Sahu, On Markov chain Monte Carlo acceleration, J. Comput. Graph. Statist., 3 (1994), 261-276.  doi: 10.2307/1390911.
    [12] M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (2011), 123-214.  doi: 10.1111/j.1467-9868.2010.00765.x.
    [13] A. GolightlyD. Henderson and C. Sherlock, Delayed acceptance particle MCMC for exact inference in stochastic kinetic models, Statistics and Computing, 25 (2015), 1039-1055.  doi: 10.1007/s11222-014-9469-x.
    [14] H. JeffreysTheory of Probability, 1st ed. The Clarendon Press, Oxford, 1939. 
    [15] A. Korattikara, Y Chen and M. Welling, Austerity in MCMC land: Cutting the Metropolis-Hastings budget, In ICML 2014, International Conference on Machine Learning, (2014), 181-189.
    [16] G. MacLachlan and D. Peel, Finite Mixture Models, John Wiley, New York, 2000. doi: 10.1002/0471721182.
    [17] K.L. Mengersen and R. Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, Ann. Statist., 24 (1996), 101-121.  doi: 10.1214/aos/1033066201.
    [18] P. Neal and G. O. Roberts, Optimal scaling of random walk Metropolis algorithms with non-Gaussian proposals, Methodology and Computing in Applied Probability, 13 (2011), 583-601.  doi: 10.1007/s11009-010-9176-9.
    [19] R. Neal, Markov chain Monte Carlo methods based on 'slicing' the density function, Tech. rep., University of Toronto, 1997.
    [20] W. Neiswanger, C. Wang and E. Xing, Asymptotically exact, embarrassingly parallel MCMC, arXiv preprint, 2013, arXiv: 1311.4780.
    [21] P. Peskun, Optimum Monte Carlo sampling using Markov chains, Biometrika, 60 (1973), 607-612.  doi: 10.1093/biomet/60.3.607.
    [22] M. PlummerN. BestK. Cowles and K. Vines, CODA: Convergence diagnosis and output analysis for MCMC, R News, 6 (2006), 7-11. 
    [23] C. P. Robert, The Bayesian Choice, 2nd ed. Springer-Verlag, New York, 2001.
    [24] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd ed. Springer-Verlag, New York, 2004. doi: 10.1007/978-1-4757-4145-2.
    [25] C. P. Robert and D. M. Titterington, Reparameterisation strategies for hidden Markov models and Bayesian approaches to maximum likelihood estimation, Statistics and Computing, 8 (1998), 145-158.
    [26] G. O. RobertsA. Gelman and W. R. Gilks, Weak convergence and optimal scaling of random walk Metropolis algorithms, Ann. Appl. Probab., 7 (1997), 110-120.  doi: 10.1214/aoap/1034625254.
    [27] G. O. Roberts and J. S. Rosenthal, Optimal scaling for various Metropolis-Hastings algorithms, Statist. Science, 16 (2001), 351-367.  doi: 10.1214/ss/1015346320.
    [28] G. O. Roberts and J. S. Rosenthal, Coupling and ergodicity of adaptive MCMC, J. Applied Proba., 44 (2005), 458-475.  doi: 10.1239/jap/1183667414.
    [29] G. O. Roberts and O. Stramer, Langevin diffusions and Metropolis-Hastings algorithms, Methodology and Computing in Applied Probability, 4 (2002), 337-357.  doi: 10.1023/A:1023562417138.
    [30] G. O. Roberts and R. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika, 83 (1996), 95-110.  doi: 10.1093/biomet/83.1.95.
    [31] K. Roeder and L. Wasserman, Practical Bayesian density estimation using mixtures of Normals, J. American Statist. Assoc., 92 (1997), 894-902.  doi: 10.1080/01621459.1997.10474044.
    [32] S. Scott, A. Blocker, F. Bonassi, M. Chipman, E. George and R. McCulloch, Bayes and big data: The consensus Monte Carlo algorithm, EFaBBayes 250 Conference, 16 (2013).
    [33] C. SherlockA. Golightly and D. A. Henderson, Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods, Journal of Computational and Graphical Statistics, 26 (2017), 434-444.  doi: 10.1080/10618600.2016.1231064.
    [34] C. Sherlock and G. O. Roberts, Optimal scaling of the random walk Metropolis on elliptically symmetric unimodal targets, Bernoulli, 15 (2009), 774-798.  doi: 10.3150/08-BEJ176.
    [35] C. Sherlock, A. Thiery and A. Golightly, Efficiency of delayed-acceptance random walk Metropolis algorithms, arXiv preprint, 2015, arXiv: 1506.08155.
    [36] C. SherlockA. ThieryG. O. Roberts and J. S. Rosenthal, On the efficiency of pseudo-marginal random walk Metropolis algorithms, The Annals of Statistics, 43 (2015), 238-275.  doi: 10.1214/14-AOS1278.
    [37] A. Y. Shestopaloff and R. M. Neal, MCMC for non-linear state space models using ensembles of latent sequences, arXiv preprint, 2013, arXiv: 1305.0320.
    [38] M. Stephens, Bayesian Methods for Mixtures of Normal Distributions, Ph.D. thesis, University of Oxford, 1997.
    [39] I. Strid, Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach, Computational Statistics & Data Analysis, 54 (2010), 2814-2835.  doi: 10.1016/j.csda.2009.11.019.
    [40] L. Tierney, A note on Metropolis-Hastings kernels for general state spaces, Ann. Appl. Probab., 8 (1998), 1-9.  doi: 10.1214/aoap/1027961031.
    [41] L. Tierney and A. Mira, Some adaptive Monte Carlo methods for Bayesian inference, Statistics in Medicine, 18 (1998), 2507-2515.
    [42] X. Wang and D. Dunson, Parallel MCMC via Weierstrass sampler, arXiv preprint, 2013, arXiv: 1312.4605.
  • 加载中

Figures(5)

Tables(3)

SHARE

Article Metrics

HTML views(6195) PDF downloads(597) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return