# American Institute of Mathematical Sciences

June  2019, 1(2): 103-128. doi: 10.3934/fods.2019005

## Accelerating Metropolis-Hastings algorithms by Delayed Acceptance

 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel St, Bloomsbury, London WC1E 7HT, UK 2 Dipartimento di Economia, Università degli Studi "Gabriele D'Annunzio", Viale Pindaro, 42, 65127 Pescara, Italy 3 School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK 4 Department of Statistics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK

* Corresponding author: Christian Robert

Published  April 2019

MCMC algorithms such as Metropolis--Hastings algorithms are slowed down by the computation of complex target distributions as exemplified by huge datasets. We offer a useful generalisation of the Delayed Acceptance approach, devised to reduce such computational costs by a simple and universal divide-and-conquer strategy. The generic acceleration stems from breaking the acceptance step into several parts, aiming at a major gain in computing time that out-ranks a corresponding reduction in acceptance probability. Each component is sequentially compared with a uniform variate, the first rejection terminating this iteration. We develop theoretical bounds for the variance of associated estimators against the standard Metropolis--Hastings and produce results on optimal scaling and general optimisation of the procedure.

Citation: Marco Banterle, Clara Grazian, Anthony Lee, Christian P. Robert. Accelerating Metropolis-Hastings algorithms by Delayed Acceptance. Foundations of Data Science, 2019, 1 (2) : 103-128. doi: 10.3934/fods.2019005
##### References:

show all references

##### References:
Fit of a two-step Metropolis-Hastings algorithm applied to a normal-normal posterior distribution $\mu|x\sim N(x/(\{1+\sigma_\mu^{-2}\}, 1/\{1+\sigma_\mu^{-2}\})$ when $x = 3$ and $\sigma_\mu = 10$, based on $T = 10^5$ iterations and a first acceptance step considering the likelihood ratio and a second acceptance step considering the prior ratio, resulting in an overall acceptance rate of 12%
(left) Fit of a multiple-step Metropolis-Hastings algorithm applied to a Beta-binomial posterior distribution $p|x\sim Be(x+a, n+b-x)$ when $N = 100$, $x = 32$, $a = 7.5$ and $b = .5$. The binomial $\mathcal{B}(N, p)$ likelihood is replaced with a product of $100$ Bernoulli terms and an acceptance step is considered for the ratio of each term. The histogram is based on $10^5$ iterations, with an overall acceptance rate of 9%; (centre) raw sequence of successive values of $p$ in the Markov chain simulated in the above experiment; (right) autocorrelogram of the above sequence
Two top panels: behaviour of $\ell^*(\delta)$ and $\alpha^*(\delta)$ as the relative cost varies. Note that for $\delta >> 1$ the optimal values converges towards the values computed for the standard Metropolis--Hastings (dashed in red). Two bottom panels: close--up of the interesting region for $0 < \delta < 1$.
] is met for $\delta = 1$.">Figure 4.  Optimal acceptance rate for the DA-MALA algorithm as a function of $\delta$. In red, the optimal acceptance rate for MALA obtained by [27] is met for $\delta = 1$.
Comparison between geometric MALA (top panels) and geometric MALA with Delayed Acceptance (bottom panels): marginal chains for two arbitrary components (left), estimated marginal posterior density for an arbitrary component (middle), 1D chain trace evaluating mixing (right).
Comparison between MH and MH with Delayed Acceptance on a logistic model. ESS is the effective sample size, ESJD the expected square jumping distance, time is the computation time
 Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.) DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
 Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.) DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
Comparison between standard geometric MALA and geometric MALA with Delayed Acceptance, with ESS the effective sample size, ESJD the expected square jumping distance, time the computation time and a the observed acceptance rate
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.) MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03 DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.) MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03 DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
Comparison using different performance indicators in the example of mixture estimation, based on 100 replicas of the experiments according to model (9) with a sample size $n = 500$, $10^5$ MH simulations and $500$ samples for the prior estimation. ("ESS" is the effective sample size, "time" is the computational time). The actual averaged gain ($\frac{ESS_{DA}/ESS_{MH}}{time_{DA}/time_{MH}}$) is $9.58$, higher than the "double average" that the table above suggests as being around $5$
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) MH 1575.96 245.96 0.226 0.44 513.95 57.81 MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) MH 1575.96 245.96 0.226 0.44 513.95 57.81 MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
 [1] Habib Ammari, Josselin Garnier, Vincent Jugnon. Detection, reconstruction, and characterization algorithms from noisy data in multistatic wave imaging. Discrete & Continuous Dynamical Systems - S, 2015, 8 (3) : 389-417. doi: 10.3934/dcdss.2015.8.389 [2] Boris Kramer, John R. Singler. A POD projection method for large-scale algebraic Riccati equations. Numerical Algebra, Control & Optimization, 2016, 6 (4) : 413-435. doi: 10.3934/naco.2016018 [3] Hong Seng Sim, Wah June Leong, Chuei Yee Chen, Siti Nur Iqmal Ibrahim. Multi-step spectral gradient methods with modified weak secant relation for large scale unconstrained optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 377-387. doi: 10.3934/naco.2018024 [4] Miroslav Bulíček, Victoria Patel, Endre Süli, Yasemin Şengül. Existence of large-data global weak solutions to a model of a strain-limiting viscoelastic body. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021053 [5] Dandan Cheng, Qian Hao, Zhiming Li. Scale pressure for amenable group actions. Communications on Pure & Applied Analysis, 2021, 20 (3) : 1091-1102. doi: 10.3934/cpaa.2021008 [6] Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics & Games, 2021  doi: 10.3934/jdg.2021008 [7] Xianjun Wang, Huaguang Gu, Bo Lu. Big homoclinic orbit bifurcation underlying post-inhibitory rebound spike and a novel threshold curve of a neuron. Electronic Research Archive, , () : -. doi: 10.3934/era.2021023 [8] Shan-Shan Lin. Due-window assignment scheduling with learning and deterioration effects. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021081 [9] Sara Munday. On the derivative of the $\alpha$-Farey-Minkowski function. Discrete & Continuous Dynamical Systems, 2014, 34 (2) : 709-732. doi: 10.3934/dcds.2014.34.709 [10] Namsu Ahn, Soochan Kim. Optimal and heuristic algorithms for the multi-objective vehicle routing problem with drones for military surveillance operations. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021037 [11] Paul E. Anderson, Timothy P. Chartier, Amy N. Langville, Kathryn E. Pedings-Behling. The rankability of weighted data from pairwise comparisons. Foundations of Data Science, 2021, 3 (1) : 1-26. doi: 10.3934/fods.2021002 [12] Ralf Hielscher, Michael Quellmalz. Reconstructing a function on the sphere from its means along vertical slices. Inverse Problems & Imaging, 2016, 10 (3) : 711-739. doi: 10.3934/ipi.2016018 [13] Raimund Bürger, Christophe Chalons, Rafael Ordoñez, Luis Miguel Villada. A multiclass Lighthill-Whitham-Richards traffic model with a discontinuous velocity function. Networks & Heterogeneous Media, 2021, 16 (2) : 187-219. doi: 10.3934/nhm.2021004 [14] Mehmet Duran Toksari, Emel Kizilkaya Aydogan, Berrin Atalay, Saziye Sari. Some scheduling problems with sum of logarithm processing times based learning effect and exponential past sequence dependent delivery times. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021044 [15] Hao Li, Honglin Chen, Matt Haberland, Andrea L. Bertozzi, P. Jeffrey Brantingham. PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021039 [16] Omer Gursoy, Kamal Adli Mehr, Nail Akar. Steady-state and first passage time distributions for waiting times in the $MAP/M/s+G$ queueing model with generally distributed patience times. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021078 [17] Longxiang Fang, Narayanaswamy Balakrishnan, Wenyu Huang. Stochastic comparisons of parallel systems with scale proportional hazards components equipped with starting devices. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021004 [18] Vo Anh Khoa, Thi Kim Thoa Thieu, Ekeoma Rowland Ijioma. On a pore-scale stationary diffusion equation: Scaling effects and correctors for the homogenization limit. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2451-2477. doi: 10.3934/dcdsb.2020190 [19] Kin Ming Hui, Soojung Kim. Asymptotic large time behavior of singular solutions of the fast diffusion equation. Discrete & Continuous Dynamical Systems, 2017, 37 (11) : 5943-5977. doi: 10.3934/dcds.2017258 [20] Linlin Li, Bedreddine Ainseba. Large-time behavior of matured population in an age-structured model. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2561-2580. doi: 10.3934/dcdsb.2020195

Impact Factor: