# American Institute of Mathematical Sciences

June  2019, 1(2): 103-128. doi: 10.3934/fods.2019005

## Accelerating Metropolis-Hastings algorithms by Delayed Acceptance

 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel St, Bloomsbury, London WC1E 7HT, UK 2 Dipartimento di Economia, Università degli Studi "Gabriele D'Annunzio", Viale Pindaro, 42, 65127 Pescara, Italy 3 School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK 4 Department of Statistics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK

* Corresponding author: Christian Robert

Published  April 2019

MCMC algorithms such as Metropolis--Hastings algorithms are slowed down by the computation of complex target distributions as exemplified by huge datasets. We offer a useful generalisation of the Delayed Acceptance approach, devised to reduce such computational costs by a simple and universal divide-and-conquer strategy. The generic acceleration stems from breaking the acceptance step into several parts, aiming at a major gain in computing time that out-ranks a corresponding reduction in acceptance probability. Each component is sequentially compared with a uniform variate, the first rejection terminating this iteration. We develop theoretical bounds for the variance of associated estimators against the standard Metropolis--Hastings and produce results on optimal scaling and general optimisation of the procedure.

Citation: Marco Banterle, Clara Grazian, Anthony Lee, Christian P. Robert. Accelerating Metropolis-Hastings algorithms by Delayed Acceptance. Foundations of Data Science, 2019, 1 (2) : 103-128. doi: 10.3934/fods.2019005
##### References:

show all references

##### References:
Fit of a two-step Metropolis-Hastings algorithm applied to a normal-normal posterior distribution $\mu|x\sim N(x/(\{1+\sigma_\mu^{-2}\}, 1/\{1+\sigma_\mu^{-2}\})$ when $x = 3$ and $\sigma_\mu = 10$, based on $T = 10^5$ iterations and a first acceptance step considering the likelihood ratio and a second acceptance step considering the prior ratio, resulting in an overall acceptance rate of 12%
(left) Fit of a multiple-step Metropolis-Hastings algorithm applied to a Beta-binomial posterior distribution $p|x\sim Be(x+a, n+b-x)$ when $N = 100$, $x = 32$, $a = 7.5$ and $b = .5$. The binomial $\mathcal{B}(N, p)$ likelihood is replaced with a product of $100$ Bernoulli terms and an acceptance step is considered for the ratio of each term. The histogram is based on $10^5$ iterations, with an overall acceptance rate of 9%; (centre) raw sequence of successive values of $p$ in the Markov chain simulated in the above experiment; (right) autocorrelogram of the above sequence
Two top panels: behaviour of $\ell^*(\delta)$ and $\alpha^*(\delta)$ as the relative cost varies. Note that for $\delta >> 1$ the optimal values converges towards the values computed for the standard Metropolis--Hastings (dashed in red). Two bottom panels: close--up of the interesting region for $0 < \delta < 1$.
Optimal acceptance rate for the DA-MALA algorithm as a function of $\delta$. In red, the optimal acceptance rate for MALA obtained by [27] is met for $\delta = 1$.
Comparison between geometric MALA (top panels) and geometric MALA with Delayed Acceptance (bottom panels): marginal chains for two arbitrary components (left), estimated marginal posterior density for an arbitrary component (middle), 1D chain trace evaluating mixing (right).
Comparison between MH and MH with Delayed Acceptance on a logistic model. ESS is the effective sample size, ESJD the expected square jumping distance, time is the computation time
 Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.) DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
 Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.) DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
Comparison between standard geometric MALA and geometric MALA with Delayed Acceptance, with ESS the effective sample size, ESJD the expected square jumping distance, time the computation time and a the observed acceptance rate
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.) MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03 DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.) MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03 DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
Comparison using different performance indicators in the example of mixture estimation, based on 100 replicas of the experiments according to model (9) with a sample size $n = 500$, $10^5$ MH simulations and $500$ samples for the prior estimation. ("ESS" is the effective sample size, "time" is the computational time). The actual averaged gain ($\frac{ESS_{DA}/ESS_{MH}}{time_{DA}/time_{MH}}$) is $9.58$, higher than the "double average" that the table above suggests as being around $5$
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) MH 1575.96 245.96 0.226 0.44 513.95 57.81 MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) MH 1575.96 245.96 0.226 0.44 513.95 57.81 MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
 [1] Pierre-Etienne Druet. A theory of generalised solutions for ideal gas mixtures with Maxwell-Stefan diffusion. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020458 [2] Yifan Chen, Thomas Y. Hou. Function approximation via the subsampled Poincaré inequality. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 169-199. doi: 10.3934/dcds.2020296 [3] Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026 [4] Bahaaeldin Abdalla, Thabet Abdeljawad. Oscillation criteria for kernel function dependent fractional dynamic equations. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020443 [5] Andreu Ferré Moragues. Properties of multicorrelation sequences and large returns under some ergodicity assumptions. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020386 [6] Stefan Ruschel, Serhiy Yanchuk. The Spectrum of delay differential equations with multiple hierarchical large delays. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 151-175. doi: 10.3934/dcdss.2020321 [7] Marion Darbas, Jérémy Heleine, Stephanie Lohrengel. Numerical resolution by the quasi-reversibility method of a data completion problem for Maxwell's equations. Inverse Problems & Imaging, 2020, 14 (6) : 1107-1133. doi: 10.3934/ipi.2020056 [8] Dan Zhu, Rosemary A. Renaut, Hongwei Li, Tianyou Liu. Fast non-convex low-rank matrix decomposition for separation of potential field data using minimal memory. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020076 [9] Lingfeng Li, Shousheng Luo, Xue-Cheng Tai, Jiang Yang. A new variational approach based on level-set function for convex hull problem with outliers. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020070 [10] Mohammed Abdulrazaq Kahya, Suhaib Abduljabbar Altamir, Zakariya Yahya Algamal. Improving whale optimization algorithm for feature selection with a time-varying transfer function. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 87-98. doi: 10.3934/naco.2020017 [11] Wenmeng Geng, Kai Tao. Large deviation theorems for dirichlet determinants of analytic quasi-periodic jacobi operators with Brjuno-Rüssmann frequency. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5305-5335. doi: 10.3934/cpaa.2020240 [12] Annegret Glitzky, Matthias Liero, Grigor Nika. Dimension reduction of thermistor models for large-area organic light-emitting diodes. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020460 [13] Yongxiu Shi, Haitao Wan. Refined asymptotic behavior and uniqueness of large solutions to a quasilinear elliptic equation in a borderline case. Electronic Research Archive, , () : -. doi: 10.3934/era.2020119

Impact Factor: