# American Institute of Mathematical Sciences

June  2019, 1(2): 103-128. doi: 10.3934/fods.2019005

## Accelerating Metropolis-Hastings algorithms by Delayed Acceptance

 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel St, Bloomsbury, London WC1E 7HT, UK 2 Dipartimento di Economia, Università degli Studi "Gabriele D'Annunzio", Viale Pindaro, 42, 65127 Pescara, Italy 3 School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK 4 Department of Statistics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK

* Corresponding author: Christian Robert

Published  April 2019

MCMC algorithms such as Metropolis--Hastings algorithms are slowed down by the computation of complex target distributions as exemplified by huge datasets. We offer a useful generalisation of the Delayed Acceptance approach, devised to reduce such computational costs by a simple and universal divide-and-conquer strategy. The generic acceleration stems from breaking the acceptance step into several parts, aiming at a major gain in computing time that out-ranks a corresponding reduction in acceptance probability. Each component is sequentially compared with a uniform variate, the first rejection terminating this iteration. We develop theoretical bounds for the variance of associated estimators against the standard Metropolis--Hastings and produce results on optimal scaling and general optimisation of the procedure.

Citation: Marco Banterle, Clara Grazian, Anthony Lee, Christian P. Robert. Accelerating Metropolis-Hastings algorithms by Delayed Acceptance. Foundations of Data Science, 2019, 1 (2) : 103-128. doi: 10.3934/fods.2019005
##### References:

show all references

##### References:
Fit of a two-step Metropolis-Hastings algorithm applied to a normal-normal posterior distribution $\mu|x\sim N(x/(\{1+\sigma_\mu^{-2}\}, 1/\{1+\sigma_\mu^{-2}\})$ when $x = 3$ and $\sigma_\mu = 10$, based on $T = 10^5$ iterations and a first acceptance step considering the likelihood ratio and a second acceptance step considering the prior ratio, resulting in an overall acceptance rate of 12%
(left) Fit of a multiple-step Metropolis-Hastings algorithm applied to a Beta-binomial posterior distribution $p|x\sim Be(x+a, n+b-x)$ when $N = 100$, $x = 32$, $a = 7.5$ and $b = .5$. The binomial $\mathcal{B}(N, p)$ likelihood is replaced with a product of $100$ Bernoulli terms and an acceptance step is considered for the ratio of each term. The histogram is based on $10^5$ iterations, with an overall acceptance rate of 9%; (centre) raw sequence of successive values of $p$ in the Markov chain simulated in the above experiment; (right) autocorrelogram of the above sequence
Two top panels: behaviour of $\ell^*(\delta)$ and $\alpha^*(\delta)$ as the relative cost varies. Note that for $\delta >> 1$ the optimal values converges towards the values computed for the standard Metropolis--Hastings (dashed in red). Two bottom panels: close--up of the interesting region for $0 < \delta < 1$.
Optimal acceptance rate for the DA-MALA algorithm as a function of $\delta$. In red, the optimal acceptance rate for MALA obtained by [27] is met for $\delta = 1$.
Comparison between geometric MALA (top panels) and geometric MALA with Delayed Acceptance (bottom panels): marginal chains for two arbitrary components (left), estimated marginal posterior density for an arbitrary component (middle), 1D chain trace evaluating mixing (right).
Comparison between MH and MH with Delayed Acceptance on a logistic model. ESS is the effective sample size, ESJD the expected square jumping distance, time is the computation time
 Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.) DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
 Algorithm rel. ESS (av.) rel. ESJD (av.) rel. Time (av.) rel. gain (ESS)(av.) rel. gain (ESJD)(av.) DA-MH over MH 1.1066 12.962 0.098 5.47 56.18
Comparison between standard geometric MALA and geometric MALA with Delayed Acceptance, with ESS the effective sample size, ESJD the expected square jumping distance, time the computation time and a the observed acceptance rate
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.) MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03 DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) a(aver.) ESS/time (aver.) ESJD/time (aver.) MALA 7504.48 107.21 5244.94 983.47 176078 1562.3 0.661 0.04 0.03 DA-MALA 6081.02 121.42 5373.253 2148.76 17342.91 6688.3 0.09 0.35 0.31
Comparison using different performance indicators in the example of mixture estimation, based on 100 replicas of the experiments according to model (9) with a sample size $n = 500$, $10^5$ MH simulations and $500$ samples for the prior estimation. ("ESS" is the effective sample size, "time" is the computational time). The actual averaged gain ($\frac{ESS_{DA}/ESS_{MH}}{time_{DA}/time_{MH}}$) is $9.58$, higher than the "double average" that the table above suggests as being around $5$
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) MH 1575.96 245.96 0.226 0.44 513.95 57.81 MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
 Algorithm ESS (av.) (sd) ESJD (av.) (sd) time (av.) (sd) MH 1575.96 245.96 0.226 0.44 513.95 57.81 MH + DA 628.77 87.86 0.215 0.45 42.22 22.95
 [1] Laura Aquilanti, Simone Cacace, Fabio Camilli, Raul De Maio. A Mean Field Games model for finite mixtures of Bernoulli and categorical distributions. Journal of Dynamics & Games, 2020  doi: 10.3934/jdg.2020033 [2] Jing Qin, Shuang Li, Deanna Needell, Anna Ma, Rachel Grotheer, Chenxi Huang, Natalie Durgin. Stochastic greedy algorithms for multiple measurement vectors. Inverse Problems & Imaging, 2021, 15 (1) : 79-107. doi: 10.3934/ipi.2020066 [3] Stefan Siegmund, Petr Stehlík. Time scale-induced asynchronous discrete dynamical systems. Discrete & Continuous Dynamical Systems - B, 2021, 26 (2) : 1011-1029. doi: 10.3934/dcdsb.2020151 [4] Pierre-Etienne Druet. A theory of generalised solutions for ideal gas mixtures with Maxwell-Stefan diffusion. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020458 [5] Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019 [6] Ningyu Sha, Lei Shi, Ming Yan. Fast algorithms for robust principal component analysis with an upper bound on the rank. Inverse Problems & Imaging, 2021, 15 (1) : 109-128. doi: 10.3934/ipi.2020067 [7] Nicola Pace, Angelo Sonnino. On the existence of PD-sets: Algorithms arising from automorphism groups of codes. Advances in Mathematics of Communications, 2021, 15 (2) : 267-277. doi: 10.3934/amc.2020065 [8] Yu Yuan, Zhibin Liang, Xia Han. Optimal investment and reinsurance to minimize the probability of drawdown with borrowing costs. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021003 [9] Bing Liu, Ming Zhou. Robust portfolio selection for individuals: Minimizing the probability of lifetime ruin. Journal of Industrial & Management Optimization, 2021, 17 (2) : 937-952. doi: 10.3934/jimo.2020005 [10] Yifan Chen, Thomas Y. Hou. Function approximation via the subsampled Poincaré inequality. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 169-199. doi: 10.3934/dcds.2020296 [11] Longxiang Fang, Narayanaswamy Balakrishnan, Wenyu Huang. Stochastic comparisons of parallel systems with scale proportional hazards components equipped with starting devices. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021004 [12] Andrea Giorgini, Roger Temam, Xuan-Truong Vu. The Navier-Stokes-Cahn-Hilliard equations for mildly compressible binary fluid mixtures. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 337-366. doi: 10.3934/dcdsb.2020141 [13] Arthur Fleig, Lars Grüne. Strict dissipativity analysis for classes of optimal control problems involving probability density functions. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020053 [14] Fengwei Li, Qin Yue, Xiaoming Sun. The values of two classes of Gaussian periods in index 2 case and weight distributions of linear codes. Advances in Mathematics of Communications, 2021, 15 (1) : 131-153. doi: 10.3934/amc.2020049 [15] Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 [16] Paul E. Anderson, Timothy P. Chartier, Amy N. Langville, Kathryn E. Pedings-Behling. The rankability of weighted data from pairwise comparisons. Foundations of Data Science, 2021  doi: 10.3934/fods.2021002 [17] Bahaaeldin Abdalla, Thabet Abdeljawad. Oscillation criteria for kernel function dependent fractional dynamic equations. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020443 [18] Liping Tang, Ying Gao. Some properties of nonconvex oriented distance function and applications to vector optimization problems. Journal of Industrial & Management Optimization, 2021, 17 (1) : 485-500. doi: 10.3934/jimo.2020117 [19] Raimund Bürger, Christophe Chalons, Rafael Ordoñez, Luis Miguel Villada. A multiclass Lighthill-Whitham-Richards traffic model with a discontinuous velocity function. Networks & Heterogeneous Media, 2021  doi: 10.3934/nhm.2021004 [20] Emre Esentürk, Juan Velazquez. Large time behavior of exchange-driven growth. Discrete & Continuous Dynamical Systems - A, 2021, 41 (2) : 747-775. doi: 10.3934/dcds.2020299

Impact Factor: