Method | General | Special cases of | |
MALA |
|
| |
SMMALA |
|
| |
MMALA |
|
| |
AM |
|
| |
Manifold Markov chain Monte Carlo algorithms have been introduced to sample more effectively from challenging target densities exhibiting multiple modes or strong correlations. Such algorithms exploit the local geometry of the parameter space, thus enabling chains to achieve a faster convergence rate when measured in number of steps. However, acquiring local geometric information can often increase computational complexity per step to the extent that sampling from high-dimensional targets becomes inefficient in terms of total computational time. This paper analyzes the computational complexity of manifold Langevin Monte Carlo and proposes a geometric adaptive Monte Carlo sampler aimed at balancing the benefits of exploiting local geometry with computational cost to achieve a high effective sample size for a given computational cost. The suggested sampler is a discrete-time stochastic process in random environment. The random environment allows to switch between local geometric and adaptive proposal kernels with the help of a schedule. An exponential schedule is put forward that enables more frequent use of geometric information in early transient phases of the chain, while saving computational time in late stationary phases. The average complexity can be manually set depending on the need for geometric exploitation posed by the underlying model.
Citation: |
Figure 1. Overlaid running means as a function of Monte Carlo iteration and overlaid linear autocorrelations of single chains corresponding to one of the twenty, six and eleven parameters of the respective t-distribution, one-planet and two-planet system. The black horizontal line in the t-distribution running mean plot represents the true mode
Figure 2. Trace plots of single chains as a function of Monte Carlo iteration corresponding to one of the twenty and eleven parameters of the respective t-distribution and two-planet system. The same chains were used for generating the trace plots of figure 2 and the associated running means and autocorrelations of figure 1. The black horizontal lines in the t-distribution trace plots represent the true mode
Table 1.
General complexity bounds per step of MALA, SMMALA, MMALA and AM samplers, and two special cases of a log-target
Method | General | Special cases of | |
MALA |
|
| |
SMMALA |
|
| |
MMALA |
|
| |
AM |
|
| |
Table 2. Comparison of sampling efficacy between MALA, AM, SMMALA and GAMC for the t-distribution, one-planet and two-planet system. AR: acceptance rate; ESS: effective sample size; t: CPU runtime in seconds; ESS/t: smaller ESS across model parameters divided by runtime; Speed: ratio of ESS/t for MALA over ESS/t for each other sampler. All tabulated numbers have been rounded to the second decimal place, apart from effective sample sizes, which have been rounded to the nearest integer. The minimum, mean, median and maximum ESS across the effective sample sizes of the twenty, six and eleven parameters (associated with the respective t-distribution, one-planet and two-planet system) are displayed
Student’s t-distribution | ||||||||
Method | AR | ESS | t | ESS/t | Speed | |||
min | mean | median | max | |||||
MALA | 0.59 | 135 | 159 | 145 | 234 | 9.33 | 14.52 | 1.00 |
AM | 0.03 | 85 | 118 | 117 | 155 | 17.01 | 5.03 | 0.35 |
SMMALA | 0.71 | 74 | 87 | 86 | 96 | 143.63 | 0.52 | 0.04 |
GAMC | 0.26 | 1471 | 1558 | 1560 | 1629 | 31.81 | 46.23 | 3.18 |
One-planet system | ||||||||
Method | AR | ESS | t | ESS/t | Speed | |||
min | mean | median | max | |||||
MALA | 0.55 | 4 | 76 | 18 | 394 | 57.03 | 0.07 | 1.00 |
AM | 0.08 | 1230 | 1397 | 1279 | 2035 | 48.84 | 25.18 | 378.50 |
SMMALA | 0.71 | 464 | 597 | 646 | 658 | 208.46 | 2.23 | 33.45 |
GAMC | 0.30 | 1260 | 2113 | 2151 | 3032 | 76.80 | 16.41 | 246.59 |
Two-planet system | ||||||||
Method | AR | ESS | t | ESS/t | Speed | |||
min | mean | median | max | |||||
MALA | 0.59 | 5 | 52 | 10 | 377 | 219.31 | 0.02 | 1.00 |
AM | 0.01 | 18 | 84 | 82 | 248 | 81.24 | 0.22 | 9.05 |
SMMALA | 0.70 | 53 | 104 | 100 | 161 | 1606.92 | 0.03 | 1.37 |
GAMC | 0.32 | 210 | 561 | 486 | 1110 | 328.08 | 0.64 | 26.39 |
[1] | C. Andrieu and É. Moulines, On the ergodicity properties of some adaptive MCMC algorithms, Ann. Appl. Probab., 16 (2006), 1462-1505. doi: 10.1214/105051606000000286. |
[2] | Y. Bai, G. O. Roberts and J. S. Rosenthal, On the containment condition for adaptive Markov chain Monte Carlo algorithms, Adv. Appl. Stat., 21 (2011), 1-54. |
[3] | M. Betancourt, A general metric for Riemannian manifold Hamiltonian Monte Carlo, in Geometric Science of Information, Lecture Notes in Comput. Sci., 8085, Springer, Heidelberg, 2013,327–334. doi: 10.1007/978-3-642-40020-9_35. |
[4] | B. Calderhead, M. Epstein, L. Sivilotti and M. Girolami, Bayesian approaches for mechanistic ion channel modeling, in In Silico Systems Biology, Methods in Molecular Biology, 1021, Humana Press, Totowa, NJ, 2013, 247-272. doi: 10.1007/978-1-62703-450-0_13. |
[5] | B. Calderhead and M. Girolami, Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods, Interface Focus, 1 (2011). doi: 10.1098/rsfs.2011.0051. |
[6] | S. Chib and E. Greenberg, Understanding the Metropolis-Hastings algorithm, Amer. Statistician, 49 (1995), 327-335. doi: 10.2307/2684568. |
[7] | A. M. Davie and A. J. Stothers, Improved bound for complexity of matrix multiplication, Proc. Roy. Soc. Edinburgh Sect. A, 143 (2013), 351-369. doi: 10.1017/S0308210511001648. |
[8] | S. Duane, A. D. Kennedy, B. J. Pendleton and D. Roweth, Hybrid Monte Carlo, Phys. Lett. B, 195 (1987), 216-222. doi: 10.1016/0370-2693(87)91197-X. |
[9] | E. B. Ford, Improving the efficiency of Markov chain Monte Carlo for analyzing the orbits of extrasolar planets, Astrophysical J., 642 (2006), 505-522. doi: 10.1086/500802. |
[10] | E. B. Ford, Quantifying the uncertainty in the orbits of extrasolar planets, Astronomical J., 129 (2005), 1706-1717. doi: 10.1086/427962. |
[11] | F. L. Gall, Powers of tensors and fast matrix multiplication, in Proceedings of the 39th international symposium on symbolic and algebraic computation, Association for Computing Machinery, 2014,296–303. doi: 10.1145/2608628.2608664. |
[12] | C. J. Geyer, Practical Markov chain Monte Carlo, Statist. Sci., 7 (1992), 473-483. doi: 10.1214/ss/1177011137. |
[13] | P. E. Gill, G. H. Golub, W. Murray and M. A. Saunders, Methods for modifying matrix factorizations, Math. Comp., 28 (1974), 505-535. doi: 10.1090/S0025-5718-1974-0343558-6. |
[14] | M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), 123-214. doi: 10.1111/j.1467-9868.2010.00765.x. |
[15] | A. Griewank, On automatic differentiation and algorithmic linearization, Pesquisa Operacional, 34 (2014), 621-645. doi: 10.1590/0101-7438.2014.034.03.0621. |
[16] | A. Griewank and A. Walther, Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008. doi: 10.1137/1.9780898717761. |
[17] | J. E. Griffin and S. G. Walker, On adaptive Metropolis-Hastings methods, Stat. Comput., 23 (2013), 123-134. doi: 10.1007/s11222-011-9296-2. |
[18] | H. Haario, M. Laine, A. Mira and E. Saksman, DRAM: Efficient adaptive MCMC, Stat. Comput., 16 (2006), 339-354. doi: 10.1007/s11222-006-9438-0. |
[19] | H. Haario, E. Saksman and J. Tamminen, An adaptive metropolis algorithm, Bernoulli, 7 (2001), 223-242. doi: 10.2307/3318737. |
[20] | B. Hajek, Cooling schedules for optimal annealing, in Open Problems in Communication and Computation, Springer, New York, 1987,147–150. doi: 10.1007/978-1-4612-4808-8_42. |
[21] | N. J. Higham, Computing the nearest correlation matrix - A problem from finance, IMA J. Numer. Anal., 22 (2002), 329-343. doi: 10.1093/imanum/22.3.329. |
[22] | N. J. Higham, Computing a nearest symmetric positive semidefinite matrix, Linear Algebra Appl., 103 (1988), 103-118. doi: 10.1016/0024-3795(88)90223-6. |
[23] | N. J. Higham and N. Strabić, Anderson acceleration of the alternating projections method for computing the nearest correlation matrix, Numer. Algorithms, 72 (2016), 1021-1042. doi: 10.1007/s11075-015-0078-3. |
[24] | T. House, Hessian corrections to the Metropolis adjusted Langevin algorithm, preprint, arXiv: 1507.06336. |
[25] | O. Kallenberg, Random Measures, Theory and Applications, Probability Theory and Stochastic Modelling, 77. Springer, Cham, 2017. doi: 10.1007/978-3-319-41598-7. |
[26] | S. Kirkpatrick, C. D. Gelatt Jr. and M. P. Vecchi, Optimization by simulated annealing, Science, 220 (1983), 671-680. doi: 10.1126/science.220.4598.671. |
[27] | T. S. Kleppe, Adaptive step size selection for Hessian-based manifold Langevin samplers, Scand. J. Stat., 43 (2016), 788-805. doi: 10.1111/sjos.12204. |
[28] | S. Lan, T. Bui-Thanh, M. Christie and M. Girolami, Emulation of higher-order tensors in manifold Monte Carlo methods for Bayesian inverse problems, J. Comput. Phys., 308 (2016), 81-101. doi: 10.1016/j.jcp.2015.12.032. |
[29] | S. Livingstone and M. Girolami, Information-geometric Markov chain Monte Carlo methods using diffusions, Entropy, 16 (2014), 3074-3102. doi: 10.3390/e16063074. |
[30] | M. Locatelli, Simulated annealing algorithms for continuous global optimization: Convergence conditions, J. Optim. Theory Appl., 104 (2000), 121-133. doi: 10.1023/A:1004680806815. |
[31] | J. F. D. Martin and J. M. R. no Sierra, A comparison of cooling schedules for simulated annealing, in Encyclopedia of Artificial Intelligence, 2009,344–352. doi: 10.4018/9781599048499.ch053. |
[32] | R. M. Neal, Bayesian Learning for Neural Networks, Lecture Notes in Statistics, 118, Springer, New York, 1996. doi: 10.1007/978-1-4612-0745-0. |
[33] | J. Neveu, Mathematical Foundations of the Calculus of Probability, Holden-Day, Inc., San Francisco, Calif.-London-Amsterdam, 1965. |
[34] | Y. Nourani and B. Andresen, A comparison of simulated annealing cooling strategies, J. Phys. A: Math. General, 31 (1998), 8373-8385. doi: 10.1088/0305-4470/31/41/011. |
[35] | T. Papamarkou, A. Mira and M. Girolami, Monte Carlo methods and zero variance principle, in Current Trends in Bayesian Methodology with Applications, CRC Press, Boca Raton, FL, 2015, 457-476. |
[36] | M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput., 26 (2016), 745-760. doi: 10.1007/s11222-015-9567-4. |
[37] | J. Revels, M. Lubin and T. Papamarkou, Forward-mode automatic differentiation in Julia, preprint, arXiv: 1607.07892. |
[38] | G. O. Roberts and J. S. Rosenthal, Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms, J. Appl. Probab., 44 (2007), 458-475. doi: 10.1239/jap/1183667414. |
[39] | G. O. Roberts and J. S. Rosenthal, Examples of adaptive MCMC, J. Comput. Graph. Statist., 18 (2009), 349-367. doi: 10.1198/jcgs.2009.06134. |
[40] | G. O. Roberts and J. S. Rosenthal, Optimal scaling of discrete approximations to Langevin diffusions, J. R. Stat. Soc. Ser. B Stat. Methodol., 60 (1998), 255-268. doi: 10.1111/1467-9868.00123. |
[41] | G. O. Roberts and O. Stramer, Langevin diffusions and Metropolis-Hastings algorithms, Methodol. Comput. Appl. Probab., 4 (2002), 337-357. doi: 10.1023/A:1023562417138. |
[42] | G. O. Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), 341-363. doi: 10.2307/3318418. |
[43] | E. Saksman and M. Vihola, On the ergodicity of the adaptive Metropolis algorithm on unbounded domains, Ann. Appl. Probab., 20 (2010), 2178-2203. doi: 10.1214/10-AAP682. |
[44] | R. Schwentner, T. Papamarkou, M. O. Kauer, V. Stathopoulos, F. Yang and et al., EWS-FLI1 employs an E2F switch to drive target gene expression, Nucleic Acids Research, 43 (2015), 2780-2789. doi: 10.1093/nar/gkv123. |
[45] | M. Seeger, Low Rank Updates for the Cholesky Decomposition, Technical report, University of California, Berkeley, 2004. Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.585.5275&rep=rep1&type=pdf. |
[46] | U. Şimşekli, R. Badeau, A. T. Cemgil and G. Richard, Stochastic quasi-Newton Langevin Monte Carlo, in Proceedings of the 33rd International Conference on Machine Learning, 2016,642–651. |
[47] | N. W. Tuchow, E. B. Ford, T. Papamarkou and A. Lindo, The efficiency of geometric samplers for exoplanet transit timing variation models, Monthly Notices Roy. Astronomical Soc., 484 (2019), 3772-3784. doi: 10.1093/mnras/stz247. |
[48] | M. Vihola, Robust adaptive Metropolis algorithm with coerced acceptance rate, Stat. Comput., 22 (2012), 997-1008. doi: 10.1007/s11222-011-9269-5. |
[49] | J. H. Wilkinson, Modern error analysis, SIAM Rev., 13 (1971), 548-568. doi: 10.1137/1013095. |
[50] | V. V. Williams, Breaking the Coppersmith-Winograd barrier, 2011. |
[51] | T. Xifara, C. Sherlock, S. Livingstone, S. Byrne and M. Girolami, Langevin diffusions and the Metropolis-adjusted Langevin algorithm, Statist. Probab. Lett., 91 (2014), 14-19. doi: 10.1016/j.spl.2014.04.002. |