Article Contents
Article Contents

# Constrained Ensemble Langevin Monte Carlo

• * Corresponding author: Qin Li

Q.L. acknowledges support from Vilas Early Career award. The research of Z.D., and Q.L is supported in part by NSF via grant DMS-1750488, DMS-2023239 and Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin Madison with funding from the Wisconsin Alumni Research Foundation

• The classical Langevin Monte Carlo method looks for samples from a target distribution by descending the samples along the gradient of the target distribution. The method enjoys a fast convergence rate. However, the numerical cost is sometimes high because each iteration requires the computation of a gradient. One approach to eliminate the gradient computation is to employ the concept of "ensemble." A large number of particles are evolved together so the neighboring particles provide gradient information to each other. In this article, we discuss two algorithms that integrate the ensemble feature into LMC, and the associated properties.

In particular, we find that if one directly surrogates the gradient using the ensemble approximation, the algorithm, termed Ensemble Langevin Monte Carlo, is unstable due to a high variance term. If the gradients are replaced by the ensemble approximations only in a constrained manner, to protect from the unstable points, the algorithm, termed Constrained Ensemble Langevin Monte Carlo, resembles the classical LMC up to an ensemble error but removes most of the gradient computation.

Mathematics Subject Classification: Primary: 62D05; Secondary: 82C31, 65C05.

 Citation:

• Figure 1.  Example 1: Evolution of samples using CEnLMC. $N = 10^4$

Figure 2.  Example 1: Evolution of samples using LMC and MALA. $N = 10^4$

Figure 3.  Example 1: Evolution of $\mathcal{R}_m$ when $N = 2\times10^3, 6\times10^3$ or $10^4$

Figure 4.  Example 2: Evolution of samples using CEnLMC when $N = 10^4$

Figure 5.  Example 2: Evolution of samples using LMC and MALA when $N = 10^4$

Figure 6.  Example 2: Evolution of $\mathcal{R}_m$ with $m$ when $N = 2\times10^3, 6\times10^3, 10^4$

•  [1] C. Andrieu, N. de Freitas, A. Doucet and M. I. Jordan, An introduction to MCMC for machine learning, Machine Learning, 50 (2003), 5-43.  doi: 10.1023/A:1020281327116. [2] A. Beskos, A. Jasra, K. Law, R. Tempone and Y. Zhou, Multilevel sequential Monte Carlo samplers, Stochastic Process. Appl., 127 (2017), 1417-1440.  doi: 10.1016/j.spa.2016.08.004. [3] N. S. Chatterji, N. Flammarion, Y.-A. Ma, P. L. Bartlett and M. I. Jordan, On the theory of variance reduction for stochastic gradient Monte Carlo, Proceedings of the 35th international Conference on Machine Learning, 80 (2018), 764–773. Available from: http://proceedings.mlr.press/v80/chatterji18a/chatterji18a.pdf. [4] A. S. Dalalyan, Theoretical guarantees for approximate sampling from smooth and log-concave densities, J. R. Stat. Soc. Ser. B. Stat. Methodol., 79 (2017), 651-676.  doi: 10.1111/rssb.12183. [5] A. S. Dalalyan and A. Karagulyan, User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient, Stochastic Process. Appl., 129 (2019), 5278-5311.  doi: 10.1016/j.spa.2019.02.016. [6] A. S. Dalalyan and L. Riou-Durand, On sampling from a log-concave density using kinetic Langevin diffusions, Bernoulli, 26 (2020), 1956-1988.  doi: 10.3150/19-BEJ1178. [7] Z. Ding and Q. Li, Ensemble Kalman inversion: Mean-field limit and convergence analysis, Stat. Comput., 31 (2021), 21pp. doi: 10.1007/s11222-020-09976-0. [8] Z. Ding and Q. Li, Ensemble Kalman sampler: Mean-field limit and convergence analysis, SIAM J. Math. Anal., 53 (2021), 1546-1578.  doi: 10.1137/20M1339507. [9] Z. Ding and Q. Li, Langevin Monte Carlo: Random coordinate descent and variance reduction, J. Mach. Learn. Res., 22 (2021), 51pp. [10] Z. Ding and Q. Li, Variance reduction for random coordinate descent-Langevin Monte Carlo, Proceedings of the 34th Conference on Neural Information Processing Systems, 33 (2020), 3748–3760. Available from: https://proceedings.neurips.cc/paper/2020/file/272e11700558e27be60f7489d2d782e7-Paper.pdf. [11] A. Doucet, N. de Freitas and N. Gordon, An introduction to sequential Monte Carlo Methods, in Sequential Monte Carlo Methods in Practice, Stat. Eng. Inf. Sci., Springer, New York, 2001, 3–14. doi: 10.1007/978-1-4757-3437-9_1. [12] S. Duane, A. D. Kennedy, B. J. Pendleton and D. Roweth, Hybrid Monte Carlo, Phys. Lett. B, 195 (1987), 216-222.  doi: 10.1016/0370-2693(87)91197-X. [13] A. Durmus, S. Majewski and B. Miasojedow, Analysis of Langevin Monte Carlo via convex optimization, J. Mach. Learn. Res., 20 (2019), 46pp. [14] A. Durmus and É. Moulines, Non-asymptotic convergence analysis for the unadjusted Langevin algorithm, Ann. Appl. Probab., 27 (2017), 1551-1587.  doi: 10.1214/16-AAP1238. [15] R. Dwivedi, Y. Chen, M. J. Wainwright and B. Yu, Log-concave sampling: Metropolis-Hastings algorithms are fast, J. Mach. Learn. Res., 20 (2019), 42pp. [16] G. Evensen, Data Assimilation. The Ensemble Kalman Filter, Springer-Verlag, Berlin, 2009. doi: 10.1007/978-3-642-03711-5. [17] P. Fabian, Atmospheric sampling, Adv. Space Res., 1 (1981), 17-27.  doi: 10.1016/0273-1177(81)90444-0. [18] A. Garbuno-Inigo, F. Hoffmann, W. Li and A. M. Stuart, Interacting Langevin diffusions: Gradient structure and Ensemble Kalman sampler, SIAM J. Appl. Dyn. Syst., 19 (2020), 412-441.  doi: 10.1137/19M1251655. [19] A. Garbuno-Inigo, N. Nüsken and S. Reich, Affine invariant interacting Langevin dynamics for Bayesian inference, SIAM J. Appl. Dyn. Syst., 19 (2020), 1633-1658.  doi: 10.1137/19M1304891. [20] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., 6 (1984), 721-741.  doi: 10.1109/TPAMI.1984.4767596. [21] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1970), 97-109.  doi: 10.1093/biomet/57.1.97. [22] M. Herty and G. Visconti, Continuous limits for constrained ensemble Kalman filter, Inverse Problems, 36 (2020), 28pp. doi: 10.1088/1361-6420/ab8bc5. [23] M. A. Iglesias, K. J. H. Law and A. M. Stuart, Ensemble Kalman methods for inverse problems, Inverse Problems, 29 (2013), 20pp. doi: 10.1088/0266-5611/29/4/045001. [24] Q. Li and K. Newton, Diffusion equation-assisted Markov chain Monte Carlo methods for the inverse radiative transfer equation, Entropy, 21 (2019), 25pp. doi: 10.3390/e21030291. [25] R. Li, S. Pei, B. Chen, Y. Song, T. Zhang, W. Yang and J. Shaman, Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2), Science, 368 (2020), 489-493.  doi: 10.1126/science.abb3221. [26] R. Li, H. Zha and M. Tao, Sqrt(d) dimension dependence of Langevin Monte Carlo, preprint, 2021, arXiv: 2109.03839. [27] P. A. Markowich and C. Villani, On the trend to equilibrium for the Fokker-Planck equation: An interplay between physics and functional analysis. Ⅵ Workshop on Partial Differential Equations, Part Ⅱ (Rio de Janeiro, 1999), Mat. Contemp., 19 (2000), 1-29. [28] J. Martin, L. C. Wilcox, C. Burstedde and O. Ghattas, A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion, SIAM J. Sci. Comput., 34 (2012), A1460–A1487. doi: 10.1137/110845598. [29] B. Leimkuhler, C. Matthews and J. Weare, Ensemble preconditioning for Markov chain Monte Carlo simulation, Stat. Comput., 28 (2018), 277-290.  doi: 10.1007/s11222-017-9730-1. [30] N. R. Nagarajan, M. M. Honarpour and K. Sampath, Reservoir-fluid sampling and characterization–Key to efficient reservoir management, J. Petroleum Technology, 59 (2007). [31] R. M. Neal, Annealed importance sampling, Stat. Comput., 11 (2001), 125-139.  doi: 10.1023/A:1008923215028. [32] R. M. Neal, Probabilistic inference using Markov chain Monte Carlo methods, Technical Report CRG-TR-93-1. Dept. of Computer Science, University of Toronto, 1993. [33] N. Nüsken and S. Reich, Note on interacting Langevin diffusions: Gradient structure and ensemble Kalman Sampler by Garbuno-Inigo, Hoffmann, Li and Stuart, preprint, arXiv: 1908.10890. [34] S. Reich, A dynamical systems framework for intermittent data assimilation, BIT, 51 (2011), 235-249.  doi: 10.1007/s10543-010-0302-4. [35] G. O. Roberts and J. S. Rosenthal, General state space Markov chains and MCMC algorithms, Probab. Surv., 1 (2004), 20-71.  doi: 10.1214/154957804100000024. [36] G. O. Roberts and O. Stramer, Langevin diffusions and Metropolis-Hastings algorithms. International Workshop in Applied Probability (Caracas, 2002), Methodol. Comput. Appl. Probab., 4 (2002), 337-357.  doi: 10.1023/A:1023562417138. [37] G. O. Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), 341-363.  doi: 10.2307/3318418. [38] C. Schillings and A. M. Stuart, Analysis of the ensemble Kalman filter for inverse problems, SIAM J. Numer. Anal, 55 (2017), 1264-1290.  doi: 10.1137/16M105959X. [39] X. T. Tong, M. Morzfeld and Y. M. Marzouk, MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure, SIAM J. Sci. Comput., 42 (2020), A1765–A1788. doi: 10.1137/19M1284014. [40] S. S. Vempala and A. Wibisono, Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices, Proceedings of the 33rd Conference on Neural Information Processing Systems, 32 (2019). Available from: https://proceedings.neurips.cc/paper/2019/file/65a99bb7a3115fdede20da98b08a370f-Paper.pdf. [41] P. Zhang, Q. Song and F. Liang, A Langevinized ensemble Kalman filter for large-scale static and dynamic learning, preprint, 2021, arXiv: 2105.05363.

Figures(6)