Early Access articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Early Access publication benefits the research community by making new scientific discoveries known as quickly as possible.
Readers can access Early Access articles via the “Early Access” tab for the selected journal.
We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals. We leverage a classical result which shows that the bias and the mean-squared error (MSE) of the importance sampling scales with the $ \chi^2 $-divergence between the target and the proposal and develop a scheme which performs global optimization of $ \chi^2 $-divergence. While it is known that this quantity is convex for exponential family proposals, the case of the general proposals has been an open problem. We close this gap by utilizing the nonasymptotic bounds for stochastic gradient Langevin dynamics (SGLD) for the global optimization of $ \chi^2 $-divergence and derive nonasymptotic bounds for the MSE by leveraging recent results from non-convex optimization literature. The resulting AIS schemes have explicit theoretical guarantees that are uniform-in-time.
Citation: |
[1] | S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso and A. M. Stuart, Importance sampling: Intrinsic dimension and computational cost, Statistical Science, 32 (2017), 405-431. doi: 10.1214/17-STS611. |
[2] | Ö. D. Akyildiz and J. Míguez, Convergence rates for optimised adaptive importance samplers, Statistics and Computing, 31 (2021), 1-17. doi: 10.1007/s11222-020-09983-1. |
[3] | Ö. D. Akyildiz and S. Sabanis, Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization, arXiv preprint, arXiv: 2002.05465, (2020). |
[4] | B. Arouna, Adaptative monte carlo method, a variance reduction technique, Monte Carlo Methods and Applications, 10 (2004), 1-24. doi: 10.1515/156939604323091180. |
[5] | B. Arouna, Robbins-Monro algorithms and variance reduction in finance, Journal of Computational Finance, 7 (2004), 35-62. doi: 10.21314/JCF.2003.111. |
[6] | Y. Bengio and J.-S. Senécal, Adaptive importance sampling to accelerate training of a neural probabilistic language model, IEEE Transactions on Neural Networks, 19 (2008), 713-722. doi: 10.1109/TNN.2007.912312. |
[7] | N. Brosse, A. Durmus and E. Moulines, The promises and pitfalls of stochastic gradient langevin dynamics, Advances in Neural Information Processing Systems, 31 (2018). |
[8] | M. F. Bugallo, V. Elvira, L. Martino, D. Luengo, J. Miguez and P. M. Djuric, Adaptive Importance Sampling: The past, the present, and the future, IEEE Signal Processing Magazine, 34 (2017), 60-79. doi: 10.1109/MSP.2017.2699226. |
[9] | M. F. Bugallo, L. Martino and J. Corander, Adaptive importance sampling in signal processing, Digital Signal Processing, 47 (2015), 36-49. doi: 10.1016/j.dsp.2015.05.014. |
[10] | O. Cappé, R. Douc, A. Guillin, J.-M. Marin and C. P. Robert, Adaptive importance sampling in general mixture classes, Statistics and Computing, 18 (2008), 447-459. doi: 10.1007/s11222-008-9059-x. |
[11] | O. Cappé, A. Guillin, J. M. Marin and C. P. Robert, Population Monte Carlo, Journal of Computational and Graphical Statistics, 13 (2004), 907-929. doi: 10.1198/106186004X12803. |
[12] | N. H. Chau, É. Moulines, M. Rásonyi, S. Sabanis and Y. Zhang, On stochastic gradient langevin dynamics with dependent data streams: The fully nonconvex case, SIAM Journal on Mathematics of Data Science, 3 (2021), 959-986. doi: 10.1137/20M1355392. |
[13] | J. Cornebise, É. Moulines and J. Olsson, Adaptive methods for sequential importance sampling with application to state space models, Statistics and Computing, 18 (2008), 461-480. doi: 10.1007/s11222-008-9089-4. |
[14] | B. Delyon and F. Portier, Asymptotic optimality of adaptive importance sampling, Proceedings of the 32nd International Conference on Neural Information Processing Systems, (2018), 3138-3148. |
[15] | A. B. Dieng, D. Tran, R. Ranganath, J. Paisley and D. Blei, Variational inference via $\chi$-upper bound minimization, Advances in Neural Information Processing Systems, (2017), 2732-2741. |
[16] | R. Douc, A. Guillin, J.-M. Marin and C. P. Robert, Convergence of adaptive mixtures of importance sampling schemes, The Annals of Statistics, 35 (2007), 420-448. doi: 10.1214/009053606000001154. |
[17] | Y. El-Laham and M. F. Bugallo, Stochastic gradient population Monte Carlo, IEEE Signal Processing Letters, 27 (2019), 46-50. doi: 10.1109/LSP.2019.2954048. |
[18] | Y. El-Laham and M. F. Bugallo, Policy gradient importance sampling for bayesian inference, IEEE Transactions on Signal Processing, 69 (2021), 4245-4256. doi: 10.1109/TSP.2021.3093792. |
[19] | Y. El-Laham, P. M. Djurić and M. F. Bugallo, A variational adaptive population importance sampler, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2019), 5052-5056. doi: 10.1109/ICASSP.2019.8683152. |
[20] | V. Elvira and É. Chouzenoux, Langevin-based strategy for efficient proposal adaptation in population Monte Carlo, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2019), 5077-5081. doi: 10.1109/ICASSP.2019.8682284. |
[21] | V. Elvira and E. Chouzenoux, Optimized population Monte Carlo, IEEE Trans. Signal Process., 70 (2022), 2489-2501. doi: 10.1109/TSP.2022.3172619. |
[22] | V. Elvira, E. Chouzenoux, Ö. D. Akyildiz and L. Martino, Gradient-based adaptive importance samplers, J. Franklin Inst., 360 (2023), 9490-9514, arXiv preprint, arXiv: 2210.10785, (2022). doi: 10.1016/j.jfranklin.2023.06.041. |
[23] | V. Elvira, L. Martino, D. Luengo and M. F. Bugallo, Improving population monte carlo: Alternative weighting and resampling schemes, Signal Processing, 131 (2017), 77-91. doi: 10.1016/j.sigpro.2016.07.012. |
[24] | Víctor Elvira, L. Martino, D. Luengo and M. F. Bugallo, Generalized multiple importance sampling, Statistical Science, 34 (2019), 129-155. doi: 10.1214/18-STS668. |
[25] | V. Elvira, L. Martino and C. P. Robert, Rethinking the effective sample size, International Statistical Review, 90 (2022), 525-550. doi: 10.1111/insr.12500. |
[26] | M. A. Erdogdu, L. Mackey and O. Shamir, Global non-convex optimization with discretized diffusions, arXiv preprint, arXiv: 1810.12361, (2018). |
[27] | M. Fasiolo, F. E. de Melo and S. Maskell, Langevin incremental mixture importance sampling, Statistics and Computing, 28 (2018), 549-561. doi: 10.1007/s11222-017-9747-5. |
[28] | X. Gao, M. Gürbüzbalaban and L. Zhu, Global convergence of stochastic gradient hamiltonian monte carlo for nonconvex stochastic optimization: Nonasymptotic performance bounds and momentum-based acceleration, Operations Research, 70 (2022), 2931-2947. doi: 10.1287/opre.2021.2162. |
[29] | C.-R. Hwang, Laplace's method revisited: Weak convergence of probability measures, The Annals of Probability, 8 (1980), 1177-1182. |
[30] | G. Jerfel, S. Wang, C. Wong-Fannjiang, K. A. Heller, Y. Ma and M. I. Jordan, Variational refinement for importance sampling using the forward kullback-leibler divergence, Uncertainty in Artificial Intelligence, PMLR, (2021), 1819-1829. |
[31] | H. J. Kappen and H. C. Ruiz, Adaptive importance sampling for control and inference, Journal of Statistical Physics, 162 (2016), 1244-1266. doi: 10.1007/s10955-016-1446-7. |
[32] | R. Kawai, Adaptive monte carlo variance reduction for lévy processes with two-time-scale stochastic approximation, Methodology and Computing in Applied Probability, 10 (2008), 199-223. doi: 10.1007/s11009-007-9043-5. |
[33] | R. Kawai, Acceleration on adaptive importance sampling with sample average approximation, SIAM Journal on Scientific Computing, 39 (2017), A1586-A1615. doi: 10.1137/15M1047192. |
[34] | R. Kawai, Optimizing adaptive importance sampling by stochastic approximation, SIAM Journal on Scientific Computing, 40 (2018), A2774-A2800. doi: 10.1137/18M1173472. |
[35] | B. Lapeyre and J. Lelong, A framework for adaptive monte carlo procedures, Monte Carlo Methods and Applications, 17 (2011), 77-98. doi: 10.1515/MCMA.2011.002. |
[36] | D.-Y. Lim, A. Neufeld, S. Sabanis and Y. Zhang, Non-asymptotic estimates for tusla algorithm for non-convex learning with applications to neural networks with relu activation function, arXiv preprint, arXiv: 2107.08649, (2021). |
[37] | D.-Y. Lim and S. Sabanis, Polygonal unadjusted langevin algorithms: Creating stable and efficient adaptive algorithms for neural networks, arXiv preprint, arXiv: 2105.13937, (2021). |
[38] | F. Llorente, E. Curbelo, L. Martino, V. Elvira and D. Delgado, MCMC-driven importance samplers, Appl. Math. Model., 111 (2022), 1–22. arXiv preprint, arXiv: 2105.02579 (2021). doi: 10.1016/j.apm.2022.06.027. |
[39] | R. Lopez, P. Boyeau, N. Yosef, M. Jordan and J. Regier, Decision-making with auto-encoding variational bayes, Advances in Neural Information Processing Systems, 33 (2020). |
[40] | L. Martino, V. Elvira and D. Luengo, Anti-tempered layered adaptive importance sampling, 2017 22nd International Conference on Digital Signal Processing (DSP), IEEE, 2017, 1-5. doi: 10.1109/ICDSP.2017.8096043. |
[41] | L. Martino, V. Elvira, D. Luengo and J. Corander, An adaptive population importance sampler: Learning from uncertainty, IEEE Transactions on Signal Processing, 63 (2015), 4422-4437. doi: 10.1109/TSP.2015.2440215. |
[42] | L. Martino, V. Elvira, D. Luengo and J. Corander, Layered adaptive importance sampling, Statistics and Computing, 27 (2017), 599-623. doi: 10.1007/s11222-016-9642-5. |
[43] | A. Mousavi, R. Monsefi and V. Elvira, Hamiltonian adaptive importance sampling, IEEE Signal Processing Letters, 28 (2021), 713-717. doi: 10.1109/LSP.2021.3068616. |
[44] | T. Müller, B. McWilliams, F. Rousselle, M. Gross and J. Novák, Neural importance sampling, ACM Transactions on Graphics (ToG), 38 (2019), 1-19. doi: 10.1145/3341156. |
[45] | C. A. C. C. Perello and D. Akyildiz, Adaptively optimised adaptive importance samplers, arXiv preprint, arXiv: 2307.09341, (2023). |
[46] | M. F. Pradier, M. C. Hughes and F. Doshi-Velez, Challenges in computing and optimizing upper bounds of marginal likelihood based on chi-square divergences, Symposium on Advances in Approximate Bayesian Inference, (2019). |
[47] | M. Raginsky, A. Rakhlin and M. Telgarsky, Non-convex learning via stochastic gradient Langevin dynamics: A nonasymptotic analysis, Conference on Learning Theory, (2017), 1674-1703. |
[48] | E. K. Ryu, Convex Optimization for Monte Carlo: Stochastic Optimization for Importance Sampling, Ph.D. thesis, Stanford University, 2016. |
[49] | E. K. Ryu and S. P. Boyd, Adaptive importance sampling via stochastic convex programming, arXiv: 1412.4845, (2014). |
[50] | D. Sanz-Alonso, Importance sampling and necessary sample size: An information theory approach, SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), 867-879. doi: 10.1137/16M1093549. |
[51] | D. Sanz-Alonso and Z. Wang, Bayesian update with importance sampling: Required sample size, Entropy, 23 (2021), Paper No. 22, 21 pp. doi: 10.3390/e23010022. |
[52] | Y. Su and M. C. Fu, Optimal importance sampling in securities pricing, Journal of Computational Finance, 5 (2002), 27-50. doi: 10.21314/JCF.2002.081. |
[53] | M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, Proceedings of the 28th International Conference on Machine Learning (ICML-11), (2011), 681-688. |
[54] | M. Xu, M. Quiroz, R. Kohn and S. A. Sisson, Variance reduction properties of the reparameterization trick, The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, (2019), 2711-2720. |
[55] | P. Xu, J. Chen, D. Zou and Q. Gu, Global convergence of langevin dynamics based algorithms for nonconvex optimization, Advances in Neural Information Processing Systems (NeurIPS), (2018). |
[56] | Y. Zhang, Ö. D. Akyildiz, T. Damoulas and S. Sabanis, Nonasymptotic estimates for stochastic gradient langevin dynamics under local conditions in nonconvex optimization, Applied Mathematics & Optimization, 87 (2023), Paper No. 25, 41 pp. doi: 10.1007/s00245-022-09932-6. |
[57] | D. Zou, P. Xu and Q. Gu, Stochastic gradient Hamiltonian Monte Carlo methods with recursive variance reduction, Advances in Neural Information Processing Systems, 32 (2019), 3835-3846. |
[58] | D. Zou, P. Xu and Q. Gu, Faster convergence of stochastic gradient langevin dynamics for non-log-concave sampling, Uncertainty in Artificial Intelligence, PMLR, (2021), 1152-1162. |
Implementation of SGLD for the Gaussian target with a Student's-t proposal. The results are averaged over