This work introduces and analyzes an algorithm to generate samples from non-trivial probability measures on $ [0, 1]^d $ based on measure transport and polynomial density surrogates. Specifically, we construct polynomial approximations of the target probability density – which may be given only in unnormalized form – using weighted least squares or interpolation. Such surrogates then allow to construct a transport map that pushes uniformly distributed samples forward to the target measure. In contrast to related approaches that aim to find suitable transports via optimization over classes of parametrized transports, our method is based on solving a convex optimization problem. This allows to carry out a complete error analysis, and to show that polynomial transport inherits the convergence rates of the polynomial approximation method used to construct the density surrogate. Our analysis is complemented by strategies for efficient implementation and corresponding complexity bounds. We provide several numerical experiments which confirm our theoretical results and highlight the algorithm's effectiveness in practical settings.
Citation: |
Figure 2. For a multimodal distribution $ \pi $ on $ [0, 1] $, its density $ { }{f_{{\pi}}} $ will vanish or be very small in certain parts of the domain. By consequence, the inverse transport $ S = T^{-1} $ (corresponding to the antiderivative of $ { }{f_{{\pi}}} $) will be (almost) constant in those areas, and the transport $ T = S^{-1} $ can exhibit discontinuities. Even if $ { }{f_{{\pi}}} $ and $ S $ are $ C^\infty $ functions that are well approximated by polynomials, the same is in general not true for $ T $
Figure 5. Grid, density and samples from Figure 4 as transformed by a transport $ T_1 $ constructed from a surrogate of (41) in $ \mathbb{P}_{\Lambda_1} $ with $ \Lambda_1 = \{{ {{\mathit{\boldsymbol{\nu}}}} \in {\mathbb N}^{{2}}}\, :\, {\| { {{\mathit{\boldsymbol{\nu}}}}} \|_{1}\le 15}\} $
Figure 6. Grid, density and samples from Figure 4 as transformed by a transport $ T_2 $ constructed from a surrogate of (41) in $ \mathbb{P}_{\Lambda_2} $ with $ \Lambda_2 = \{{ {{\mathit{\boldsymbol{\nu}}}} \in {\mathbb N}^{{2}}}\, :\, {\| { {{\mathit{\boldsymbol{\nu}}}}} \|_{1}\le 65}\} $
Figure 7. Monte-Carlo approximation of the Hellinger distance between the target measure $ \pi $ with $ f_\pi $ as in (41) and $ T_\sharp \mu $ in dependence of the cardinality of the surrogate ansatz space $ \mathbb{P}_\Lambda $. Further shown are the WLS error, which differs only by a constant from $ {\rm{dist}}_{{\rm{H}}}(\pi, T_\sharp\mu) $, and the predicted exponential convergence rate (with the unknown constant $ \beta $ fitted to $ 0.12 $)
Figure 10. Monte-Carlo approximation of the Hellinger distance between the target measure $ \pi $ with density $ f_\pi $ as in Figure 8 and $ T_\sharp \mu $ in dependence of the cardinality of the density surrogate ansatz space $ \mathbb{P}_\Lambda $, as well as the WLS error
Figure 14. Monte-Carlo approximation of the Hellinger distance between the target measure $ \pi $ with density $ f_\pi $ as in (45) and $ T_\sharp \mu $ in dependence of the cardinality of the density surrogate ansatz space $ \mathbb{P}_\Lambda $ for different parameter dimensions $ d\in {\mathbb N} $
Figure 15. CPU runtime $ t $ (in seconds) of computing a single evaluation of $ T $ (with the bisection parameter $ r $ in Algorithm 8 set to $ r = 30 $) in dependence of the cardinality of the density surrogate ansatz space $ \mathbb{P}_\Lambda $ for different parameter dimensions $ d\in {\mathbb N} $. Each datapoint is the average of $ 3 $ evaluations
[1] |
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, National Bureau of Standards Applied Mathematics Series, No. 55, U. S. Government Printing Office, Washington, DC, 1964.
![]() ![]() |
[2] |
L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, L. Maier-Hein, C. Rother and U. Köthe, Analyzing inverse problems with invertible neural networks, 2018.
![]() |
[3] |
B. Arras, M. Bachmayr and A. Cohen, Sequential sampling for optimal weighted least squares approximations in hierarchical spaces, SIAM J. Math. Data Sci., 1 (2019), 189-207.
doi: 10.1137/18M1189749.![]() ![]() ![]() |
[4] |
T. Bagby, L. Bos and N. Levenberg, Multivariate simultaneous approximation, Constr. Approx., 18 (2002), 569-577.
doi: 10.1007/s00365-001-0024-6.![]() ![]() ![]() |
[5] |
R. Baptista, B. Hosseini, N. B. Kovachki, Y. M. Marzouk and A. Sagiv, An approximation theory framework for measure-transport sampling algorithms, 2023.
doi: 10.1090/mcom/4013.![]() ![]() |
[6] |
R. Baptista, Y. Marzouk and O. Zahm, On the representation and learning of monotone triangular transport maps, 2020.
![]() |
[7] |
F. Bartel, M. Schäfer and T. Ullrich, Constructive subsampling of finite frames with applications in optimal function recovery, Appl. Comput. Harmon. Anal., 65 (2023), 209-248.
doi: 10.1016/j.acha.2023.02.004.![]() ![]() ![]() |
[8] |
V. Barthelmann, E. Novak and K. Ritter, High dimensional polynomial interpolation on sparse grids, Adv. Comput. Math., 12 (2000), 273-288.
doi: 10.1023/A:1018977404843.![]() ![]() ![]() |
[9] |
D. M. Blei, A. Kucukelbir and J. D. McAuliffe, Variational inference: A review for statisticians, J. Amer. Statist. Assoc., 112 (2017), 859-877.
doi: 10.1080/01621459.2017.1285773.![]() ![]() ![]() |
[10] |
T. Bloom, L. Bos, N. Levenberg and S. Waldron, On the convergence of optimal measures, Constr. Approx., 32 (2010), 159-179.
doi: 10.1007/s00365-009-9078-7.![]() ![]() ![]() |
[11] |
V. I. Bogachev, A. V. Kolesnikov and K. V. Medvedev, Triangular transformations of measures, Sb. Math., 196 (2005), 309-335.
doi: 10.1070/SM2005v196n03ABEH000882.![]() ![]() ![]() |
[12] |
H.-J. Bungartz and M. Griebel, Sparse grids, Acta Numer., 13 (2004), 147-269.
![]() ![]() |
[13] |
R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
![]() |
[14] |
A. Chkifa, A. Cohen, G. Migliorati, F. Nobile and R. Tempone, Discrete least squares polynomial approximation with random evaluations–application to parametric and stochastic elliptic PDEs, ESAIM Math. Model. Numer. Anal., 49 (2015), 815-837.
doi: 10.1051/m2an/2014050.![]() ![]() ![]() |
[15] |
A. Chkifa, A. Cohen and C. Schwab, High-dimensional adaptive sparse polynomial interpolation and applications to parametric PDEs, Found. Comput. Math., 14 (2014), 601-633.
doi: 10.1007/s10208-013-9154-z.![]() ![]() ![]() |
[16] |
A. Chkifa, A. Cohen and C. Schwab, Breaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs, J. Math. Pures Appl. (9), 103 (2015), 400-428.
doi: 10.1016/j.matpur.2014.04.009.![]() ![]() ![]() |
[17] |
A. Cohen, M. A. Davenport and D. Leviatan, On the stability and accuracy of least squares approximations, Found. Comput. Math., 13 (2013), 819-834.
doi: 10.1007/s10208-013-9142-3.![]() ![]() ![]() |
[18] |
A. Cohen, M. A. Davenport and D. Leviatan, Correction to: On the stability and accuracy of least squares approximations, Found. Comput. Math., 19 (2019), 239.
doi: 10.1007/s10208-018-9397-9.![]() ![]() ![]() |
[19] |
A. Cohen, R. Devore and C. Schwab, Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE's, Anal. Appl. (Singap.), 9 (2011), 11-47.
doi: 10.1142/S0219530511001728.![]() ![]() ![]() |
[20] |
A. Cohen and G. Migliorati, Optimal weighted least-squares methods, SMAI J. Comput. Math., 3 (2017), 181-203.
doi: 10.5802/smai-jcm.24.![]() ![]() ![]() |
[21] |
A. Cohen, G. Migliorati and F. Nobile, Discrete least-squares approximations over optimized downward closed polynomial spaces in arbitrary dimension, Constr. Approx., 45 (2017), 497-519.
doi: 10.1007/s00365-017-9364-8.![]() ![]() ![]() |
[22] |
T. Cui and S. Dolgov, Deep composition of tensor-trains using squared inverse Rosenblatt transports, Found. Comput. Math., 22 (2022), 1863-1922.
doi: 10.1007/s10208-021-09537-5.![]() ![]() ![]() |
[23] |
T. Cui, S. Dolgov and O. Zahm, Self-reinforced polynomial approximation methods for concentrated probability densities, preprint, arXiv: 2303.02554, 2023.
![]() |
[24] |
J. Dick, Explicit constructions of quasi-Monte Carlo rules for the numerical integration of high-dimensional periodic functions, SIAM J. Numer. Anal., 45 (2007), 2141-2176.
doi: 10.1137/060658916.![]() ![]() ![]() |
[25] |
L. Dinh, D. Krueger and Y. Bengio, NICE: Non-linear independent components estimation., Preprint, arXiv: 1410.8516, 2014.
![]() |
[26] |
L. Dinh, J. Sohl-Dickstein and S. Bengio, Density estimation using Real NVP, 2016.
![]() |
[27] |
M. Dolbeault and A. Cohen, Optimal pointwise sampling for $L^2$ approximation, J. Complexity, 68 (2022), Paper No. 101602, 12 pp.
![]() ![]() |
[28] |
S. Dolgov, K. Anaya-Izquierdo, C. Fox and R. Scheichl, Approximation and sampling of multivariate probability distributions in the tensor train decomposition, Stat. Comput., 30 (2020), 603-625.
doi: 10.1007/s11222-019-09910-z.![]() ![]() ![]() |
[29] |
T. A. El Moselhy and Y. M. Marzouk, Bayesian inference with optimal maps, J. Comput. Phys., 231 (2012), 7815-7850.
doi: 10.1016/j.jcp.2012.07.022.![]() ![]() ![]() |
[30] |
R. N. Gantner and M. D. Peters, Higher-order quasi-Monte Carlo for Bayesian shape inversion, SIAM/ASA J. Uncertain. Quantif., 6 (2018), 707-736.
doi: 10.1137/16M1096116.![]() ![]() ![]() |
[31] |
T. Gerstner and M. Griebel, Numerical integration using sparse grids, Numer. Algorithms, 18 (1998), 209-232.
doi: 10.1023/A:1019129717644.![]() ![]() |
[32] |
R. Günttner, Evaluation of Lebesgue constants, SIAM J. Numer. Anal., 17 (1980), 512-520.
doi: 10.1137/0717043.![]() ![]() ![]() |
[33] |
E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2018), 014004.
doi: 10.1088/1361-6420/aa9a90.![]() ![]() ![]() |
[34] |
M. Hadigol and A. Doostan, Least squares polynomial chaos expansion: A review of sampling strategies, Comput. Methods Appl. Mech. Engrg., 332 (2018), 382-407.
doi: 10.1016/j.cma.2017.12.019.![]() ![]() ![]() |
[35] |
A.-L. Haji-Ali, F. Nobile, R. Tempone and S. Wolfers, Multilevel weighted least squares polynomial approximation, ESAIM Math. Model. Numer. Anal., 54 (2020), 649-677.
doi: 10.1051/m2an/2019045.![]() ![]() ![]() |
[36] |
J. Hampton and A. Doostan, Coherence motivated sampling and convergence analysis of least squares polynomial Chaos regression, Comput. Methods Appl. Mech. Engrg., 290 (2015), 73-97.
doi: 10.1016/j.cma.2015.02.006.![]() ![]() ![]() |
[37] |
N. J. Irons, M. Scetbon, S. Pal and Z. Harchaoui, Triangular flows for generative modeling: Statistical consistency, smoothness classes and fast rates, Proceedings of Machine Learning Research, 151 (2022), 10161-10195.
![]() |
[38] |
P. Jaini, K. A. Selby and Y. Yu, Sum-of-squares polynomial flow, In Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, pages 97 (2019), 3009-3018.
![]() |
[39] |
A. S. Kechris, Classical Descriptive Set Theory, Grad. Texts in Math., 156, Springer-Verlag, New York, 1995.
![]() ![]() |
[40] |
A. Klenke, Probability Theory–A Comprehensive Course, 3$^{rd}$ edition, Universitext. Springer, Cham, 2020.
![]() ![]() |
[41] |
I. Kobyzev, S. D. Prince and M. A. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE Transactions on Pattern Analysis and Machine Intelligence, 43 (2021), 3964-3979.
doi: 10.1109/TPAMI.2020.2992934.![]() ![]() |
[42] |
N. N. Lebedev, Special Functions and Their Applications, Dover Publications, Inc., New York, 1972.
![]() ![]() |
[43] |
Q. Liu and D. Wang, Stein variational gradient descent: A general purpose bayesian inference algorithm, In Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016).
![]() |
[44] |
Y. Marzouk, T. Moselhy, M. Parno and A. Spantini, Sampling via measure transport: An introduction., In Handbook of Uncertainty Quantification, Springer, Cham, 1,2,3(2017), 785-825.
![]() ![]() |
[45] |
G. A. Muñoz, Y. Sarantopoulos and A. Tonge, Complexifications of real Banach spaces, polynomials and multilinear maps, Studia Math., 134 (1999), 1-33.
doi: 10.4064/sm-134-1-1-33.![]() ![]() ![]() |
[46] |
A. Narayn, J. D. Jakeman and T. Zhou, A Christoffel function weighted least squares algorithm for collocation approximations, Math. Comp., 86 (2017), 1913-1947.
doi: 10.1090/mcom/3192.![]() ![]() ![]() |
[47] |
J. A. A. Opschoor, C. Schwab and J. Zech, Exponential ReLU DNN expression of holomorphic maps in high dimension, Constr. Approx., 55 (2022), 537-582.
doi: 10.1007/s00365-021-09542-5.![]() ![]() ![]() |
[48] |
W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical Recipes, ![]() ![]() |
[49] |
D. Rezende and S. Mohamed, Variational inference with normalizing flows., In Proceedings of the 32nd International Conference on Machine Learning; Proceedings of Machine Learning Research, pages Lille, France, PMLR, 37 (2015), 1530-1538.
![]() |
[50] |
L. Rüschendorf, Mathematische Statistik, volume 62, Springer, 2014.
![]() |
[51] |
A. Sagiv, The Wasserstein distances between pushed-forward measures with applications to uncertainty quantification, Commun. Math. Sci., 18 (2020), 707-724.
doi: 10.4310/CMS.2020.v18.n3.a6.![]() ![]() ![]() |
[52] |
F. Santambrogio, Optimal Transport for Applied Mathematicians, Calculus of variations, PDEs, and modeling Progr. Nonlinear Differential Equations Appl., 87, Birkhäuser/Springer, Cham, 2015.
![]() ![]() |
[53] |
C. Schwab and A. M. Stuart, Sparse deterministic approximation of Bayesian inverse problems, Inverse Problems, 28 (2012), 045003.
doi: 10.1088/0266-5611/28/4/045003.![]() ![]() ![]() |
[54] |
A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numer., 19 (2010), 451-559.
doi: 10.1017/S0962492910000061.![]() ![]() ![]() |
[55] |
T. Teshima, K. Tojo, M. Ikeda, I. Ishikawa and K. Oono, Universal approximation property of neural ordinary differential equations, CoRR, abs/2012.02414, 2020.
![]() |
[56] |
C. Villani, Optimal Transport, Old and new Grundlehren Math. Wiss., 338, Springer-Verlag, Berlin, 2009.
![]() ![]() |
[57] |
S. Wang and Y. Marzouk, On minimax density estimation via measure transport, 2022.
![]() |
[58] |
J. Zech, Sparse-Grid Approximation of High-Dimensional Parametric PDEs, Dissertation 25683, ETH Zürich, 2018.
![]() |
[59] |
J. Zech and Y. Marzouk, Sparse approximation of triangular transports, Part Ⅰ: The finite-dimensional case, Constr. Approx., 55 (2022), 919-986.
doi: 10.1007/s00365-022-09569-2.![]() ![]() ![]() |
[60] |
J. Zech and Y. Marzouk, Sparse approximation of triangular transports, Part Ⅱ: The infinite-dimensional case, Constr. Approx., 55 (2022), 987-1036.
doi: 10.1007/s00365-022-09570-9.![]() ![]() ![]() |
[61] |
J. Zech and C. Schwab, Convergence rates of high dimensional Smolyak quadrature, ESAIM Math. Model. Numer. Anal., 54 (2020), 1259-1307.
doi: 10.1051/m2an/2020003.![]() ![]() ![]() |
Exemplary sets on which a function
For a multimodal distribution
Test density
Uniform grid, uniform density
Grid, density and samples from Figure 4 as transformed by a transport
Grid, density and samples from Figure 4 as transformed by a transport
Monte-Carlo approximation of the Hellinger distance between the target measure
Test density
Density surrogate in
Monte-Carlo approximation of the Hellinger distance between the target measure
Hat functions
A random signal function
Convoluted signal
Monte-Carlo approximation of the Hellinger distance between the target measure
CPU runtime