American Institute of Mathematical Sciences

September  2019, 1(3): 329-387. doi: 10.3934/fods.2019015

Randomized learning of the second-moment matrix of a smooth function

 1 Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland 2 Department of Electrical Engineering, Colorado School of Mines, Denver, CO 80401, USA 3 Departments of Statistics and Computer Science, Rutgers University, Piscataway, NJ 08854, USA 4 Department of Computer Science, University of Colorado Boulder, Boulder, CO 80309, USA

* Corresponding author: Armin Eftekhari

Published  September 2019

Fund Project: AE was partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and also by the Turing Seed Funding grant SF019. MBW was partially supported by NSF grant CCF-1409258 and NSF CAREER grant CCF-1149225. PGC was partially supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0011077 and the Defense Advanced Research Projects Agency's Enabling Quantification of Uncertainty in Physical Systems.

Consider an open set $\mathbb{D}\subseteq\mathbb{R}^n$, equipped with a probability measure $\mu$. An important characteristic of a smooth function $f:\mathbb{D}\rightarrow\mathbb{R}$ is its second-moment matrix $\Sigma_{\mu}: = \int \nabla f(x) \nabla f(x)^* \mu(dx) \in\mathbb{R}^{n\times n}$, where $\nabla f(x)\in\mathbb{R}^n$ is the gradient of $f(\cdot)$ at $x\in\mathbb{D}$ and $*$ stands for transpose. For instance, the span of the leading $r$ eigenvectors of $\Sigma_{\mu}$ forms an active subspace of $f(\cdot)$, which contains the directions along which $f(\cdot)$ changes the most and is of particular interest in ridge approximation. In this work, we propose a simple algorithm for estimating $\Sigma_{\mu}$ from random point evaluations of $f(\cdot)$ without imposing any structural assumptions on $\Sigma_{\mu}$. Theoretical guarantees for this algorithm are established with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements.

Citation: Armin Eftekhari, Michael B. Wakin, Ping Li, Paul G. Constantine. Randomized learning of the second-moment matrix of a smooth function. Foundations of Data Science, 2019, 1 (3) : 329-387. doi: 10.3934/fods.2019015
References:
 [1] R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238.  doi: 10.4064/ba53-2-10. [2] F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349. [3] M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077. [4] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344. [5] I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359. [6] T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138.  doi: 10.1214/14-AOS1267. [7] E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998. [8] Y. Chen, Y. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059.  doi: 10.1109/TIT.2015.2429594. [9] A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243.  doi: 10.1007/s00365-011-9147-6. [10] P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545. [11] P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ. [12] P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038. [13] R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25. [14] G. Dasarathy, P. Shah, B. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388.  doi: 10.1109/TIT.2015.2391251. [15] R. DeVore, G. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143.  doi: 10.1007/s00365-010-9105-8. [16] D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106.  doi: 10.1214/aos/1176347004. [17] A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784. [18] A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020. [19] M. Fornasier, K. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262.  doi: 10.1007/s10208-012-9115-y. [20] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5. [21] J. Friedman, T. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.  doi: 10.1093/biostatistics/kxm045. [22] J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823.  doi: 10.1080/01621459.1981.10477729. [23] K. Fukumizu, F. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99. [24] S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573.  doi: 10.1214/07-EJS077. [25] A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227. [26] G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15. [27] A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844. [28] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566.  doi: 10.1109/TIT.2011.2104999. [29] N. Halko, P. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288.  doi: 10.1137/090771806. [30] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC. [31] J. Haupt, R. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235.  doi: 10.1109/TIT.2011.2162269. [32] M. Hristache, A. Juditsky, J. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566.  doi: 10.1214/aos/1015345954. [33] P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475.  doi: 10.1214/aos/1176349519. [34] A. B. Juditsky, O. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404.  doi: 10.1214/08-AOS611. [35] S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892. [36] M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558. [37] A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751. [38] R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567. [39] K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327.  doi: 10.1080/01621459.1991.10475035. [40] E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172.  doi: 10.1073/pnas.0709640104. [41] P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664.  doi: 10.1214/12-AOS1018. [42] K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058.  doi: 10.3150/12-BEJ487. [43] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084. [44] F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010. [45] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195.  doi: 10.1017/S0962492900002919. [46] A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124. [47] F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169. [48] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. [49] P. Ravikumar, M. J. Wainwright, G. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980.  doi: 10.1214/11-EJS631. [50] B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430. [51] B. Recht, M. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501.  doi: 10.1137/070697835. [52] A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847.  doi: 10.1080/01621459.1993.10476348. [53] T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37. [54] C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705.  doi: 10.1214/aos/1176349548. [55] J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980. [56] R. Tripathy, I. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223.  doi: 10.1016/j.jcp.2016.05.039. [57] H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412.  doi: 10.1016/j.acha.2014.01.002. [58] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027. [59] F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619. [60] P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111.  doi: 10.1007/bf01932678. [61] H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005. [62] Y. Xia, H. Tong, W. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410.  doi: 10.1111/1467-9868.03411. [63] X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416.  doi: 10.1214/11-AOS950. [64] O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922.

show all references

References:
 [1] R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238.  doi: 10.4064/ba53-2-10. [2] F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349. [3] M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077. [4] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344. [5] I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359. [6] T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138.  doi: 10.1214/14-AOS1267. [7] E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998. [8] Y. Chen, Y. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059.  doi: 10.1109/TIT.2015.2429594. [9] A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243.  doi: 10.1007/s00365-011-9147-6. [10] P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545. [11] P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ. [12] P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038. [13] R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25. [14] G. Dasarathy, P. Shah, B. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388.  doi: 10.1109/TIT.2015.2391251. [15] R. DeVore, G. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143.  doi: 10.1007/s00365-010-9105-8. [16] D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106.  doi: 10.1214/aos/1176347004. [17] A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784. [18] A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020. [19] M. Fornasier, K. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262.  doi: 10.1007/s10208-012-9115-y. [20] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5. [21] J. Friedman, T. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.  doi: 10.1093/biostatistics/kxm045. [22] J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823.  doi: 10.1080/01621459.1981.10477729. [23] K. Fukumizu, F. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99. [24] S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573.  doi: 10.1214/07-EJS077. [25] A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227. [26] G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15. [27] A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844. [28] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566.  doi: 10.1109/TIT.2011.2104999. [29] N. Halko, P. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288.  doi: 10.1137/090771806. [30] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC. [31] J. Haupt, R. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235.  doi: 10.1109/TIT.2011.2162269. [32] M. Hristache, A. Juditsky, J. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566.  doi: 10.1214/aos/1015345954. [33] P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475.  doi: 10.1214/aos/1176349519. [34] A. B. Juditsky, O. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404.  doi: 10.1214/08-AOS611. [35] S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892. [36] M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558. [37] A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751. [38] R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567. [39] K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327.  doi: 10.1080/01621459.1991.10475035. [40] E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172.  doi: 10.1073/pnas.0709640104. [41] P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664.  doi: 10.1214/12-AOS1018. [42] K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058.  doi: 10.3150/12-BEJ487. [43] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084. [44] F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010. [45] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195.  doi: 10.1017/S0962492900002919. [46] A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124. [47] F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169. [48] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. [49] P. Ravikumar, M. J. Wainwright, G. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980.  doi: 10.1214/11-EJS631. [50] B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430. [51] B. Recht, M. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501.  doi: 10.1137/070697835. [52] A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847.  doi: 10.1080/01621459.1993.10476348. [53] T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37. [54] C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705.  doi: 10.1214/aos/1176349548. [55] J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980. [56] R. Tripathy, I. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223.  doi: 10.1016/j.jcp.2016.05.039. [57] H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412.  doi: 10.1016/j.acha.2014.01.002. [58] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027. [59] F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619. [60] P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111.  doi: 10.1007/bf01932678. [61] H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005. [62] Y. Xia, H. Tong, W. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410.  doi: 10.1111/1467-9868.03411. [63] X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416.  doi: 10.1214/11-AOS950. [64] O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922.
Visualization of the problem setup. The probability measure $\mu$ is supported on the domain $\mathbb{D}\subseteq \mathbb{R}^n$ of a smooth function $f(\cdot)$. Here, $N = 3$ and $X = \{x_i\}_{i = 1}^N$ are drawn independently from $\mu$. For sufficiently small $\epsilon$, we let $\mathbb{B}_{x_i, \epsilon}$ denote the $\epsilon$-neighborhood of each $x_i$ and set $\mathbb{B}_{X, \epsilon} = \cup_{i = 1}^N \mathbb{B}_{x_i, \epsilon}$. On $\mathbb{B}_{X, \epsilon}$, $\mu$ induces the conditional measure $\mu_{X, \epsilon}$, from which $N_{X, \epsilon}$ points are independently drawn and collected in $Y_{X, \epsilon} = \{y_{ij}\}_{i, j}$. Here, $x_1$ has $N_{x_1, \epsilon} = 2$ neighbors in $Y_{X, \epsilon}$ and we set $Y_{x_1, \epsilon} = \{y_{1, j} \}_{j = 1}^{N_{x_1, \epsilon}}$. Similarly, $Y_{x_2, \epsilon}$ and $Y_{x_3, \epsilon}$ are formed. Note that $Y_{X, \epsilon} = \cup_{i = 1}^N Y_{x_i, \epsilon}$. Our objective is to estimate the second-moment matrix of $f(\cdot)$ (with respect to the probability measure $\mu$) given $\{x_i, f(x_i)\}$ and $\{y_{ij}, f(y_{ij})\}$
A simulation study using a 500-dimensional ($n = 500$) quadratic function for the relative error in the approximation (7) as a function of the number $N_{X, \epsilon}$ of total samples. All simulations use $\epsilon = 10^{-4}$. Each subfigure uses a different value for the minimum number $N_{X, \text{min}, \epsilon}$ of points in the $\epsilon$-neighborhood of each center point in $X$, and varies the number $N$ of centers. Each of the five groups of blue dots in each subfigure indicates a different number $N$. Within each group, there are ten replications. The slope of each line is $-1/2$, which verifies the expected convergence rate from (6)
 [1] Lori Badea, Marius Cocou. Approximation results and subspace correction algorithms for implicit variational inequalities. Discrete and Continuous Dynamical Systems - S, 2013, 6 (6) : 1507-1524. doi: 10.3934/dcdss.2013.6.1507 [2] YunKyong Hyon. Hysteretic behavior of a moment-closure approximation for FENE model. Kinetic and Related Models, 2014, 7 (3) : 493-507. doi: 10.3934/krm.2014.7.493 [3] T. Hillen. On the $L^2$-moment closure of transport equations: The Cattaneo approximation. Discrete and Continuous Dynamical Systems - B, 2004, 4 (4) : 961-982. doi: 10.3934/dcdsb.2004.4.961 [4] Yitong Guo, Bingo Wing-Kuen Ling. Principal component analysis with drop rank covariance matrix. Journal of Industrial and Management Optimization, 2021, 17 (5) : 2345-2366. doi: 10.3934/jimo.2020072 [5] Yong-Jung Kim. A generalization of the moment problem to a complex measure space and an approximation technique using backward moments. Discrete and Continuous Dynamical Systems, 2011, 30 (1) : 187-207. doi: 10.3934/dcds.2011.30.187 [6] Pablo Ochoa. Approximation schemes for non-linear second order equations on the Heisenberg group. Communications on Pure and Applied Analysis, 2015, 14 (5) : 1841-1863. doi: 10.3934/cpaa.2015.14.1841 [7] Martin Redmann, Peter Benner. Approximation and model order reduction for second order systems with Levy-noise. Conference Publications, 2015, 2015 (special) : 945-953. doi: 10.3934/proc.2015.0945 [8] Yongchao Liu, Hailin Sun, Huifu Xu. An approximation scheme for stochastic programs with second order dominance constraints. Numerical Algebra, Control and Optimization, 2016, 6 (4) : 473-490. doi: 10.3934/naco.2016021 [9] Arnaud Münch, Ademir Fernando Pazoto. Boundary stabilization of a nonlinear shallow beam: theory and numerical approximation. Discrete and Continuous Dynamical Systems - B, 2008, 10 (1) : 197-219. doi: 10.3934/dcdsb.2008.10.197 [10] Yanzhao Cao, Anping Liu, Zhimin Zhang. Special section on differential equations: Theory, application, and numerical approximation. Discrete and Continuous Dynamical Systems - B, 2015, 20 (5) : i-ii. doi: 10.3934/dcdsb.2015.20.5i [11] Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems and Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025 [12] Laurent Baratchart, Sylvain Chevillard, Douglas Hardin, Juliette Leblond, Eduardo Andrade Lima, Jean-Paul Marmorat. Magnetic moment estimation and bounded extremal problems. Inverse Problems and Imaging, 2019, 13 (1) : 39-67. doi: 10.3934/ipi.2019003 [13] Gilberto M. Kremer, Wilson Marques Jr.. Fourteen moment theory for granular gases. Kinetic and Related Models, 2011, 4 (1) : 317-331. doi: 10.3934/krm.2011.4.317 [14] Marco Di Francesco, Simone Fagioli, Massimiliano D. Rosini. Many particle approximation of the Aw-Rascle-Zhang second order model for vehicular traffic. Mathematical Biosciences & Engineering, 2017, 14 (1) : 127-141. doi: 10.3934/mbe.2017009 [15] Mohamed Assellaou, Olivier Bokanowski, Hasnaa Zidani. Error estimates for second order Hamilton-Jacobi-Bellman equations. Approximation of probabilistic reachable sets. Discrete and Continuous Dynamical Systems, 2015, 35 (9) : 3933-3964. doi: 10.3934/dcds.2015.35.3933 [16] Abraão D. C. Nascimento, Leandro C. Rêgo, Raphaela L. B. A. Nascimento. Compound truncated Poisson normal distribution: Mathematical properties and Moment estimation. Inverse Problems and Imaging, 2019, 13 (4) : 787-803. doi: 10.3934/ipi.2019036 [17] Nguyen Thi Hoai. Asymptotic approximation to a solution of a singularly perturbed linear-quadratic optimal control problem with second-order linear ordinary differential equation of state variable. Numerical Algebra, Control and Optimization, 2021, 11 (4) : 495-512. doi: 10.3934/naco.2020040 [18] D. Lannes. Consistency of the KP approximation. Conference Publications, 2003, 2003 (Special) : 517-525. doi: 10.3934/proc.2003.2003.517 [19] H. N. Mhaskar, T. Poggio. Function approximation by deep networks. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4085-4095. doi: 10.3934/cpaa.2020181 [20] Cristina Stoica. An approximation theorem in classical mechanics. Journal of Geometric Mechanics, 2016, 8 (3) : 359-374. doi: 10.3934/jgm.2016011

Impact Factor: