# American Institute of Mathematical Sciences

September  2019, 1(3): 329-387. doi: 10.3934/fods.2019015

## Randomized learning of the second-moment matrix of a smooth function

 1 Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland 2 Department of Electrical Engineering, Colorado School of Mines, Denver, CO 80401, USA 3 Departments of Statistics and Computer Science, Rutgers University, Piscataway, NJ 08854, USA 4 Department of Computer Science, University of Colorado Boulder, Boulder, CO 80309, USA

* Corresponding author: Armin Eftekhari

Published  September 2019

Fund Project: AE was partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and also by the Turing Seed Funding grant SF019. MBW was partially supported by NSF grant CCF-1409258 and NSF CAREER grant CCF-1149225. PGC was partially supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0011077 and the Defense Advanced Research Projects Agency's Enabling Quantification of Uncertainty in Physical Systems.

Consider an open set $\mathbb{D}\subseteq\mathbb{R}^n$, equipped with a probability measure $\mu$. An important characteristic of a smooth function $f:\mathbb{D}\rightarrow\mathbb{R}$ is its second-moment matrix $\Sigma_{\mu}: = \int \nabla f(x) \nabla f(x)^* \mu(dx) \in\mathbb{R}^{n\times n}$, where $\nabla f(x)\in\mathbb{R}^n$ is the gradient of $f(\cdot)$ at $x\in\mathbb{D}$ and $*$ stands for transpose. For instance, the span of the leading $r$ eigenvectors of $\Sigma_{\mu}$ forms an active subspace of $f(\cdot)$, which contains the directions along which $f(\cdot)$ changes the most and is of particular interest in ridge approximation. In this work, we propose a simple algorithm for estimating $\Sigma_{\mu}$ from random point evaluations of $f(\cdot)$ without imposing any structural assumptions on $\Sigma_{\mu}$. Theoretical guarantees for this algorithm are established with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements.

Citation: Armin Eftekhari, Michael B. Wakin, Ping Li, Paul G. Constantine. Randomized learning of the second-moment matrix of a smooth function. Foundations of Data Science, 2019, 1 (3) : 329-387. doi: 10.3934/fods.2019015
##### References:
 [1] R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238.  doi: 10.4064/ba53-2-10.  Google Scholar [2] F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349. Google Scholar [3] M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077.  Google Scholar [4] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344.  Google Scholar [5] I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359.  Google Scholar [6] T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138.  doi: 10.1214/14-AOS1267.  Google Scholar [7] E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998.  Google Scholar [8] Y. Chen, Y. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059.  doi: 10.1109/TIT.2015.2429594.  Google Scholar [9] A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243.  doi: 10.1007/s00365-011-9147-6.  Google Scholar [10] P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545. Google Scholar [11] P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ.  Google Scholar [12] P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038.  Google Scholar [13] R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25. Google Scholar [14] G. Dasarathy, P. Shah, B. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388.  doi: 10.1109/TIT.2015.2391251.  Google Scholar [15] R. DeVore, G. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143.  doi: 10.1007/s00365-010-9105-8.  Google Scholar [16] D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106.  doi: 10.1214/aos/1176347004.  Google Scholar [17] A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784.  Google Scholar [18] A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020.  Google Scholar [19] M. Fornasier, K. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262.  doi: 10.1007/s10208-012-9115-y.  Google Scholar [20] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar [21] J. Friedman, T. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.  doi: 10.1093/biostatistics/kxm045.  Google Scholar [22] J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823.  doi: 10.1080/01621459.1981.10477729.  Google Scholar [23] K. Fukumizu, F. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99.   Google Scholar [24] S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573.  doi: 10.1214/07-EJS077.  Google Scholar [25] A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227. Google Scholar [26] G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15.   Google Scholar [27] A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844.  Google Scholar [28] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566.  doi: 10.1109/TIT.2011.2104999.  Google Scholar [29] N. Halko, P. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288.  doi: 10.1137/090771806.  Google Scholar [30] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC.  Google Scholar [31] J. Haupt, R. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235.  doi: 10.1109/TIT.2011.2162269.  Google Scholar [32] M. Hristache, A. Juditsky, J. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566.  doi: 10.1214/aos/1015345954.  Google Scholar [33] P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475.  doi: 10.1214/aos/1176349519.  Google Scholar [34] A. B. Juditsky, O. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404.  doi: 10.1214/08-AOS611.  Google Scholar [35] S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892.  Google Scholar [36] M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558. Google Scholar [37] A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751. Google Scholar [38] R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567. Google Scholar [39] K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327.  doi: 10.1080/01621459.1991.10475035.  Google Scholar [40] E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172.  doi: 10.1073/pnas.0709640104.  Google Scholar [41] P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664.  doi: 10.1214/12-AOS1018.  Google Scholar [42] K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058.  doi: 10.3150/12-BEJ487.  Google Scholar [43] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084.  Google Scholar [44] F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010. Google Scholar [45] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195.  doi: 10.1017/S0962492900002919.  Google Scholar [46] A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124.  Google Scholar [47] F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169.  Google Scholar [48] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006.  Google Scholar [49] P. Ravikumar, M. J. Wainwright, G. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980.  doi: 10.1214/11-EJS631.  Google Scholar [50] B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430.   Google Scholar [51] B. Recht, M. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501.  doi: 10.1137/070697835.  Google Scholar [52] A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847.  doi: 10.1080/01621459.1993.10476348.  Google Scholar [53] T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37.  Google Scholar [54] C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705.  doi: 10.1214/aos/1176349548.  Google Scholar [55] J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980.  Google Scholar [56] R. Tripathy, I. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223.  doi: 10.1016/j.jcp.2016.05.039.  Google Scholar [57] H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412.  doi: 10.1016/j.acha.2014.01.002.  Google Scholar [58] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027.  Google Scholar [59] F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619. Google Scholar [60] P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111.  doi: 10.1007/bf01932678.  Google Scholar [61] H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005.  Google Scholar [62] Y. Xia, H. Tong, W. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410.  doi: 10.1111/1467-9868.03411.  Google Scholar [63] X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416.  doi: 10.1214/11-AOS950.  Google Scholar [64] O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922. Google Scholar

show all references

##### References:
 [1] R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238.  doi: 10.4064/ba53-2-10.  Google Scholar [2] F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349. Google Scholar [3] M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077.  Google Scholar [4] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344.  Google Scholar [5] I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359.  Google Scholar [6] T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138.  doi: 10.1214/14-AOS1267.  Google Scholar [7] E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998.  Google Scholar [8] Y. Chen, Y. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059.  doi: 10.1109/TIT.2015.2429594.  Google Scholar [9] A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243.  doi: 10.1007/s00365-011-9147-6.  Google Scholar [10] P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545. Google Scholar [11] P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ.  Google Scholar [12] P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038.  Google Scholar [13] R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25. Google Scholar [14] G. Dasarathy, P. Shah, B. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388.  doi: 10.1109/TIT.2015.2391251.  Google Scholar [15] R. DeVore, G. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143.  doi: 10.1007/s00365-010-9105-8.  Google Scholar [16] D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106.  doi: 10.1214/aos/1176347004.  Google Scholar [17] A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784.  Google Scholar [18] A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020.  Google Scholar [19] M. Fornasier, K. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262.  doi: 10.1007/s10208-012-9115-y.  Google Scholar [20] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar [21] J. Friedman, T. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441.  doi: 10.1093/biostatistics/kxm045.  Google Scholar [22] J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823.  doi: 10.1080/01621459.1981.10477729.  Google Scholar [23] K. Fukumizu, F. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99.   Google Scholar [24] S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573.  doi: 10.1214/07-EJS077.  Google Scholar [25] A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227. Google Scholar [26] G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15.   Google Scholar [27] A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844.  Google Scholar [28] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566.  doi: 10.1109/TIT.2011.2104999.  Google Scholar [29] N. Halko, P. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288.  doi: 10.1137/090771806.  Google Scholar [30] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC.  Google Scholar [31] J. Haupt, R. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235.  doi: 10.1109/TIT.2011.2162269.  Google Scholar [32] M. Hristache, A. Juditsky, J. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566.  doi: 10.1214/aos/1015345954.  Google Scholar [33] P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475.  doi: 10.1214/aos/1176349519.  Google Scholar [34] A. B. Juditsky, O. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404.  doi: 10.1214/08-AOS611.  Google Scholar [35] S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892.  Google Scholar [36] M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558. Google Scholar [37] A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751. Google Scholar [38] R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567. Google Scholar [39] K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327.  doi: 10.1080/01621459.1991.10475035.  Google Scholar [40] E. Liberty, F. Woolfe, P. G. Martinsson, V. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172.  doi: 10.1073/pnas.0709640104.  Google Scholar [41] P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664.  doi: 10.1214/12-AOS1018.  Google Scholar [42] K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058.  doi: 10.3150/12-BEJ487.  Google Scholar [43] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084.  Google Scholar [44] F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010. Google Scholar [45] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195.  doi: 10.1017/S0962492900002919.  Google Scholar [46] A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124.  Google Scholar [47] F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169.  Google Scholar [48] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006.  Google Scholar [49] P. Ravikumar, M. J. Wainwright, G. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980.  doi: 10.1214/11-EJS631.  Google Scholar [50] B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430.   Google Scholar [51] B. Recht, M. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501.  doi: 10.1137/070697835.  Google Scholar [52] A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847.  doi: 10.1080/01621459.1993.10476348.  Google Scholar [53] T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37.  Google Scholar [54] C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705.  doi: 10.1214/aos/1176349548.  Google Scholar [55] J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980.  Google Scholar [56] R. Tripathy, I. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223.  doi: 10.1016/j.jcp.2016.05.039.  Google Scholar [57] H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412.  doi: 10.1016/j.acha.2014.01.002.  Google Scholar [58] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027.  Google Scholar [59] F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619. Google Scholar [60] P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111.  doi: 10.1007/bf01932678.  Google Scholar [61] H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005.  Google Scholar [62] Y. Xia, H. Tong, W. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410.  doi: 10.1111/1467-9868.03411.  Google Scholar [63] X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416.  doi: 10.1214/11-AOS950.  Google Scholar [64] O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922. Google Scholar
Visualization of the problem setup. The probability measure $\mu$ is supported on the domain $\mathbb{D}\subseteq \mathbb{R}^n$ of a smooth function $f(\cdot)$. Here, $N = 3$ and $X = \{x_i\}_{i = 1}^N$ are drawn independently from $\mu$. For sufficiently small $\epsilon$, we let $\mathbb{B}_{x_i, \epsilon}$ denote the $\epsilon$-neighborhood of each $x_i$ and set $\mathbb{B}_{X, \epsilon} = \cup_{i = 1}^N \mathbb{B}_{x_i, \epsilon}$. On $\mathbb{B}_{X, \epsilon}$, $\mu$ induces the conditional measure $\mu_{X, \epsilon}$, from which $N_{X, \epsilon}$ points are independently drawn and collected in $Y_{X, \epsilon} = \{y_{ij}\}_{i, j}$. Here, $x_1$ has $N_{x_1, \epsilon} = 2$ neighbors in $Y_{X, \epsilon}$ and we set $Y_{x_1, \epsilon} = \{y_{1, j} \}_{j = 1}^{N_{x_1, \epsilon}}$. Similarly, $Y_{x_2, \epsilon}$ and $Y_{x_3, \epsilon}$ are formed. Note that $Y_{X, \epsilon} = \cup_{i = 1}^N Y_{x_i, \epsilon}$. Our objective is to estimate the second-moment matrix of $f(\cdot)$ (with respect to the probability measure $\mu$) given $\{x_i, f(x_i)\}$ and $\{y_{ij}, f(y_{ij})\}$
A simulation study using a 500-dimensional ($n = 500$) quadratic function for the relative error in the approximation (7) as a function of the number $N_{X, \epsilon}$ of total samples. All simulations use $\epsilon = 10^{-4}$. Each subfigure uses a different value for the minimum number $N_{X, \text{min}, \epsilon}$ of points in the $\epsilon$-neighborhood of each center point in $X$, and varies the number $N$ of centers. Each of the five groups of blue dots in each subfigure indicates a different number $N$. Within each group, there are ten replications. The slope of each line is $-1/2$, which verifies the expected convergence rate from (6)
 [1] Yifan Chen, Thomas Y. Hou. Function approximation via the subsampled Poincaré inequality. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 169-199. doi: 10.3934/dcds.2020296 [2] Mostafa Mbekhta. Representation and approximation of the polar factor of an operator on a Hilbert space. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020463 [3] Felix Finster, Jürg Fröhlich, Marco Oppio, Claudio F. Paganini. Causal fermion systems and the ETH approach to quantum theory. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020451 [4] Aihua Fan, Jörg Schmeling, Weixiao Shen. $L^\infty$-estimation of generalized Thue-Morse trigonometric polynomials and ergodic maximization. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 297-327. doi: 10.3934/dcds.2020363 [5] Peizhao Yu, Guoshan Zhang, Yi Zhang. Decoupling of cubic polynomial matrix systems. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 13-26. doi: 10.3934/naco.2020012 [6] Shengxin Zhu, Tongxiang Gu, Xingping Liu. AIMS: Average information matrix splitting. Mathematical Foundations of Computing, 2020, 3 (4) : 301-308. doi: 10.3934/mfc.2020012 [7] Weisong Dong, Chang Li. Second order estimates for complex Hessian equations on Hermitian manifolds. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020377 [8] Pierre-Etienne Druet. A theory of generalised solutions for ideal gas mixtures with Maxwell-Stefan diffusion. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020458 [9] Juan Pablo Pinasco, Mauro Rodriguez Cartabia, Nicolas Saintier. Evolutionary game theory in mixed strategies: From microscopic interactions to kinetic equations. Kinetic & Related Models, , () : -. doi: 10.3934/krm.2020051 [10] Shasha Hu, Yihong Xu, Yuhan Zhang. Second-Order characterizations for set-valued equilibrium problems with variable ordering structures. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020164 [11] Claudianor O. Alves, Rodrigo C. M. Nemer, Sergio H. Monari Soares. The use of the Morse theory to estimate the number of nontrivial solutions of a nonlinear Schrödinger equation with a magnetic field. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020276 [12] Shenglan Xie, Maoan Han, Peng Zhu. A posteriori error estimate of weak Galerkin fem for second order elliptic problem with mixed boundary condition. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020340 [13] Soniya Singh, Sumit Arora, Manil T. Mohan, Jaydev Dabas. Approximate controllability of second order impulsive systems with state-dependent delay in Banach spaces. Evolution Equations & Control Theory, 2020  doi: 10.3934/eect.2020103 [14] Parikshit Upadhyaya, Elias Jarlebring, Emanuel H. Rubensson. A density matrix approach to the convergence of the self-consistent field iteration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 99-115. doi: 10.3934/naco.2020018 [15] S. Sadeghi, H. Jafari, S. Nemati. Solving fractional Advection-diffusion equation using Genocchi operational matrix based on Atangana-Baleanu derivative. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020435 [16] Yuri Fedorov, Božidar Jovanović. Continuous and discrete Neumann systems on Stiefel varieties as matrix generalizations of the Jacobi–Mumford systems. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020375 [17] Dan Zhu, Rosemary A. Renaut, Hongwei Li, Tianyou Liu. Fast non-convex low-rank matrix decomposition for separation of potential field data using minimal memory. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020076 [18] Sihem Guerarra. Maximum and minimum ranks and inertias of the Hermitian parts of the least rank solution of the matrix equation AXB = C. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 75-86. doi: 10.3934/naco.2020016

Impact Factor:

## Tools

Article outline

Figures and Tables