June  2020, 2(2): 123-154. doi: 10.3934/fods.2020008

Hierarchical approximations for data reduction and learning at multiple scales

Data Intensive Studies Center, Tufts University, Medford, MA, 02155, USA

* Corresponding author: Prashant Shekhar

Published  July 2020

Fund Project: This work was funded by the grants NSF1821311, NSF1645053, NSF1621853

This paper describes a hierarchical learning strategy for generating sparse representations of multivariate datasets. The hierarchy arises from approximation spaces considered at successively finer scales. A detailed analysis of stability, convergence and behavior of error functionals associated with the approximations are presented, along with a well chosen set of applications. Results show the performance of the approach as a data reduction mechanism for both synthetic (univariate and multivariate) and a real dataset (geo-spatial). The sparse representation generated is shown to efficiently reconstruct data and minimize error in prediction. The approach is also shown to generalize well to unseen samples, extending its prospective application to statistical learning problems.

Citation: Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008
References:
[1]

N. I. Achieser, Theory of Approximation, Frederick Ungar Publishing Co., New York, 1956.  Google Scholar

[2]

W. K. AllardG. Chen and M. Maggioni, Multi-scale geometric methods for data setsⅡ: Geometric multi-resolution analysis, Appl. Comput. Harmon. Anal., 32 (2012), 435-462.  doi: 10.1016/j.acha.2011.08.001.  Google Scholar

[3]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7.  Google Scholar

[4]

F. Bellocchio, N. Borghese, S. Ferrari and V. Piuri, Kernel regression in HRBF networks for surface reconstruction, 2008 IEEE International Workshop on Haptic Audio Visual Environments and Games, (2008), 160–165. doi: 10.1109/HAVE.2008.4685317.  Google Scholar

[5]

A. BermanisA. Averbuch and R. R. Coifman, Multiscale data sampling and function extension, Appl. Comput. Harmon. Anal., 34 (2013), 15-29.  doi: 10.1016/j.acha.2012.03.002.  Google Scholar

[6]

A. BermanisG. Wolf and A. Averbuch, Diffusion-based kernel methods on euclidean metric measure spaces, Appl. Comput. Harmon. Anal., 41 (2016), 190-213.  doi: 10.1016/j.acha.2015.07.005.  Google Scholar

[7]

B. BohnJ. Garcke and M. Griebel, A sparse grid based method for generative dimensionality reduction of high-dimensional data, J. Comput. Phys., 309 (2016), 1-17.  doi: 10.1016/j.jcp.2015.12.033.  Google Scholar

[8]

N. A. Borghese and S. Ferrari, Hierarchical rbf networks and local parameters estimate, Neurocomputing, 19 (1998), 259-283.   Google Scholar

[9]

C. Boutsidis, M. W. Mahoney and P. Drineas, An improved approximation algorithm for the column subset selection problem, Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2009), 968–977.  Google Scholar

[10]

W. L. Briggs, V. E. Henson and S. F. McCormick, A Multigrid Tutorial, 2nd edition, SIAM, 2000. doi: 10.1137/1.9780898719505.  Google Scholar

[11] M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Vol. 12, Cambridge university press, 2003.  doi: 10.1017/CBO9780511543241.  Google Scholar
[12]

J. C. CarrW. R. Fright and R. K. Beatson, Surface interpolation with radial basis functions for medical imaging, IEEE Transactions on Medical Imaging, 16 (1997), 96-107.  doi: 10.1109/42.552059.  Google Scholar

[13]

G. Chen, A. V. Little and M. Maggioni, Multi-resolution geometric analysis for data in high dimensions, in Excursions in Harmonic Analysis, Springer, 1 (2013), 259–285. doi: 10.1007/978-0-8176-8376-4_13.  Google Scholar

[14]

E. W. Cheney, Introduction to Approximation Theory, McGraw-Hill Book Co., New York-Toronto, Ont.-London, 1966.  Google Scholar

[15]

W. Cheney and W. Light, A Course in Approximation Theory, Vol. 101, American Mathematical Society, 2009. doi: 10.1090/gsm/101.  Google Scholar

[16]

C. H. ChouB. H. Kuo and F. Chang, The generalized condensed nearest neighbor rule as a data reduction method, 18th International Conference on Pattern Recognition, 2 (2006), 556-559.   Google Scholar

[17]

A. Çivril and M. Magdon-Ismail, On selecting a maximum volume sub-matrix of a matrix and related problems, Theoret. Comput. Sci., 410 (2009), 4801-4811.  doi: 10.1016/j.tcs.2009.06.018.  Google Scholar

[18]

R. R. CoifmanS. LafonA. B. LeeM. MaggioniB. NadlerF. Warner and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proceedings of the National Academy of Sciences, 102 (2005), 7426-7431.   Google Scholar

[19]

R. R. CoifmanS. LafonA. B. LeeM. MaggioniB. NadlerF. Warner and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods, Proceedings of the National Academy of Sciences, 102 (2005), 7432-7437.   Google Scholar

[20]

R. R. Coifman and M. Maggioni, Diffusion wavelets, Appl. Comput. Harmon. Anal., 21 (2006), 53-94.  doi: 10.1016/j.acha.2006.04.004.  Google Scholar

[21]

I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, 61. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992. doi: 10.1137/1.9781611970104.  Google Scholar

[22]

S. De Marchi and R. Schaback, Stability of kernel-based interpolation, Adv. Comput. Math., 32 (2010), 155-161.  doi: 10.1007/s10444-008-9093-4.  Google Scholar

[23]

A. Deshpande and S. Vempala, Adaptive sampling and fast low-rank matrix approximation, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Springer, (2006), 292–303. doi: 10.1007/11830924_28.  Google Scholar

[24]

G. E. Fasshauer and J. G. Zhang, Preconditioning of radial basis function interpolation systems via accelerated iterated approximate moving least squares approximation, in Progress on Meshless Methods, Springer, (2009), 57–75. doi: 10.1007/978-1-4020-8821-6_4.  Google Scholar

[25]

S. FerrariF. BellocchioV. Piuri and N. A. Borghese, A hierarchical rbf online learning algorithm for real-time 3-d scanner, IEEE Transactions on Neural Networks, 21 (2009), 275-285.   Google Scholar

[26]

S. FerrariM. Maggioni and N. A. Borghese, Multiscale approximation with hierarchical radial basis functions networks, IEEE Transactions on Neural Networks, 15 (2004), 178-188.  doi: 10.1109/TNN.2003.811355.  Google Scholar

[27]

M. S Floater and A. Iske, Multistep scattered data interpolation using compactly supported radial basis functions, J. Comput. Appl. Math., 73 (1996), 65-78.  doi: 10.1016/0377-0427(96)00035-0.  Google Scholar

[28]

T. E. FrickerJ. E. Oakley and N. M. Urban, Multivariate Gaussian process emulators with nonseparable covariance structures, Technometrics, 55 (2013), 47-56.  doi: 10.1080/00401706.2012.715835.  Google Scholar

[29]

M. GalunR. Basri and I. Yavneh, Review of methods inspired by algebraic-multigrid for data and image analysis applications, Numer. Math. Theory Methods Appl., 8 (2015), 283-312.  doi: 10.4208/nmtma.2015.w14si.  Google Scholar

[30]

M. Gavish, B. Nadler and R. R. Coifman, Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning, ICML, (2010), 367–374. Google Scholar

[31]

M. Griebel and A. Hullmann, A sparse grid based generative topographic mapping for the dimensionality reduction of high-dimensional data, Modeling, Simulation and Optimization of Complex Processes-HPSC, Springer, (2012), 51–62. Google Scholar

[32]

M. Gu and J. O. Berger, Parallel partial Gaussian process emulation for computer models with massive output, Ann. Appl. Stat., 10 (2016), 1317-1347.  doi: 10.1214/16-AOAS934.  Google Scholar

[33]

A. Iske, Scattered data approximation by positive definite kernel functions, Rend. Semin. Mat. Univ. Politec. Torino, 69 (2011), 217-246.   Google Scholar

[34]

P. Koumoutsakos, Multiscale flow simulations using particles, Annu. Rev. Fluid Mech., 37 (2005), 457-487.  doi: 10.1146/annurev.fluid.37.061903.175753.  Google Scholar

[35]

E. Kreyszig, Introductory Functional Analysis with Applications, Vol. 1, John Wiley & Sons, New York-London-Sydney, 1978.  Google Scholar

[36]

D. KushnirM. Galun and A. Brandt, Efficient multilevel eigensolvers with applications to data analysis tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2009), 1377-1391.  doi: 10.1109/TPAMI.2009.147.  Google Scholar

[37]

T. Lane and C. E. Brodley, Temporal sequence learning and data reduction for anomaly detection, CCS '98: Proceedings of the 5th ACM Conference on Computer and communications Security, (1998), 150–158. doi: 10.1145/288090.288122.  Google Scholar

[38]

M. Maggioni, J. C. Bremer Jr, R. R. Coifman and A. D. Szlam, Biorthogonal diffusion wavelets for multiscale representation on manifolds and graphs, in Wavelets XI, International Society for Optics and Photonics, 5914, (2005), 59141M. doi: 10.1117/12.616909.  Google Scholar

[39]

S. G. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Transactions on Pattern Analysis & Machine Intelligence, 11 (1989), 674-693.   Google Scholar

[40]

S. Paul, M. Magdon-Ismail and P. Drineas, Column selection via adaptive sampling, in Advances in Neural Information Processing Systems, (2015), 406–414. Google Scholar

[41] M. J. D. Powell, Approximation Theory and Methods, Cambridge university press, 1981.   Google Scholar
[42]

A. Quarteroni and A. Veneziani, Analysis of a geometrical multiscale model based on the coupling of ODEs and PDEs for blood flow simulations, Multiscale Model. Simul., 1 (2003), 173-195.  doi: 10.1137/S1540345902408482.  Google Scholar

[43] C. E. Rasmussen and C. K. I Williams, Gaussian Process for Machine Learning Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2006.   Google Scholar
[44]

E. Sadrfaridpour, S. Jeereddy, K. Kennedy, A. Luckow, T. Razzaghi and I. Safro, Algebraic multigrid support vector machines, preprint, (2016), arXiv: 1611.05487. Google Scholar

[45]

L. Shih, J. D. Rennie, Y. H. Chang and D. R. Karger, Text bundling: Statistics based data-reduction, in Proceedings of the 20th International Conference on Machine Learning (ICML-03), (2003), 696–703. Google Scholar

[46]

K. Stüben, A review of algebraic multigrid, in Numerical Analysis: Historical Developments in the 20th Century, Elsevier, (2001), 331–359. Google Scholar

[47]

S. Surjanovic and D. Bingham, Virtual Library of Simulation Experiments: Test Functions and Datasets, Retrieved January 23, 2019, from http://www.sfu.ca/ ssurjano. Google Scholar

[48]

A. D. Szlam, M. Maggioni, R. R. Coifman and J. C. Bremer Jr, Diffusion-driven multiscale analysis on manifolds and graphs: top-down and bottom-up constructions, in Wavelets XI, International Society for Optics and Photonics, 5914 (2005), 59141D. Google Scholar

[49]

J. T. VogelsteinY. ParkT. OhyamaR. A. KerrJ. W. TrumanC. E. Priebe and M. Zlatic, Discovery of brainwide neural-behavioral maps via multiscale unsupervised structure learning, Science, 344 (2014), 386-392.   Google Scholar

[50] H. Wendland, Scattered Data Approximation, Cambridge Monographs on Applied and Computational Mathematics, Vol. 17, Cambridge university press, 2005.   Google Scholar
[51]

Q. WuT. XiaC. ChenH.-Y. S. LinH. Wang and Y. Yu, Hierarchical tensor approximation of multi-dimensional visual data, IEEE Transactions on Visualization and Computer Graphics, 14 (2007), 186-199.   Google Scholar

show all references

References:
[1]

N. I. Achieser, Theory of Approximation, Frederick Ungar Publishing Co., New York, 1956.  Google Scholar

[2]

W. K. AllardG. Chen and M. Maggioni, Multi-scale geometric methods for data setsⅡ: Geometric multi-resolution analysis, Appl. Comput. Harmon. Anal., 32 (2012), 435-462.  doi: 10.1016/j.acha.2011.08.001.  Google Scholar

[3]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7.  Google Scholar

[4]

F. Bellocchio, N. Borghese, S. Ferrari and V. Piuri, Kernel regression in HRBF networks for surface reconstruction, 2008 IEEE International Workshop on Haptic Audio Visual Environments and Games, (2008), 160–165. doi: 10.1109/HAVE.2008.4685317.  Google Scholar

[5]

A. BermanisA. Averbuch and R. R. Coifman, Multiscale data sampling and function extension, Appl. Comput. Harmon. Anal., 34 (2013), 15-29.  doi: 10.1016/j.acha.2012.03.002.  Google Scholar

[6]

A. BermanisG. Wolf and A. Averbuch, Diffusion-based kernel methods on euclidean metric measure spaces, Appl. Comput. Harmon. Anal., 41 (2016), 190-213.  doi: 10.1016/j.acha.2015.07.005.  Google Scholar

[7]

B. BohnJ. Garcke and M. Griebel, A sparse grid based method for generative dimensionality reduction of high-dimensional data, J. Comput. Phys., 309 (2016), 1-17.  doi: 10.1016/j.jcp.2015.12.033.  Google Scholar

[8]

N. A. Borghese and S. Ferrari, Hierarchical rbf networks and local parameters estimate, Neurocomputing, 19 (1998), 259-283.   Google Scholar

[9]

C. Boutsidis, M. W. Mahoney and P. Drineas, An improved approximation algorithm for the column subset selection problem, Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2009), 968–977.  Google Scholar

[10]

W. L. Briggs, V. E. Henson and S. F. McCormick, A Multigrid Tutorial, 2nd edition, SIAM, 2000. doi: 10.1137/1.9780898719505.  Google Scholar

[11] M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Vol. 12, Cambridge university press, 2003.  doi: 10.1017/CBO9780511543241.  Google Scholar
[12]

J. C. CarrW. R. Fright and R. K. Beatson, Surface interpolation with radial basis functions for medical imaging, IEEE Transactions on Medical Imaging, 16 (1997), 96-107.  doi: 10.1109/42.552059.  Google Scholar

[13]

G. Chen, A. V. Little and M. Maggioni, Multi-resolution geometric analysis for data in high dimensions, in Excursions in Harmonic Analysis, Springer, 1 (2013), 259–285. doi: 10.1007/978-0-8176-8376-4_13.  Google Scholar

[14]

E. W. Cheney, Introduction to Approximation Theory, McGraw-Hill Book Co., New York-Toronto, Ont.-London, 1966.  Google Scholar

[15]

W. Cheney and W. Light, A Course in Approximation Theory, Vol. 101, American Mathematical Society, 2009. doi: 10.1090/gsm/101.  Google Scholar

[16]

C. H. ChouB. H. Kuo and F. Chang, The generalized condensed nearest neighbor rule as a data reduction method, 18th International Conference on Pattern Recognition, 2 (2006), 556-559.   Google Scholar

[17]

A. Çivril and M. Magdon-Ismail, On selecting a maximum volume sub-matrix of a matrix and related problems, Theoret. Comput. Sci., 410 (2009), 4801-4811.  doi: 10.1016/j.tcs.2009.06.018.  Google Scholar

[18]

R. R. CoifmanS. LafonA. B. LeeM. MaggioniB. NadlerF. Warner and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proceedings of the National Academy of Sciences, 102 (2005), 7426-7431.   Google Scholar

[19]

R. R. CoifmanS. LafonA. B. LeeM. MaggioniB. NadlerF. Warner and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods, Proceedings of the National Academy of Sciences, 102 (2005), 7432-7437.   Google Scholar

[20]

R. R. Coifman and M. Maggioni, Diffusion wavelets, Appl. Comput. Harmon. Anal., 21 (2006), 53-94.  doi: 10.1016/j.acha.2006.04.004.  Google Scholar

[21]

I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, 61. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992. doi: 10.1137/1.9781611970104.  Google Scholar

[22]

S. De Marchi and R. Schaback, Stability of kernel-based interpolation, Adv. Comput. Math., 32 (2010), 155-161.  doi: 10.1007/s10444-008-9093-4.  Google Scholar

[23]

A. Deshpande and S. Vempala, Adaptive sampling and fast low-rank matrix approximation, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Springer, (2006), 292–303. doi: 10.1007/11830924_28.  Google Scholar

[24]

G. E. Fasshauer and J. G. Zhang, Preconditioning of radial basis function interpolation systems via accelerated iterated approximate moving least squares approximation, in Progress on Meshless Methods, Springer, (2009), 57–75. doi: 10.1007/978-1-4020-8821-6_4.  Google Scholar

[25]

S. FerrariF. BellocchioV. Piuri and N. A. Borghese, A hierarchical rbf online learning algorithm for real-time 3-d scanner, IEEE Transactions on Neural Networks, 21 (2009), 275-285.   Google Scholar

[26]

S. FerrariM. Maggioni and N. A. Borghese, Multiscale approximation with hierarchical radial basis functions networks, IEEE Transactions on Neural Networks, 15 (2004), 178-188.  doi: 10.1109/TNN.2003.811355.  Google Scholar

[27]

M. S Floater and A. Iske, Multistep scattered data interpolation using compactly supported radial basis functions, J. Comput. Appl. Math., 73 (1996), 65-78.  doi: 10.1016/0377-0427(96)00035-0.  Google Scholar

[28]

T. E. FrickerJ. E. Oakley and N. M. Urban, Multivariate Gaussian process emulators with nonseparable covariance structures, Technometrics, 55 (2013), 47-56.  doi: 10.1080/00401706.2012.715835.  Google Scholar

[29]

M. GalunR. Basri and I. Yavneh, Review of methods inspired by algebraic-multigrid for data and image analysis applications, Numer. Math. Theory Methods Appl., 8 (2015), 283-312.  doi: 10.4208/nmtma.2015.w14si.  Google Scholar

[30]

M. Gavish, B. Nadler and R. R. Coifman, Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning, ICML, (2010), 367–374. Google Scholar

[31]

M. Griebel and A. Hullmann, A sparse grid based generative topographic mapping for the dimensionality reduction of high-dimensional data, Modeling, Simulation and Optimization of Complex Processes-HPSC, Springer, (2012), 51–62. Google Scholar

[32]

M. Gu and J. O. Berger, Parallel partial Gaussian process emulation for computer models with massive output, Ann. Appl. Stat., 10 (2016), 1317-1347.  doi: 10.1214/16-AOAS934.  Google Scholar

[33]

A. Iske, Scattered data approximation by positive definite kernel functions, Rend. Semin. Mat. Univ. Politec. Torino, 69 (2011), 217-246.   Google Scholar

[34]

P. Koumoutsakos, Multiscale flow simulations using particles, Annu. Rev. Fluid Mech., 37 (2005), 457-487.  doi: 10.1146/annurev.fluid.37.061903.175753.  Google Scholar

[35]

E. Kreyszig, Introductory Functional Analysis with Applications, Vol. 1, John Wiley & Sons, New York-London-Sydney, 1978.  Google Scholar

[36]

D. KushnirM. Galun and A. Brandt, Efficient multilevel eigensolvers with applications to data analysis tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2009), 1377-1391.  doi: 10.1109/TPAMI.2009.147.  Google Scholar

[37]

T. Lane and C. E. Brodley, Temporal sequence learning and data reduction for anomaly detection, CCS '98: Proceedings of the 5th ACM Conference on Computer and communications Security, (1998), 150–158. doi: 10.1145/288090.288122.  Google Scholar

[38]

M. Maggioni, J. C. Bremer Jr, R. R. Coifman and A. D. Szlam, Biorthogonal diffusion wavelets for multiscale representation on manifolds and graphs, in Wavelets XI, International Society for Optics and Photonics, 5914, (2005), 59141M. doi: 10.1117/12.616909.  Google Scholar

[39]

S. G. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Transactions on Pattern Analysis & Machine Intelligence, 11 (1989), 674-693.   Google Scholar

[40]

S. Paul, M. Magdon-Ismail and P. Drineas, Column selection via adaptive sampling, in Advances in Neural Information Processing Systems, (2015), 406–414. Google Scholar

[41] M. J. D. Powell, Approximation Theory and Methods, Cambridge university press, 1981.   Google Scholar
[42]

A. Quarteroni and A. Veneziani, Analysis of a geometrical multiscale model based on the coupling of ODEs and PDEs for blood flow simulations, Multiscale Model. Simul., 1 (2003), 173-195.  doi: 10.1137/S1540345902408482.  Google Scholar

[43] C. E. Rasmussen and C. K. I Williams, Gaussian Process for Machine Learning Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2006.   Google Scholar
[44]

E. Sadrfaridpour, S. Jeereddy, K. Kennedy, A. Luckow, T. Razzaghi and I. Safro, Algebraic multigrid support vector machines, preprint, (2016), arXiv: 1611.05487. Google Scholar

[45]

L. Shih, J. D. Rennie, Y. H. Chang and D. R. Karger, Text bundling: Statistics based data-reduction, in Proceedings of the 20th International Conference on Machine Learning (ICML-03), (2003), 696–703. Google Scholar

[46]

K. Stüben, A review of algebraic multigrid, in Numerical Analysis: Historical Developments in the 20th Century, Elsevier, (2001), 331–359. Google Scholar

[47]

S. Surjanovic and D. Bingham, Virtual Library of Simulation Experiments: Test Functions and Datasets, Retrieved January 23, 2019, from http://www.sfu.ca/ ssurjano. Google Scholar

[48]

A. D. Szlam, M. Maggioni, R. R. Coifman and J. C. Bremer Jr, Diffusion-driven multiscale analysis on manifolds and graphs: top-down and bottom-up constructions, in Wavelets XI, International Society for Optics and Photonics, 5914 (2005), 59141D. Google Scholar

[49]

J. T. VogelsteinY. ParkT. OhyamaR. A. KerrJ. W. TrumanC. E. Priebe and M. Zlatic, Discovery of brainwide neural-behavioral maps via multiscale unsupervised structure learning, Science, 344 (2014), 386-392.   Google Scholar

[50] H. Wendland, Scattered Data Approximation, Cambridge Monographs on Applied and Computational Mathematics, Vol. 17, Cambridge university press, 2005.   Google Scholar
[51]

Q. WuT. XiaC. ChenH.-Y. S. LinH. Wang and Y. Yu, Hierarchical tensor approximation of multi-dimensional visual data, IEEE Transactions on Visualization and Computer Graphics, 14 (2007), 186-199.   Google Scholar

Figure 4.  Scalewise performance of Algorithm 1 on TF2. The plots in the left column((a), (c), (e) and (g)) show the density of basis functions at each scale, selected while identifying the sparse representation. Rank here refers to the cardinality of basis function set $ B^s $. The right column ((b), (d), (f) and (h)) shows the corresponding scalewise reconstruction of the underlying function. The small dots show the samples from true function while star markers represent the sparse representation $ D^s_{sparse} $. The solid curve is the reconstruction from the sparse representation. The fraction in bracket (for example (11/200) for (b)) shows the number of data points chosen out of 200 as $ D^s_{sparse} $
Figure 1.  Univariate {(a): Gramancy and Lee function - Test function 1 (TF1); (b): 1-D Schwefel function - TF2} and multivariate {(c): Dropwave function - TF3; (d): 2-D Schwefel function - TF4} test functions considered for studying the performance of Algorithm 1
Figure 2.  Convergence behavior on the test functions measured in 2-norm prediction error on the observed data. Top row ((a) and (b)) shows the performance on univariate functions with bottom row for multivariate functions ((c) and (d)). Each of the plots also show the Critical scale ($ S_c $) and Convergence scale ($ S_a $) along with the $ \% $ of data sampled as the sparse representation ($ D^{s}_{sparse} $) at each scale
Figure 3.  Convergence measured as the decay of upper bound (equation (45)) to the inner product in the native Hilbert space between approximation $ A_sf $ and the approximation error $ E^s $ for considered univariate ((a): TF1 and (b): TF2) and multivariate ((c): TF3 and (d): TF4) test functions
Figure 5.  Scalewise performance of Algorithm 1 on TF3. Left column ((a), (c), (e) and (g)) shows the distribution of the sparse representation selected at multiple scales. Here we have shown the projection of the sparse representation (star markers) on the X-Y plain for ease of presentation. The right column ((b), (d), (f) and (h)) shows the corresponding reconstruction for the dropwave test function from the respective $ D^s_{sparse} $. Again the fraction in the header of plots (right column) shows the proportion of dataset chosen as $ D^s_{sparse} $
Figure 6.  Confidence and Prediction Intervals for reconstruction from $ D^s_{sparse} $ from scale 0 to scale 5 for TF2. Along with the approximation produced at these scales (solid curve), the plots also show the sparse representation selected (star marked points) with the $ 95\% $ t-confidence interval (thinner dark shaded region) and $ 95\% $t-prediction interval (broader light shaded region)
Figure 7.  (Ⅰ): Importance ranking for the sparse representation (important points) sampled at scale 0 for TF2 based on ordering governed by pivoted QR decomposition; (Ⅱ) Histogram of the top 3 most important points selected for TF2 (shown as a solid curve in each subplot). Here along the columns we have the increment in scale (abbreviated as S) and along the rows we have shown the histogram of the first, second and third most important point respectively (represented with abbreviation I for Importance)
Figure 8.  Comparative behavior of the decay of $ 2 $-$ norm $ error for the Multiscale extension algorithm from [5] (represented as 'Berm') and Algorithm 1 (represented as 'Shek') for the four test function TF1, TF2, TF3 and TF4 in (a), (b), (c) and (d) respectively
Figure 9.  Performance on unseen samples for TF1, TF2, TF3 and TF4 by Algorithm 1 ($ Shek $). Here plots (a), (c), (e) and (g) ((e) and (g) just show the projection on XY plane) show the distribution of the training and testing data (50/50 % split) with corresponding plots on the right hand side ((b), (d), (f) and (g)) showing the reconstructions obtained just using the training data. The header on ((b), (d), (f) and (h)) also show the corresponding testing error
Figure 10.  Performance of Algorithm 4 from [5] ($ Berm $) on TF1, TF2, TF3 and TF4 with same experimental setup (same training and testing data as (a), (c), (e) and (g) in figure 9). Here again the corresponding testing error is displayed as a header with corresponding reconstructions from training data
Figure 11.  Generalization analysis with repeated (100 times) random splitting of data into training and testing (again 50/50 split). Here we compare the mean and standard deviation of the Root Mean Square Error (RMSE) for prediction on testing data ($ D_{test} $) by Algorithm 4 from [5] ($ Berm $) and our Algorithm 1 ($ Shek $)
Figure 12.  (a): Contour plot for Digital Elevation Model (DEM) data with 60m resolution; (b): Training DEM obtained by sampling the nodes at even indices (from DEM in (a)) resulting in a 120m resolution DEM
Figure 13.  Scale dependent reconstructions for the DEM dataset in figure 12 (a). Here (a), (b) and (c) show the reconstructed contour plot for scales 0, 2 and 6 respectively. The header on these plots also show the the proportion of dataset selected as the sparse representation along with the $ \infty-norm $ of the reconstruction error
Figure 14.  Comparison of 60m DEM reconstruction by $ Shek $ (a) and $ Berm $ (b) using the 120m DEM training data shown in figure 12 (b). The header shows the RMSE for prediction of height at the testing locations (testing RMSE for $ Shek $: 2.74 and testing RMSE for $ Berm $: 10.7)
Algorithm 1 Hierarchical approach
1: INPUT:
      Hyperparameters: $ [(TOL \geq 0, T>0, P > 1) \in \mathbb{R}^3] $
      Dataset: $ [(X \in \mathbb{R}^{n \times d}) \subset \Omega $, $ (f|_{X} \in \mathbb{R}^n)] $
      Prediction points: $ [(X_{*} \in \mathbb{R}^{n^{*} \times d}) \subset \Omega] $
2: OUTPUT:
      Convergence Scale: $ [S_a \in \mathbb{N}] $
      Sparse Representation ($ D_{sparse}^{S_a} $): $ [(X_{S_a} \in \mathbb{R}^{\{l_{S_a} \times d \}}),(C_{S_a} \in \mathbb{R}^{l_{S_a}})] $
      Predictions: $ [P^* \in \mathbb{R}^{n^{*}}] $
3: Initialize: $ s =0 $
4: while $ TRUE $ do
5:      Compute covariance kernel: [$ G_s $ on $ X $ with $ (\epsilon_s = T/P^s $)]
6:      Compute numerical Rank: [$ l_s = rank(G_s) $]
7:      Remove sampling Bias: [$ (W = AG_{s}) $ with $ A \in \mathbb{R}^{k \times n} $ and $ (a_{i,j} \sim N(0,1)) $]
8:      Generate permutation information: [$ WP_R = QR $]
9:      Produce bases at scale s: $ [B^{s} = (G_{s}P_R)[:,1:l_s] $, \quad \text{with}\ $ B^{s} \in \mathbb{R}^{n \times l_s}] $
10:      Subset the sparse representation in $ X_s $
11:      Compute projection coordinate: [$ C_{s} = {(B^{s})}^{\dagger} f|_X $; $ {B^s}^{\dagger} = ({B^s}^TB^s)^{-1}{B^s}^T $]
12:      Generate approximation at s: [$ (A_sf)|_{X} = B^{s}C_s $]
13:      If $ (||f|_X- (A_sf)|_{X}||_2 \leq TOL) $ : $ S_a = s $; $ Break $
14:      Update scale: [$ s=s+1 $]
15: end while
16: Compute bases for prediction at $ X_{*} $: [$ G^{*}_{S_a} $ centered at $ X_{S_a} $with $ (\epsilon_{S_a} = T/P^{S_a} $)]
17: Predict: $ P^* = G^{*}_{S_a} C_{S_a} $
Algorithm 1 Hierarchical approach
1: INPUT:
      Hyperparameters: $ [(TOL \geq 0, T>0, P > 1) \in \mathbb{R}^3] $
      Dataset: $ [(X \in \mathbb{R}^{n \times d}) \subset \Omega $, $ (f|_{X} \in \mathbb{R}^n)] $
      Prediction points: $ [(X_{*} \in \mathbb{R}^{n^{*} \times d}) \subset \Omega] $
2: OUTPUT:
      Convergence Scale: $ [S_a \in \mathbb{N}] $
      Sparse Representation ($ D_{sparse}^{S_a} $): $ [(X_{S_a} \in \mathbb{R}^{\{l_{S_a} \times d \}}),(C_{S_a} \in \mathbb{R}^{l_{S_a}})] $
      Predictions: $ [P^* \in \mathbb{R}^{n^{*}}] $
3: Initialize: $ s =0 $
4: while $ TRUE $ do
5:      Compute covariance kernel: [$ G_s $ on $ X $ with $ (\epsilon_s = T/P^s $)]
6:      Compute numerical Rank: [$ l_s = rank(G_s) $]
7:      Remove sampling Bias: [$ (W = AG_{s}) $ with $ A \in \mathbb{R}^{k \times n} $ and $ (a_{i,j} \sim N(0,1)) $]
8:      Generate permutation information: [$ WP_R = QR $]
9:      Produce bases at scale s: $ [B^{s} = (G_{s}P_R)[:,1:l_s] $, \quad \text{with}\ $ B^{s} \in \mathbb{R}^{n \times l_s}] $
10:      Subset the sparse representation in $ X_s $
11:      Compute projection coordinate: [$ C_{s} = {(B^{s})}^{\dagger} f|_X $; $ {B^s}^{\dagger} = ({B^s}^TB^s)^{-1}{B^s}^T $]
12:      Generate approximation at s: [$ (A_sf)|_{X} = B^{s}C_s $]
13:      If $ (||f|_X- (A_sf)|_{X}||_2 \leq TOL) $ : $ S_a = s $; $ Break $
14:      Update scale: [$ s=s+1 $]
15: end while
16: Compute bases for prediction at $ X_{*} $: [$ G^{*}_{S_a} $ centered at $ X_{S_a} $with $ (\epsilon_{S_a} = T/P^{S_a} $)]
17: Predict: $ P^* = G^{*}_{S_a} C_{S_a} $
Table 1.  Comparison of training and prediction time for Algorithm 1 ($ Shek $) and Algorithm 4 from [5] ($ Berm $). For training analysis, we have considered the case of different training data sizes ($ |X| $) for all test functions. Correspondingly, for testing prediction latency, we again have sets $ |X_{pred}| $ of different cardinality. For each case, the approach taking shorter time has been boldfaced
Training time (s) Prediction time (s)
$ |X| $ $ Shek $ $ Berm $ $ |X_{pred}| $ $ Shek $ $ Berm $
$ TF1 $ 50 4.61e-03 3.25e-02 100 1.47e-02 7.22e-02
100 9.25e-03 1.40e-02 200 4.15e-02 3.09e-01
150 1.60e-02 2.93e-02 300 6.21e-02 5.82e-01
200 2.55e-02 5.63e-02 400 8.18e-02 1.08
250 3.77e-02 9.29e-02 500 1.12e-01 1.81
300 5.64e-02 1.25e-01 600 1.31e-01 2.38
350 6.51e-02 1.93e-01 700 1.48e-01 3.56
400 8.36e-02 2.21e-01 800 1.70e-01 4.15
$ TF2 $ 50 5.68e-03 9.21e-03 100 2.17e-02 8.35e-02
100 1.35e-02 1.49e-02 200 6.75e-02 3.00e-01
150 2.86e-02 2.80e-02 300 1.53e-01 6.36e-01
200 5.28e-02 5.27e-02 400 2.71e-01 1.07
250 8.97e-02 8.94e-02 500 4.09e-01 1.81
300 1.15e-01 1.23e-01 600 5.91e-01 2.29
350 1.86e-01 1.75e-01 700 8.06e-01 3.52
400 2.12e-01 2.17e-01 800 1.08 4.10
$ TF3 $ 900 8.39e-01 7.98e-01 1225 3.13 8.67
1600 4.19 6.40 2025 5.29 2.14e+01
2500 2.76e+01 2.67e+01 3025 7.91 3.47e+01
3600 7.19e+01 8.24e+01 4225 1.16e+01 4.95e+01
4900 1.72e+02 1.99e+02 5625 1.59e+01 6.87e+01
6400 3.84e+02 4.44e+02 7225 2.05e+01 8.73e+01
8100 7.71e+02 8.88e+02 9025 2.75e+01 1.08e+02
10000 1.44e+03 1.88e+03 11025 3.27e+01 2.32e+02
$ TF4 $ 900 1.42 8.70e-01 1225 4.10 1.53e+01
1600 6.94 8.68 2025 1.05e+01 3.21e+01
2500 3.51e+01 3.44e+01 3025 2.25e+01 5.44e+01
3600 1.27e+02 1.27e+02 4225 5.25e+01 1.32e+02
4900 2.94e+02 2.95e+02 5625 8.78e+01 2.02e+02
6400 7.90e+02 7.94e+02 7225 1.52e+02 3.96e+02
8100 1.49e+03 1.49e+03 9025 2.24e+02 5.37e+02
10000 2.69e+03 2.69e+03 11025 3.37e+02 6.90e+02
Training time (s) Prediction time (s)
$ |X| $ $ Shek $ $ Berm $ $ |X_{pred}| $ $ Shek $ $ Berm $
$ TF1 $ 50 4.61e-03 3.25e-02 100 1.47e-02 7.22e-02
100 9.25e-03 1.40e-02 200 4.15e-02 3.09e-01
150 1.60e-02 2.93e-02 300 6.21e-02 5.82e-01
200 2.55e-02 5.63e-02 400 8.18e-02 1.08
250 3.77e-02 9.29e-02 500 1.12e-01 1.81
300 5.64e-02 1.25e-01 600 1.31e-01 2.38
350 6.51e-02 1.93e-01 700 1.48e-01 3.56
400 8.36e-02 2.21e-01 800 1.70e-01 4.15
$ TF2 $ 50 5.68e-03 9.21e-03 100 2.17e-02 8.35e-02
100 1.35e-02 1.49e-02 200 6.75e-02 3.00e-01
150 2.86e-02 2.80e-02 300 1.53e-01 6.36e-01
200 5.28e-02 5.27e-02 400 2.71e-01 1.07
250 8.97e-02 8.94e-02 500 4.09e-01 1.81
300 1.15e-01 1.23e-01 600 5.91e-01 2.29
350 1.86e-01 1.75e-01 700 8.06e-01 3.52
400 2.12e-01 2.17e-01 800 1.08 4.10
$ TF3 $ 900 8.39e-01 7.98e-01 1225 3.13 8.67
1600 4.19 6.40 2025 5.29 2.14e+01
2500 2.76e+01 2.67e+01 3025 7.91 3.47e+01
3600 7.19e+01 8.24e+01 4225 1.16e+01 4.95e+01
4900 1.72e+02 1.99e+02 5625 1.59e+01 6.87e+01
6400 3.84e+02 4.44e+02 7225 2.05e+01 8.73e+01
8100 7.71e+02 8.88e+02 9025 2.75e+01 1.08e+02
10000 1.44e+03 1.88e+03 11025 3.27e+01 2.32e+02
$ TF4 $ 900 1.42 8.70e-01 1225 4.10 1.53e+01
1600 6.94 8.68 2025 1.05e+01 3.21e+01
2500 3.51e+01 3.44e+01 3025 2.25e+01 5.44e+01
3600 1.27e+02 1.27e+02 4225 5.25e+01 1.32e+02
4900 2.94e+02 2.95e+02 5625 8.78e+01 2.02e+02
6400 7.90e+02 7.94e+02 7225 1.52e+02 3.96e+02
8100 1.49e+03 1.49e+03 9025 2.24e+02 5.37e+02
10000 2.69e+03 2.69e+03 11025 3.37e+02 6.90e+02
[1]

Tyrus Berry, Timothy Sauer. Consistent manifold representation for topological data analysis. Foundations of Data Science, 2019, 1 (1) : 1-38. doi: 10.3934/fods.2019001

[2]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

[3]

Andreas Chirstmann, Qiang Wu, Ding-Xuan Zhou. Preface to the special issue on analysis in machine learning and data science. Communications on Pure & Applied Analysis, 2020, 19 (8) : i-iii. doi: 10.3934/cpaa.2020171

[4]

Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008

[5]

Kanghui Guo and Demetrio Labate. Sparse shearlet representation of Fourier integral operators. Electronic Research Announcements, 2007, 14: 7-19. doi: 10.3934/era.2007.14.7

[6]

Georg Vossen, Stefan Volkwein. Model reduction techniques with a-posteriori error analysis for linear-quadratic optimal control problems. Numerical Algebra, Control & Optimization, 2012, 2 (3) : 465-485. doi: 10.3934/naco.2012.2.465

[7]

Norikazu Saito. Error analysis of a conservative finite-element approximation for the Keller-Segel system of chemotaxis. Communications on Pure & Applied Analysis, 2012, 11 (1) : 339-364. doi: 10.3934/cpaa.2012.11.339

[8]

Rua Murray. Approximation error for invariant density calculations. Discrete & Continuous Dynamical Systems - A, 1998, 4 (3) : 535-557. doi: 10.3934/dcds.1998.4.535

[9]

Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan. Multiple-instance learning for text categorization based on semantic representation. Big Data & Information Analytics, 2017, 2 (1) : 69-75. doi: 10.3934/bdia.2017009

[10]

Eduardo Casas, Mariano Mateos, Arnd Rösch. Finite element approximation of sparse parabolic control problems. Mathematical Control & Related Fields, 2017, 7 (3) : 393-417. doi: 10.3934/mcrf.2017014

[11]

Hang Xu, Song Li. Analysis non-sparse recovery for relaxed ALASSO. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4055-4068. doi: 10.3934/cpaa.2020179

[12]

Shuhua Xu, Fei Gao. Weighted two-phase supervised sparse representation based on Gaussian for face recognition. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1385-1400. doi: 10.3934/dcdss.2015.8.1385

[13]

Ralf Banisch, Carsten Hartmann. A sparse Markov chain approximation of LQ-type stochastic control problems. Mathematical Control & Related Fields, 2016, 6 (3) : 363-389. doi: 10.3934/mcrf.2016007

[14]

Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020

[15]

Eduardo Casas, Boris Vexler, Enrique Zuazua. Sparse initial data identification for parabolic PDE and its finite element approximations. Mathematical Control & Related Fields, 2015, 5 (3) : 377-399. doi: 10.3934/mcrf.2015.5.377

[16]

Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111

[17]

Xiangmin Zhang. User perceived learning from interactive searching on big medical literature data. Big Data & Information Analytics, 2018  doi: 10.3934/bdia.2017019

[18]

M. Sumon Hossain, M. Monir Uddin. Iterative methods for solving large sparse Lyapunov equations and application to model reduction of index 1 differential-algebraic-equations. Numerical Algebra, Control & Optimization, 2019, 9 (2) : 173-186. doi: 10.3934/naco.2019013

[19]

Xiaoying Han, Jinglai Li, Dongbin Xiu. Error analysis for numerical formulation of particle filter. Discrete & Continuous Dynamical Systems - B, 2015, 20 (5) : 1337-1354. doi: 10.3934/dcdsb.2015.20.1337

[20]

Anders C. Hansen. A theoretical framework for backward error analysis on manifolds. Journal of Geometric Mechanics, 2011, 3 (1) : 81-111. doi: 10.3934/jgm.2011.3.81

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]