September  2022, 4(3): 423-440. doi: 10.3934/fods.2022012

Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication

1. 

Department of Mathematics, Chair of Applied Functional Analysis, TU Chemnitz, Germany

2. 

Department of Mathematics, Chair of Scientific Computing, TU Chemnitz, Germany

*Corresponding author: Theresa Wagner

Received  November 2021 Revised  March 2022 Published  September 2022 Early access  July 2022

Kernel matrices are crucial in many learning tasks such as support vector machines or kernel ridge regression. The kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all of its entries in reasonable time becomes a challenging task. For such dense matrices the cost of a matrix-vector product scales quadratically with the dimensionality $ N $, if no customized methods are applied. We propose the use of an ANOVA kernel, where we construct several kernels based on lower-dimensional feature spaces for which we provide fast algorithms realizing the matrix-vector products. We employ the non-equispaced fast Fourier transform (NFFT), which is of linear complexity for fixed accuracy. Based on a feature grouping approach, we then show how the fast matrix-vector products can be embedded into a learning method choosing kernel ridge regression and the conjugate gradient solver. We illustrate the performance of our approach on several data sets.

Citation: Franziska Nestler, Martin Stoll, Theresa Wagner. Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication. Foundations of Data Science, 2022, 4 (3) : 423-440. doi: 10.3934/fods.2022012
References:
[1]

A. AdeyemoH. Wimmer and L. M. Powell, Effects of normalization techniques on logistic regression in data science, Journal of Information Systems Applied Research, 12 (2019), 37. 

[2]

D. AlfkeD. PottsM. Stoll and T. Volkmer, NFFT meets krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks, Frontiers in Applied Mathematics and Statistics, 4 (2018), 61.  doi: 10.3389/fams.2018.00061.

[3]

H. AvronK. L. Clarkson and D. P. Woodruff, Faster kernel ridge regression using sketching and preconditioning, SIAM J. Matrix Anal. Appl., 38 (2017), 1116-1138.  doi: 10.1137/16M1105396.

[4]

P. BaldiP. Sadowski and D. Whiteson, Searching for exotic particles in high-energy physics with deep learning, Nature Communications, 5 (2014), 1-9.  doi: 10.1038/ncomms5308.

[5]

R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5 (1994), 537-550.  doi: 10.1109/72.298224.

[6]

G. Beylkin, On the Fast Fourier Transform of Functions with Singularities, Applied Computational Harmonic Analysis, 2 (1995), 363-381.  doi: 10.1006/acha.1995.1026.

[7]

C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. doi: 10.1007/978-0-387-45528-0.

[8]

A. R. T. DondersG. J. M. G. Van Der HeijdenT. Stijnen and K. G. M. Moons, A gentle introduction to imputation of missing values, Journal of Clinical Epidemiology, 59 (2006), 1087-1091.  doi: 10.1016/j.jclinepi.2006.01.014.

[9]

A. J. W. Duijndam and M. A. Schonewille, Nonuniform fast Fourier transform, GEOPHYSICS, 64 (1999), 539-551.  doi: 10.1190/1.1444560.

[10]

A. Dutt and V. Rokhlin, Fast Fourier transforms for nonequispaced data, SIAM J. Sci. Comput., 14 (1993), 1368-1393.  doi: 10.1137/0914081.

[11]

M. Fenn and G. Steidl, Fast NFFT based summation of radial functions, Sampling Theory in Signal and Image Processing, 3 (2004), 1-28. 

[12]

G. H. Golub and C. F. Van Loan, Matrix Computations, JHU Press, 2013.

[13]

M. Gönen and E. Alpaydın, Multiple kernel learning algorithms, J. Mach. Learn. Res., 12 (2011), 2211-2268. 

[14]

M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, 49 (1952), 409-436.  doi: 10.6028/jres.049.044.

[15]

T. HofmannB. Schölkopf and A. J. Smola, Kernel methods in machine learning., Ann. Statist., 36 (2008), 1171-1220.  doi: 10.1214/009053607000000677.

[16]

J. Keiner, S. Kunis and D. Potts, Using NFFT3- a software library for various nonequispaced fast Fourier transforms, ACM Trans. Math. Software, 36 (2009), Article 19, 1–30. doi: 10.1145/1555386.1555388.

[17]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, arXiv preprint, arXiv: 1609.02907, 2016.

[18]

W. March, B. Xiao and G. Biros, ASKIT: Approximate skeletonization kernel-independent treecode in high dimensions, SIAM J. Sci. Comput., 37 (2014), A1089–A1110. doi: 10.1137/140989546.

[19]

A. I. MarquésV. García and J. S. Sánchez, On the suitability of resampling techniques for the class imbalance problem in credit scoring, Journal of the Operational Research Society, 64 (2013), 1060-1070.  doi: 10.1057/jors.2012.120.

[20]

V. I. Morariu, B. V. Srinivasan, V. C. Raykar, R. Duraiswami and L. S. Davis, Automatic online tuning for fast Gaussian summation, Advances in Neural Information Processing Systems, 21 (2008).

[21]

S. G. K. Patro and K. K. Sahu, Normalization: A preprocessing stage, International Advanced Research Journal in Science, Engineering and Technology, 2 (2015), 20–22. arXiv preprint, arXiv: 1503.06462, 2015. doi: 10.17148/IARJSET.2015.2305.

[22]

D. Potts and M. Schmischke, Approximations of high-dimensional periodic functions with Fourier-Based methods, SIAM J. Numer. Anal., 59 (2021), 2393-2429.  doi: 10.1137/20M1354921.

[23]

D. Potts and M. Schmischke, Learning multivariate functions with low-dimensional structures using polynomial bases, J. Comput. Appl. Math., 403 (2022), 113821, 19 pp. doi: 10.1016/j.cam.2021.113821.

[24]

D. Potts and G. Steidl, Fast summation at nonequispaced knots by NFFT, SIAM J. Sci. Comput., 24 (2003), 2013-2037.  doi: 10.1137/S1064827502400984.

[25]

D. PottsG. Steidl and A. Nieslony, Fast convolution with radial kernels at nonequispaced knots, Numer. Math., 98 (2004), 329-351.  doi: 10.1007/s00211-004-0538-5.

[26]

C. E. Rasmussen, Gaussian processes in machine learning, In Summer School on Machine Learning, Springer, 2003, 63–71. doi: 10.1007/978-3-540-28650-9_4.

[27]

V. C. Raykar and R. Duraiswami, Fast large scale Gaussian process regression using approximate matrix-vector products, In Learning Workshop, 2007.

[28]

Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, 2003. doi: 10.1137/1.9780898718003.

[29]

W. Sarle, comp. ai. neural-nets FAQ, Part 2 of 7: Learning, http://www.faqs.org/faqs/ai-faq/neural-nets/part2, (1997), (accessed: 22 February 2021).

[30]

B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.

[31]

J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. doi: 10.1017/CBO9780511809682.

[32]

G. Steidl, A note on fast Fourier transforms for nonequispaced grids, Adv. Comput. Math., 9 (1998), 337-353.  doi: 10.1023/A:1018901926283.

[33]

M. Stoll, A literature survey of matrix methods for data science, GAMM-Mitt., 43 (2020), e202000013, 26 pp.

[34]

H. Tanabe, T. B. Ho, C. H. Nguyen and S. Kawasaki, Simple but effective methods for combining kernels in computational biology, In 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, IEEE, 2008, 71–78. doi: 10.1109/RIVF.2008.4586335.

[35]

A. V. UzilovJ. M. Keegan and D. H. Mathews, Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change, BMC Bioinformatics, 7 (2006), 1-30.  doi: 10.1186/1471-2105-7-173.

[36]

A. G. Wilson, Z. Hu, R. Salakhutdinov and E. P. Xing, Deep kernel learning, In Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. (PMLR), 2016,370–378.

[37]

C. Yu, W. March, B. Xiao and G. Biros, INV-ASKIT: A parallel fast direct solver for kernel matrices, 2016 IEEE International Parallel and Distributed Processing Symposium, 2016,161–171. doi: 10.1109/IPDPS.2016.12.

[38]

A. Zheng and A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, O'Reilly Media, Inc., 2018.

show all references

References:
[1]

A. AdeyemoH. Wimmer and L. M. Powell, Effects of normalization techniques on logistic regression in data science, Journal of Information Systems Applied Research, 12 (2019), 37. 

[2]

D. AlfkeD. PottsM. Stoll and T. Volkmer, NFFT meets krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks, Frontiers in Applied Mathematics and Statistics, 4 (2018), 61.  doi: 10.3389/fams.2018.00061.

[3]

H. AvronK. L. Clarkson and D. P. Woodruff, Faster kernel ridge regression using sketching and preconditioning, SIAM J. Matrix Anal. Appl., 38 (2017), 1116-1138.  doi: 10.1137/16M1105396.

[4]

P. BaldiP. Sadowski and D. Whiteson, Searching for exotic particles in high-energy physics with deep learning, Nature Communications, 5 (2014), 1-9.  doi: 10.1038/ncomms5308.

[5]

R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5 (1994), 537-550.  doi: 10.1109/72.298224.

[6]

G. Beylkin, On the Fast Fourier Transform of Functions with Singularities, Applied Computational Harmonic Analysis, 2 (1995), 363-381.  doi: 10.1006/acha.1995.1026.

[7]

C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. doi: 10.1007/978-0-387-45528-0.

[8]

A. R. T. DondersG. J. M. G. Van Der HeijdenT. Stijnen and K. G. M. Moons, A gentle introduction to imputation of missing values, Journal of Clinical Epidemiology, 59 (2006), 1087-1091.  doi: 10.1016/j.jclinepi.2006.01.014.

[9]

A. J. W. Duijndam and M. A. Schonewille, Nonuniform fast Fourier transform, GEOPHYSICS, 64 (1999), 539-551.  doi: 10.1190/1.1444560.

[10]

A. Dutt and V. Rokhlin, Fast Fourier transforms for nonequispaced data, SIAM J. Sci. Comput., 14 (1993), 1368-1393.  doi: 10.1137/0914081.

[11]

M. Fenn and G. Steidl, Fast NFFT based summation of radial functions, Sampling Theory in Signal and Image Processing, 3 (2004), 1-28. 

[12]

G. H. Golub and C. F. Van Loan, Matrix Computations, JHU Press, 2013.

[13]

M. Gönen and E. Alpaydın, Multiple kernel learning algorithms, J. Mach. Learn. Res., 12 (2011), 2211-2268. 

[14]

M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, 49 (1952), 409-436.  doi: 10.6028/jres.049.044.

[15]

T. HofmannB. Schölkopf and A. J. Smola, Kernel methods in machine learning., Ann. Statist., 36 (2008), 1171-1220.  doi: 10.1214/009053607000000677.

[16]

J. Keiner, S. Kunis and D. Potts, Using NFFT3- a software library for various nonequispaced fast Fourier transforms, ACM Trans. Math. Software, 36 (2009), Article 19, 1–30. doi: 10.1145/1555386.1555388.

[17]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, arXiv preprint, arXiv: 1609.02907, 2016.

[18]

W. March, B. Xiao and G. Biros, ASKIT: Approximate skeletonization kernel-independent treecode in high dimensions, SIAM J. Sci. Comput., 37 (2014), A1089–A1110. doi: 10.1137/140989546.

[19]

A. I. MarquésV. García and J. S. Sánchez, On the suitability of resampling techniques for the class imbalance problem in credit scoring, Journal of the Operational Research Society, 64 (2013), 1060-1070.  doi: 10.1057/jors.2012.120.

[20]

V. I. Morariu, B. V. Srinivasan, V. C. Raykar, R. Duraiswami and L. S. Davis, Automatic online tuning for fast Gaussian summation, Advances in Neural Information Processing Systems, 21 (2008).

[21]

S. G. K. Patro and K. K. Sahu, Normalization: A preprocessing stage, International Advanced Research Journal in Science, Engineering and Technology, 2 (2015), 20–22. arXiv preprint, arXiv: 1503.06462, 2015. doi: 10.17148/IARJSET.2015.2305.

[22]

D. Potts and M. Schmischke, Approximations of high-dimensional periodic functions with Fourier-Based methods, SIAM J. Numer. Anal., 59 (2021), 2393-2429.  doi: 10.1137/20M1354921.

[23]

D. Potts and M. Schmischke, Learning multivariate functions with low-dimensional structures using polynomial bases, J. Comput. Appl. Math., 403 (2022), 113821, 19 pp. doi: 10.1016/j.cam.2021.113821.

[24]

D. Potts and G. Steidl, Fast summation at nonequispaced knots by NFFT, SIAM J. Sci. Comput., 24 (2003), 2013-2037.  doi: 10.1137/S1064827502400984.

[25]

D. PottsG. Steidl and A. Nieslony, Fast convolution with radial kernels at nonequispaced knots, Numer. Math., 98 (2004), 329-351.  doi: 10.1007/s00211-004-0538-5.

[26]

C. E. Rasmussen, Gaussian processes in machine learning, In Summer School on Machine Learning, Springer, 2003, 63–71. doi: 10.1007/978-3-540-28650-9_4.

[27]

V. C. Raykar and R. Duraiswami, Fast large scale Gaussian process regression using approximate matrix-vector products, In Learning Workshop, 2007.

[28]

Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, 2003. doi: 10.1137/1.9780898718003.

[29]

W. Sarle, comp. ai. neural-nets FAQ, Part 2 of 7: Learning, http://www.faqs.org/faqs/ai-faq/neural-nets/part2, (1997), (accessed: 22 February 2021).

[30]

B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.

[31]

J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. doi: 10.1017/CBO9780511809682.

[32]

G. Steidl, A note on fast Fourier transforms for nonequispaced grids, Adv. Comput. Math., 9 (1998), 337-353.  doi: 10.1023/A:1018901926283.

[33]

M. Stoll, A literature survey of matrix methods for data science, GAMM-Mitt., 43 (2020), e202000013, 26 pp.

[34]

H. Tanabe, T. B. Ho, C. H. Nguyen and S. Kawasaki, Simple but effective methods for combining kernels in computational biology, In 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, IEEE, 2008, 71–78. doi: 10.1109/RIVF.2008.4586335.

[35]

A. V. UzilovJ. M. Keegan and D. H. Mathews, Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change, BMC Bioinformatics, 7 (2006), 1-30.  doi: 10.1186/1471-2105-7-173.

[36]

A. G. Wilson, Z. Hu, R. Salakhutdinov and E. P. Xing, Deep kernel learning, In Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. (PMLR), 2016,370–378.

[37]

C. Yu, W. March, B. Xiao and G. Biros, INV-ASKIT: A parallel fast direct solver for kernel matrices, 2016 IEEE International Parallel and Distributed Processing Symposium, 2016,161–171. doi: 10.1109/IPDPS.2016.12.

[38]

A. Zheng and A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, O'Reilly Media, Inc., 2018.

Figure 1.  Kernel-vector multiplication with and without NFFT-approach on various balanced samples of different size on synthetic data, with $ \sigma = 100 $ and $ n_{\text{runs}} = 100 $
Figure 2.  Embedding 2-dimensional data into a 3-dimensional space via an embedding map
Figure 3.  The univariate kernel function $ \kappa $, here a Gaussian, is truncated and a polynomial $ p $ is constructed such that we end up with a periodic function $ \tilde\kappa $, which is smooth up to a certain degree. In higher-dimensional settings, the kernel function is truncated at $ \|{\mathit{\boldsymbol{ r }}}\| = L $, i. e., radially. The resulting periodic function will then have the period $ h $ with respect to each dimension
Figure 4.  Accuracy and runtime of GridSearch on various balanced samples of different size
Table 1.  Benchmark data sets for numerical experiments
data set $ {N} $ $ {d} $ number of data points in under-represented class largest balanced sample possible
Telescope2 19020 10 6688 13376
HIGGS3 11000000 28 5170877 10341754
SUSY4 5000000 18 2287827 4575654
YearPredictionMSD5 515345 90 206945 413890
cod-rna6 488565 8 162855 2098847
data set $ {N} $ $ {d} $ number of data points in under-represented class largest balanced sample possible
Telescope2 19020 10 6688 13376
HIGGS3 11000000 28 5170877 10341754
SUSY4 5000000 18 2287827 4575654
YearPredictionMSD5 515345 90 206945 413890
cod-rna6 488565 8 162855 2098847
Table 2.  Parameter-grids for the considered classifiers
classifier kernel coefficient regularization parameter
NFFTKernelRidge $ \sigma: [10^{-3}, 10^{-2}, 10^{-1}, 1, 10^{1}, 10^{2}, 10^{3}] $ $ \lambda: [1, 10^{1}, 10^{2}, 10^{3}] $
sklearn KRR $ \gamma: [10^{6}, 10^{4}, 10^{2}, 1, 10^{-2}, 10^{-4}, 10^{-6}] $ $ \alpha: [1, 10^{1}, 10^{2}, 10^{3}] $
sklearn SVC $ \gamma: [10^{6}, 10^{4}, 10^{2}, 1, 10^{-2}, 10^{-4}, 10^{-6}] $ $ C: [1, 10^{-1}, 10^{-2}, 10^{-3}] $
classifier kernel coefficient regularization parameter
NFFTKernelRidge $ \sigma: [10^{-3}, 10^{-2}, 10^{-1}, 1, 10^{1}, 10^{2}, 10^{3}] $ $ \lambda: [1, 10^{1}, 10^{2}, 10^{3}] $
sklearn KRR $ \gamma: [10^{6}, 10^{4}, 10^{2}, 1, 10^{-2}, 10^{-4}, 10^{-6}] $ $ \alpha: [1, 10^{1}, 10^{2}, 10^{3}] $
sklearn SVC $ \gamma: [10^{6}, 10^{4}, 10^{2}, 1, 10^{-2}, 10^{-4}, 10^{-6}] $ $ C: [1, 10^{-1}, 10^{-2}, 10^{-3}] $
Table 3.  Telescope Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 69.6 74.2 57.0 0.1091 0.0026 0.0008 0.0121 0.0011 0.0003 0.1212 0.0036 0.0011
500 80.4 78.1 76.6 0.1600 0.0075 0.0036 0.0168 0.0029 0.0021 0.1768 0.0104 0.0056
1000 81.3 79.7 78.3 0.2104 0.0121 0.0125 0.0225 0.0089 0.0082 0.2328 0.0210 0.0207
5000* 83.2 82.0 81.4 0.4207 0.2086 0.1694 0.0376 0.1138 0.1300 0.4583 0.3224 0.2994
10000* 83.9 83.8 83.7 0.8128 0.8929 0.7027 0.0675 0.3991 0.5210 0.8803 1.2929 1.2237
13376* 83.9 83.8 83.6 1.2122 1.7200 1.2932 0.0978 0.7063 0.8661 1.3100 2.4264 2.1594
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 69.6 74.2 57.0 0.1091 0.0026 0.0008 0.0121 0.0011 0.0003 0.1212 0.0036 0.0011
500 80.4 78.1 76.6 0.1600 0.0075 0.0036 0.0168 0.0029 0.0021 0.1768 0.0104 0.0056
1000 81.3 79.7 78.3 0.2104 0.0121 0.0125 0.0225 0.0089 0.0082 0.2328 0.0210 0.0207
5000* 83.2 82.0 81.4 0.4207 0.2086 0.1694 0.0376 0.1138 0.1300 0.4583 0.3224 0.2994
10000* 83.9 83.8 83.7 0.8128 0.8929 0.7027 0.0675 0.3991 0.5210 0.8803 1.2929 1.2237
13376* 83.9 83.8 83.6 1.2122 1.7200 1.2932 0.0978 0.7063 0.8661 1.3100 2.4264 2.1594
Table 4.  HIGGS Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 55.8 54.4 40.4 0.2624 0.0020 0.0011 0.0344 0.0008 0.0006 0.2968 0.0028 0.0017
500 58.8 55.7 54.3 0.3561 0.0048 0.0031 0.0392 0.0020 0.0020 0.3953 0.0068 0.0051
1000 62.1 60.5 59.3 0.4325 0.0133 0.0107 0.0450 0.0040 0.0080 0.4776 0.0173 0.0187
5000* 63.3 62.9 62.0 1.0610 0.1914 0.2557 0.1056 0.1056 0.2084 1.1666 0.2970 0.4641
10000* 66.7 65.5 65.0 2.1512 0.8663 1.0339 0.1992 0.3673 0.7972 2.3504 1.2336 1.8312
25000* 64.8 67.0 66.5 6.3951 10.4802 9.4575 0.4987 7.5416 4.3869 6.8938 18.0218 13.8443
50000* 65.5 - 67.3 14.4947 - 56.6365 0.9397 - 20.4737 15.4344 - 77.1102
100000* 67.1 - 67.9 34.6103 - 197.9109 1.8479 - 84.8191 36.4582 - 282.7300
200000* 68.6 - 68.5 105.0926 - 991.7101 4.1666 - 337.1600 109.2592 - 1328.8701
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 55.8 54.4 40.4 0.2624 0.0020 0.0011 0.0344 0.0008 0.0006 0.2968 0.0028 0.0017
500 58.8 55.7 54.3 0.3561 0.0048 0.0031 0.0392 0.0020 0.0020 0.3953 0.0068 0.0051
1000 62.1 60.5 59.3 0.4325 0.0133 0.0107 0.0450 0.0040 0.0080 0.4776 0.0173 0.0187
5000* 63.3 62.9 62.0 1.0610 0.1914 0.2557 0.1056 0.1056 0.2084 1.1666 0.2970 0.4641
10000* 66.7 65.5 65.0 2.1512 0.8663 1.0339 0.1992 0.3673 0.7972 2.3504 1.2336 1.8312
25000* 64.8 67.0 66.5 6.3951 10.4802 9.4575 0.4987 7.5416 4.3869 6.8938 18.0218 13.8443
50000* 65.5 - 67.3 14.4947 - 56.6365 0.9397 - 20.4737 15.4344 - 77.1102
100000* 67.1 - 67.9 34.6103 - 197.9109 1.8479 - 84.8191 36.4582 - 282.7300
200000* 68.6 - 68.5 105.0926 - 991.7101 4.1666 - 337.1600 109.2592 - 1328.8701
Table 5.  SUSY Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 62.2 62.0 55.6 0.1218 0.0009 0.0005 0.0159 0.0004 0.0001 0.1377 0.0014 0.0006
500 74.3 74.2 71.8 0.1754 0.0044 0.0028 0.0216 0.0029 0.0016 0.1969 0.0073 0.0044
1000 76.8 75.5 73.6 0.2427 0.0114 0.0091 0.0307 0.0039 0.0058 0.2734 0.0153 0.0149
5000* 77.6 77.3 76.9 0.8570 0.1879 0.2033 0.0775 0.0939 0.1614 0.9345 0.2817 0.3647
10000* 78.4 78.4 78.0 1.5382 1.0586 0.8214 0.1356 0.3487 0.6078 1.6737 1.4073 1.4291
50000* 78.9 79.4 79.2 9.1113 42.7143 31.1025 0.6305 9.0420 15.5091 9.7417 51.7563 46.6116
100000* 78.7 - 79.2 23.9190 - 148.1214 1.2657 - 62.3419 25.1847 - 210.4633
200000* 78.9 - 79.3 56.7895 - 732.4476 2.3603 - 255.1349 59.1497 - 987.5824
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 62.2 62.0 55.6 0.1218 0.0009 0.0005 0.0159 0.0004 0.0001 0.1377 0.0014 0.0006
500 74.3 74.2 71.8 0.1754 0.0044 0.0028 0.0216 0.0029 0.0016 0.1969 0.0073 0.0044
1000 76.8 75.5 73.6 0.2427 0.0114 0.0091 0.0307 0.0039 0.0058 0.2734 0.0153 0.0149
5000* 77.6 77.3 76.9 0.8570 0.1879 0.2033 0.0775 0.0939 0.1614 0.9345 0.2817 0.3647
10000* 78.4 78.4 78.0 1.5382 1.0586 0.8214 0.1356 0.3487 0.6078 1.6737 1.4073 1.4291
50000* 78.9 79.4 79.2 9.1113 42.7143 31.1025 0.6305 9.0420 15.5091 9.7417 51.7563 46.6116
100000* 78.7 - 79.2 23.9190 - 148.1214 1.2657 - 62.3419 25.1847 - 210.4633
200000* 78.9 - 79.3 56.7895 - 732.4476 2.3603 - 255.1349 59.1497 - 987.5824
Table 6.  YearPredictionMSD Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 46.0 50.8 42.6 0.6578 0.0011 0.0007 0.0899 0.0005 0.0006 0.7476 0.0016 0.0013
500 61.4 64.2 64.2 0.8981 0.0051 0.0064 0.1122 0.0022 0.0054 1.0103 0.0074 0.0118
1000 66.0 69.0 68.2 1.1817 0.0107 0.0218 0.1405 0.0066 0.0196 1.3221 0.0173 0.0413
5000* 70.3 71.7 71.7 3.2502 0.1861 0.5164 0.3882 0.0966 0.4705 3.6384 0.2827 0.9869
10000* 72.3 73.4 73.1 6.7636 0.8400 2.1104 0.7016 0.3625 1.8575 7.4651 1.2025 3.9679
50000* 73.4 74.8 74.8 53.3774 43.2971 122.4035 3.9307 9.1279 50.3554 57.3081 52.4250 172.7589
100000* 74.2 - 75.8 102.0414 - 426.8239 6.1424 - 201.7097 108.1838 - 628.5336
200000* 74.4 - 76.4 245.1812 - 2006.6694 12.3315 - 785.3145 257.5127 - 2791.9839
413890* 74.3 - 76.7 669.4028 - 7696.3063 24.4887 - 3335.6310 693.8915 - 11031.9373
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 46.0 50.8 42.6 0.6578 0.0011 0.0007 0.0899 0.0005 0.0006 0.7476 0.0016 0.0013
500 61.4 64.2 64.2 0.8981 0.0051 0.0064 0.1122 0.0022 0.0054 1.0103 0.0074 0.0118
1000 66.0 69.0 68.2 1.1817 0.0107 0.0218 0.1405 0.0066 0.0196 1.3221 0.0173 0.0413
5000* 70.3 71.7 71.7 3.2502 0.1861 0.5164 0.3882 0.0966 0.4705 3.6384 0.2827 0.9869
10000* 72.3 73.4 73.1 6.7636 0.8400 2.1104 0.7016 0.3625 1.8575 7.4651 1.2025 3.9679
50000* 73.4 74.8 74.8 53.3774 43.2971 122.4035 3.9307 9.1279 50.3554 57.3081 52.4250 172.7589
100000* 74.2 - 75.8 102.0414 - 426.8239 6.1424 - 201.7097 108.1838 - 628.5336
200000* 74.4 - 76.4 245.1812 - 2006.6694 12.3315 - 785.3145 257.5127 - 2791.9839
413890* 74.3 - 76.7 669.4028 - 7696.3063 24.4887 - 3335.6310 693.8915 - 11031.9373
Table 7.  cod-rna Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 78.0 77.2 72.2 0.0459 0.0010 0.0005 0.0066 0.0006 0.0002 0.0524 0.0016 0.0007
500 89.2 92.7 89.6 0.0671 0.0039 0.0021 0.0074 0.0023 0.0012 0.0745 0.0063 0.0034
1000 92.5 94.7 93.7 0.0882 0.0110 0.0071 0.0100 0.0037 0.0044 0.0982 0.0147 0.0116
5000* 95.4 95.7 95.6 0.3046 0.1788 0.1564 0.0274 0.0952 0.1031 0.3320 0.2740 0.2595
10000* 95.7 95.8 95.9 0.5864 0.8309 0.6044 0.0472 0.3723 0.4101 0.6336 1.2032 1.0145
50000* 95.7 95.9 96.0 3.9697 43.4380 17.9735 0.2376 9.5084 9.8435 4.2073 52.9464 27.8170
100000* 95.9 - 96.2 10.1743 - 73.8995 0.4686 - 39.0615 10.6430 - 112.9610
209884* 96.0 - 96.5 28.9638 - 422.5391 1.0058 - 170.0743 29.9696 - 592.6134
sample size best accuracy mean runtime fit mean runtime predict mean total runtime
NFFT KRR NFFT KRR NFFT KRR NFFT KRR
100 78.0 77.2 72.2 0.0459 0.0010 0.0005 0.0066 0.0006 0.0002 0.0524 0.0016 0.0007
500 89.2 92.7 89.6 0.0671 0.0039 0.0021 0.0074 0.0023 0.0012 0.0745 0.0063 0.0034
1000 92.5 94.7 93.7 0.0882 0.0110 0.0071 0.0100 0.0037 0.0044 0.0982 0.0147 0.0116
5000* 95.4 95.7 95.6 0.3046 0.1788 0.1564 0.0274 0.0952 0.1031 0.3320 0.2740 0.2595
10000* 95.7 95.8 95.9 0.5864 0.8309 0.6044 0.0472 0.3723 0.4101 0.6336 1.2032 1.0145
50000* 95.7 95.9 96.0 3.9697 43.4380 17.9735 0.2376 9.5084 9.8435 4.2073 52.9464 27.8170
100000* 95.9 - 96.2 10.1743 - 73.8995 0.4686 - 39.0615 10.6430 - 112.9610
209884* 96.0 - 96.5 28.9638 - 422.5391 1.0058 - 170.0743 29.9696 - 592.6134
[1]

Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176

[2]

Zhenglin Wang. Fast non-uniform Fourier transform based regularization for sparse-view large-size CT reconstruction. STEM Education, 2022, 2 (2) : 121-139. doi: 10.3934/steme.2022009

[3]

François Bolley, Arnaud Guillin, Xinyu Wang. Non ultracontractive heat kernel bounds by Lyapunov conditions. Discrete and Continuous Dynamical Systems, 2015, 35 (3) : 857-870. doi: 10.3934/dcds.2015.35.857

[4]

Semyon Yakubovich. The heat kernel and Heisenberg inequalities related to the Kontorovich-Lebedev transform. Communications on Pure and Applied Analysis, 2011, 10 (2) : 745-760. doi: 10.3934/cpaa.2011.10.745

[5]

Dan Zhu, Rosemary A. Renaut, Hongwei Li, Tianyou Liu. Fast non-convex low-rank matrix decomposition for separation of potential field data using minimal memory. Inverse Problems and Imaging, 2021, 15 (1) : 159-183. doi: 10.3934/ipi.2020076

[6]

Xin Yan, Hongmiao Zhu. A kernel-free fuzzy support vector machine with Universum. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021184

[7]

Ye Tian, Wei Yang, Gene Lai, Menghan Zhao. Predicting non-life insurer's insolvency using non-kernel fuzzy quadratic surface support vector machines. Journal of Industrial and Management Optimization, 2019, 15 (2) : 985-999. doi: 10.3934/jimo.2018081

[8]

Shengfan Zhou, Linshan Wang. Kernel sections for damped non-autonomous wave equations with critical exponent. Discrete and Continuous Dynamical Systems, 2003, 9 (2) : 399-412. doi: 10.3934/dcds.2003.9.399

[9]

Shu-Yu Hsu. Super fast vanishing solutions of the fast diffusion equation. Discrete and Continuous Dynamical Systems, 2020, 40 (9) : 5383-5414. doi: 10.3934/dcds.2020232

[10]

Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055

[11]

Ali Akgül. A new application of the reproducing kernel method. Discrete and Continuous Dynamical Systems - S, 2021, 14 (7) : 2041-2053. doi: 10.3934/dcdss.2020261

[12]

Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020

[13]

Esther S. Daus, Shi Jin, Liu Liu. Spectral convergence of the stochastic galerkin approximation to the boltzmann equation with multiple scales and large random perturbation in the collision kernel. Kinetic and Related Models, 2019, 12 (4) : 909-922. doi: 10.3934/krm.2019034

[14]

Alfredo Lorenzi, Eugenio Sinestrari. Identifying a BV-kernel in a hyperbolic integrodifferential equation. Discrete and Continuous Dynamical Systems, 2008, 21 (4) : 1199-1219. doi: 10.3934/dcds.2008.21.1199

[15]

Xiao-Qiang Zhao, Shengfan Zhou. Kernel sections for processes and nonautonomous lattice systems. Discrete and Continuous Dynamical Systems - B, 2008, 9 (3&4, May) : 763-785. doi: 10.3934/dcdsb.2008.9.763

[16]

Xin Guo, Lexin Li, Qiang Wu. Modeling interactive components by coordinate kernel polynomial models. Mathematical Foundations of Computing, 2020, 3 (4) : 263-277. doi: 10.3934/mfc.2020010

[17]

Sandra Carillo, Vanda Valente, Giorgio Vergara Caffarelli. Heat conduction with memory: A singular kernel problem. Evolution Equations and Control Theory, 2014, 3 (3) : 399-410. doi: 10.3934/eect.2014.3.399

[18]

Jian Luo, Xueqi Yang, Ye Tian, Wenwen Yu. Corporate and personal credit scoring via fuzzy non-kernel SVM with fuzzy within-class scatter. Journal of Industrial and Management Optimization, 2020, 16 (6) : 2743-2756. doi: 10.3934/jimo.2019078

[19]

Shengfan Zhou, Caidi Zhao, Yejuan Wang. Finite dimensionality and upper semicontinuity of compact kernel sections of non-autonomous lattice systems. Discrete and Continuous Dynamical Systems, 2008, 21 (4) : 1259-1277. doi: 10.3934/dcds.2008.21.1259

[20]

Shengfan Zhou, Jinwu Huang, Xiaoying Han. Compact kernel sections for dissipative non-autonomous Zakharov equation on infinite lattices. Communications on Pure and Applied Analysis, 2010, 9 (1) : 193-210. doi: 10.3934/cpaa.2010.9.193

 Impact Factor: 

Metrics

  • PDF downloads (53)
  • HTML views (28)
  • Cited by (0)

Other articles
by authors

[Back to Top]