Article Contents
Article Contents

# Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication

• *Corresponding author: Theresa Wagner
• Kernel matrices are crucial in many learning tasks such as support vector machines or kernel ridge regression. The kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all of its entries in reasonable time becomes a challenging task. For such dense matrices the cost of a matrix-vector product scales quadratically with the dimensionality $N$, if no customized methods are applied. We propose the use of an ANOVA kernel, where we construct several kernels based on lower-dimensional feature spaces for which we provide fast algorithms realizing the matrix-vector products. We employ the non-equispaced fast Fourier transform (NFFT), which is of linear complexity for fixed accuracy. Based on a feature grouping approach, we then show how the fast matrix-vector products can be embedded into a learning method choosing kernel ridge regression and the conjugate gradient solver. We illustrate the performance of our approach on several data sets.

Mathematics Subject Classification: Primary: 65F45, 65T50; Secondary: 62J10.

 Citation:

• Figure 1.  Kernel-vector multiplication with and without NFFT-approach on various balanced samples of different size on synthetic data, with $\sigma = 100$ and $n_{\text{runs}} = 100$

Figure 2.  Embedding 2-dimensional data into a 3-dimensional space via an embedding map

Figure 3.  The univariate kernel function $\kappa$, here a Gaussian, is truncated and a polynomial $p$ is constructed such that we end up with a periodic function $\tilde\kappa$, which is smooth up to a certain degree. In higher-dimensional settings, the kernel function is truncated at $\|{\mathit{\boldsymbol{ r }}}\| = L$, i. e., radially. The resulting periodic function will then have the period $h$ with respect to each dimension

Figure 4.  Accuracy and runtime of GridSearch on various balanced samples of different size

Table 1.  Benchmark data sets for numerical experiments

 data set ${N}$ ${d}$ number of data points in under-represented class largest balanced sample possible Telescope2 19020 10 6688 13376 HIGGS3 11000000 28 5170877 10341754 SUSY4 5000000 18 2287827 4575654 YearPredictionMSD5 515345 90 206945 413890 cod-rna6 488565 8 162855 2098847

Table 2.  Parameter-grids for the considered classifiers

 classifier kernel coefficient regularization parameter NFFTKernelRidge $\sigma: [10^{-3}, 10^{-2}, 10^{-1}, 1, 10^{1}, 10^{2}, 10^{3}]$ $\lambda: [1, 10^{1}, 10^{2}, 10^{3}]$ sklearn KRR $\gamma: [10^{6}, 10^{4}, 10^{2}, 1, 10^{-2}, 10^{-4}, 10^{-6}]$ $\alpha: [1, 10^{1}, 10^{2}, 10^{3}]$ sklearn SVC $\gamma: [10^{6}, 10^{4}, 10^{2}, 1, 10^{-2}, 10^{-4}, 10^{-6}]$ $C: [1, 10^{-1}, 10^{-2}, 10^{-3}]$

Table 3.  Telescope Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size

 sample size best accuracy mean runtime fit mean runtime predict mean total runtime NFFT KRR NFFT KRR NFFT KRR NFFT KRR 100 69.6 74.2 57.0 0.1091 0.0026 0.0008 0.0121 0.0011 0.0003 0.1212 0.0036 0.0011 500 80.4 78.1 76.6 0.1600 0.0075 0.0036 0.0168 0.0029 0.0021 0.1768 0.0104 0.0056 1000 81.3 79.7 78.3 0.2104 0.0121 0.0125 0.0225 0.0089 0.0082 0.2328 0.0210 0.0207 5000* 83.2 82.0 81.4 0.4207 0.2086 0.1694 0.0376 0.1138 0.1300 0.4583 0.3224 0.2994 10000* 83.9 83.8 83.7 0.8128 0.8929 0.7027 0.0675 0.3991 0.5210 0.8803 1.2929 1.2237 13376* 83.9 83.8 83.6 1.2122 1.7200 1.2932 0.0978 0.7063 0.8661 1.3100 2.4264 2.1594

Table 4.  HIGGS Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size

 sample size best accuracy mean runtime fit mean runtime predict mean total runtime NFFT KRR NFFT KRR NFFT KRR NFFT KRR 100 55.8 54.4 40.4 0.2624 0.0020 0.0011 0.0344 0.0008 0.0006 0.2968 0.0028 0.0017 500 58.8 55.7 54.3 0.3561 0.0048 0.0031 0.0392 0.0020 0.0020 0.3953 0.0068 0.0051 1000 62.1 60.5 59.3 0.4325 0.0133 0.0107 0.0450 0.0040 0.0080 0.4776 0.0173 0.0187 5000* 63.3 62.9 62.0 1.0610 0.1914 0.2557 0.1056 0.1056 0.2084 1.1666 0.2970 0.4641 10000* 66.7 65.5 65.0 2.1512 0.8663 1.0339 0.1992 0.3673 0.7972 2.3504 1.2336 1.8312 25000* 64.8 67.0 66.5 6.3951 10.4802 9.4575 0.4987 7.5416 4.3869 6.8938 18.0218 13.8443 50000* 65.5 - 67.3 14.4947 - 56.6365 0.9397 - 20.4737 15.4344 - 77.1102 100000* 67.1 - 67.9 34.6103 - 197.9109 1.8479 - 84.8191 36.4582 - 282.7300 200000* 68.6 - 68.5 105.0926 - 991.7101 4.1666 - 337.1600 109.2592 - 1328.8701

Table 5.  SUSY Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size

 sample size best accuracy mean runtime fit mean runtime predict mean total runtime NFFT KRR NFFT KRR NFFT KRR NFFT KRR 100 62.2 62.0 55.6 0.1218 0.0009 0.0005 0.0159 0.0004 0.0001 0.1377 0.0014 0.0006 500 74.3 74.2 71.8 0.1754 0.0044 0.0028 0.0216 0.0029 0.0016 0.1969 0.0073 0.0044 1000 76.8 75.5 73.6 0.2427 0.0114 0.0091 0.0307 0.0039 0.0058 0.2734 0.0153 0.0149 5000* 77.6 77.3 76.9 0.8570 0.1879 0.2033 0.0775 0.0939 0.1614 0.9345 0.2817 0.3647 10000* 78.4 78.4 78.0 1.5382 1.0586 0.8214 0.1356 0.3487 0.6078 1.6737 1.4073 1.4291 50000* 78.9 79.4 79.2 9.1113 42.7143 31.1025 0.6305 9.0420 15.5091 9.7417 51.7563 46.6116 100000* 78.7 - 79.2 23.9190 - 148.1214 1.2657 - 62.3419 25.1847 - 210.4633 200000* 78.9 - 79.3 56.7895 - 732.4476 2.3603 - 255.1349 59.1497 - 987.5824

Table 6.  YearPredictionMSD Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size

 sample size best accuracy mean runtime fit mean runtime predict mean total runtime NFFT KRR NFFT KRR NFFT KRR NFFT KRR 100 46.0 50.8 42.6 0.6578 0.0011 0.0007 0.0899 0.0005 0.0006 0.7476 0.0016 0.0013 500 61.4 64.2 64.2 0.8981 0.0051 0.0064 0.1122 0.0022 0.0054 1.0103 0.0074 0.0118 1000 66.0 69.0 68.2 1.1817 0.0107 0.0218 0.1405 0.0066 0.0196 1.3221 0.0173 0.0413 5000* 70.3 71.7 71.7 3.2502 0.1861 0.5164 0.3882 0.0966 0.4705 3.6384 0.2827 0.9869 10000* 72.3 73.4 73.1 6.7636 0.8400 2.1104 0.7016 0.3625 1.8575 7.4651 1.2025 3.9679 50000* 73.4 74.8 74.8 53.3774 43.2971 122.4035 3.9307 9.1279 50.3554 57.3081 52.4250 172.7589 100000* 74.2 - 75.8 102.0414 - 426.8239 6.1424 - 201.7097 108.1838 - 628.5336 200000* 74.4 - 76.4 245.1812 - 2006.6694 12.3315 - 785.3145 257.5127 - 2791.9839 413890* 74.3 - 76.7 669.4028 - 7696.3063 24.4887 - 3335.6310 693.8915 - 11031.9373

Table 7.  cod-rna Data Set - Accuracy and runtime of GridSearch on various balanced samples of different size

 sample size best accuracy mean runtime fit mean runtime predict mean total runtime NFFT KRR NFFT KRR NFFT KRR NFFT KRR 100 78.0 77.2 72.2 0.0459 0.0010 0.0005 0.0066 0.0006 0.0002 0.0524 0.0016 0.0007 500 89.2 92.7 89.6 0.0671 0.0039 0.0021 0.0074 0.0023 0.0012 0.0745 0.0063 0.0034 1000 92.5 94.7 93.7 0.0882 0.0110 0.0071 0.0100 0.0037 0.0044 0.0982 0.0147 0.0116 5000* 95.4 95.7 95.6 0.3046 0.1788 0.1564 0.0274 0.0952 0.1031 0.3320 0.2740 0.2595 10000* 95.7 95.8 95.9 0.5864 0.8309 0.6044 0.0472 0.3723 0.4101 0.6336 1.2032 1.0145 50000* 95.7 95.9 96.0 3.9697 43.4380 17.9735 0.2376 9.5084 9.8435 4.2073 52.9464 27.8170 100000* 95.9 - 96.2 10.1743 - 73.8995 0.4686 - 39.0615 10.6430 - 112.9610 209884* 96.0 - 96.5 28.9638 - 422.5391 1.0058 - 170.0743 29.9696 - 592.6134

Figures(4)

Tables(7)