MNIST | Training Errors | Test Errors | Sparsity |
Linear SVM classifier by ∞-norm margin | 0/9939 | 5/1967 | 654/785 |
Kernel SVM classifier by ∞-norm margin | 0/9939 | 4/1967 | 1760/9939 |
In this article, the classical support vector machine (SVM) classifiers are generalized by the non-Euclidean margins. We first extend the linear models of the SVM classifiers by the non-Euclidean margins including the theorems and algorithms of the SVM classifiers by the hard margins and the soft margins. Specially, the SVM classifiers by the $ \infty $-norm margins can be solved by the 1-norm optimization with sparsity. Next, we show that the non-linear models of the SVM classifiers by the $ q $-norm margins can be equivalently transferred to the SVM in the $ p $-norm reproducing kernel Banach spaces given by the hinge loss, where $ 1/p+1/q = 1 $. Finally, we illustrate the numerical examples of artificial data and real data to compare the different algorithms of the SVM classifiers by the $ \infty $-norm margin.
Citation: |
Figure 6. The geometrical interpretation of the kernel tricks. In the original space, the data set is not linear separable, see Figure 6a. After the feature mapping using kernel function, the data set is linear separable in higher dimensional space, see Figure 6b. Finally, the decision boundary in higher dimensional space can be projected into the original space and becomes the non-linear boundary in the original space
Table 1.
The results of experiments on MNIST. The row Linear SVM classifier by ∞-norm margin is the result obtained by (9) where
MNIST | Training Errors | Test Errors | Sparsity |
Linear SVM classifier by ∞-norm margin | 0/9939 | 5/1967 | 654/785 |
Kernel SVM classifier by ∞-norm margin | 0/9939 | 4/1967 | 1760/9939 |
Table 2.
The results of experiments on Handwritten Alphabets Database. The row Linear SVM classifier by ∞-norm margin is the result obtained by (9) where
Handwritten Alphabets | Training Errors | Test Errors | Sparsity |
Linear SVM> classifier by ∞-norm margin | 0/7000 | 69/3000 | 401/785 |
Kernel SVM classifier by ∞-norm margin | 0/7000 | 14/3000 | 1339/7000 |
[1] | B. E. Boser, I. M. Guyon and V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational learning theory, ACM, (1992), 144–152. doi: 10.1145/130385.130401. |
[2] | P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data. Methods, Theory and Applications, Springer Series in Statistics, Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-20192-9. |
[3] | L. Chen and H. Zhang, Statistical margin error bounds for l1-norm support vector machines, Neurocomputing, 339 (2019), 210-216. doi: 10.1016/j.neucom.2019.02.015. |
[4] | C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297. doi: 10.1007/BF00994018. |
[5] | R. Der and D. Lee, Large-margin classification in banach spaces, Journal of Machine Learning Research - Proceedings Track, 2 (2007), 91-98. |
[6] | I. Ekeland and T. Turnbull, Infinite-Dimensional Optimization and Convexity, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1983. |
[7] | T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics and Applied Probability, 143. CRC Press, Boca Raton, FL, 2015. |
[8] | L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100. |
[9] | O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters, 24 (1999), 15-23. doi: 10.1016/S0167-6377(98)00049-2. |
[10] | J. Platt, Sequential minimal optimization: A fast algorithm for training support vector machines., |
[11] | L. Q. Qi, H. B. Chen and Y. N.Chen, Tensor Eigenvalues and Their Applications, Advances in Mechanics and Mathematics, 39. Springer, Singapore, 2018. doi: 10.1007/978-981-10-8058-6. |
[12] | R. T. Rockafellar, Convex Analysis, Princeton Mathematical Series, No. 28, Princeton University Press, Princeton, N.J., 1970. |
[13] | B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001. |
[14] | G. H. Song and H. Z. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm Ⅱ: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729. doi: 10.1162/NECO_a_00178. |
[15] | G. H. Song, H. Z. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116. doi: 10.1016/j.acha.2012.03.009. |
[16] | I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008. |
[17] | V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. doi: 10.1007/978-1-4757-2440-0. |
[18] | Y. S. Xu and Q. Ye, Generalized mercer kernels and reproducing kernel banach spaces, Mem. Amer. Math. Soc., 258 (2019). doi: 10.1090/memo/1243. |
[19] | H. Yang, X. Yang, F. Zhang, Q. Ye and X. Fan, Infinite norm large margin classifier, International Journal of Machine Learning and Cybernetics, 10 (2019), 2449-2457. doi: 10.1007/s13042-018-0881-y. |
[20] | H. Z. Zhang, Y. S. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775. doi: 10.1109/IJCNN.2009.5179093. |
[21] | L. Zhang and W. Zhou, On the sparseness of 1-norm support vector machines, Neural Networks, 23 (2010), 373-385. doi: 10.1016/j.neunet.2009.11.012. |
[22] | J. Zhu, S. Rosset, R. Tibshirani and T. J. Hastie, 1-norm support vector machines, in Advances in Neural Information Processing Systems, (2004), 49–56. |