# American Institute of Mathematical Sciences

November  2020, 3(4): 279-300. doi: 10.3934/mfc.2020018

## Support vector machine classifiers by non-Euclidean margins

 School of Mathematical Sciences, South China Normal University, Guangzhou, Guangdong 510631, China

* Corresponding author: Qi Ye

Received  January 2020 Revised  May 2020 Published  June 2020

Fund Project: The first author is supported by the Innovation Project of Graduate School of South China Normal University. The second author is supported by the Natural Science Foundation of Guangdong Province (2019A1515011995, 2020B1515310013) and the National Natural Science Foundation of China (11931003)

In this article, the classical support vector machine (SVM) classifiers are generalized by the non-Euclidean margins. We first extend the linear models of the SVM classifiers by the non-Euclidean margins including the theorems and algorithms of the SVM classifiers by the hard margins and the soft margins. Specially, the SVM classifiers by the $\infty$-norm margins can be solved by the 1-norm optimization with sparsity. Next, we show that the non-linear models of the SVM classifiers by the $q$-norm margins can be equivalently transferred to the SVM in the $p$-norm reproducing kernel Banach spaces given by the hinge loss, where $1/p+1/q = 1$. Finally, we illustrate the numerical examples of artificial data and real data to compare the different algorithms of the SVM classifiers by the $\infty$-norm margin.

Citation: Ying Lin, Qi Ye. Support vector machine classifiers by non-Euclidean margins. Mathematical Foundations of Computing, 2020, 3 (4) : 279-300. doi: 10.3934/mfc.2020018
##### References:
 [1] B. E. Boser, I. M. Guyon and V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational learning theory, ACM, (1992), 144–152. doi: 10.1145/130385.130401.  Google Scholar [2] P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data. Methods, Theory and Applications, Springer Series in Statistics, Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-20192-9.  Google Scholar [3] L. Chen and H. Zhang, Statistical margin error bounds for l1-norm support vector machines, Neurocomputing, 339 (2019), 210-216.  doi: 10.1016/j.neucom.2019.02.015.  Google Scholar [4] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.  Google Scholar [5] R. Der and D. Lee, Large-margin classification in banach spaces, Journal of Machine Learning Research - Proceedings Track, 2 (2007), 91-98.   Google Scholar [6] I. Ekeland and T. Turnbull, Infinite-Dimensional Optimization and Convexity, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1983.   Google Scholar [7] T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics and Applied Probability, 143. CRC Press, Boca Raton, FL, 2015.   Google Scholar [8] L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100.  Google Scholar [9] O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters, 24 (1999), 15-23.  doi: 10.1016/S0167-6377(98)00049-2.  Google Scholar [10] J. Platt, Sequential minimal optimization: A fast algorithm for training support vector machines., Google Scholar [11] L. Q. Qi, H. B. Chen and Y. N.Chen, Tensor Eigenvalues and Their Applications, Advances in Mechanics and Mathematics, 39. Springer, Singapore, 2018. doi: 10.1007/978-981-10-8058-6.  Google Scholar [12] R. T. Rockafellar, Convex Analysis, Princeton Mathematical Series, No. 28, Princeton University Press, Princeton, N.J., 1970.   Google Scholar [13] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001.   Google Scholar [14] G. H. Song and H. Z. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm Ⅱ: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729.  doi: 10.1162/NECO_a_00178.  Google Scholar [15] G. H. Song, H. Z. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116.  doi: 10.1016/j.acha.2012.03.009.  Google Scholar [16] I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008.  Google Scholar [17] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. doi: 10.1007/978-1-4757-2440-0.  Google Scholar [18] Y. S. Xu and Q. Ye, Generalized mercer kernels and reproducing kernel banach spaces, Mem. Amer. Math. Soc., 258 (2019). doi: 10.1090/memo/1243.  Google Scholar [19] H. Yang, X. Yang, F. Zhang, Q. Ye and X. Fan, Infinite norm large margin classifier, International Journal of Machine Learning and Cybernetics, 10 (2019), 2449-2457.  doi: 10.1007/s13042-018-0881-y.  Google Scholar [20] H. Z. Zhang, Y. S. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775.  doi: 10.1109/IJCNN.2009.5179093.  Google Scholar [21] L. Zhang and W. Zhou, On the sparseness of 1-norm support vector machines, Neural Networks, 23 (2010), 373-385.  doi: 10.1016/j.neunet.2009.11.012.  Google Scholar [22] J. Zhu, S. Rosset, R. Tibshirani and T. J. Hastie, 1-norm support vector machines, in Advances in Neural Information Processing Systems, (2004), 49–56. Google Scholar

show all references

##### References:
 [1] B. E. Boser, I. M. Guyon and V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational learning theory, ACM, (1992), 144–152. doi: 10.1145/130385.130401.  Google Scholar [2] P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data. Methods, Theory and Applications, Springer Series in Statistics, Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-20192-9.  Google Scholar [3] L. Chen and H. Zhang, Statistical margin error bounds for l1-norm support vector machines, Neurocomputing, 339 (2019), 210-216.  doi: 10.1016/j.neucom.2019.02.015.  Google Scholar [4] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.  Google Scholar [5] R. Der and D. Lee, Large-margin classification in banach spaces, Journal of Machine Learning Research - Proceedings Track, 2 (2007), 91-98.   Google Scholar [6] I. Ekeland and T. Turnbull, Infinite-Dimensional Optimization and Convexity, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1983.   Google Scholar [7] T. Hastie, R. Tibshirani and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, Monographs on Statistics and Applied Probability, 143. CRC Press, Boca Raton, FL, 2015.   Google Scholar [8] L. Huang, C. Liu, L. Tan and Q. Ye, Generalized representer theorems in Banach spaces, Anal. Appl. (Singap.), (2019). doi: 10.1142/S0219530519410100.  Google Scholar [9] O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters, 24 (1999), 15-23.  doi: 10.1016/S0167-6377(98)00049-2.  Google Scholar [10] J. Platt, Sequential minimal optimization: A fast algorithm for training support vector machines., Google Scholar [11] L. Q. Qi, H. B. Chen and Y. N.Chen, Tensor Eigenvalues and Their Applications, Advances in Mechanics and Mathematics, 39. Springer, Singapore, 2018. doi: 10.1007/978-981-10-8058-6.  Google Scholar [12] R. T. Rockafellar, Convex Analysis, Princeton Mathematical Series, No. 28, Princeton University Press, Princeton, N.J., 1970.   Google Scholar [13] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, 2001.   Google Scholar [14] G. H. Song and H. Z. Zhang, Reproducing kernel Banach spaces with the $\ell^1$ norm Ⅱ: Error analysis for regularized least square regression, Neural Comput., 23 (2011), 2713-2729.  doi: 10.1162/NECO_a_00178.  Google Scholar [15] G. H. Song, H. Z. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the $\ell^1$ norm, Appl. Comput. Harmon. Anal., 34 (2013), 96-116.  doi: 10.1016/j.acha.2012.03.009.  Google Scholar [16] I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008.  Google Scholar [17] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. doi: 10.1007/978-1-4757-2440-0.  Google Scholar [18] Y. S. Xu and Q. Ye, Generalized mercer kernels and reproducing kernel banach spaces, Mem. Amer. Math. Soc., 258 (2019). doi: 10.1090/memo/1243.  Google Scholar [19] H. Yang, X. Yang, F. Zhang, Q. Ye and X. Fan, Infinite norm large margin classifier, International Journal of Machine Learning and Cybernetics, 10 (2019), 2449-2457.  doi: 10.1007/s13042-018-0881-y.  Google Scholar [20] H. Z. Zhang, Y. S. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), 2741-2775.  doi: 10.1109/IJCNN.2009.5179093.  Google Scholar [21] L. Zhang and W. Zhou, On the sparseness of 1-norm support vector machines, Neural Networks, 23 (2010), 373-385.  doi: 10.1016/j.neunet.2009.11.012.  Google Scholar [22] J. Zhu, S. Rosset, R. Tibshirani and T. J. Hastie, 1-norm support vector machines, in Advances in Neural Information Processing Systems, (2004), 49–56. Google Scholar
Examples of Euclidean margins and non-Euclidean margins are illustrated. The black line is the decision boundary. The distance of the gap between the two red dashed lines is the margin. We can see the difference between these two classifiers
The difference between hard margin and soft margin solutions
The geometrical interpretation of distance from a point to a plane
Maximal margin classifiers by 2 norm and $\infty$ norm in the case of 2 distinct points. We can see the difference between the abilities to obtain sparsity
The geometric explanation for infinite solutions for the SVM classifier by $\infty$-norm margin
The geometric explanation for unique solution for the SVM classifier by $\infty$-norm margin
The geometrical interpretation of the kernel tricks. In the original space, the data set is not linear separable, see Figure 6a. After the feature mapping using kernel function, the data set is linear separable in higher dimensional space, see Figure 6b. Finally, the decision boundary in higher dimensional space can be projected into the original space and becomes the non-linear boundary in the original space
Noiseless and noisy data sets. The classes overlaps in noisy data set
Three classifiers obtained by three different optimizations when $C = 0.5$, $\lambda = 2$
Two classifiers obtained by two different optimizations when $C = 0.1$, $\lambda = 10$
The comparison between handwritten digits 6 and 9. The first row is 6 and the second row is 9
The comparison between handwritten alphabets O and Q. The first row is O and the second row is Q
The results of experiments on MNIST. The row Linear SVM classifier by ∞-norm margin is the result obtained by (9) where $\lambda = 0.1$. The row Kernel SVM classifier by ∞-norm margin is the result obtained by (15) where $\lambda = 0.1$. The kernel function used here is Gaussian kernel $K(\boldsymbol x, \boldsymbol x') = \exp \frac{-\|\boldsymbol x- \boldsymbol x'\|_{2}^{2}}{\sigma^2}$ where $\sigma = 9.6$
 MNIST Training Errors Test Errors Sparsity Linear SVM classifier by ∞-norm margin 0/9939 5/1967 654/785 Kernel SVM classifier by ∞-norm margin 0/9939 4/1967 1760/9939
 MNIST Training Errors Test Errors Sparsity Linear SVM classifier by ∞-norm margin 0/9939 5/1967 654/785 Kernel SVM classifier by ∞-norm margin 0/9939 4/1967 1760/9939
The results of experiments on Handwritten Alphabets Database. The row Linear SVM classifier by ∞-norm margin is the result obtained by (9) where $\lambda = 0.001$. The row Kernel SVM classifier by ∞-norm margin is the result obtained by (15) where $\lambda = 0.001$. The kernel function used here is Gaussian kernel $K(\boldsymbol x, \boldsymbol x') = \exp \frac{-\|\boldsymbol x- \boldsymbol x'\|_{2}^{2}}{\sigma^2}$ where $\sigma = 2600$
 Handwritten Alphabets Training Errors Test Errors Sparsity Linear SVM> classifier by ∞-norm margin 0/7000 69/3000 401/785 Kernel SVM classifier by ∞-norm margin 0/7000 14/3000 1339/7000
 Handwritten Alphabets Training Errors Test Errors Sparsity Linear SVM> classifier by ∞-norm margin 0/7000 69/3000 401/785 Kernel SVM classifier by ∞-norm margin 0/7000 14/3000 1339/7000
 [1] Valery Y. Glizer. Novel Conditions of Euclidean space controllability for singularly perturbed systems with input delay. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 307-320. doi: 10.3934/naco.2020027 [2] Yves Dumont, Frederic Chiroleu. Vector control for the Chikungunya disease. Mathematical Biosciences & Engineering, 2010, 7 (2) : 313-345. doi: 10.3934/mbe.2010.7.313 [3] Xue-Ping Luo, Yi-Bin Xiao, Wei Li. Strict feasibility of variational inclusion problems in reflexive Banach spaces. Journal of Industrial & Management Optimization, 2020, 16 (5) : 2495-2502. doi: 10.3934/jimo.2019065 [4] A. K. Misra, Anupama Sharma, Jia Li. A mathematical model for control of vector borne diseases through media campaigns. Discrete & Continuous Dynamical Systems - B, 2013, 18 (7) : 1909-1927. doi: 10.3934/dcdsb.2013.18.1909 [5] Alina Chertock, Alexander Kurganov, Mária Lukáčová-Medvi${\rm{\check{d}}}$ová, Șeyma Nur Özcan. An asymptotic preserving scheme for kinetic chemotaxis models in two space dimensions. Kinetic & Related Models, 2019, 12 (1) : 195-216. doi: 10.3934/krm.2019009 [6] Wei Liu, Pavel Krejčí, Guoju Ye. Continuity properties of Prandtl-Ishlinskii operators in the space of regulated functions. Discrete & Continuous Dynamical Systems - B, 2017, 22 (10) : 3783-3795. doi: 10.3934/dcdsb.2017190 [7] Alberto Bressan, Ke Han, Franco Rampazzo. On the control of non holonomic systems by active constraints. Discrete & Continuous Dynamical Systems - A, 2013, 33 (8) : 3329-3353. doi: 10.3934/dcds.2013.33.3329 [8] Xiaomao Deng, Xiao-Chuan Cai, Jun Zou. A parallel space-time domain decomposition method for unsteady source inversion problems. Inverse Problems & Imaging, 2015, 9 (4) : 1069-1091. doi: 10.3934/ipi.2015.9.1069 [9] Jianping Gao, Shangjiang Guo, Wenxian Shen. Persistence and time periodic positive solutions of doubly nonlocal Fisher-KPP equations in time periodic and space heterogeneous media. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2645-2676. doi: 10.3934/dcdsb.2020199 [10] Xinyuan Liao, Caidi Zhao, Shengfan Zhou. Compact uniform attractors for dissipative non-autonomous lattice dynamical systems. Communications on Pure & Applied Analysis, 2007, 6 (4) : 1087-1111. doi: 10.3934/cpaa.2007.6.1087 [11] Pascal Noble, Sebastien Travadel. Non-persistence of roll-waves under viscous perturbations. Discrete & Continuous Dynamical Systems - B, 2001, 1 (1) : 61-70. doi: 10.3934/dcdsb.2001.1.61 [12] Vieri Benci, Marco Cococcioni. The algorithmic numbers in non-archimedean numerical computing environments. Discrete & Continuous Dynamical Systems - S, 2021, 14 (5) : 1673-1692. doi: 10.3934/dcdss.2020449 [13] Liqin Qian, Xiwang Cao. Character sums over a non-chain ring and their applications. Advances in Mathematics of Communications, 2021  doi: 10.3934/amc.2020134 [14] Zhiming Guo, Zhi-Chun Yang, Xingfu Zou. Existence and uniqueness of positive solution to a non-local differential equation with homogeneous Dirichlet boundary condition---A non-monotone case. Communications on Pure & Applied Analysis, 2012, 11 (5) : 1825-1838. doi: 10.3934/cpaa.2012.11.1825 [15] Marita Holtmannspötter, Arnd Rösch, Boris Vexler. A priori error estimates for the space-time finite element discretization of an optimal control problem governed by a coupled linear PDE-ODE system. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021014 [16] Fumihiko Nakamura. Asymptotic behavior of non-expanding piecewise linear maps in the presence of random noise. Discrete & Continuous Dynamical Systems - B, 2018, 23 (6) : 2457-2473. doi: 10.3934/dcdsb.2018055 [17] Hyeong-Ohk Bae, Hyoungsuk So, Yeonghun Youn. Interior regularity to the steady incompressible shear thinning fluids with non-Standard growth. Networks & Heterogeneous Media, 2018, 13 (3) : 479-491. doi: 10.3934/nhm.2018021 [18] Emma D'Aniello, Saber Elaydi. The structure of $\omega$-limit sets of asymptotically non-autonomous discrete dynamical systems. Discrete & Continuous Dynamical Systems - B, 2020, 25 (3) : 903-915. doi: 10.3934/dcdsb.2019195 [19] Nabahats Dib-Baghdadli, Rabah Labbas, Tewfik Mahdjoub, Ahmed Medeghri. On some reaction-diffusion equations generated by non-domiciliated triatominae, vectors of Chagas disease. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021004 [20] John Leventides, Costas Poulios, Georgios Alkis Tsiatsios, Maria Livada, Stavros Tsipras, Konstantinos Lefcaditis, Panagiota Sargenti, Aleka Sargenti. Systems theory and analysis of the implementation of non pharmaceutical policies for the mitigation of the COVID-19 pandemic. Journal of Dynamics & Games, 2021  doi: 10.3934/jdg.2021004

Impact Factor:

## Tools

Article outline

Figures and Tables