August  2020, 19(8): 4069-4083. doi: 10.3934/cpaa.2020180

Quantitative convergence analysis of kernel based large-margin unified machines

1. 

Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong, China

2. 

Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China

* Corresponding author

Received  August 2019 Revised  September 2019 Published  May 2020

Fund Project: The work by J. Fan is partially supported by the Hong Kong RGC ECS grant 22303518, HKBU FRG grant FRG2/17-18/091 and the NSF grant of China (No. 11801478). The work by D. H. Xiang is supported by the National Natural Science Foundation of China under Grant 11871438 and 11771120

High-dimensional binary classification has been intensively studied in the community of machine learning in the last few decades. Support vector machine (SVM), one of the most popular classifier, depends on only a portion of training samples called support vectors which leads to suboptimal performance in the setting of high dimension and low sample size (HDLSS). Large-margin unified machines (LUMs) are a family of margin-based classifiers proposed to solve the so-called "data piling" problem which is inherent in SVM under HDLSS settings. In this paper we study the binary classification algorithms associated with LUM loss functions in the framework of reproducing kernel Hilbert spaces. Quantitative convergence analysis has been carried out for these algorithms by means of a novel application of projection operators to overcome the technical difficulty. The rates are explicitly derived under priori conditions on approximation and capacity of the reproducing kernel Hilbert space.

Citation: Jun Fan, Dao-Hong Xiang. Quantitative convergence analysis of kernel based large-margin unified machines. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4069-4083. doi: 10.3934/cpaa.2020180
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

P. L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory, 44 (1998), 525-536.  doi: 10.1109/18.661502.  Google Scholar

[3]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.  Google Scholar

[4]

B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in Computational Learning Theory, Madison, WI, ACM, (1992), 144–152. Google Scholar

[5]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[6]

C. Cortes and V. Vapnik, Support vector networks, Mach. Learn., 20 (1995), 273-297.   Google Scholar

[7] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[8] D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, Cambridge University Press, Cambridge, 1996.  doi: 10.1017/CBO9780511662201.  Google Scholar
[9]

J. Fan and D. H. Xiang, Comparison theorems on large margin learning, Technique report, 2019. Google Scholar

[10]

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119-139.  doi: 10.1006/jcss.1997.1504.  Google Scholar

[11]

J. FriedmanT. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion), Ann. Statist., 28 (2000), 337-407.  doi: 10.1214/aos/1016218223.  Google Scholar

[12]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[13]

Y. W. LeiU. DoganD. X. Zhou and M. Kloft, Data-dependent generalization bounds for multi-class classification, IEEE Trans. Inform. Theory, 65 (2019), 2995-3021.  doi: 10.1109/TIT.2019.2893916.  Google Scholar

[14]

Y. F. LiuH. H. Zhang and Y. C. Wu, Hard or soft classificaiton? large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.  Google Scholar

[15]

J. S. MarronM. Todd and J. Ahn, Distance weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.  Google Scholar

[16]

S. Smale. and D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87-113.  doi: 10.1142/S0219530509001293.  Google Scholar

[17]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[18]

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2001), 67-93.  doi: 10.1162/153244302760185252.  Google Scholar

[19]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.  Google Scholar

[20]

I. Steinwart and C. Scovel, Fast rates for support vector machines using gaussian kernels, Ann. Statist., 35 (2007), 575-607.  doi: 10.1214/009053606000001226.  Google Scholar

[21]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.  Google Scholar

[22]

G. Wahba, Spline Models for Observational Data, SIAM, 1990. doi: 10.1137/1.9781611970128.  Google Scholar

[23]

C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl., 17 (2019), 293-322.  doi: 10.1142/S0219530518500148.  Google Scholar

[24]

B. X. Wang and H. Zou, Another look at distance-weighted discrimination, J. R. Statist. Soc., 80 (2018), 177-198.  doi: 10.1111/rssb.12244.  Google Scholar

[25]

R. WilliamsonA. J. Smola and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47 (2001), 2516-2532.  doi: 10.1109/18.945262.  Google Scholar

[26]

Q. Wu, Classification and Regularization in Learning Theory, VDM Verlag, 2009. Google Scholar

[27]

Q. WuY. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[28]

Q. WuY. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.  Google Scholar

[29]

Q. Wu and D. X. Zhou, Svm soft margin classifiers: linear programming versus quadratic programming, Neural Comput., 17 (2005), 1160-1187.  doi: 10.1162/0899766053491896.  Google Scholar

[30]

Q. Wu and D. X. Zhou, Analysis of support vector machine classification, J. Comput. Anal. Appl., 8 (2006), 99-119.  doi: 10.1109/BMEI.2008.178.  Google Scholar

[31]

D. H. Xiang, Classification with gaussians and convex loss ii: improving error bounds by noise conditions, Sci. China Math., 54 (2011), 165-171.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[32]

D. H. Xiang, Logistic classification with varying gaussians, Comput. Math. Appl., 61 (2011), 397-407.  doi: 10.1016/j.camwa.2010.11.016.  Google Scholar

[33]

D. H. Xiang, Conditional quantiles with varying gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.  Google Scholar

[34]

D. H. Xiang and D. X. Zhou, Classification with gaussians and convex loss, J. Mach. Learn. Res., 10 (2009), 1447-1468.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[35]

D. H. Xiang, T. Hu and D. X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., 2012 (2012), 17 pp. doi: 10.1155/2012/902139.  Google Scholar

[36]

Y. Ying and D. X. Zhou, Learnability of Gaussians with Flexible Variances, J. Mach. Learn. Res., 8 (2007), 249-276.   Google Scholar

[37]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.  Google Scholar

[38]

Y. L. ZhaoJ. Fan and L. Shi, Learing rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836.  doi: 10.1142/S0219530517500063.  Google Scholar

[39]

D. X. Zhou, The covering number in learning theory, J. Complexity, 18 (2002), 739-767.  doi: 10.1006/jcom.2002.0635.  Google Scholar

[40]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE. Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.  Google Scholar

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

P. L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory, 44 (1998), 525-536.  doi: 10.1109/18.661502.  Google Scholar

[3]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.  Google Scholar

[4]

B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in Computational Learning Theory, Madison, WI, ACM, (1992), 144–152. Google Scholar

[5]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[6]

C. Cortes and V. Vapnik, Support vector networks, Mach. Learn., 20 (1995), 273-297.   Google Scholar

[7] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[8] D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, Cambridge University Press, Cambridge, 1996.  doi: 10.1017/CBO9780511662201.  Google Scholar
[9]

J. Fan and D. H. Xiang, Comparison theorems on large margin learning, Technique report, 2019. Google Scholar

[10]

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119-139.  doi: 10.1006/jcss.1997.1504.  Google Scholar

[11]

J. FriedmanT. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion), Ann. Statist., 28 (2000), 337-407.  doi: 10.1214/aos/1016218223.  Google Scholar

[12]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[13]

Y. W. LeiU. DoganD. X. Zhou and M. Kloft, Data-dependent generalization bounds for multi-class classification, IEEE Trans. Inform. Theory, 65 (2019), 2995-3021.  doi: 10.1109/TIT.2019.2893916.  Google Scholar

[14]

Y. F. LiuH. H. Zhang and Y. C. Wu, Hard or soft classificaiton? large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.  Google Scholar

[15]

J. S. MarronM. Todd and J. Ahn, Distance weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.  Google Scholar

[16]

S. Smale. and D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87-113.  doi: 10.1142/S0219530509001293.  Google Scholar

[17]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[18]

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2001), 67-93.  doi: 10.1162/153244302760185252.  Google Scholar

[19]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.  Google Scholar

[20]

I. Steinwart and C. Scovel, Fast rates for support vector machines using gaussian kernels, Ann. Statist., 35 (2007), 575-607.  doi: 10.1214/009053606000001226.  Google Scholar

[21]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.  Google Scholar

[22]

G. Wahba, Spline Models for Observational Data, SIAM, 1990. doi: 10.1137/1.9781611970128.  Google Scholar

[23]

C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl., 17 (2019), 293-322.  doi: 10.1142/S0219530518500148.  Google Scholar

[24]

B. X. Wang and H. Zou, Another look at distance-weighted discrimination, J. R. Statist. Soc., 80 (2018), 177-198.  doi: 10.1111/rssb.12244.  Google Scholar

[25]

R. WilliamsonA. J. Smola and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47 (2001), 2516-2532.  doi: 10.1109/18.945262.  Google Scholar

[26]

Q. Wu, Classification and Regularization in Learning Theory, VDM Verlag, 2009. Google Scholar

[27]

Q. WuY. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[28]

Q. WuY. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.  Google Scholar

[29]

Q. Wu and D. X. Zhou, Svm soft margin classifiers: linear programming versus quadratic programming, Neural Comput., 17 (2005), 1160-1187.  doi: 10.1162/0899766053491896.  Google Scholar

[30]

Q. Wu and D. X. Zhou, Analysis of support vector machine classification, J. Comput. Anal. Appl., 8 (2006), 99-119.  doi: 10.1109/BMEI.2008.178.  Google Scholar

[31]

D. H. Xiang, Classification with gaussians and convex loss ii: improving error bounds by noise conditions, Sci. China Math., 54 (2011), 165-171.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[32]

D. H. Xiang, Logistic classification with varying gaussians, Comput. Math. Appl., 61 (2011), 397-407.  doi: 10.1016/j.camwa.2010.11.016.  Google Scholar

[33]

D. H. Xiang, Conditional quantiles with varying gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.  Google Scholar

[34]

D. H. Xiang and D. X. Zhou, Classification with gaussians and convex loss, J. Mach. Learn. Res., 10 (2009), 1447-1468.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[35]

D. H. Xiang, T. Hu and D. X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., 2012 (2012), 17 pp. doi: 10.1155/2012/902139.  Google Scholar

[36]

Y. Ying and D. X. Zhou, Learnability of Gaussians with Flexible Variances, J. Mach. Learn. Res., 8 (2007), 249-276.   Google Scholar

[37]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.  Google Scholar

[38]

Y. L. ZhaoJ. Fan and L. Shi, Learing rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836.  doi: 10.1142/S0219530517500063.  Google Scholar

[39]

D. X. Zhou, The covering number in learning theory, J. Complexity, 18 (2002), 739-767.  doi: 10.1006/jcom.2002.0635.  Google Scholar

[40]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE. Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.  Google Scholar

[1]

Mostafa Mbekhta. Representation and approximation of the polar factor of an operator on a Hilbert space. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020463

[2]

Xin Guo, Lexin Li, Qiang Wu. Modeling interactive components by coordinate kernel polynomial models. Mathematical Foundations of Computing, 2020, 3 (4) : 263-277. doi: 10.3934/mfc.2020010

[3]

Thierry Horsin, Mohamed Ali Jendoubi. On the convergence to equilibria of a sequence defined by an implicit scheme. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020465

[4]

Bahaaeldin Abdalla, Thabet Abdeljawad. Oscillation criteria for kernel function dependent fractional dynamic equations. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020443

[5]

Kha Van Huynh, Barbara Kaltenbacher. Some application examples of minimization based formulations of inverse problems and their regularization. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020074

[6]

Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115

[7]

Parikshit Upadhyaya, Elias Jarlebring, Emanuel H. Rubensson. A density matrix approach to the convergence of the self-consistent field iteration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 99-115. doi: 10.3934/naco.2020018

[8]

Xuefei He, Kun Wang, Liwei Xu. Efficient finite difference methods for the nonlinear Helmholtz equation in Kerr medium. Electronic Research Archive, 2020, 28 (4) : 1503-1528. doi: 10.3934/era.2020079

[9]

Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026

[10]

Zedong Yang, Guotao Wang, Ravi P. Agarwal, Haiyong Xu. Existence and nonexistence of entire positive radial solutions for a class of Schrödinger elliptic systems involving a nonlinear operator. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020436

[11]

Jia Cai, Guanglong Xu, Zhensheng Hu. Sketch-based image retrieval via CAT loss with elastic net regularization. Mathematical Foundations of Computing, 2020, 3 (4) : 219-227. doi: 10.3934/mfc.2020013

[12]

Wenbin Li, Jianliang Qian. Simultaneously recovering both domain and varying density in inverse gravimetry by efficient level-set methods. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020073

[13]

Zuliang Lu, Fei Huang, Xiankui Wu, Lin Li, Shang Liu. Convergence and quasi-optimality of $ L^2- $norms based an adaptive finite element method for nonlinear optimal control problems. Electronic Research Archive, 2020, 28 (4) : 1459-1486. doi: 10.3934/era.2020077

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (121)
  • HTML views (80)
  • Cited by (0)

Other articles
by authors

[Back to Top]