August  2020, 19(8): 4069-4083. doi: 10.3934/cpaa.2020180

Quantitative convergence analysis of kernel based large-margin unified machines

1. 

Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong, China

2. 

Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China

* Corresponding author

Received  August 2019 Revised  September 2019 Published  May 2020

Fund Project: The work by J. Fan is partially supported by the Hong Kong RGC ECS grant 22303518, HKBU FRG grant FRG2/17-18/091 and the NSF grant of China (No. 11801478). The work by D. H. Xiang is supported by the National Natural Science Foundation of China under Grant 11871438 and 11771120

High-dimensional binary classification has been intensively studied in the community of machine learning in the last few decades. Support vector machine (SVM), one of the most popular classifier, depends on only a portion of training samples called support vectors which leads to suboptimal performance in the setting of high dimension and low sample size (HDLSS). Large-margin unified machines (LUMs) are a family of margin-based classifiers proposed to solve the so-called "data piling" problem which is inherent in SVM under HDLSS settings. In this paper we study the binary classification algorithms associated with LUM loss functions in the framework of reproducing kernel Hilbert spaces. Quantitative convergence analysis has been carried out for these algorithms by means of a novel application of projection operators to overcome the technical difficulty. The rates are explicitly derived under priori conditions on approximation and capacity of the reproducing kernel Hilbert space.

Citation: Jun Fan, Dao-Hong Xiang. Quantitative convergence analysis of kernel based large-margin unified machines. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4069-4083. doi: 10.3934/cpaa.2020180
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

P. L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory, 44 (1998), 525-536.  doi: 10.1109/18.661502.  Google Scholar

[3]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.  Google Scholar

[4]

B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in Computational Learning Theory, Madison, WI, ACM, (1992), 144–152. Google Scholar

[5]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[6]

C. Cortes and V. Vapnik, Support vector networks, Mach. Learn., 20 (1995), 273-297.   Google Scholar

[7] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[8] D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, Cambridge University Press, Cambridge, 1996.  doi: 10.1017/CBO9780511662201.  Google Scholar
[9]

J. Fan and D. H. Xiang, Comparison theorems on large margin learning, Technique report, 2019. Google Scholar

[10]

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119-139.  doi: 10.1006/jcss.1997.1504.  Google Scholar

[11]

J. FriedmanT. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion), Ann. Statist., 28 (2000), 337-407.  doi: 10.1214/aos/1016218223.  Google Scholar

[12]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[13]

Y. W. LeiU. DoganD. X. Zhou and M. Kloft, Data-dependent generalization bounds for multi-class classification, IEEE Trans. Inform. Theory, 65 (2019), 2995-3021.  doi: 10.1109/TIT.2019.2893916.  Google Scholar

[14]

Y. F. LiuH. H. Zhang and Y. C. Wu, Hard or soft classificaiton? large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.  Google Scholar

[15]

J. S. MarronM. Todd and J. Ahn, Distance weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.  Google Scholar

[16]

S. Smale. and D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87-113.  doi: 10.1142/S0219530509001293.  Google Scholar

[17]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[18]

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2001), 67-93.  doi: 10.1162/153244302760185252.  Google Scholar

[19]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.  Google Scholar

[20]

I. Steinwart and C. Scovel, Fast rates for support vector machines using gaussian kernels, Ann. Statist., 35 (2007), 575-607.  doi: 10.1214/009053606000001226.  Google Scholar

[21]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.  Google Scholar

[22]

G. Wahba, Spline Models for Observational Data, SIAM, 1990. doi: 10.1137/1.9781611970128.  Google Scholar

[23]

C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl., 17 (2019), 293-322.  doi: 10.1142/S0219530518500148.  Google Scholar

[24]

B. X. Wang and H. Zou, Another look at distance-weighted discrimination, J. R. Statist. Soc., 80 (2018), 177-198.  doi: 10.1111/rssb.12244.  Google Scholar

[25]

R. WilliamsonA. J. Smola and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47 (2001), 2516-2532.  doi: 10.1109/18.945262.  Google Scholar

[26]

Q. Wu, Classification and Regularization in Learning Theory, VDM Verlag, 2009. Google Scholar

[27]

Q. WuY. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[28]

Q. WuY. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.  Google Scholar

[29]

Q. Wu and D. X. Zhou, Svm soft margin classifiers: linear programming versus quadratic programming, Neural Comput., 17 (2005), 1160-1187.  doi: 10.1162/0899766053491896.  Google Scholar

[30]

Q. Wu and D. X. Zhou, Analysis of support vector machine classification, J. Comput. Anal. Appl., 8 (2006), 99-119.  doi: 10.1109/BMEI.2008.178.  Google Scholar

[31]

D. H. Xiang, Classification with gaussians and convex loss ii: improving error bounds by noise conditions, Sci. China Math., 54 (2011), 165-171.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[32]

D. H. Xiang, Logistic classification with varying gaussians, Comput. Math. Appl., 61 (2011), 397-407.  doi: 10.1016/j.camwa.2010.11.016.  Google Scholar

[33]

D. H. Xiang, Conditional quantiles with varying gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.  Google Scholar

[34]

D. H. Xiang and D. X. Zhou, Classification with gaussians and convex loss, J. Mach. Learn. Res., 10 (2009), 1447-1468.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[35]

D. H. Xiang, T. Hu and D. X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., 2012 (2012), 17 pp. doi: 10.1155/2012/902139.  Google Scholar

[36]

Y. Ying and D. X. Zhou, Learnability of Gaussians with Flexible Variances, J. Mach. Learn. Res., 8 (2007), 249-276.   Google Scholar

[37]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.  Google Scholar

[38]

Y. L. ZhaoJ. Fan and L. Shi, Learing rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836.  doi: 10.1142/S0219530517500063.  Google Scholar

[39]

D. X. Zhou, The covering number in learning theory, J. Complexity, 18 (2002), 739-767.  doi: 10.1006/jcom.2002.0635.  Google Scholar

[40]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE. Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.  Google Scholar

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

P. L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory, 44 (1998), 525-536.  doi: 10.1109/18.661502.  Google Scholar

[3]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convexity, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.  Google Scholar

[4]

B. E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in Computational Learning Theory, Madison, WI, ACM, (1992), 144–152. Google Scholar

[5]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[6]

C. Cortes and V. Vapnik, Support vector networks, Mach. Learn., 20 (1995), 273-297.   Google Scholar

[7] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[8] D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, Cambridge University Press, Cambridge, 1996.  doi: 10.1017/CBO9780511662201.  Google Scholar
[9]

J. Fan and D. H. Xiang, Comparison theorems on large margin learning, Technique report, 2019. Google Scholar

[10]

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 119-139.  doi: 10.1006/jcss.1997.1504.  Google Scholar

[11]

J. FriedmanT. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion), Ann. Statist., 28 (2000), 337-407.  doi: 10.1214/aos/1016218223.  Google Scholar

[12]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[13]

Y. W. LeiU. DoganD. X. Zhou and M. Kloft, Data-dependent generalization bounds for multi-class classification, IEEE Trans. Inform. Theory, 65 (2019), 2995-3021.  doi: 10.1109/TIT.2019.2893916.  Google Scholar

[14]

Y. F. LiuH. H. Zhang and Y. C. Wu, Hard or soft classificaiton? large-margin unified machines, J. Amer. Statist. Assoc., 106 (2011), 166-177.  doi: 10.1198/jasa.2011.tm10319.  Google Scholar

[15]

J. S. MarronM. Todd and J. Ahn, Distance weighted discrimination, J. Amer. Statist. Assoc., 102 (2007), 1267-1271.  doi: 10.1198/016214507000001120.  Google Scholar

[16]

S. Smale. and D. X. Zhou, Online learning with markov sampling, Anal. Appl., 7 (2009), 87-113.  doi: 10.1142/S0219530509001293.  Google Scholar

[17]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[18]

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2001), 67-93.  doi: 10.1162/153244302760185252.  Google Scholar

[19]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.  Google Scholar

[20]

I. Steinwart and C. Scovel, Fast rates for support vector machines using gaussian kernels, Ann. Statist., 35 (2007), 575-607.  doi: 10.1214/009053606000001226.  Google Scholar

[21]

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., 32 (2004), 135-166.  doi: 10.1214/aos/1079120131.  Google Scholar

[22]

G. Wahba, Spline Models for Observational Data, SIAM, 1990. doi: 10.1137/1.9781611970128.  Google Scholar

[23]

C. Wang and T. Hu, Online minimum error entropy algorithm with unbounded sampling, Anal. Appl., 17 (2019), 293-322.  doi: 10.1142/S0219530518500148.  Google Scholar

[24]

B. X. Wang and H. Zou, Another look at distance-weighted discrimination, J. R. Statist. Soc., 80 (2018), 177-198.  doi: 10.1111/rssb.12244.  Google Scholar

[25]

R. WilliamsonA. J. Smola and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, IEEE Trans. Inform. Theory, 47 (2001), 2516-2532.  doi: 10.1109/18.945262.  Google Scholar

[26]

Q. Wu, Classification and Regularization in Learning Theory, VDM Verlag, 2009. Google Scholar

[27]

Q. WuY. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[28]

Q. WuY. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.  Google Scholar

[29]

Q. Wu and D. X. Zhou, Svm soft margin classifiers: linear programming versus quadratic programming, Neural Comput., 17 (2005), 1160-1187.  doi: 10.1162/0899766053491896.  Google Scholar

[30]

Q. Wu and D. X. Zhou, Analysis of support vector machine classification, J. Comput. Anal. Appl., 8 (2006), 99-119.  doi: 10.1109/BMEI.2008.178.  Google Scholar

[31]

D. H. Xiang, Classification with gaussians and convex loss ii: improving error bounds by noise conditions, Sci. China Math., 54 (2011), 165-171.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[32]

D. H. Xiang, Logistic classification with varying gaussians, Comput. Math. Appl., 61 (2011), 397-407.  doi: 10.1016/j.camwa.2010.11.016.  Google Scholar

[33]

D. H. Xiang, Conditional quantiles with varying gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.  Google Scholar

[34]

D. H. Xiang and D. X. Zhou, Classification with gaussians and convex loss, J. Mach. Learn. Res., 10 (2009), 1447-1468.  doi: 10.1007/s11425-010-4043-2.  Google Scholar

[35]

D. H. Xiang, T. Hu and D. X. Zhou, Approximation analysis of learning algorithms for support vector regression and quantile regression, J. Appl. Math., 2012 (2012), 17 pp. doi: 10.1155/2012/902139.  Google Scholar

[36]

Y. Ying and D. X. Zhou, Learnability of Gaussians with Flexible Variances, J. Mach. Learn. Res., 8 (2007), 249-276.   Google Scholar

[37]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.  Google Scholar

[38]

Y. L. ZhaoJ. Fan and L. Shi, Learing rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836.  doi: 10.1142/S0219530517500063.  Google Scholar

[39]

D. X. Zhou, The covering number in learning theory, J. Complexity, 18 (2002), 739-767.  doi: 10.1006/jcom.2002.0635.  Google Scholar

[40]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE. Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.  Google Scholar

[1]

Stefan Kindermann, Antonio Leitão. Convergence rates for Kaczmarz-type regularization methods. Inverse Problems & Imaging, 2014, 8 (1) : 149-172. doi: 10.3934/ipi.2014.8.149

[2]

De-han Chen, Daijun jiang. Convergence rates of Tikhonov regularization for recovering growth rates in a Lotka-Volterra competition model with diffusion. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021023

[3]

Philippe Angot, Pierre Fabrie. Convergence results for the vector penalty-projection and two-step artificial compressibility methods. Discrete & Continuous Dynamical Systems - B, 2012, 17 (5) : 1383-1405. doi: 10.3934/dcdsb.2012.17.1383

[4]

Markus Grasmair. Well-posedness and convergence rates for sparse regularization with sublinear $l^q$ penalty term. Inverse Problems & Imaging, 2009, 3 (3) : 383-387. doi: 10.3934/ipi.2009.3.383

[5]

Guozhi Dong, Bert Jüttler, Otmar Scherzer, Thomas Takacs. Convergence of Tikhonov regularization for solving ill-posed operator equations with solutions defined on surfaces. Inverse Problems & Imaging, 2017, 11 (2) : 221-246. doi: 10.3934/ipi.2017011

[6]

Matteo Bonforte, Jean Dolbeault, Matteo Muratori, Bruno Nazaret. Weighted fast diffusion equations (Part Ⅱ): Sharp asymptotic rates of convergence in relative error by entropy methods. Kinetic & Related Models, 2017, 10 (1) : 61-91. doi: 10.3934/krm.2017003

[7]

James Broda, Alexander Grigo, Nikola P. Petrov. Convergence rates for semistochastic processes. Discrete & Continuous Dynamical Systems - B, 2019, 24 (1) : 109-125. doi: 10.3934/dcdsb.2019001

[8]

Thomas Schuster, Joachim Weickert. On the application of projection methods for computing optical flow fields. Inverse Problems & Imaging, 2007, 1 (4) : 673-690. doi: 10.3934/ipi.2007.1.673

[9]

Dang Van Hieu. Projection methods for solving split equilibrium problems. Journal of Industrial & Management Optimization, 2020, 16 (5) : 2331-2349. doi: 10.3934/jimo.2019056

[10]

Qinghua Ma, Zuoliang Xu, Liping Wang. Recovery of the local volatility function using regularization and a gradient projection method. Journal of Industrial & Management Optimization, 2015, 11 (2) : 421-437. doi: 10.3934/jimo.2015.11.421

[11]

Richard A. Norton, David I. McLaren, G. R. W. Quispel, Ari Stern, Antonella Zanna. Projection methods and discrete gradient methods for preserving first integrals of ODEs. Discrete & Continuous Dynamical Systems, 2015, 35 (5) : 2079-2098. doi: 10.3934/dcds.2015.35.2079

[12]

Ole Løseth Elvetun, Bjørn Fredrik Nielsen. A regularization operator for source identification for elliptic PDEs. Inverse Problems & Imaging, 2021, 15 (4) : 599-618. doi: 10.3934/ipi.2021006

[13]

Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176

[14]

Xiangtuan Xiong, Jinmei Li, Jin Wen. Some novel linear regularization methods for a deblurring problem. Inverse Problems & Imaging, 2017, 11 (2) : 403-426. doi: 10.3934/ipi.2017019

[15]

Stefan Kindermann, Andreas Neubauer. On the convergence of the quasioptimality criterion for (iterated) Tikhonov regularization. Inverse Problems & Imaging, 2008, 2 (2) : 291-299. doi: 10.3934/ipi.2008.2.291

[16]

Jae-Hong Pyo, Jie Shen. Normal mode analysis of second-order projection methods for incompressible flows. Discrete & Continuous Dynamical Systems - B, 2005, 5 (3) : 817-840. doi: 10.3934/dcdsb.2005.5.817

[17]

Yazheng Dang, Fanwen Meng, Jie Sun. Convergence analysis of a parallel projection algorithm for solving convex feasibility problems. Numerical Algebra, Control & Optimization, 2016, 6 (4) : 505-519. doi: 10.3934/naco.2016023

[18]

Frank Blume. Minimal rates of entropy convergence for rank one systems. Discrete & Continuous Dynamical Systems, 2000, 6 (4) : 773-796. doi: 10.3934/dcds.2000.6.773

[19]

Jie Zhao. Convergence rates for elliptic reiterated homogenization problems. Communications on Pure & Applied Analysis, 2013, 12 (6) : 2787-2795. doi: 10.3934/cpaa.2013.12.2787

[20]

Wilhelm Schlag. Regularity and convergence rates for the Lyapunov exponents of linear cocycles. Journal of Modern Dynamics, 2013, 7 (4) : 619-637. doi: 10.3934/jmd.2013.7.619

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (173)
  • HTML views (81)
  • Cited by (0)

Other articles
by authors

[Back to Top]