August  2020, 19(8): 3973-4005. doi: 10.3934/cpaa.2020176

Learning rates for the kernel regularized regression with a differentiable strongly convex loss

Department of Applied Statistics, Shaoxing University, Shaoxing, 312000, China

* Corresponding authors

Received  June 2019 Revised  March 2020 Published  May 2020

Fund Project: This work is supported by the National Natural Science Foundation of China under Grants (No. 61877039, 11501375) and the Natural Science Foundation of Zhejiang Province under Grant (No. LQ14A010005)

We consider learning rates of kernel regularized regression (KRR) based on reproducing kernel Hilbert spaces (RKHSs) and differentiable strongly convex losses and provide some new strongly convex losses. We first show the robustness with the maximum mean discrepancy (MMD) and the Hutchinson metric respectively, and, along this line, bound the learning rate of the KRR. We first provide a capacity dependent learning rate and then give the learning rates for four concrete strongly convex losses respectively. In particular, we provide the learning rates when the hypothesis RKHS's logarithmic complexity exponent is arbitrarily small as well as sufficiently large.

Citation: Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176
References:
[1] R. A. Adams and J. J. F. Fournier, Sobolev Space, 2$^nd$ edition, Academic Press, 2009. 
[2]

A. ArgyriouC. A. Micchelli and M. Pontil, On spectral learning, J. Mach. Learn. Res., 11 (2010), 935-953.  doi: 10.1093/protein/6.4.383.

[3]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.

[4]

A. BanerjeeS. MeruguI. S. Dhillon and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), 1705-1749. 

[5]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convex, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.

[6]

F. BauerS. Pereverzey and L. Rosasco, On regularization algorithms in learning theory, J. Complexity, 23 (2007), 52-72.  doi: 10.1016/j.jco.2006.07.001.

[7]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2010. doi: 10.1007/978-1-4419-9467-7.

[8]

A. Bietti and J. Mairal, Group invariance, stability to deformations, and complexity of deep convolutional representations, J. Mach. Learn. Res., 20 (2019), 1-49. 

[9]

G. Blanchard and N. Krämer, Optimal learning rates for kernel conjugate gradient regression, preprint, arXiv: 1009.5839.

[10]

G. Blanchard and N. Mücke, Optimal rates for regularized of statistical inverse learning problems, Found. Math. Comput., 18 (2018), 971-1013.  doi: 10.1007/s10208-017-9359-7.

[11]

B. Bohn and C. Rieger, A representer theorem for deep kernel learning, J. Mach. Learn. Res., 20 (2019), 1-32. 

[12]

J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problem, Springer-Verlag, New York, 2000. doi: 10.1007/978-1-4612-1394-9.

[13]

E. M. Bronshtein, $\varepsilon$-entropy of convex stes and functions, Siberian Math. J., 17 (1976), 393-398. 

[14]

A. Caponnetto and E. De Vito, Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., 7 (2007), 331-368.  doi: 10.1007/s10208-006-0196-8.

[15]

X. Y. ChangZ. B. XuB. Zou and H. Zhang, Generalization bounds of regularization algorithms derived simultaneously through hypothesis space complexity, algorithmic stability and data quality, Int. J. Wavelets Multiresolut. Inform. Process., 9 (2011), 549-570.  doi: 10.1142/S0219691311004213.

[16]

D. R. ChenQ. WuY. M. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175. 

[17]

A. Christmann and A. V. Messem, Bouligand derivatives and robustness of support vector machines for regression, J. Mach. Learn. Res., 9 (2008), 915-936. 

[18]

A. ChristmannA. V. Messem and I. Steinwart, On consistency and robustness properties of support vector machines for heavy-tailed distributions, Statist. Interface, 2 (2009), 311-327.  doi: 10.4310/SII.2009.v2.n3.a5.

[19]

A. Christmann and I. Steinwart, On robustness properties of convex risk minimization methods for pattern recognition, J. Mach. Learn. Res., 5 (2004), 1007-1034. 

[20]

A. Christmann and I. Steinwart, Consistency and robustness of kernel based regression, Bernoulli, 13 (2007), 799-819.  doi: 10.3150/07-BEJ5102.

[21]

A. Christmann and I. Steinwart, Consistency of kernel based quantile regression, Appl. Stoch. Model. Bus. and Industr., 24 (2008), 171-183.  doi: 10.1002/asmb.700.

[22]

A. ChristmannD. H. Xiang and D. X. Zhou, Total stability of kernel methods, Neurocomputing, 289 (2018), 101-118. 

[23]

A. Christmann and D. X. Zhou, Learning rates for the risk of kernel-based quantile regression estimators in additive models, Anal. Appl., 14 (2016), 449-477.  doi: 10.1142/S0219530515500050.

[24]

A. Christmann and D. X. Zhou, On the robustness of regularized pairwise learning methods based on kernels, J. Complexity, 37 (2016), 1-33.  doi: 10.1016/j.jco.2016.07.001.

[25]

F. Cucker and S. Smale, On the mathematical foundations of learning theory, Bull. Amer. Math. Soc., 39 (2001), 1-49.  doi: 10.1090/S0273-0979-01-00923-5.

[26]

F. Cucker and S. Smale, Best choices for regularized parameters in learning theory: on the bias-variance problem, Found. Comput. Math., 2 (2002), 413-428.  doi: 10.1007/s102080010030.

[27] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, 2007.  doi: 10.1017/CBO9780511618796.
[28]

M. DebruyneA. ChristmannM. Huber and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.

[29]

E. De VitoA. Caponnetto and L. Rosasco, Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., 5 (2005), 59-85.  doi: 10.1007/s10208-004-0134-1.

[30]

E. De VitoL. RosascoA. CaponnettoM. Piana and A. Verri, Some properties of regularized kernel methods, J. Mach. Learn. Res., 5 (2004), 1363-1390. 

[31]

E. De VitoL. Rosasco and A. Toigo, Learning sets with separating kernels, Appl. Comput. Harmon. Anal., 37 (2014), 185-217.  doi: 10.1016/j.acha.2013.11.003.

[32]

F. Dumpert, Universal consistency and robustness of localized support vector machines, Neurocomputting, 315 (2018), 96-106. 

[33]

M. Ebets and I. Steinwart, Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Statist., 6 (2012), 2627-2668.  doi: 10.1214/12-EJS760.

[34]

T. EvgeniouM. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math., 13 (2000), 1-50.  doi: 10.1023/A:1018946025316.

[35]

M. Farooq and I. Steinwart, Learning rates for kernel-based expectile regression, Mach. Learn., 108 (2019), 203-227.  doi: 10.1007/s10994-018-5762-9.

[36]

S. Fischer and I. Steinwart, Sobolev norm learning rates for regularized least-squares algorithm, preprint, arXiv: 1702.07254, Fachbereich Mathematik Fakultat Mathematik und Physik, Universitat Stuttgart, Pfaffenwaldring, 57, D-70,569, Stuttgart.

[37]

A. Guntuboyina and B. Sen, Covering numbers for convex functions, IEEE Trans. Inform. Theory, 59 (2013), 1957-1965.  doi: 10.1109/TIT.2012.2235172.

[38]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Problem, 34 (2018), Art. 065009, 29 pp. doi: 10.1088/1361-6420/aabe55.

[39]

R. Hable and A. Christmann, Qualitative robustness of support vector machines, J. Multi. Anal., 102 (2011), 993-1007.  doi: 10.1016/j.jmva.2011.01.009.

[40]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Veralg, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.

[41]

J. B. Hiriart-urruty and C.Lemaréchal, Fundamentals of Convex Analysis, Springer-Verlag, 2004.

[42]

J. H. LinA. RudiL. Rosasco and V. Cevher, Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces, Appl. Comput. Harmon. Anal., 48 (2020), 868-890.  doi: 10.1016/j.acha.2018.09.009.

[43]

H. X. LiuB. H. Sheng and P. X. Ye, The improved learning rate for regularized regression with RKBSs, Int. J. Math. Learn. Cyber., 8 (2017), 1235-1245. 

[44]

Y. X. Ma and H. W. Sun, Application of integral operator for vector-valued regression learning, Int. J. Wavelets Multiresolut. Inform. Process., 13 (2015), Art. 1550047, 16 pp. doi: 10.1142/S0219691315500472.

[45]

A. V. Messem and A. Christmann, A review on consistency and robustness properties of support vector machines for heavy-tailed distributions, Adv. Data Anal. Classif., 4 (2010), 199-220.  doi: 10.1007/s11634-010-0067-2.

[46]

C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125. 

[47]

C. A. MicchelliY. S. Xu and H. Z. Zhang, Universal kernels, J. Mach. Learn. Res., 7 (2006), 2651-2667. 

[48]

K. MuandetK. FukumizuB. K. Sriperumbudur and B. Schölkopf, Kernel mean embedding of distributions: a review and beyonds, Found. Trends Mach. Learn., 10 (2017), 1-14.  doi: 10.1561/2200000060.

[49]

Y. Nishiyama, Characteristic kernels and infinitely divisible distributions, J. Mach. Learn. Res., 17 (2016), 1-28. 

[50]

J. Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, Springer-Verlag, 2015. doi: 10.1007/978-3-319-13710-0.

[51]

Z. W. QinF. X. YuC. C. Liu and X. Chen, How convolutional neural networks see the world: a survey of convolution neural network visualization methods, Math. Found. Comput., 1 (2018), 149-180. 

[52]

M. M. Rao and Z. D. Ren, Applications of Orlicz Spaces, Marcel Dekker. Inc., New York, 2002. doi: 10.1201/9780203910863.

[53]

L. Rosasco and M. Belkin, On learning with integral operators, J. Mach. Learn. Res., 11 (2010), 905-934. 

[54] Z. Sha and H. J. Ruan, Fractal and Fitting (in Chinese), Zhejiang University Press, Hangzhou, 2005. 
[55]

B. H. Sheng, On approximation by reproducing kernel spaces in weighted $L^p-$spaces, J. Syst. Sci. Complex., 20 (2007), 623-638.  doi: 10.1007/s11424-007-9061-y.

[56]

B. H. ShengL. Q. Duan and P. X. Ye, Strong convex loss can increase the learning rates of online learning, J. Computer, 9 (2014), 1606-1611. 

[57]

B. H. Sheng and H. T. Li, On approximation by spherical zonal translation networks based on Bochner Riesz means, J. Syst. Sci. Complex., 18 (2005), 361-374. 

[58]

B. H. Sheng and J. L. Wang, On the $K$-functional in learning theory, Anal. Appl., 18 (2020), 423-446.  doi: 10.1142/S0219530519500192.

[59]

B. H. ShengJ. L. Wang and D. H. Xiang, Error analysis on Hérmite learning with gradient data, Chin. Ann. Math. Ser. B, 39 (2018), 705-720.  doi: 10.1007/s11401-018-0091-7.

[60]

B. H. ShengJ. L. Wang and S. P. Zhou, A way of constructing spherical zonal translation network operators with linear bounded operators, Taiwan. J. Math., 12 (2008), 77-92.  doi: 10.11650/twjm/1500602490.

[61]

B. H. Sheng and D. H. Xiang, The convergence rate for a K-functional in learning theory, J. Inequal. Appl., (2010), Art. 249507, 18 pp. doi: 10.1155/2010/249507..

[62]

B. H. Sheng and D. H. Xiang, The learning rate of $l_2$-coefficient regularized classification with strong loss, Acta Math. Sin. English Ser., 29 (2013), 2397-2408.  doi: 10.1007/s10114-013-0175-y.

[63]

B. H. ShengP. X. Ye and J. L. Wang, Learning rates for least square regressions with coefficient regularization, Acta Math. Sin., 28 (2012), 2205-2212.  doi: 10.1007/s10114-012-0607-0.

[64]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.

[65]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their applications, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[66]

B. H. Sheng and H. C. Zhu, The convergence rate of semi-supervised regression with quadratic loss, Appl. Math. Comput., 321 (2018), 11-24.  doi: 10.1016/j.amc.2017.10.033.

[67]

B. K. SriperumbudurK. Fukumizu and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 9 (2010), 773-780. 

[68]

B. K. SriperumbudurA. GrettonK. FukumizuB. Schölkopf and G. R. G. Lanckriet, Hilbert space embeddings and metrics on probability measures, J. Mach. Learn. Res., 11 (2010), 1517-1561. 

[69]

I. Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res., 4 (2003), 1071-1105.  doi: 10.1162/1532443041827925.

[70]

I. Steinwart and A. Christmann, How SVMs can estimate quantiles and the median, Adv. Neural Inform. Process. Syst., 20 (2008), 305-312. 

[71]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.

[72]

I. Steinwart and A. Christmann, Support Vector Machines, Springer-Verlag, New York, 2008.

[73]

I. Steinwart, D. Hush and C. Scovel, Optimal rates for regularized least square regression, in Proceedings of the 22nd Annual Conference on Learning Theory, (2009), 79–93.

[74]

H. W. Sun and Q. Wu, Application of integral operator for regularized least-square regression, Math. Comput. Model., 49 (2009), 276-285.  doi: 10.1016/j.mcm.2008.08.005.

[75]

H. Z. TongD. R. Chen and L. Z. Peng, Analysis of support vector machines regression, Found. Comput. Math., 9 (2009), 243-257.  doi: 10.1007/s10208-008-9026-0.

[76]

H. Z. TongD. R. Chen and F. H. Yang, Fast learning rates for regularized regression algorithms (in Chinese), Sci. Sin. Math., 42 (2012), 1251-1262. 

[77]

V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[78]

C. Wang and J. Cai, Convergence analysis of coefficient-based regularization under moment incremental condition, Int. J. Wavelets Multiresolut. Inform. Process., 12 (2014), Art. 1450008, 19 pp. doi: 10.1142/S0219691314500088.

[79]

S. H. Wang, Z. L. Chen and B. H. Sheng, Convergence rate of SVM for kernel-based robust regression, Int. J. Wavelets Multiresolut. Inform. Process., 17 (2019), Art. 1950004, 21 pp. doi: 10.1142/S0219691319500048.

[80]

C. Wang and Z. C. Guo, ERM learning with unbounded sampling, Acta Math. Sin. English Ser., 28 (2012), 97-104.  doi: 10.1007/s10114-012-9739-5.

[81]

X. Z. WangY. X. Zhao and F. Pourpanah, Recent advances in deep learning, Int. J. Math. Learn. Cyber., 11 (2020), 747-750. 

[82]

C. Wang and D. X. Zhou, Optimal learning rates for least squares regularized regression with unbounded sampling, J. Complexity, 27 (2011), 55-67.  doi: 10.1016/j.jco.2010.10.002.

[83]

Q. WuY. M. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.

[84]

D. H. Xiang, Conditional quantiles with varying Gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.

[85]

D. H. Xiang, ERM scheme for quantile regression, Abstr. Appl. Anal., (2013), Art. 148490, 6 pp. doi: 10.1155/2013/148490.

[86]

T. Zhang, Statistical behavior and consistency of classificattion methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-134.  doi: 10.1214/aos/1079120130.

[87]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.

[88]

D. X. Zhou, Deep distributed convolutional neural networks: universality, Anal. Appl., 16 (2018), 895-919.  doi: 10.1142/S0219530518500124.

[89]

D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmonic Anal., 48 (2020), 787-794.  doi: 10.1016/j.acha.2019.06.004.

[90]

D. X. Zhou and K. Jetter, Approximation with polynomial kernels and SVM, Adv. Comput. Math., 25 (2006), 323-344.  doi: 10.1007/s10444-004-7206-2.

show all references

References:
[1] R. A. Adams and J. J. F. Fournier, Sobolev Space, 2$^nd$ edition, Academic Press, 2009. 
[2]

A. ArgyriouC. A. Micchelli and M. Pontil, On spectral learning, J. Mach. Learn. Res., 11 (2010), 935-953.  doi: 10.1093/protein/6.4.383.

[3]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.

[4]

A. BanerjeeS. MeruguI. S. Dhillon and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), 1705-1749. 

[5]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convex, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.

[6]

F. BauerS. Pereverzey and L. Rosasco, On regularization algorithms in learning theory, J. Complexity, 23 (2007), 52-72.  doi: 10.1016/j.jco.2006.07.001.

[7]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2010. doi: 10.1007/978-1-4419-9467-7.

[8]

A. Bietti and J. Mairal, Group invariance, stability to deformations, and complexity of deep convolutional representations, J. Mach. Learn. Res., 20 (2019), 1-49. 

[9]

G. Blanchard and N. Krämer, Optimal learning rates for kernel conjugate gradient regression, preprint, arXiv: 1009.5839.

[10]

G. Blanchard and N. Mücke, Optimal rates for regularized of statistical inverse learning problems, Found. Math. Comput., 18 (2018), 971-1013.  doi: 10.1007/s10208-017-9359-7.

[11]

B. Bohn and C. Rieger, A representer theorem for deep kernel learning, J. Mach. Learn. Res., 20 (2019), 1-32. 

[12]

J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problem, Springer-Verlag, New York, 2000. doi: 10.1007/978-1-4612-1394-9.

[13]

E. M. Bronshtein, $\varepsilon$-entropy of convex stes and functions, Siberian Math. J., 17 (1976), 393-398. 

[14]

A. Caponnetto and E. De Vito, Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., 7 (2007), 331-368.  doi: 10.1007/s10208-006-0196-8.

[15]

X. Y. ChangZ. B. XuB. Zou and H. Zhang, Generalization bounds of regularization algorithms derived simultaneously through hypothesis space complexity, algorithmic stability and data quality, Int. J. Wavelets Multiresolut. Inform. Process., 9 (2011), 549-570.  doi: 10.1142/S0219691311004213.

[16]

D. R. ChenQ. WuY. M. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175. 

[17]

A. Christmann and A. V. Messem, Bouligand derivatives and robustness of support vector machines for regression, J. Mach. Learn. Res., 9 (2008), 915-936. 

[18]

A. ChristmannA. V. Messem and I. Steinwart, On consistency and robustness properties of support vector machines for heavy-tailed distributions, Statist. Interface, 2 (2009), 311-327.  doi: 10.4310/SII.2009.v2.n3.a5.

[19]

A. Christmann and I. Steinwart, On robustness properties of convex risk minimization methods for pattern recognition, J. Mach. Learn. Res., 5 (2004), 1007-1034. 

[20]

A. Christmann and I. Steinwart, Consistency and robustness of kernel based regression, Bernoulli, 13 (2007), 799-819.  doi: 10.3150/07-BEJ5102.

[21]

A. Christmann and I. Steinwart, Consistency of kernel based quantile regression, Appl. Stoch. Model. Bus. and Industr., 24 (2008), 171-183.  doi: 10.1002/asmb.700.

[22]

A. ChristmannD. H. Xiang and D. X. Zhou, Total stability of kernel methods, Neurocomputing, 289 (2018), 101-118. 

[23]

A. Christmann and D. X. Zhou, Learning rates for the risk of kernel-based quantile regression estimators in additive models, Anal. Appl., 14 (2016), 449-477.  doi: 10.1142/S0219530515500050.

[24]

A. Christmann and D. X. Zhou, On the robustness of regularized pairwise learning methods based on kernels, J. Complexity, 37 (2016), 1-33.  doi: 10.1016/j.jco.2016.07.001.

[25]

F. Cucker and S. Smale, On the mathematical foundations of learning theory, Bull. Amer. Math. Soc., 39 (2001), 1-49.  doi: 10.1090/S0273-0979-01-00923-5.

[26]

F. Cucker and S. Smale, Best choices for regularized parameters in learning theory: on the bias-variance problem, Found. Comput. Math., 2 (2002), 413-428.  doi: 10.1007/s102080010030.

[27] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, 2007.  doi: 10.1017/CBO9780511618796.
[28]

M. DebruyneA. ChristmannM. Huber and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.

[29]

E. De VitoA. Caponnetto and L. Rosasco, Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., 5 (2005), 59-85.  doi: 10.1007/s10208-004-0134-1.

[30]

E. De VitoL. RosascoA. CaponnettoM. Piana and A. Verri, Some properties of regularized kernel methods, J. Mach. Learn. Res., 5 (2004), 1363-1390. 

[31]

E. De VitoL. Rosasco and A. Toigo, Learning sets with separating kernels, Appl. Comput. Harmon. Anal., 37 (2014), 185-217.  doi: 10.1016/j.acha.2013.11.003.

[32]

F. Dumpert, Universal consistency and robustness of localized support vector machines, Neurocomputting, 315 (2018), 96-106. 

[33]

M. Ebets and I. Steinwart, Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Statist., 6 (2012), 2627-2668.  doi: 10.1214/12-EJS760.

[34]

T. EvgeniouM. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math., 13 (2000), 1-50.  doi: 10.1023/A:1018946025316.

[35]

M. Farooq and I. Steinwart, Learning rates for kernel-based expectile regression, Mach. Learn., 108 (2019), 203-227.  doi: 10.1007/s10994-018-5762-9.

[36]

S. Fischer and I. Steinwart, Sobolev norm learning rates for regularized least-squares algorithm, preprint, arXiv: 1702.07254, Fachbereich Mathematik Fakultat Mathematik und Physik, Universitat Stuttgart, Pfaffenwaldring, 57, D-70,569, Stuttgart.

[37]

A. Guntuboyina and B. Sen, Covering numbers for convex functions, IEEE Trans. Inform. Theory, 59 (2013), 1957-1965.  doi: 10.1109/TIT.2012.2235172.

[38]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Problem, 34 (2018), Art. 065009, 29 pp. doi: 10.1088/1361-6420/aabe55.

[39]

R. Hable and A. Christmann, Qualitative robustness of support vector machines, J. Multi. Anal., 102 (2011), 993-1007.  doi: 10.1016/j.jmva.2011.01.009.

[40]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Veralg, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.

[41]

J. B. Hiriart-urruty and C.Lemaréchal, Fundamentals of Convex Analysis, Springer-Verlag, 2004.

[42]

J. H. LinA. RudiL. Rosasco and V. Cevher, Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces, Appl. Comput. Harmon. Anal., 48 (2020), 868-890.  doi: 10.1016/j.acha.2018.09.009.

[43]

H. X. LiuB. H. Sheng and P. X. Ye, The improved learning rate for regularized regression with RKBSs, Int. J. Math. Learn. Cyber., 8 (2017), 1235-1245. 

[44]

Y. X. Ma and H. W. Sun, Application of integral operator for vector-valued regression learning, Int. J. Wavelets Multiresolut. Inform. Process., 13 (2015), Art. 1550047, 16 pp. doi: 10.1142/S0219691315500472.

[45]

A. V. Messem and A. Christmann, A review on consistency and robustness properties of support vector machines for heavy-tailed distributions, Adv. Data Anal. Classif., 4 (2010), 199-220.  doi: 10.1007/s11634-010-0067-2.

[46]

C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125. 

[47]

C. A. MicchelliY. S. Xu and H. Z. Zhang, Universal kernels, J. Mach. Learn. Res., 7 (2006), 2651-2667. 

[48]

K. MuandetK. FukumizuB. K. Sriperumbudur and B. Schölkopf, Kernel mean embedding of distributions: a review and beyonds, Found. Trends Mach. Learn., 10 (2017), 1-14.  doi: 10.1561/2200000060.

[49]

Y. Nishiyama, Characteristic kernels and infinitely divisible distributions, J. Mach. Learn. Res., 17 (2016), 1-28. 

[50]

J. Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, Springer-Verlag, 2015. doi: 10.1007/978-3-319-13710-0.

[51]

Z. W. QinF. X. YuC. C. Liu and X. Chen, How convolutional neural networks see the world: a survey of convolution neural network visualization methods, Math. Found. Comput., 1 (2018), 149-180. 

[52]

M. M. Rao and Z. D. Ren, Applications of Orlicz Spaces, Marcel Dekker. Inc., New York, 2002. doi: 10.1201/9780203910863.

[53]

L. Rosasco and M. Belkin, On learning with integral operators, J. Mach. Learn. Res., 11 (2010), 905-934. 

[54] Z. Sha and H. J. Ruan, Fractal and Fitting (in Chinese), Zhejiang University Press, Hangzhou, 2005. 
[55]

B. H. Sheng, On approximation by reproducing kernel spaces in weighted $L^p-$spaces, J. Syst. Sci. Complex., 20 (2007), 623-638.  doi: 10.1007/s11424-007-9061-y.

[56]

B. H. ShengL. Q. Duan and P. X. Ye, Strong convex loss can increase the learning rates of online learning, J. Computer, 9 (2014), 1606-1611. 

[57]

B. H. Sheng and H. T. Li, On approximation by spherical zonal translation networks based on Bochner Riesz means, J. Syst. Sci. Complex., 18 (2005), 361-374. 

[58]

B. H. Sheng and J. L. Wang, On the $K$-functional in learning theory, Anal. Appl., 18 (2020), 423-446.  doi: 10.1142/S0219530519500192.

[59]

B. H. ShengJ. L. Wang and D. H. Xiang, Error analysis on Hérmite learning with gradient data, Chin. Ann. Math. Ser. B, 39 (2018), 705-720.  doi: 10.1007/s11401-018-0091-7.

[60]

B. H. ShengJ. L. Wang and S. P. Zhou, A way of constructing spherical zonal translation network operators with linear bounded operators, Taiwan. J. Math., 12 (2008), 77-92.  doi: 10.11650/twjm/1500602490.

[61]

B. H. Sheng and D. H. Xiang, The convergence rate for a K-functional in learning theory, J. Inequal. Appl., (2010), Art. 249507, 18 pp. doi: 10.1155/2010/249507..

[62]

B. H. Sheng and D. H. Xiang, The learning rate of $l_2$-coefficient regularized classification with strong loss, Acta Math. Sin. English Ser., 29 (2013), 2397-2408.  doi: 10.1007/s10114-013-0175-y.

[63]

B. H. ShengP. X. Ye and J. L. Wang, Learning rates for least square regressions with coefficient regularization, Acta Math. Sin., 28 (2012), 2205-2212.  doi: 10.1007/s10114-012-0607-0.

[64]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.

[65]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their applications, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.

[66]

B. H. Sheng and H. C. Zhu, The convergence rate of semi-supervised regression with quadratic loss, Appl. Math. Comput., 321 (2018), 11-24.  doi: 10.1016/j.amc.2017.10.033.

[67]

B. K. SriperumbudurK. Fukumizu and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 9 (2010), 773-780. 

[68]

B. K. SriperumbudurA. GrettonK. FukumizuB. Schölkopf and G. R. G. Lanckriet, Hilbert space embeddings and metrics on probability measures, J. Mach. Learn. Res., 11 (2010), 1517-1561. 

[69]

I. Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res., 4 (2003), 1071-1105.  doi: 10.1162/1532443041827925.

[70]

I. Steinwart and A. Christmann, How SVMs can estimate quantiles and the median, Adv. Neural Inform. Process. Syst., 20 (2008), 305-312. 

[71]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.

[72]

I. Steinwart and A. Christmann, Support Vector Machines, Springer-Verlag, New York, 2008.

[73]

I. Steinwart, D. Hush and C. Scovel, Optimal rates for regularized least square regression, in Proceedings of the 22nd Annual Conference on Learning Theory, (2009), 79–93.

[74]

H. W. Sun and Q. Wu, Application of integral operator for regularized least-square regression, Math. Comput. Model., 49 (2009), 276-285.  doi: 10.1016/j.mcm.2008.08.005.

[75]

H. Z. TongD. R. Chen and L. Z. Peng, Analysis of support vector machines regression, Found. Comput. Math., 9 (2009), 243-257.  doi: 10.1007/s10208-008-9026-0.

[76]

H. Z. TongD. R. Chen and F. H. Yang, Fast learning rates for regularized regression algorithms (in Chinese), Sci. Sin. Math., 42 (2012), 1251-1262. 

[77]

V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[78]

C. Wang and J. Cai, Convergence analysis of coefficient-based regularization under moment incremental condition, Int. J. Wavelets Multiresolut. Inform. Process., 12 (2014), Art. 1450008, 19 pp. doi: 10.1142/S0219691314500088.

[79]

S. H. Wang, Z. L. Chen and B. H. Sheng, Convergence rate of SVM for kernel-based robust regression, Int. J. Wavelets Multiresolut. Inform. Process., 17 (2019), Art. 1950004, 21 pp. doi: 10.1142/S0219691319500048.

[80]

C. Wang and Z. C. Guo, ERM learning with unbounded sampling, Acta Math. Sin. English Ser., 28 (2012), 97-104.  doi: 10.1007/s10114-012-9739-5.

[81]

X. Z. WangY. X. Zhao and F. Pourpanah, Recent advances in deep learning, Int. J. Math. Learn. Cyber., 11 (2020), 747-750. 

[82]

C. Wang and D. X. Zhou, Optimal learning rates for least squares regularized regression with unbounded sampling, J. Complexity, 27 (2011), 55-67.  doi: 10.1016/j.jco.2010.10.002.

[83]

Q. WuY. M. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.

[84]

D. H. Xiang, Conditional quantiles with varying Gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.

[85]

D. H. Xiang, ERM scheme for quantile regression, Abstr. Appl. Anal., (2013), Art. 148490, 6 pp. doi: 10.1155/2013/148490.

[86]

T. Zhang, Statistical behavior and consistency of classificattion methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-134.  doi: 10.1214/aos/1079120130.

[87]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.

[88]

D. X. Zhou, Deep distributed convolutional neural networks: universality, Anal. Appl., 16 (2018), 895-919.  doi: 10.1142/S0219530518500124.

[89]

D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmonic Anal., 48 (2020), 787-794.  doi: 10.1016/j.acha.2019.06.004.

[90]

D. X. Zhou and K. Jetter, Approximation with polynomial kernels and SVM, Adv. Comput. Math., 25 (2006), 323-344.  doi: 10.1007/s10444-004-7206-2.

[1]

Shuhua Wang, Baohuai Sheng. Error analysis of kernel regularized pairwise learning with a strongly convex loss. Mathematical Foundations of Computing, 2022  doi: 10.3934/mfc.2022030

[2]

Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020

[3]

Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

[4]

Kaitlyn (Voccola) Muller. A reproducing kernel Hilbert space framework for inverse scattering problems within the Born approximation. Inverse Problems and Imaging, 2019, 13 (6) : 1327-1348. doi: 10.3934/ipi.2019058

[5]

Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055

[6]

Ali Akgül. A new application of the reproducing kernel method. Discrete and Continuous Dynamical Systems - S, 2021, 14 (7) : 2041-2053. doi: 10.3934/dcdss.2020261

[7]

Liam Burrows, Weihong Guo, Ke Chen, Francesco Torella. Reproducible kernel Hilbert space based global and local image segmentation. Inverse Problems and Imaging, 2021, 15 (1) : 1-25. doi: 10.3934/ipi.2020048

[8]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[9]

Bingzheng Li, Zhengzhan Dai. Error analysis on regularized regression based on the Maximum correntropy criterion. Mathematical Foundations of Computing, 2020, 3 (1) : 25-40. doi: 10.3934/mfc.2020003

[10]

Xuqing He, Hongwei Sun. Error analysis of classification learning algorithms based on LUMs loss. Mathematical Foundations of Computing, 2022  doi: 10.3934/mfc.2022028

[11]

Fabrizio Colombo, Davide Guidetti. Identification of the memory kernel in the strongly damped wave equation by a flux condition. Communications on Pure and Applied Analysis, 2009, 8 (2) : 601-620. doi: 10.3934/cpaa.2009.8.601

[12]

Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186

[13]

Lucian Coroianu, Danilo Costarelli, Sorin G. Gal, Gianluca Vinti. Approximation by multivariate max-product Kantorovich-type operators and learning rates of least-squares regularized regression. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4213-4225. doi: 10.3934/cpaa.2020189

[14]

Steven G. Krantz and Marco M. Peloso. New results on the Bergman kernel of the worm domain in complex space. Electronic Research Announcements, 2007, 14: 35-41. doi: 10.3934/era.2007.14.35

[15]

Tao Wang. Variational relations for metric mean dimension and rate distortion dimension. Discrete and Continuous Dynamical Systems, 2021, 41 (10) : 4593-4608. doi: 10.3934/dcds.2021050

[16]

Ming Huang, Cong Cheng, Yang Li, Zun Quan Xia. The space decomposition method for the sum of nonlinear convex maximum eigenvalues and its applications. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1885-1905. doi: 10.3934/jimo.2019034

[17]

Zhenhuan Yang, Wei Shen, Yiming Ying, Xiaoming Yuan. Stochastic AUC optimization with general loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4191-4212. doi: 10.3934/cpaa.2020188

[18]

Anatoly Neishtadt. On stability loss delay for dynamical bifurcations. Discrete and Continuous Dynamical Systems - S, 2009, 2 (4) : 897-909. doi: 10.3934/dcdss.2009.2.897

[19]

Qinian Jin, YanYan Li. Starshaped compact hypersurfaces with prescribed $k$-th mean curvature in hyperbolic space. Discrete and Continuous Dynamical Systems, 2006, 15 (2) : 367-377. doi: 10.3934/dcds.2006.15.367

[20]

Marjan Uddin, Hazrat Ali. Space-time kernel based numerical method for generalized Black-Scholes equation. Discrete and Continuous Dynamical Systems - S, 2020, 13 (10) : 2905-2915. doi: 10.3934/dcdss.2020221

2021 Impact Factor: 1.273

Metrics

  • PDF downloads (299)
  • HTML views (114)
  • Cited by (0)

Other articles
by authors

[Back to Top]