August  2020, 19(8): 3973-4005. doi: 10.3934/cpaa.2020176

Learning rates for the kernel regularized regression with a differentiable strongly convex loss

Department of Applied Statistics, Shaoxing University, Shaoxing, 312000, China

* Corresponding authors

Received  June 2019 Revised  March 2020 Published  May 2020

Fund Project: This work is supported by the National Natural Science Foundation of China under Grants (No. 61877039, 11501375) and the Natural Science Foundation of Zhejiang Province under Grant (No. LQ14A010005)

We consider learning rates of kernel regularized regression (KRR) based on reproducing kernel Hilbert spaces (RKHSs) and differentiable strongly convex losses and provide some new strongly convex losses. We first show the robustness with the maximum mean discrepancy (MMD) and the Hutchinson metric respectively, and, along this line, bound the learning rate of the KRR. We first provide a capacity dependent learning rate and then give the learning rates for four concrete strongly convex losses respectively. In particular, we provide the learning rates when the hypothesis RKHS's logarithmic complexity exponent is arbitrarily small as well as sufficiently large.

Citation: Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176
References:
[1] R. A. Adams and J. J. F. Fournier, Sobolev Space, 2$^nd$ edition, Academic Press, 2009.   Google Scholar
[2]

A. ArgyriouC. A. Micchelli and M. Pontil, On spectral learning, J. Mach. Learn. Res., 11 (2010), 935-953.  doi: 10.1093/protein/6.4.383.  Google Scholar

[3]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[4]

A. BanerjeeS. MeruguI. S. Dhillon and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), 1705-1749.   Google Scholar

[5]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convex, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.  Google Scholar

[6]

F. BauerS. Pereverzey and L. Rosasco, On regularization algorithms in learning theory, J. Complexity, 23 (2007), 52-72.  doi: 10.1016/j.jco.2006.07.001.  Google Scholar

[7]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2010. doi: 10.1007/978-1-4419-9467-7.  Google Scholar

[8]

A. Bietti and J. Mairal, Group invariance, stability to deformations, and complexity of deep convolutional representations, J. Mach. Learn. Res., 20 (2019), 1-49.   Google Scholar

[9]

G. Blanchard and N. Krämer, Optimal learning rates for kernel conjugate gradient regression, preprint, arXiv: 1009.5839. Google Scholar

[10]

G. Blanchard and N. Mücke, Optimal rates for regularized of statistical inverse learning problems, Found. Math. Comput., 18 (2018), 971-1013.  doi: 10.1007/s10208-017-9359-7.  Google Scholar

[11]

B. Bohn and C. Rieger, A representer theorem for deep kernel learning, J. Mach. Learn. Res., 20 (2019), 1-32.   Google Scholar

[12]

J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problem, Springer-Verlag, New York, 2000. doi: 10.1007/978-1-4612-1394-9.  Google Scholar

[13]

E. M. Bronshtein, $\varepsilon$-entropy of convex stes and functions, Siberian Math. J., 17 (1976), 393-398.   Google Scholar

[14]

A. Caponnetto and E. De Vito, Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., 7 (2007), 331-368.  doi: 10.1007/s10208-006-0196-8.  Google Scholar

[15]

X. Y. ChangZ. B. XuB. Zou and H. Zhang, Generalization bounds of regularization algorithms derived simultaneously through hypothesis space complexity, algorithmic stability and data quality, Int. J. Wavelets Multiresolut. Inform. Process., 9 (2011), 549-570.  doi: 10.1142/S0219691311004213.  Google Scholar

[16]

D. R. ChenQ. WuY. M. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[17]

A. Christmann and A. V. Messem, Bouligand derivatives and robustness of support vector machines for regression, J. Mach. Learn. Res., 9 (2008), 915-936.   Google Scholar

[18]

A. ChristmannA. V. Messem and I. Steinwart, On consistency and robustness properties of support vector machines for heavy-tailed distributions, Statist. Interface, 2 (2009), 311-327.  doi: 10.4310/SII.2009.v2.n3.a5.  Google Scholar

[19]

A. Christmann and I. Steinwart, On robustness properties of convex risk minimization methods for pattern recognition, J. Mach. Learn. Res., 5 (2004), 1007-1034.   Google Scholar

[20]

A. Christmann and I. Steinwart, Consistency and robustness of kernel based regression, Bernoulli, 13 (2007), 799-819.  doi: 10.3150/07-BEJ5102.  Google Scholar

[21]

A. Christmann and I. Steinwart, Consistency of kernel based quantile regression, Appl. Stoch. Model. Bus. and Industr., 24 (2008), 171-183.  doi: 10.1002/asmb.700.  Google Scholar

[22]

A. ChristmannD. H. Xiang and D. X. Zhou, Total stability of kernel methods, Neurocomputing, 289 (2018), 101-118.   Google Scholar

[23]

A. Christmann and D. X. Zhou, Learning rates for the risk of kernel-based quantile regression estimators in additive models, Anal. Appl., 14 (2016), 449-477.  doi: 10.1142/S0219530515500050.  Google Scholar

[24]

A. Christmann and D. X. Zhou, On the robustness of regularized pairwise learning methods based on kernels, J. Complexity, 37 (2016), 1-33.  doi: 10.1016/j.jco.2016.07.001.  Google Scholar

[25]

F. Cucker and S. Smale, On the mathematical foundations of learning theory, Bull. Amer. Math. Soc., 39 (2001), 1-49.  doi: 10.1090/S0273-0979-01-00923-5.  Google Scholar

[26]

F. Cucker and S. Smale, Best choices for regularized parameters in learning theory: on the bias-variance problem, Found. Comput. Math., 2 (2002), 413-428.  doi: 10.1007/s102080010030.  Google Scholar

[27] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[28]

M. DebruyneA. ChristmannM. Huber and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[29]

E. De VitoA. Caponnetto and L. Rosasco, Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., 5 (2005), 59-85.  doi: 10.1007/s10208-004-0134-1.  Google Scholar

[30]

E. De VitoL. RosascoA. CaponnettoM. Piana and A. Verri, Some properties of regularized kernel methods, J. Mach. Learn. Res., 5 (2004), 1363-1390.   Google Scholar

[31]

E. De VitoL. Rosasco and A. Toigo, Learning sets with separating kernels, Appl. Comput. Harmon. Anal., 37 (2014), 185-217.  doi: 10.1016/j.acha.2013.11.003.  Google Scholar

[32]

F. Dumpert, Universal consistency and robustness of localized support vector machines, Neurocomputting, 315 (2018), 96-106.   Google Scholar

[33]

M. Ebets and I. Steinwart, Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Statist., 6 (2012), 2627-2668.  doi: 10.1214/12-EJS760.  Google Scholar

[34]

T. EvgeniouM. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math., 13 (2000), 1-50.  doi: 10.1023/A:1018946025316.  Google Scholar

[35]

M. Farooq and I. Steinwart, Learning rates for kernel-based expectile regression, Mach. Learn., 108 (2019), 203-227.  doi: 10.1007/s10994-018-5762-9.  Google Scholar

[36]

S. Fischer and I. Steinwart, Sobolev norm learning rates for regularized least-squares algorithm, preprint, arXiv: 1702.07254, Fachbereich Mathematik Fakultat Mathematik und Physik, Universitat Stuttgart, Pfaffenwaldring, 57, D-70,569, Stuttgart. Google Scholar

[37]

A. Guntuboyina and B. Sen, Covering numbers for convex functions, IEEE Trans. Inform. Theory, 59 (2013), 1957-1965.  doi: 10.1109/TIT.2012.2235172.  Google Scholar

[38]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Problem, 34 (2018), Art. 065009, 29 pp. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[39]

R. Hable and A. Christmann, Qualitative robustness of support vector machines, J. Multi. Anal., 102 (2011), 993-1007.  doi: 10.1016/j.jmva.2011.01.009.  Google Scholar

[40]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Veralg, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[41]

J. B. Hiriart-urruty and C.Lemaréchal, Fundamentals of Convex Analysis, Springer-Verlag, 2004. Google Scholar

[42]

J. H. LinA. RudiL. Rosasco and V. Cevher, Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces, Appl. Comput. Harmon. Anal., 48 (2020), 868-890.  doi: 10.1016/j.acha.2018.09.009.  Google Scholar

[43]

H. X. LiuB. H. Sheng and P. X. Ye, The improved learning rate for regularized regression with RKBSs, Int. J. Math. Learn. Cyber., 8 (2017), 1235-1245.   Google Scholar

[44]

Y. X. Ma and H. W. Sun, Application of integral operator for vector-valued regression learning, Int. J. Wavelets Multiresolut. Inform. Process., 13 (2015), Art. 1550047, 16 pp. doi: 10.1142/S0219691315500472.  Google Scholar

[45]

A. V. Messem and A. Christmann, A review on consistency and robustness properties of support vector machines for heavy-tailed distributions, Adv. Data Anal. Classif., 4 (2010), 199-220.  doi: 10.1007/s11634-010-0067-2.  Google Scholar

[46]

C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125.   Google Scholar

[47]

C. A. MicchelliY. S. Xu and H. Z. Zhang, Universal kernels, J. Mach. Learn. Res., 7 (2006), 2651-2667.   Google Scholar

[48]

K. MuandetK. FukumizuB. K. Sriperumbudur and B. Schölkopf, Kernel mean embedding of distributions: a review and beyonds, Found. Trends Mach. Learn., 10 (2017), 1-14.  doi: 10.1561/2200000060.  Google Scholar

[49]

Y. Nishiyama, Characteristic kernels and infinitely divisible distributions, J. Mach. Learn. Res., 17 (2016), 1-28.   Google Scholar

[50]

J. Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, Springer-Verlag, 2015. doi: 10.1007/978-3-319-13710-0.  Google Scholar

[51]

Z. W. QinF. X. YuC. C. Liu and X. Chen, How convolutional neural networks see the world: a survey of convolution neural network visualization methods, Math. Found. Comput., 1 (2018), 149-180.   Google Scholar

[52]

M. M. Rao and Z. D. Ren, Applications of Orlicz Spaces, Marcel Dekker. Inc., New York, 2002. doi: 10.1201/9780203910863.  Google Scholar

[53]

L. Rosasco and M. Belkin, On learning with integral operators, J. Mach. Learn. Res., 11 (2010), 905-934.   Google Scholar

[54] Z. Sha and H. J. Ruan, Fractal and Fitting (in Chinese), Zhejiang University Press, Hangzhou, 2005.   Google Scholar
[55]

B. H. Sheng, On approximation by reproducing kernel spaces in weighted $L^p-$spaces, J. Syst. Sci. Complex., 20 (2007), 623-638.  doi: 10.1007/s11424-007-9061-y.  Google Scholar

[56]

B. H. ShengL. Q. Duan and P. X. Ye, Strong convex loss can increase the learning rates of online learning, J. Computer, 9 (2014), 1606-1611.   Google Scholar

[57]

B. H. Sheng and H. T. Li, On approximation by spherical zonal translation networks based on Bochner Riesz means, J. Syst. Sci. Complex., 18 (2005), 361-374.   Google Scholar

[58]

B. H. Sheng and J. L. Wang, On the $K$-functional in learning theory, Anal. Appl., 18 (2020), 423-446.  doi: 10.1142/S0219530519500192.  Google Scholar

[59]

B. H. ShengJ. L. Wang and D. H. Xiang, Error analysis on Hérmite learning with gradient data, Chin. Ann. Math. Ser. B, 39 (2018), 705-720.  doi: 10.1007/s11401-018-0091-7.  Google Scholar

[60]

B. H. ShengJ. L. Wang and S. P. Zhou, A way of constructing spherical zonal translation network operators with linear bounded operators, Taiwan. J. Math., 12 (2008), 77-92.  doi: 10.11650/twjm/1500602490.  Google Scholar

[61]

B. H. Sheng and D. H. Xiang, The convergence rate for a K-functional in learning theory, J. Inequal. Appl., (2010), Art. 249507, 18 pp. doi: 10.1155/2010/249507..  Google Scholar

[62]

B. H. Sheng and D. H. Xiang, The learning rate of $l_2$-coefficient regularized classification with strong loss, Acta Math. Sin. English Ser., 29 (2013), 2397-2408.  doi: 10.1007/s10114-013-0175-y.  Google Scholar

[63]

B. H. ShengP. X. Ye and J. L. Wang, Learning rates for least square regressions with coefficient regularization, Acta Math. Sin., 28 (2012), 2205-2212.  doi: 10.1007/s10114-012-0607-0.  Google Scholar

[64]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[65]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their applications, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[66]

B. H. Sheng and H. C. Zhu, The convergence rate of semi-supervised regression with quadratic loss, Appl. Math. Comput., 321 (2018), 11-24.  doi: 10.1016/j.amc.2017.10.033.  Google Scholar

[67]

B. K. SriperumbudurK. Fukumizu and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 9 (2010), 773-780.   Google Scholar

[68]

B. K. SriperumbudurA. GrettonK. FukumizuB. Schölkopf and G. R. G. Lanckriet, Hilbert space embeddings and metrics on probability measures, J. Mach. Learn. Res., 11 (2010), 1517-1561.   Google Scholar

[69]

I. Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res., 4 (2003), 1071-1105.  doi: 10.1162/1532443041827925.  Google Scholar

[70]

I. Steinwart and A. Christmann, How SVMs can estimate quantiles and the median, Adv. Neural Inform. Process. Syst., 20 (2008), 305-312.   Google Scholar

[71]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.  Google Scholar

[72]

I. Steinwart and A. Christmann, Support Vector Machines, Springer-Verlag, New York, 2008.  Google Scholar

[73]

I. Steinwart, D. Hush and C. Scovel, Optimal rates for regularized least square regression, in Proceedings of the 22nd Annual Conference on Learning Theory, (2009), 79–93. Google Scholar

[74]

H. W. Sun and Q. Wu, Application of integral operator for regularized least-square regression, Math. Comput. Model., 49 (2009), 276-285.  doi: 10.1016/j.mcm.2008.08.005.  Google Scholar

[75]

H. Z. TongD. R. Chen and L. Z. Peng, Analysis of support vector machines regression, Found. Comput. Math., 9 (2009), 243-257.  doi: 10.1007/s10208-008-9026-0.  Google Scholar

[76]

H. Z. TongD. R. Chen and F. H. Yang, Fast learning rates for regularized regression algorithms (in Chinese), Sci. Sin. Math., 42 (2012), 1251-1262.   Google Scholar

[77]

V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.  Google Scholar

[78]

C. Wang and J. Cai, Convergence analysis of coefficient-based regularization under moment incremental condition, Int. J. Wavelets Multiresolut. Inform. Process., 12 (2014), Art. 1450008, 19 pp. doi: 10.1142/S0219691314500088.  Google Scholar

[79]

S. H. Wang, Z. L. Chen and B. H. Sheng, Convergence rate of SVM for kernel-based robust regression, Int. J. Wavelets Multiresolut. Inform. Process., 17 (2019), Art. 1950004, 21 pp. doi: 10.1142/S0219691319500048.  Google Scholar

[80]

C. Wang and Z. C. Guo, ERM learning with unbounded sampling, Acta Math. Sin. English Ser., 28 (2012), 97-104.  doi: 10.1007/s10114-012-9739-5.  Google Scholar

[81]

X. Z. WangY. X. Zhao and F. Pourpanah, Recent advances in deep learning, Int. J. Math. Learn. Cyber., 11 (2020), 747-750.   Google Scholar

[82]

C. Wang and D. X. Zhou, Optimal learning rates for least squares regularized regression with unbounded sampling, J. Complexity, 27 (2011), 55-67.  doi: 10.1016/j.jco.2010.10.002.  Google Scholar

[83]

Q. WuY. M. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.  Google Scholar

[84]

D. H. Xiang, Conditional quantiles with varying Gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.  Google Scholar

[85]

D. H. Xiang, ERM scheme for quantile regression, Abstr. Appl. Anal., (2013), Art. 148490, 6 pp. doi: 10.1155/2013/148490.  Google Scholar

[86]

T. Zhang, Statistical behavior and consistency of classificattion methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-134.  doi: 10.1214/aos/1079120130.  Google Scholar

[87]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.  Google Scholar

[88]

D. X. Zhou, Deep distributed convolutional neural networks: universality, Anal. Appl., 16 (2018), 895-919.  doi: 10.1142/S0219530518500124.  Google Scholar

[89]

D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmonic Anal., 48 (2020), 787-794.  doi: 10.1016/j.acha.2019.06.004.  Google Scholar

[90]

D. X. Zhou and K. Jetter, Approximation with polynomial kernels and SVM, Adv. Comput. Math., 25 (2006), 323-344.  doi: 10.1007/s10444-004-7206-2.  Google Scholar

show all references

References:
[1] R. A. Adams and J. J. F. Fournier, Sobolev Space, 2$^nd$ edition, Academic Press, 2009.   Google Scholar
[2]

A. ArgyriouC. A. Micchelli and M. Pontil, On spectral learning, J. Mach. Learn. Res., 11 (2010), 935-953.  doi: 10.1093/protein/6.4.383.  Google Scholar

[3]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[4]

A. BanerjeeS. MeruguI. S. Dhillon and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), 1705-1749.   Google Scholar

[5]

P. L. BartlettM. I. Jordan and J. D. McAuliffe, Convex, classification, and risk bounds, J. Amer. Statist. Assoc., 101 (2006), 138-156.  doi: 10.1198/016214505000000907.  Google Scholar

[6]

F. BauerS. Pereverzey and L. Rosasco, On regularization algorithms in learning theory, J. Complexity, 23 (2007), 52-72.  doi: 10.1016/j.jco.2006.07.001.  Google Scholar

[7]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2010. doi: 10.1007/978-1-4419-9467-7.  Google Scholar

[8]

A. Bietti and J. Mairal, Group invariance, stability to deformations, and complexity of deep convolutional representations, J. Mach. Learn. Res., 20 (2019), 1-49.   Google Scholar

[9]

G. Blanchard and N. Krämer, Optimal learning rates for kernel conjugate gradient regression, preprint, arXiv: 1009.5839. Google Scholar

[10]

G. Blanchard and N. Mücke, Optimal rates for regularized of statistical inverse learning problems, Found. Math. Comput., 18 (2018), 971-1013.  doi: 10.1007/s10208-017-9359-7.  Google Scholar

[11]

B. Bohn and C. Rieger, A representer theorem for deep kernel learning, J. Mach. Learn. Res., 20 (2019), 1-32.   Google Scholar

[12]

J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problem, Springer-Verlag, New York, 2000. doi: 10.1007/978-1-4612-1394-9.  Google Scholar

[13]

E. M. Bronshtein, $\varepsilon$-entropy of convex stes and functions, Siberian Math. J., 17 (1976), 393-398.   Google Scholar

[14]

A. Caponnetto and E. De Vito, Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., 7 (2007), 331-368.  doi: 10.1007/s10208-006-0196-8.  Google Scholar

[15]

X. Y. ChangZ. B. XuB. Zou and H. Zhang, Generalization bounds of regularization algorithms derived simultaneously through hypothesis space complexity, algorithmic stability and data quality, Int. J. Wavelets Multiresolut. Inform. Process., 9 (2011), 549-570.  doi: 10.1142/S0219691311004213.  Google Scholar

[16]

D. R. ChenQ. WuY. M. Ying and D. X. Zhou, Support vector machine soft margin classifiers: error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[17]

A. Christmann and A. V. Messem, Bouligand derivatives and robustness of support vector machines for regression, J. Mach. Learn. Res., 9 (2008), 915-936.   Google Scholar

[18]

A. ChristmannA. V. Messem and I. Steinwart, On consistency and robustness properties of support vector machines for heavy-tailed distributions, Statist. Interface, 2 (2009), 311-327.  doi: 10.4310/SII.2009.v2.n3.a5.  Google Scholar

[19]

A. Christmann and I. Steinwart, On robustness properties of convex risk minimization methods for pattern recognition, J. Mach. Learn. Res., 5 (2004), 1007-1034.   Google Scholar

[20]

A. Christmann and I. Steinwart, Consistency and robustness of kernel based regression, Bernoulli, 13 (2007), 799-819.  doi: 10.3150/07-BEJ5102.  Google Scholar

[21]

A. Christmann and I. Steinwart, Consistency of kernel based quantile regression, Appl. Stoch. Model. Bus. and Industr., 24 (2008), 171-183.  doi: 10.1002/asmb.700.  Google Scholar

[22]

A. ChristmannD. H. Xiang and D. X. Zhou, Total stability of kernel methods, Neurocomputing, 289 (2018), 101-118.   Google Scholar

[23]

A. Christmann and D. X. Zhou, Learning rates for the risk of kernel-based quantile regression estimators in additive models, Anal. Appl., 14 (2016), 449-477.  doi: 10.1142/S0219530515500050.  Google Scholar

[24]

A. Christmann and D. X. Zhou, On the robustness of regularized pairwise learning methods based on kernels, J. Complexity, 37 (2016), 1-33.  doi: 10.1016/j.jco.2016.07.001.  Google Scholar

[25]

F. Cucker and S. Smale, On the mathematical foundations of learning theory, Bull. Amer. Math. Soc., 39 (2001), 1-49.  doi: 10.1090/S0273-0979-01-00923-5.  Google Scholar

[26]

F. Cucker and S. Smale, Best choices for regularized parameters in learning theory: on the bias-variance problem, Found. Comput. Math., 2 (2002), 413-428.  doi: 10.1007/s102080010030.  Google Scholar

[27] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[28]

M. DebruyneA. ChristmannM. Huber and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[29]

E. De VitoA. Caponnetto and L. Rosasco, Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., 5 (2005), 59-85.  doi: 10.1007/s10208-004-0134-1.  Google Scholar

[30]

E. De VitoL. RosascoA. CaponnettoM. Piana and A. Verri, Some properties of regularized kernel methods, J. Mach. Learn. Res., 5 (2004), 1363-1390.   Google Scholar

[31]

E. De VitoL. Rosasco and A. Toigo, Learning sets with separating kernels, Appl. Comput. Harmon. Anal., 37 (2014), 185-217.  doi: 10.1016/j.acha.2013.11.003.  Google Scholar

[32]

F. Dumpert, Universal consistency and robustness of localized support vector machines, Neurocomputting, 315 (2018), 96-106.   Google Scholar

[33]

M. Ebets and I. Steinwart, Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Statist., 6 (2012), 2627-2668.  doi: 10.1214/12-EJS760.  Google Scholar

[34]

T. EvgeniouM. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math., 13 (2000), 1-50.  doi: 10.1023/A:1018946025316.  Google Scholar

[35]

M. Farooq and I. Steinwart, Learning rates for kernel-based expectile regression, Mach. Learn., 108 (2019), 203-227.  doi: 10.1007/s10994-018-5762-9.  Google Scholar

[36]

S. Fischer and I. Steinwart, Sobolev norm learning rates for regularized least-squares algorithm, preprint, arXiv: 1702.07254, Fachbereich Mathematik Fakultat Mathematik und Physik, Universitat Stuttgart, Pfaffenwaldring, 57, D-70,569, Stuttgart. Google Scholar

[37]

A. Guntuboyina and B. Sen, Covering numbers for convex functions, IEEE Trans. Inform. Theory, 59 (2013), 1957-1965.  doi: 10.1109/TIT.2012.2235172.  Google Scholar

[38]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Problem, 34 (2018), Art. 065009, 29 pp. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[39]

R. Hable and A. Christmann, Qualitative robustness of support vector machines, J. Multi. Anal., 102 (2011), 993-1007.  doi: 10.1016/j.jmva.2011.01.009.  Google Scholar

[40]

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Veralg, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[41]

J. B. Hiriart-urruty and C.Lemaréchal, Fundamentals of Convex Analysis, Springer-Verlag, 2004. Google Scholar

[42]

J. H. LinA. RudiL. Rosasco and V. Cevher, Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces, Appl. Comput. Harmon. Anal., 48 (2020), 868-890.  doi: 10.1016/j.acha.2018.09.009.  Google Scholar

[43]

H. X. LiuB. H. Sheng and P. X. Ye, The improved learning rate for regularized regression with RKBSs, Int. J. Math. Learn. Cyber., 8 (2017), 1235-1245.   Google Scholar

[44]

Y. X. Ma and H. W. Sun, Application of integral operator for vector-valued regression learning, Int. J. Wavelets Multiresolut. Inform. Process., 13 (2015), Art. 1550047, 16 pp. doi: 10.1142/S0219691315500472.  Google Scholar

[45]

A. V. Messem and A. Christmann, A review on consistency and robustness properties of support vector machines for heavy-tailed distributions, Adv. Data Anal. Classif., 4 (2010), 199-220.  doi: 10.1007/s11634-010-0067-2.  Google Scholar

[46]

C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125.   Google Scholar

[47]

C. A. MicchelliY. S. Xu and H. Z. Zhang, Universal kernels, J. Mach. Learn. Res., 7 (2006), 2651-2667.   Google Scholar

[48]

K. MuandetK. FukumizuB. K. Sriperumbudur and B. Schölkopf, Kernel mean embedding of distributions: a review and beyonds, Found. Trends Mach. Learn., 10 (2017), 1-14.  doi: 10.1561/2200000060.  Google Scholar

[49]

Y. Nishiyama, Characteristic kernels and infinitely divisible distributions, J. Mach. Learn. Res., 17 (2016), 1-28.   Google Scholar

[50]

J. Peypouquet, Convex Optimization in Normed Spaces: Theory, Methods and Examples, Springer-Verlag, 2015. doi: 10.1007/978-3-319-13710-0.  Google Scholar

[51]

Z. W. QinF. X. YuC. C. Liu and X. Chen, How convolutional neural networks see the world: a survey of convolution neural network visualization methods, Math. Found. Comput., 1 (2018), 149-180.   Google Scholar

[52]

M. M. Rao and Z. D. Ren, Applications of Orlicz Spaces, Marcel Dekker. Inc., New York, 2002. doi: 10.1201/9780203910863.  Google Scholar

[53]

L. Rosasco and M. Belkin, On learning with integral operators, J. Mach. Learn. Res., 11 (2010), 905-934.   Google Scholar

[54] Z. Sha and H. J. Ruan, Fractal and Fitting (in Chinese), Zhejiang University Press, Hangzhou, 2005.   Google Scholar
[55]

B. H. Sheng, On approximation by reproducing kernel spaces in weighted $L^p-$spaces, J. Syst. Sci. Complex., 20 (2007), 623-638.  doi: 10.1007/s11424-007-9061-y.  Google Scholar

[56]

B. H. ShengL. Q. Duan and P. X. Ye, Strong convex loss can increase the learning rates of online learning, J. Computer, 9 (2014), 1606-1611.   Google Scholar

[57]

B. H. Sheng and H. T. Li, On approximation by spherical zonal translation networks based on Bochner Riesz means, J. Syst. Sci. Complex., 18 (2005), 361-374.   Google Scholar

[58]

B. H. Sheng and J. L. Wang, On the $K$-functional in learning theory, Anal. Appl., 18 (2020), 423-446.  doi: 10.1142/S0219530519500192.  Google Scholar

[59]

B. H. ShengJ. L. Wang and D. H. Xiang, Error analysis on Hérmite learning with gradient data, Chin. Ann. Math. Ser. B, 39 (2018), 705-720.  doi: 10.1007/s11401-018-0091-7.  Google Scholar

[60]

B. H. ShengJ. L. Wang and S. P. Zhou, A way of constructing spherical zonal translation network operators with linear bounded operators, Taiwan. J. Math., 12 (2008), 77-92.  doi: 10.11650/twjm/1500602490.  Google Scholar

[61]

B. H. Sheng and D. H. Xiang, The convergence rate for a K-functional in learning theory, J. Inequal. Appl., (2010), Art. 249507, 18 pp. doi: 10.1155/2010/249507..  Google Scholar

[62]

B. H. Sheng and D. H. Xiang, The learning rate of $l_2$-coefficient regularized classification with strong loss, Acta Math. Sin. English Ser., 29 (2013), 2397-2408.  doi: 10.1007/s10114-013-0175-y.  Google Scholar

[63]

B. H. ShengP. X. Ye and J. L. Wang, Learning rates for least square regressions with coefficient regularization, Acta Math. Sin., 28 (2012), 2205-2212.  doi: 10.1007/s10114-012-0607-0.  Google Scholar

[64]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[65]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their applications, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[66]

B. H. Sheng and H. C. Zhu, The convergence rate of semi-supervised regression with quadratic loss, Appl. Math. Comput., 321 (2018), 11-24.  doi: 10.1016/j.amc.2017.10.033.  Google Scholar

[67]

B. K. SriperumbudurK. Fukumizu and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 9 (2010), 773-780.   Google Scholar

[68]

B. K. SriperumbudurA. GrettonK. FukumizuB. Schölkopf and G. R. G. Lanckriet, Hilbert space embeddings and metrics on probability measures, J. Mach. Learn. Res., 11 (2010), 1517-1561.   Google Scholar

[69]

I. Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res., 4 (2003), 1071-1105.  doi: 10.1162/1532443041827925.  Google Scholar

[70]

I. Steinwart and A. Christmann, How SVMs can estimate quantiles and the median, Adv. Neural Inform. Process. Syst., 20 (2008), 305-312.   Google Scholar

[71]

I. Steinwart and A. Christmann, Estimating conditional quantiles with the help of the pinball loss, Bernoulli, 17 (2011), 211-225.  doi: 10.3150/10-BEJ267.  Google Scholar

[72]

I. Steinwart and A. Christmann, Support Vector Machines, Springer-Verlag, New York, 2008.  Google Scholar

[73]

I. Steinwart, D. Hush and C. Scovel, Optimal rates for regularized least square regression, in Proceedings of the 22nd Annual Conference on Learning Theory, (2009), 79–93. Google Scholar

[74]

H. W. Sun and Q. Wu, Application of integral operator for regularized least-square regression, Math. Comput. Model., 49 (2009), 276-285.  doi: 10.1016/j.mcm.2008.08.005.  Google Scholar

[75]

H. Z. TongD. R. Chen and L. Z. Peng, Analysis of support vector machines regression, Found. Comput. Math., 9 (2009), 243-257.  doi: 10.1007/s10208-008-9026-0.  Google Scholar

[76]

H. Z. TongD. R. Chen and F. H. Yang, Fast learning rates for regularized regression algorithms (in Chinese), Sci. Sin. Math., 42 (2012), 1251-1262.   Google Scholar

[77]

V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.  Google Scholar

[78]

C. Wang and J. Cai, Convergence analysis of coefficient-based regularization under moment incremental condition, Int. J. Wavelets Multiresolut. Inform. Process., 12 (2014), Art. 1450008, 19 pp. doi: 10.1142/S0219691314500088.  Google Scholar

[79]

S. H. Wang, Z. L. Chen and B. H. Sheng, Convergence rate of SVM for kernel-based robust regression, Int. J. Wavelets Multiresolut. Inform. Process., 17 (2019), Art. 1950004, 21 pp. doi: 10.1142/S0219691319500048.  Google Scholar

[80]

C. Wang and Z. C. Guo, ERM learning with unbounded sampling, Acta Math. Sin. English Ser., 28 (2012), 97-104.  doi: 10.1007/s10114-012-9739-5.  Google Scholar

[81]

X. Z. WangY. X. Zhao and F. Pourpanah, Recent advances in deep learning, Int. J. Math. Learn. Cyber., 11 (2020), 747-750.   Google Scholar

[82]

C. Wang and D. X. Zhou, Optimal learning rates for least squares regularized regression with unbounded sampling, J. Complexity, 27 (2011), 55-67.  doi: 10.1016/j.jco.2010.10.002.  Google Scholar

[83]

Q. WuY. M. Ying and D. X. Zhou, Learning rates of least-square regularized regression, Found. Comput. Math., 6 (2006), 171-192.  doi: 10.1007/s10208-004-0155-9.  Google Scholar

[84]

D. H. Xiang, Conditional quantiles with varying Gaussians, Adv. Comput. Math., 38 (2013), 723-735.  doi: 10.1007/s10444-011-9257-5.  Google Scholar

[85]

D. H. Xiang, ERM scheme for quantile regression, Abstr. Appl. Anal., (2013), Art. 148490, 6 pp. doi: 10.1155/2013/148490.  Google Scholar

[86]

T. Zhang, Statistical behavior and consistency of classificattion methods based on convex risk minimization, Ann. Statist., 32 (2004), 56-134.  doi: 10.1214/aos/1079120130.  Google Scholar

[87]

D. X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory, 49 (2003), 1743-1752.  doi: 10.1109/TIT.2003.813564.  Google Scholar

[88]

D. X. Zhou, Deep distributed convolutional neural networks: universality, Anal. Appl., 16 (2018), 895-919.  doi: 10.1142/S0219530518500124.  Google Scholar

[89]

D. X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmonic Anal., 48 (2020), 787-794.  doi: 10.1016/j.acha.2019.06.004.  Google Scholar

[90]

D. X. Zhou and K. Jetter, Approximation with polynomial kernels and SVM, Adv. Comput. Math., 25 (2006), 323-344.  doi: 10.1007/s10444-004-7206-2.  Google Scholar

[1]

Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020, 3 (3) : 205-218. doi: 10.3934/mfc.2020020

[2]

Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

[3]

Kaitlyn (Voccola) Muller. A reproducing kernel Hilbert space framework for inverse scattering problems within the Born approximation. Inverse Problems & Imaging, 2019, 13 (6) : 1327-1348. doi: 10.3934/ipi.2019058

[4]

Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055

[5]

Ali Akgül. A new application of the reproducing kernel method. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020261

[6]

Liam Burrows, Weihong Guo, Ke Chen, Francesco Torella. Reproducible kernel Hilbert space based global and local image segmentation. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020048

[7]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[8]

Bingzheng Li, Zhengzhan Dai. Error analysis on regularized regression based on the Maximum correntropy criterion. Mathematical Foundations of Computing, 2020, 3 (1) : 25-40. doi: 10.3934/mfc.2020003

[9]

Fabrizio Colombo, Davide Guidetti. Identification of the memory kernel in the strongly damped wave equation by a flux condition. Communications on Pure & Applied Analysis, 2009, 8 (2) : 601-620. doi: 10.3934/cpaa.2009.8.601

[10]

Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186

[11]

Lucian Coroianu, Danilo Costarelli, Sorin G. Gal, Gianluca Vinti. Approximation by multivariate max-product Kantorovich-type operators and learning rates of least-squares regularized regression. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4213-4225. doi: 10.3934/cpaa.2020189

[12]

Steven G. Krantz and Marco M. Peloso. New results on the Bergman kernel of the worm domain in complex space. Electronic Research Announcements, 2007, 14: 35-41. doi: 10.3934/era.2007.14.35

[13]

Ming Huang, Cong Cheng, Yang Li, Zun Quan Xia. The space decomposition method for the sum of nonlinear convex maximum eigenvalues and its applications. Journal of Industrial & Management Optimization, 2020, 16 (4) : 1885-1905. doi: 10.3934/jimo.2019034

[14]

Zhenhuan Yang, Wei Shen, Yiming Ying, Xiaoming Yuan. Stochastic AUC optimization with general loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4191-4212. doi: 10.3934/cpaa.2020188

[15]

Anatoly Neishtadt. On stability loss delay for dynamical bifurcations. Discrete & Continuous Dynamical Systems - S, 2009, 2 (4) : 897-909. doi: 10.3934/dcdss.2009.2.897

[16]

Qinian Jin, YanYan Li. Starshaped compact hypersurfaces with prescribed $k$-th mean curvature in hyperbolic space. Discrete & Continuous Dynamical Systems - A, 2006, 15 (2) : 367-377. doi: 10.3934/dcds.2006.15.367

[17]

Marjan Uddin, Hazrat Ali. Space-time kernel based numerical method for generalized Black-Scholes equation. Discrete & Continuous Dynamical Systems - S, 2020, 13 (10) : 2905-2915. doi: 10.3934/dcdss.2020221

[18]

Zhiming Li, Lin Shu. The metric entropy of random dynamical systems in a Hilbert space: Characterization of invariant measures satisfying Pesin's entropy formula. Discrete & Continuous Dynamical Systems - A, 2013, 33 (9) : 4123-4155. doi: 10.3934/dcds.2013.33.4123

[19]

Nigel Higson and Gennadi Kasparov. Operator K-theory for groups which act properly and isometrically on Hilbert space. Electronic Research Announcements, 1997, 3: 131-142.

[20]

Yoshikazu Giga, Yukihiro Seki, Noriaki Umeda. On decay rate of quenching profile at space infinity for axisymmetric mean curvature flow. Discrete & Continuous Dynamical Systems - A, 2011, 29 (4) : 1463-1470. doi: 10.3934/dcds.2011.29.1463

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (58)
  • HTML views (109)
  • Cited by (0)

Other articles
by authors

[Back to Top]