Recent investigations on the error analysis of kernel regularized pairwise learning initiate the theoretical research on pairwise reproducing kernel Hilbert spaces (PRKHSs). In the present paper, we provide a method of constructing PRKHSs with classical Jacobi orthogonal polynomials. The performance of the kernel regularized online pairwise regression learning algorithms based on a quadratic loss function is investigated. Applying convex analysis and Rademacher complexity techniques, the bounds for the generalization error are provided explicitly. It is shown that the convergence rate can be greatly improved by adjusting the scale parameters in the loss function.
Citation: |
[1] | S. Agarwal and P. Niyogi, Generalization bounds for ranking algorithms via algorithms via algorithmic stability, J. Mach. Learn. Res., 10 (2009), 441-474. |
[2] | R. Aktass, A. Altin and F. Tasdelen, A note on a family of two-variable polynomials, J. Comput. Appl. Math., 235 (2011), 4825-4833. doi: 10.1016/j.cam.2010.11.005. |
[3] | A. Altin, R. Aktas and E. Erkus-Duman, On a multivariable extension for the extended Jacobi polynomials, J. Math. Anal. Appl., 353 (2009), 121-133. doi: 10.1016/j.jmaa.2008.11.070. |
[4] | P. L. Bartlett, O. Bousquet and S. Mendelson, Local rademacher complexities, Ann. Stat., 33 (2005), 1497-1537. doi: 10.1214/009053605000000282. |
[5] | P. L. Bartlett and S. Mendelson, Rademacher and Gaussian complexities: risk bounds and structural results, J. Mach. Learn. Res., 3 (2002), 463-482. doi: 10.1162/153244303321897690. |
[6] | H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2010. doi: 10.1007/978-1-4419-9467-7. |
[7] | A. Bezubik, J. Hrivn$\acute{a}$k and S. Po$\check{s}$ta, Two dimensional symmetric and antisymmetric exponential functions and orthogonal polynomials, J. Phys. Confer. Ser., 411 (2013), Art. 012007. doi: 10.1088/1742-6596/411/1/012007. |
[8] | M. Boissier, S. Lyu, Y. Ying and D. X. Zhou, Fast convergence of online pairwise learning algorithms, in Proceedings of the 19th International Conference on Artifical Intelligence and Statistics, (2016) 204–212. |
[9] | S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge, Cambridge University Press, 2004. doi: 10.1017/CBO9780511804441. |
[10] | C. Brunner, A. Fischer, K. Luig and T. Thies, Pairwise support vector machines and their application to large scale problems, J. Mach. Learn. Res., 13 (2012), 2279-2292. |
[11] | Q. Cao, Z. C. Guo and Y. M. Ying, Generalization bounds for metric and similarity learning, Mach. Learn., 102 (2016), 115-132. doi: 10.1007/s10994-015-5499-7. |
[12] | H. Chen, Z. B. Pan and L. Q. Li, Learning performance of coefficient-based regularized ranking, Neurocomputing, 133 (2014), 54-62. |
[13] | H. Chen and J. T. Wu, Regularized ranking with convex losses and $\ell^{1}$ penalty, Abst. Appl. Anal., (2013), Art. 927827, 8 pp. doi: 10.1155/2013/927827. |
[14] | R. H. Chen and Z. M. Wu, Solving partial differential equation by using multiquadratic quasi-interpolation, Appl. Math. Comput., 186 (2007), 1502-1510. doi: 10.1016/j.amc.2006.07.160. |
[15] | X. M. Chen and Y. W. Lei, Refined bounds for online pairwise learning algorithms, Neurocomputing, 275 (2018), 2656-2665. |
[16] | S. O. Cheng and A. J. Smola, Hyperkernels, Conference Paper, 2002. Available from: https://www.researchgate.net/publication/221618045. |
[17] | S. O. Cheng, A. J. Smola and R. C. Williamson, Learning the kernel with hyperkernels, J. Mach. Learn. Res., 6 (2005), 1043-1071. |
[18] | A. Christmann and D. X. Zhou, On the robustness of regularized pairwise learning methods based on kernels, J. Complexity, 37 (2016), 1-33. doi: 10.1016/j.jco.2016.07.001. |
[19] | F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2002), 1-49. doi: 10.1090/S0273-0979-01-00923-5. |
[20] | F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, New York, 2007. doi: 10.1017/CBO9780511618796. |
[21] | S. R. Das, Ku hoo, D. Mishra and M. Rout, An optimized feature reduction based currency forecasting model exploring the online sequential extreme learning machine and krill herd strategies, Physica A, 513 (2019), 339-370. |
[22] | C. F. Dunkl and Y. Xu, Orthogonal Polynomials of Several Variables, Cambridge University Press, 2014. doi: 10.1017/CBO9781107786134. |
[23] | Q. Fang, M. Xu and Y. M. Ying, Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems, Anal. Appl., 16 (2018), 741-755. doi: 10.1142/S0219530518500082. |
[24] | L. Fern$\acute{a}$ndez, T. E. P$\acute{e}$rez and M. A. Pi$\tilde{n}$ar, On Koornwinder classical orthogonal polynomials in two variables, J. Comput. Appl. Math., 236 (2012), 3817-3826. doi: 10.1016/j.cam.2011.08.017. |
[25] | W. W. Gao and Z. G. Wang, A meshless scheme for partial differential equations based on multiquadratic trigonometric b-spline quasi-interpolation, Chin. Phys. B, 23 (2014), Art. 110207. |
[26] | Z. C. Guo, Y. M. Ying and D. X. Zhou, Online regularized learning with pairwise loss functions, Adv. Comput. Math., 43 (2017), 127-150. doi: 10.1007/s10444-016-9479-7. |
[27] | T. Hu, J. Fan and D. H. Xiang, Convergence analysis of distributed multi-penalty regularized pairwise learning, Anal. Appl., 18 (2020), 109-127. doi: 10.1142/S0219530519410045. |
[28] | S. Karlin and J. Gregor, Determinants of orthogonal polynomials, Bull. Amer. Math. Soc., 68 (1962), 204-209. doi: 10.1090/S0002-9904-1962-10749-6. |
[29] | J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Stati., 23 (1952), 462-466. doi: 10.1214/aoms/1177729392. |
[30] | A. Klimyk and J. Patera, (Anti)Symmetric multivariable exponential functions and corresponding Fourier transforms, J. Phys. A Math. Theor., 40 (2007), 10473-10489. doi: 10.1088/1751-8113/40/34/006. |
[31] | T. Koornwinder, Two-variable analogues of the classical orthogonal polynomials, in Theory and Application of Special Functions (eds. R. A. Askey), Academic Press, New York, (1975), 435–495. |
[32] | T. Koornwinder, Yet another proof of the addition formula for Jacobi polynomials, J. Math. Anal. Appl., 61 (1977), 136-141. doi: 10.1016/0022-247X(77)90149-4. |
[33] | T. Koornwinder, Orthogonal polynomials in two variables which are eigenfunctions of two algebraically independeng partial differential operators, I, II, Indag Math., 36 (1974), 48-66. |
[34] | H. L. Krall and I. M. Sheffer, Orthogonal polynomials in two variables, Ann. Mat. Pura Appl., 76 (1967), 325-376. doi: 10.1007/BF02412238. |
[35] | G. Kriukova, S. V. Pereverzyev and P. Tkachenko, On the convergence rate and some applications of regularized ranking algorithms, J. Complexity, 33 (2016), 14-29. doi: 10.1016/j.jco.2015.09.004. |
[36] | D. W. Lee, Partial differential equations for products of two classical orthogonal polynomials, Bull. Korean Math. Soc., 42 (2005), 179-188. doi: 10.4134/BKMS.2005.42.1.179. |
[37] | S. B. Lin, J. S. Zeng, J. Fang and Z. B. Xu, Learning rates of $l^q$ coefficient regularization learning with Gaussian kernel, Neural Comput., 26 (2014), 2359-2378. doi: 10.1162/NECO_a_00641. |
[38] | J. H. Lin and D. X. Zhou, Online learning algorithms can converge comparably fast as batch learning, IEEE Trans. Neural Netw. Learn. Syst., 29 (2017), 2367-2378. doi: 10.1109/TNNLS.2017.2677970. |
[39] | J. H. Lin, Y. W. Lei, B. Zhang and D. X. Zhou, Online pairwise learning algorithms with convex loss functions, Inform. Sci., 406-407 (2017), 57-70. doi: 10.1162/neco_a_00817. |
[40] | H. X. Liu, B. H. Sheng and P. X. Ye, The improved learning rate for regularized regression with RKBSs, Int. J. Math. Learn. Cyber., 8 (2017), 1235-1245. |
[41] | P. Malave, Multi-dimensional Orthogonal Polynomials, Ph.D thesis, Marathwada, Univ. Aurangabad, 1979. |
[42] | L. Mason, J. Baxter, P. Bartlett and M. Frean, Functional Gradient Techniques for Combining Hypotheses, MIT Press, 1999. |
[43] | R. Meir and T. Zhang, Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2003), 839-860. doi: 10.1162/1532443041424300. |
[44] | G. V. Milovanovi$\acute{c}$, G. $\ddot{O}$zt$\ddot{u}$rk and R. Aktas, Properties of some of two-variable orthogonal polynomials, Bull. Malay. Math. Sci. Soc., 43 (2020), 1403-1431. doi: 10.1007/s40840-019-00750-8. |
[45] | T. Pahikkala, M. Viljanen, A. Airola and W. Waegeman, Spectral analysis of symmetric and antisymmetric pairwise kernels, preprint, arXiv: 1506.05950v1, (2015), 19 pp. |
[46] | N. Ratliff and J. Bagnell, Kernel conjugate gradient for fast kernel machines, In IJCAI, 20 (2007), 1017-1021. |
[47] | W. Rejchel, On ranking and generalization bounds, J. Mach. Learn. Res., 13 (2012), 1373-1392. |
[48] | H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407. doi: 10.1214/aoms/1177729586. |
[49] | H. Rosengren, Multivariable Chiristoffel-Darboux kernels and charateristic polynomials of random Hermitian matrices, Symmetry Integr. Geom. Meth. Appl., 2 (2006), Art. 085, 12 pp. doi: 10.3842/SIGMA.2006.085. |
[50] | B. Sch$\ddot{o}$lkopf, R. Herbrich and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002. |
[51] | B. H. Sheng, The weighted norm for some Mercer kernel matrices, Acta. Math. Sci., 33A (2013), 6–15 (in Chinese). |
[52] | B. H. Sheng, L. Q. Duan and P. X. Ye, Strong convex loss can increase the learning rates of online learning, J. Computer, 9 (2014), 1606-1611. |
[53] | B. H. Sheng and J. L. Wang, On the K-functional in learning theory, Anal. Appl., 18 (2020), 423-446. doi: 10.1142/S0219530519500192. |
[54] | B. H. Sheng and D. H. Xiang, The performance of semi-supervised laplacian regularized regression with least square loss, Int. J. Wavelets Multiresolut. Inform. Process., 15 (2017), Art. 1750016, 31 pp. doi: 10.1142/S0219691317500163. |
[55] | B. H. Sheng and H. Z. Zhang, Performance analysis of the LapRSSLG algorithm in learning theory, Anal. Appl., 18 (2020), 79-108. doi: 10.1142/S0219530519410033. |
[56] | B. H. Sheng and H. C. Zhu, The convergence rate of semi-supervised regression with quadratic loss, Appl. Math. Comput., 321 (2018), 11-24. doi: 10.1016/j.amc.2017.10.033. |
[57] | S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 1-25. doi: 10.1142/S0219530503000089. |
[58] | S. Smale and Y. Yao, Online learning algorithms, Found. Comput. Math., 6 (2006), 145-170. doi: 10.1007/s10208-004-0160-z. |
[59] | I. Sprinkhuizen-Kuyper, Orthogonal polynomials in two variables. A further analysis of the polynomials orthogonal over a region bounded by two lines and a parabola, SIAM J. Math. Anal., 7 (1976), 501-518. doi: 10.1137/0507041. |
[60] | I. Steinwart and C. Scovel, Mercer$\prime$s theorem on general domains: on the interaction between measures, kernels, and RKHSs, Constr. Approx., 35 (2012), 363-417. doi: 10.1007/s00365-012-9153-3. |
[61] | H. Sun, Mercer theorem for RKHS on noncompact sets, J. Complexity, 21 (2005), 337-349. doi: 10.1016/j.jco.2004.09.002. |
[62] | S. H. Wang, Z. L. Chen and B. H. Sheng, The convergence rate of SVM for kernel-based robust regression, Int. J. Wavelets Multiresolut. Inform. Process., 17 (2019), Art. 1950004. doi: 10.1142/S0219691319500048. |
[63] | C. Wang and T. Hu, Online regularized rairwise learning with least squares loss, Anal. Appl., 18 (2020), 49-78. doi: 10.1142/S0219530519410070. |
[64] | Y. Wang, R. Khardon, D. Pechyony and R. Jones, Generalization bounds for online learning algorithms with pairwise loss functions, in JMLR: Workshop and Conference Proceedings, Annual Conference on Learning Theory, 23 (2012), 13.1–13.22. |
[65] | Q. Wu, Y. M. Ying and D. X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134. doi: 10.1016/j.jco.2006.06.007. |
[66] | Z. M. Wu, Dynamical knot and shape parameter setting for simulating shock wave by using multi-quadratic quasi-interpolation, Eng. Anal. Bound. Elem., 29 (2005), 354-358. |
[67] | Z. M. Wu and R. Schaback, Shape preserving properties and convergence of univariate multiquadratic quasi-interpolation, Acta Math. Appl. Sin., 10 (1994), 441-446. doi: 10.1007/BF02016334. |
[68] | D. H. Xiang, Conditional quantiles with varying Gaussians, Adv. Comput. Math., 38 (2013), 723-735. doi: 10.1007/s10444-011-9257-5. |
[69] | G. B. Ye and D. X. Zhou, Fully online classification by regularization, Appl. Comput. Harmon. Anal., 23 (2007), 198-214. doi: 10.1016/j.acha.2006.12.001. |
[70] | Y. M. Ying and D. X. Zhou, Learnability of Gaussians with flexible variances, J. Mach. Learn. Res., 8 (2007), 249-276. |
[71] | Y. M. Ying and D. X. Zhou, Online regularized classification algorithms, IEEE. Trans. Inform. Theory, 52 (2006), 4775-4788. doi: 10.1109/TIT.2006.883632. |
[72] | Y. M. Ying and D. X. Zhou, Online pairwise learning algorithms, Neural Comput., 28 (2016), 743-777. doi: 10.1162/neco_a_00817. |
[73] | Y. M. Ying and D. X. Zhou, Unregularized online learning algorithms with general loss functions, Appl. Comput. Harmon. Anal., 42 (2017), 224-244. doi: 10.1016/j.acha.2015.08.007. |
[74] | Y. X. Zeng and D. Klabjian, Online adaptive machine learning based algorithm for implied volatility surface modeling, Knowledge-Based Syst., 163 (2019), 376-391. |
[75] | L. L. Zhang and B. H. Sheng, Online regularized generalized gradient classification algorithms, Anal. Theory Appl., 26 (2010), 278-300. doi: 10.1007/s10496-010-0278-6. |
[76] | Y. L. Zhao, J. Fan and L. Shi, Learning rates for regularized least squares ranking algorithm, Anal. Appl., 15 (2017), 815-836. doi: 10.1142/S0219530517500063. |