• Previous Article
    A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression
  • CPAA Home
  • This Issue
  • Next Article
    Computing eigenpairs of two-parameter Sturm-Liouville systems using the bivariate sinc-Gauss formula
August  2020, 19(8): 4159-4177. doi: 10.3934/cpaa.2020186

Kernel-based maximum correntropy criterion with gradient descent method

School of Mathematics and Statistics, Wuhan University, Wuhan, China

Received  September 2019 Revised  December 2019 Published  May 2020

Fund Project: The author is supported by NSFC grant 11671307 and 11571078

In this paper, we study the convergence of the gradient descent method for the maximum correntropy criterion (MCC) associated with reproducing kernel Hilbert spaces (RKHSs). MCC is widely used in many real-world applications because of its robustness and ability to deal with non-Gaussian impulse noises. In the regression context, we show that the gradient descent iterates of MCC can approximate the target function and derive the capacity-dependent convergence rate by taking a suitable iteration number. Our result can nearly match the optimal convergence rate stated in the previous work, and in which we can see that the scaling parameter is crucial to MCC's approximation ability and robustness property. The novelty of our work lies in a sharp estimate for the norms of the gradient descent iterates and the projection operation on the last iterate.

Citation: Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.  Google Scholar

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35.   Google Scholar

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034.   Google Scholar

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.  Google Scholar

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.  Google Scholar

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.  Google Scholar

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.  Google Scholar

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.  Google Scholar

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755.   Google Scholar

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.  Google Scholar

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706.   Google Scholar

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.  Google Scholar

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.  Google Scholar

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.  Google Scholar

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.  Google Scholar

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.  Google Scholar

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.  Google Scholar

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.  Google Scholar

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35.   Google Scholar

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034.   Google Scholar

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.  Google Scholar

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.  Google Scholar

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.  Google Scholar

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.  Google Scholar

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.  Google Scholar

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755.   Google Scholar

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.  Google Scholar

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706.   Google Scholar

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.  Google Scholar

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.  Google Scholar

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.  Google Scholar

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.  Google Scholar

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.  Google Scholar

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.  Google Scholar

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[1]

Yuan Gao, Jian-Guo Liu, Tao Luo, Yang Xiang. Revisit of the Peierls-Nabarro model for edge dislocations in Hilbert space. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 3177-3207. doi: 10.3934/dcdsb.2020224

[2]

Muberra Allahverdi, Harun Aydilek, Asiye Aydilek, Ali Allahverdi. A better dominance relation and heuristics for Two-Machine No-Wait Flowshops with Maximum Lateness Performance Measure. Journal of Industrial & Management Optimization, 2021, 17 (4) : 1973-1991. doi: 10.3934/jimo.2020054

[3]

Tuan Hiep Pham, Jérôme Laverne, Jean-Jacques Marigo. Stress gradient effects on the nucleation and propagation of cohesive cracks. Discrete & Continuous Dynamical Systems - S, 2016, 9 (2) : 557-584. doi: 10.3934/dcdss.2016012

[4]

Matthias Erbar, Jan Maas. Gradient flow structures for discrete porous medium equations. Discrete & Continuous Dynamical Systems, 2014, 34 (4) : 1355-1374. doi: 10.3934/dcds.2014.34.1355

[5]

Krzysztof A. Krakowski, Luís Machado, Fátima Silva Leite. A unifying approach for rolling symmetric spaces. Journal of Geometric Mechanics, 2021, 13 (1) : 145-166. doi: 10.3934/jgm.2020016

[6]

Alexandre B. Simas, Fábio J. Valentim. $W$-Sobolev spaces: Higher order and regularity. Communications on Pure & Applied Analysis, 2015, 14 (2) : 597-607. doi: 10.3934/cpaa.2015.14.597

[7]

Zhengchao Ji. Cylindrical estimates for mean curvature flow in hyperbolic spaces. Communications on Pure & Applied Analysis, 2021, 20 (3) : 1199-1211. doi: 10.3934/cpaa.2021016

[8]

Andrea Cianchi, Adele Ferone. Improving sharp Sobolev type inequalities by optimal remainder gradient norms. Communications on Pure & Applied Analysis, 2012, 11 (3) : 1363-1386. doi: 10.3934/cpaa.2012.11.1363

[9]

Minh-Phuong Tran, Thanh-Nhan Nguyen. Pointwise gradient bounds for a class of very singular quasilinear elliptic equations. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021043

[10]

Xinqun Mei, Jundong Zhou. The interior gradient estimate of prescribed Hessian quotient curvature equation in the hyperbolic space. Communications on Pure & Applied Analysis, 2021, 20 (3) : 1187-1198. doi: 10.3934/cpaa.2021012

[11]

Chiun-Chuan Chen, Hung-Yu Chien, Chih-Chiang Huang. A variational approach to three-phase traveling waves for a gradient system. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021055

[12]

Changjun Yu, Lei Yuan, Shuxuan Su. A new gradient computational formula for optimal control problems with time-delay. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021076

[13]

Alexander A. Davydov, Massimo Giulietti, Stefano Marcugini, Fernanda Pambianco. Linear nonbinary covering codes and saturating sets in projective spaces. Advances in Mathematics of Communications, 2011, 5 (1) : 119-147. doi: 10.3934/amc.2011.5.119

[14]

Xue-Ping Luo, Yi-Bin Xiao, Wei Li. Strict feasibility of variational inclusion problems in reflexive Banach spaces. Journal of Industrial & Management Optimization, 2020, 16 (5) : 2495-2502. doi: 10.3934/jimo.2019065

[15]

Tadahiro Oh, Yuzhao Wang. On global well-posedness of the modified KdV equation in modulation spaces. Discrete & Continuous Dynamical Systems, 2021, 41 (6) : 2971-2992. doi: 10.3934/dcds.2020393

[16]

K. Ravikumar, Manil T. Mohan, A. Anguraj. Approximate controllability of a non-autonomous evolution equation in Banach spaces. Numerical Algebra, Control & Optimization, 2021, 11 (3) : 461-485. doi: 10.3934/naco.2020038

[17]

Hong Seng Sim, Wah June Leong, Chuei Yee Chen, Siti Nur Iqmal Ibrahim. Multi-step spectral gradient methods with modified weak secant relation for large scale unconstrained optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 377-387. doi: 10.3934/naco.2018024

[18]

Bouthaina Abdelhedi, Hatem Zaag. Single point blow-up and final profile for a perturbed nonlinear heat equation with a gradient and a non-local term. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021032

[19]

Tengteng Yu, Xin-Wei Liu, Yu-Hong Dai, Jie Sun. Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021084

[20]

Min Li. A three term Polak-Ribière-Polyak conjugate gradient method close to the memoryless BFGS quasi-Newton method. Journal of Industrial & Management Optimization, 2020, 16 (1) : 245-260. doi: 10.3934/jimo.2018149

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (161)
  • HTML views (72)
  • Cited by (0)

Other articles
by authors

[Back to Top]