• Previous Article
    A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression
  • CPAA Home
  • This Issue
  • Next Article
    Computing eigenpairs of two-parameter Sturm-Liouville systems using the bivariate sinc-Gauss formula
August  2020, 19(8): 4159-4177. doi: 10.3934/cpaa.2020186

Kernel-based maximum correntropy criterion with gradient descent method

School of Mathematics and Statistics, Wuhan University, Wuhan, China

Received  September 2019 Revised  December 2019 Published  May 2020

Fund Project: The author is supported by NSFC grant 11671307 and 11571078

In this paper, we study the convergence of the gradient descent method for the maximum correntropy criterion (MCC) associated with reproducing kernel Hilbert spaces (RKHSs). MCC is widely used in many real-world applications because of its robustness and ability to deal with non-Gaussian impulse noises. In the regression context, we show that the gradient descent iterates of MCC can approximate the target function and derive the capacity-dependent convergence rate by taking a suitable iteration number. Our result can nearly match the optimal convergence rate stated in the previous work, and in which we can see that the scaling parameter is crucial to MCC's approximation ability and robustness property. The novelty of our work lies in a sharp estimate for the norms of the gradient descent iterates and the projection operation on the last iterate.

Citation: Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.  Google Scholar

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35.   Google Scholar

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034.   Google Scholar

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.  Google Scholar

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.  Google Scholar

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.  Google Scholar

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.  Google Scholar

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.  Google Scholar

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755.   Google Scholar

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.  Google Scholar

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706.   Google Scholar

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.  Google Scholar

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.  Google Scholar

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.  Google Scholar

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.  Google Scholar

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.  Google Scholar

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.  Google Scholar

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.  Google Scholar

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35.   Google Scholar

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034.   Google Scholar

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.  Google Scholar

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.  Google Scholar

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.  Google Scholar

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.  Google Scholar

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.  Google Scholar

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755.   Google Scholar

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.  Google Scholar

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706.   Google Scholar

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.  Google Scholar

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.  Google Scholar

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.  Google Scholar

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.  Google Scholar

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.  Google Scholar

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.  Google Scholar

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[1]

Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115

[2]

Mostafa Mbekhta. Representation and approximation of the polar factor of an operator on a Hilbert space. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020463

[3]

Xin Guo, Lexin Li, Qiang Wu. Modeling interactive components by coordinate kernel polynomial models. Mathematical Foundations of Computing, 2020, 3 (4) : 263-277. doi: 10.3934/mfc.2020010

[4]

Yahia Zare Mehrjerdi. A new methodology for solving bi-criterion fractional stochastic programming. Numerical Algebra, Control & Optimization, 2020  doi: 10.3934/naco.2020054

[5]

Bahaaeldin Abdalla, Thabet Abdeljawad. Oscillation criteria for kernel function dependent fractional dynamic equations. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020443

[6]

Giuseppina Guatteri, Federica Masiero. Stochastic maximum principle for problems with delay with dependence on the past through general measures. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020048

[7]

Federico Rodriguez Hertz, Zhiren Wang. On $ \epsilon $-escaping trajectories in homogeneous spaces. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 329-357. doi: 10.3934/dcds.2020365

[8]

Noah Stevenson, Ian Tice. A truncated real interpolation method and characterizations of screened Sobolev spaces. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5509-5566. doi: 10.3934/cpaa.2020250

[9]

Sumit Arora, Manil T. Mohan, Jaydev Dabas. Approximate controllability of a Sobolev type impulsive functional evolution system in Banach spaces. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020049

[10]

Sihem Guerarra. Maximum and minimum ranks and inertias of the Hermitian parts of the least rank solution of the matrix equation AXB = C. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 75-86. doi: 10.3934/naco.2020016

[11]

Soniya Singh, Sumit Arora, Manil T. Mohan, Jaydev Dabas. Approximate controllability of second order impulsive systems with state-dependent delay in Banach spaces. Evolution Equations & Control Theory, 2020  doi: 10.3934/eect.2020103

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (122)
  • HTML views (71)
  • Cited by (0)

Other articles
by authors

[Back to Top]