• Previous Article
    A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression
  • CPAA Home
  • This Issue
  • Next Article
    Computing eigenpairs of two-parameter Sturm-Liouville systems using the bivariate sinc-Gauss formula
August  2020, 19(8): 4159-4177. doi: 10.3934/cpaa.2020186

Kernel-based maximum correntropy criterion with gradient descent method

School of Mathematics and Statistics, Wuhan University, Wuhan, China

Received  September 2019 Revised  December 2019 Published  May 2020

Fund Project: The author is supported by NSFC grant 11671307 and 11571078

In this paper, we study the convergence of the gradient descent method for the maximum correntropy criterion (MCC) associated with reproducing kernel Hilbert spaces (RKHSs). MCC is widely used in many real-world applications because of its robustness and ability to deal with non-Gaussian impulse noises. In the regression context, we show that the gradient descent iterates of MCC can approximate the target function and derive the capacity-dependent convergence rate by taking a suitable iteration number. Our result can nearly match the optimal convergence rate stated in the previous work, and in which we can see that the scaling parameter is crucial to MCC's approximation ability and robustness property. The novelty of our work lies in a sharp estimate for the norms of the gradient descent iterates and the projection operation on the last iterate.

Citation: Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186
References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.  Google Scholar

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35.   Google Scholar

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034.   Google Scholar

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.  Google Scholar

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.  Google Scholar

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.  Google Scholar

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.  Google Scholar

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.  Google Scholar

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755.   Google Scholar

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.  Google Scholar

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706.   Google Scholar

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.  Google Scholar

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.  Google Scholar

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.  Google Scholar

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.  Google Scholar

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.  Google Scholar

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.  Google Scholar

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

show all references

References:
[1]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.2307/1990404.  Google Scholar

[2]

R. J. BessaV. Miranda and J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting, IEEE Trans. Power Syst., 24 (2009), 1657-1666.  doi: 10.1109/TPWRS.2009.2030291.  Google Scholar

[3]

D. R. ChenQ. WuY. Ying and D. X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learn. Res., 5 (2004), 1143-1175.   Google Scholar

[4]

M. DebruyneA. ChristmannM. Hubert and J. A. K. Suykens, Robustness of reweighted least squares kernel based regression, J. Multi. Anal., 101 (2010), 447-463.  doi: 10.1016/j.jmva.2009.09.007.  Google Scholar

[5]

Y. FengJ. Fan and J. A. Suykens, A statistical learning approach to modal regression, J. Mach. Learn. Res., 21 (2020), 1-35.   Google Scholar

[6]

Y. FengX. HuangS. LeiY. Yang and J. A. K. Suykens, Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16 (2015), 993-1034.   Google Scholar

[7]

Y. Feng and Y. Ying, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48 (2020), 795-810.  doi: 10.1016/j.acha.2019.09.001.  Google Scholar

[8]

Z. C. Guo, T. Hu and L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), Art. 065009. doi: 10.1088/1361-6420/aabe55.  Google Scholar

[9]

R. HeW. S. Zheng and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), 1561-1576.  doi: 10.1109/TPAMI.2010.220.  Google Scholar

[10]

R. HeW. S. ZhengB. G. Hu and X. W. Kong, A regularized correntropy framework for robust pattern recognition, Neural Comput., 23 (2011), 2074-2100.  doi: 10.1162/NECO_a_00155.  Google Scholar

[11]

P. W. Holland and R. E. Welsch, Robust regression using iteratively reweighted least-squares, Commun. Statist., 6 (1977), 813-827.  doi: 10.1016/j.neucom.2016.12.029.  Google Scholar

[12]

T. HuQ. Wu and D. X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Appl. Comput. Harmon. Anal., 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

P. J. Huber, Robust Statistics., Wiley, New York, 2004.  Google Scholar

[14]

J. LinL. Rosasco and D. X. Zhou, Iterative regularization for learning with convex loss functions, J. Mach. Learn. Res., 17 (2016), 2718-2755.   Google Scholar

[15]

W. LiuP. P. Pokharel and J. C. Principe, Correntropy: Properties and applications in non-gaussian signal processing, IEEE Trans. Signal Process., 55 (2007), 5286-5298.  doi: 10.1109/TSP.2007.896065.  Google Scholar

[16]

I. Pinelis et al., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab., 22 (1994), 1679-1706.   Google Scholar

[17]

K. N. PlataniotisD. Androutsos and A. N. Venetsanopoulos, Nonlinear filtering of non-gaussian noise, J. Intell. Robot. Syst., 19 (1997), 207-231.  doi: 10.1023/A:1007974400149.  Google Scholar

[18]

J. C. Principe, Information Theoretic Learning: Renyi's Entropy Entropy and Kernel Perspectives, Springer, New York, 2010. doi: 10.1007/978-1-4419-1570-2.  Google Scholar

[19]

I. SantamariaP. P. Pokharel and J. C. Principe, Generalized correlation function: Definition, properties, and application to blind equalization, IEEE Trans. Signal Process., 54 (2006), 2187-2197.  doi: 10.1109/TSP.2006.872524.  Google Scholar

[20]

S. Smale and D. X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 1 (2003), 17-41.  doi: 10.1142/S0219530503000089.  Google Scholar

[21]

S. Smale and D. X. Zhou, Learning theory estimates via integral operators and their approximations, Constr. Approx., 26 (2007), 153-172.  doi: 10.1007/s00365-006-0659-y.  Google Scholar

[22]

I. Steinwart, Oracle inequalities for support vector machines that are based on random entropy numbers, J. Complexity, 25 (2009), 437-454.  doi: 10.1016/j.jco.2009.06.002.  Google Scholar

[23]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[24]

X. WangY. JiangM. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc., 108 (2013), 632-643.  doi: 10.1080/01621459.2013.766613.  Google Scholar

[25]

B. Weng and K. E. Barner, Nonlinear system identification in impulsive environments, IEEE Trans. Signal Process., 53 (2005), 2588-2594.  doi: 10.1109/TSP.2005.849213.  Google Scholar

[26]

Q. WuY. Ying and D. X. Zhou., Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[1]

Bingzheng Li, Zhengzhan Dai. Error analysis on regularized regression based on the Maximum correntropy criterion. Mathematical Foundations of Computing, 2020, 3 (1) : 25-40. doi: 10.3934/mfc.2020003

[2]

Xiaming Chen. Kernel-based online gradient descent using distributed approach. Mathematical Foundations of Computing, 2019, 2 (1) : 1-9. doi: 10.3934/mfc.2019001

[3]

Kaitlyn (Voccola) Muller. A reproducing kernel Hilbert space framework for inverse scattering problems within the Born approximation. Inverse Problems & Imaging, 2019, 13 (6) : 1327-1348. doi: 10.3934/ipi.2019058

[4]

Ying Lin, Rongrong Lin, Qi Ye. Sparse regularized learning in the reproducing kernel banach spaces with the $ \ell^1 $ norm. Mathematical Foundations of Computing, 2020  doi: 10.3934/mfc.2020020

[5]

Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055

[6]

Sylvia Serfaty. Gamma-convergence of gradient flows on Hilbert and metric spaces and applications. Discrete & Continuous Dynamical Systems - A, 2011, 31 (4) : 1427-1451. doi: 10.3934/dcds.2011.31.1427

[7]

Irene Benedetti, Luisa Malaguti, Valentina Taddei. Nonlocal problems in Hilbert spaces. Conference Publications, 2015, 2015 (special) : 103-111. doi: 10.3934/proc.2015.0103

[8]

Fritz Gesztesy, Rudi Weikard, Maxim Zinchenko. On a class of model Hilbert spaces. Discrete & Continuous Dynamical Systems - A, 2013, 33 (11&12) : 5067-5088. doi: 10.3934/dcds.2013.33.5067

[9]

Zhiming Li, Yujun Zhu. Entropies of commuting transformations on Hilbert spaces. Discrete & Continuous Dynamical Systems - A, 2020, 40 (10) : 5795-5814. doi: 10.3934/dcds.2020246

[10]

Shishun Li, Zhengda Huang. Guaranteed descent conjugate gradient methods with modified secant condition. Journal of Industrial & Management Optimization, 2008, 4 (4) : 739-755. doi: 10.3934/jimo.2008.4.739

[11]

Wataru Nakamura, Yasushi Narushima, Hiroshi Yabe. Nonlinear conjugate gradient methods with sufficient descent properties for unconstrained optimization. Journal of Industrial & Management Optimization, 2013, 9 (3) : 595-619. doi: 10.3934/jimo.2013.9.595

[12]

Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, 2 (1) : 1-17. doi: 10.3934/fods.2020001

[13]

Jin-Mun Jeong, Seong-Ho Cho. Identification problems of retarded differential systems in Hilbert spaces. Evolution Equations & Control Theory, 2017, 6 (1) : 77-91. doi: 10.3934/eect.2017005

[14]

Giuseppe Da Prato, Franco Flandoli. Some results for pathwise uniqueness in Hilbert spaces. Communications on Pure & Applied Analysis, 2014, 13 (5) : 1789-1797. doi: 10.3934/cpaa.2014.13.1789

[15]

Guangcun Lu. The splitting lemmas for nonsmooth functionals on Hilbert spaces I. Discrete & Continuous Dynamical Systems - A, 2013, 33 (7) : 2939-2990. doi: 10.3934/dcds.2013.33.2939

[16]

Bernd Hofmann, Barbara Kaltenbacher, Elena Resmerita. Lavrentiev's regularization method in Hilbert spaces revisited. Inverse Problems & Imaging, 2016, 10 (3) : 741-764. doi: 10.3934/ipi.2016019

[17]

Hanbing Liu, Yongdong Huang, Chongjun Li. Weaving K-fusion frames in hilbert spaces. Mathematical Foundations of Computing, 2020, 3 (2) : 101-116. doi: 10.3934/mfc.2020008

[18]

Raffaele Chiappinelli. Eigenvalues of homogeneous gradient mappings in Hilbert space and the Birkoff-Kellogg theorem. Conference Publications, 2007, 2007 (Special) : 260-268. doi: 10.3934/proc.2007.2007.260

[19]

Saman Babaie–Kafaki, Reza Ghanbari. A class of descent four–term extension of the Dai–Liao conjugate gradient method based on the scaled memoryless BFGS update. Journal of Industrial & Management Optimization, 2017, 13 (2) : 649-658. doi: 10.3934/jimo.2016038

[20]

Gaohang Yu, Lutai Guan, Guoyin Li. Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property. Journal of Industrial & Management Optimization, 2008, 4 (3) : 565-579. doi: 10.3934/jimo.2008.4.565

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (47)
  • HTML views (62)
  • Cited by (0)

Other articles
by authors

[Back to Top]