November  2020, 3(4): 263-277. doi: 10.3934/mfc.2020010

Modeling interactive components by coordinate kernel polynomial models

1. 

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China

2. 

Division of Biostatistics, University of California, Berkeley, Berkeley, CA 94720, USA

3. 

Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN 37132, USA

* Corresponding author: Xin Guo

Received  October 2019 Published  November 2020 Early access  June 2020

Fund Project: The work described in this paper is partially supported by FRCAC of Middle Tennessee State University, and is partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 25301115). All the three authors contributed equally to the paper

We proposed the use of coordinate kernel polynomials in kernel regression. This new approach, called coordinate kernel polynomial regression, can simultaneously identify active variables and effective interactive components. Reparametrization refinement is found critical to improve the modeling accuracy and prediction power. The post-training component selection allows one to identify effective interactive components. Generalization error bounds are used to explain the effectiveness of the algorithm from a learning theory perspective and simulation studies are used to show its empirical effectiveness.

Citation: Xin Guo, Lexin Li, Qiang Wu. Modeling interactive components by coordinate kernel polynomial models. Mathematical Foundations of Computing, 2020, 3 (4) : 263-277. doi: 10.3934/mfc.2020010
References:
[1]

F. R. Bach, Consistency of the group lasso and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), 1179-1225. 

[2]

P. L. Bartlett and S. Mendelson, Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res., 3 (2003), 463-482.  doi: 10.1162/153244303321897690.

[3]

M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on Applied and Computational Mathematics, 12, Cambridge University Press, Cambridge, 2003. doi: 10.1017/CBO9780511543241.

[4]

C. M. CarvalhoJ. ChangJ. E. LucasJ. R. NevinsQ. Wang and M. West, High-dimensional sparse factor modeling: Applications in gene expression genomics, J. Amer. Statist. Assoc., 103 (2008), 1438-1456.  doi: 10.1198/016214508000000869.

[5]

C. Cortes, M. Mohri and A. Rostamizadeh, Learning non-linear combinations of kernels, in Advances in Neural Information Processing Systems, Curran Associates, Inc., (2009), 396–404.

[6]

J. H. Friedman, Multivariate adaptive regression splines, Ann. Statist., 19 (1991), 1-141.  doi: 10.1214/aos/1176347963.

[7]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.  doi: 10.1023/A:1012487302797.

[8]

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324. 

[9]

G. R. G. LanckrietN. CristianiniP. BartlettL. El Ghaoui and M. I. Jordan, Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2003/04), 27-72. 

[10]

S. L. Lauritzen, Graphical Models, Oxford Statistical Science Series, 17, The Clarendon Press, Oxford University Press, New York, 1996.

[11]

L. Li and X. Yin, Sliced inverse regression with regularizations, Biometrics, 64 (2008), 124-131.  doi: 10.1111/j.1541-0420.2007.00836.x.

[12]

F. Liang, K. Mao, M. Liao, S. Mukherjee and M. West, Nonparametric Bayesian Kernel Models, Technical report, Department of Statistical Science, Duke University, 2007.

[13]

Y. Lin and H. Zhang, Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34 (2006), 2272-2297.  doi: 10.1214/009053606000000722.

[14]

C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989), 148–188

[15]

R. Meir and T. Zhang, Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2004), 839-860.  doi: 10.1162/1532443041424300.

[16]

S. Mukherjee and Q. Wu, Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res., 7 (2006), 2481-2514. 

[17]

S. Mukherjee and D.-X. Zhou, Learning coordinate covariances via gradients, J. Mach. Learn. Res., 7 (2006), 519-549. 

[18]

M. Pontil and C. Micchelli, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125. 

[19]

H. Qin and X. Guo, Semi-supervised learning with summary statistics, Anal. Appl. (Singap.), 17 (2019), 837-851.  doi: 10.1142/S0219530519400037.

[20]

B. Schölkopf, A. Smola and K.-R. Müller, Kernel principal component analysis, in Artificial Neural Networks–ICANN'97, Lecture Notes in Computer Science, 1327, Springer, Berlin, Heidelberg, (1997), 583–588.

[21]

L. Shi, Distributed learning with indefinite kernels, Anal. Appl. (Singap.), 17 (2019), 947-975.  doi: 10.1142/S021953051850032X.

[22]

T. P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs, Ann. Statist., 138–150. doi: 10.1214/aos/1176349846.

[23]

R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.  doi: 10.1111/j.2517-6161.1996.tb02080.x.

[24]

V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 1998.

[25]

G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. doi: 10.1137/1.9781611970128.

[26]

Q. Wang and X. Yin, A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE, Comput. Statist. Data Anal., 52 (2008), 4512-4520.  doi: 10.1016/j.csda.2008.03.003.

[27]

Q. WuY. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.

[28]

Y. Xu and H. Zhang, Refinable kernels, J. Mach. Learn. Res., 8 (2007), 2083-2120.  doi: 10.1109/IIHMSP.2010.145.

[29]

Y. Xu and H. Zhang, Refinement of reproducing kernels, J. Mach. Learn. Res., 10 (2009), 107-140. 

[30]

W. Yao and Q. Wang, Robust variable selection through MAVE, Comput. Statist. Data Anal., 63 (2013), 42-49.  doi: 10.1016/j.csda.2013.01.021.

[31]

Y. Ying and C. Campbell, Rademacher chaos complexities for learning the kernel problem, Neural Comput., 22 (2010), 2858-2886.  doi: 10.1162/NECO_a_00028.

[32]

H. H. Zhang, Variable selection for support vector machines via smoothing spline ANOVA, Statist. Sinica, 16 (2006), 659-674. 

[33]

N. ZhangZ. Yu and Q. Wu, Overlapping sliced inverse regression for dimension reduction, Anal. Appl. (Singap.), 17 (2019), 715-736.  doi: 10.1142/S0219530519400013.

[34]

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (2005), 301-320.  doi: 10.1111/j.1467-9868.2005.00503.x.

show all references

References:
[1]

F. R. Bach, Consistency of the group lasso and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), 1179-1225. 

[2]

P. L. Bartlett and S. Mendelson, Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res., 3 (2003), 463-482.  doi: 10.1162/153244303321897690.

[3]

M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on Applied and Computational Mathematics, 12, Cambridge University Press, Cambridge, 2003. doi: 10.1017/CBO9780511543241.

[4]

C. M. CarvalhoJ. ChangJ. E. LucasJ. R. NevinsQ. Wang and M. West, High-dimensional sparse factor modeling: Applications in gene expression genomics, J. Amer. Statist. Assoc., 103 (2008), 1438-1456.  doi: 10.1198/016214508000000869.

[5]

C. Cortes, M. Mohri and A. Rostamizadeh, Learning non-linear combinations of kernels, in Advances in Neural Information Processing Systems, Curran Associates, Inc., (2009), 396–404.

[6]

J. H. Friedman, Multivariate adaptive regression splines, Ann. Statist., 19 (1991), 1-141.  doi: 10.1214/aos/1176347963.

[7]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.  doi: 10.1023/A:1012487302797.

[8]

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324. 

[9]

G. R. G. LanckrietN. CristianiniP. BartlettL. El Ghaoui and M. I. Jordan, Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2003/04), 27-72. 

[10]

S. L. Lauritzen, Graphical Models, Oxford Statistical Science Series, 17, The Clarendon Press, Oxford University Press, New York, 1996.

[11]

L. Li and X. Yin, Sliced inverse regression with regularizations, Biometrics, 64 (2008), 124-131.  doi: 10.1111/j.1541-0420.2007.00836.x.

[12]

F. Liang, K. Mao, M. Liao, S. Mukherjee and M. West, Nonparametric Bayesian Kernel Models, Technical report, Department of Statistical Science, Duke University, 2007.

[13]

Y. Lin and H. Zhang, Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34 (2006), 2272-2297.  doi: 10.1214/009053606000000722.

[14]

C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989), 148–188

[15]

R. Meir and T. Zhang, Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2004), 839-860.  doi: 10.1162/1532443041424300.

[16]

S. Mukherjee and Q. Wu, Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res., 7 (2006), 2481-2514. 

[17]

S. Mukherjee and D.-X. Zhou, Learning coordinate covariances via gradients, J. Mach. Learn. Res., 7 (2006), 519-549. 

[18]

M. Pontil and C. Micchelli, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125. 

[19]

H. Qin and X. Guo, Semi-supervised learning with summary statistics, Anal. Appl. (Singap.), 17 (2019), 837-851.  doi: 10.1142/S0219530519400037.

[20]

B. Schölkopf, A. Smola and K.-R. Müller, Kernel principal component analysis, in Artificial Neural Networks–ICANN'97, Lecture Notes in Computer Science, 1327, Springer, Berlin, Heidelberg, (1997), 583–588.

[21]

L. Shi, Distributed learning with indefinite kernels, Anal. Appl. (Singap.), 17 (2019), 947-975.  doi: 10.1142/S021953051850032X.

[22]

T. P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs, Ann. Statist., 138–150. doi: 10.1214/aos/1176349846.

[23]

R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.  doi: 10.1111/j.2517-6161.1996.tb02080.x.

[24]

V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 1998.

[25]

G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. doi: 10.1137/1.9781611970128.

[26]

Q. Wang and X. Yin, A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE, Comput. Statist. Data Anal., 52 (2008), 4512-4520.  doi: 10.1016/j.csda.2008.03.003.

[27]

Q. WuY. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.

[28]

Y. Xu and H. Zhang, Refinable kernels, J. Mach. Learn. Res., 8 (2007), 2083-2120.  doi: 10.1109/IIHMSP.2010.145.

[29]

Y. Xu and H. Zhang, Refinement of reproducing kernels, J. Mach. Learn. Res., 10 (2009), 107-140. 

[30]

W. Yao and Q. Wang, Robust variable selection through MAVE, Comput. Statist. Data Anal., 63 (2013), 42-49.  doi: 10.1016/j.csda.2013.01.021.

[31]

Y. Ying and C. Campbell, Rademacher chaos complexities for learning the kernel problem, Neural Comput., 22 (2010), 2858-2886.  doi: 10.1162/NECO_a_00028.

[32]

H. H. Zhang, Variable selection for support vector machines via smoothing spline ANOVA, Statist. Sinica, 16 (2006), 659-674. 

[33]

N. ZhangZ. Yu and Q. Wu, Overlapping sliced inverse regression for dimension reduction, Anal. Appl. (Singap.), 17 (2019), 715-736.  doi: 10.1142/S0219530519400013.

[34]

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (2005), 301-320.  doi: 10.1111/j.1467-9868.2005.00503.x.

Table 1.  Variable selection accuracy and average MSE for Example 1
Algorithm TPR($ x_1 $) TPR($ x_2 $) FPR MSE
CKPR-L 1.00 1.00 0.000 0.008 (0.000)
CKPR-G 1.00 1.00 0.011 0.109 (0.015)
LASSO 1.00 0.18 0.040 1.129 (0.015)
COSSO 0.90 0.02 0.020 10.879 (8.345)
SR-SIR (AIC) 1.00 0.89 0.460 -
SR-SIR (BIC) 1.00 0.85 0.181 -
SR-SIR (RIC) 1.00 0.75 0.053 -
Algorithm TPR($ x_1 $) TPR($ x_2 $) FPR MSE
CKPR-L 1.00 1.00 0.000 0.008 (0.000)
CKPR-G 1.00 1.00 0.011 0.109 (0.015)
LASSO 1.00 0.18 0.040 1.129 (0.015)
COSSO 0.90 0.02 0.020 10.879 (8.345)
SR-SIR (AIC) 1.00 0.89 0.460 -
SR-SIR (BIC) 1.00 0.85 0.181 -
SR-SIR (RIC) 1.00 0.75 0.053 -
Table 2.  Average and standard error of MSEs for Example 2
$ m=100 $ $ m=200 $ $ m=400 $
CKPR-G 0.119 (0.003) 0.054 (0.001) 0.025 (0.0004)
COSSO(GCV) 0.358 (0.009) 0.100 (0.003) 0.045 (0.001)
COSSO(5CV) 0.378 (0.005) 0.094 (0.004) 0.043 (0.001)
MARS 0.239 (0.008) 0.109 (0.003) 0.084 (0.001)
$ m=100 $ $ m=200 $ $ m=400 $
CKPR-G 0.119 (0.003) 0.054 (0.001) 0.025 (0.0004)
COSSO(GCV) 0.358 (0.009) 0.100 (0.003) 0.045 (0.001)
COSSO(5CV) 0.378 (0.005) 0.094 (0.004) 0.043 (0.001)
MARS 0.239 (0.008) 0.109 (0.003) 0.084 (0.001)
Table 3.  RMSE on three UCI data sets
Ionosphere Sonar MR Wisc. BC
$ n $ 351 208 683
$ p $ 33 60 9
CKPR-L $ 0.64 (0.04) $ $ 0.75 (0.06) $ $ 0.34 (0.02) $
CKPR-G $ 0.54 (0.03) $ $ 0.77 (0.06) $ $ 0.34 (0.02) $
Best in [5] $ 0.60 (0.05) $ $ 0.80 (0.04) $ $ 0.70 (0.01) $
Ionosphere Sonar MR Wisc. BC
$ n $ 351 208 683
$ p $ 33 60 9
CKPR-L $ 0.64 (0.04) $ $ 0.75 (0.06) $ $ 0.34 (0.02) $
CKPR-G $ 0.54 (0.03) $ $ 0.77 (0.06) $ $ 0.34 (0.02) $
Best in [5] $ 0.60 (0.05) $ $ 0.80 (0.04) $ $ 0.70 (0.01) $
[1]

Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186

[2]

Yanqin Bai, Pengfei Ma, Jing Zhang. A polynomial-time interior-point method for circular cone programming based on kernel functions. Journal of Industrial and Management Optimization, 2016, 12 (2) : 739-756. doi: 10.3934/jimo.2016.12.739

[3]

Ali Akgül. A new application of the reproducing kernel method. Discrete and Continuous Dynamical Systems - S, 2021, 14 (7) : 2041-2053. doi: 10.3934/dcdss.2020261

[4]

Artur O. Lopes, Jairo K. Mengue. On information gain, Kullback-Leibler divergence, entropy production and the involution kernel. Discrete and Continuous Dynamical Systems, 2022, 42 (7) : 3593-3627. doi: 10.3934/dcds.2022026

[5]

Matthew O. Williams, Clarence W. Rowley, Ioannis G. Kevrekidis. A kernel-based method for data-driven koopman spectral analysis. Journal of Computational Dynamics, 2015, 2 (2) : 247-265. doi: 10.3934/jcd.2015005

[6]

Tadele Mengesha, Qiang Du. Analysis of a scalar nonlocal peridynamic model with a sign changing kernel. Discrete and Continuous Dynamical Systems - B, 2013, 18 (5) : 1415-1437. doi: 10.3934/dcdsb.2013.18.1415

[7]

Marjan Uddin, Hazrat Ali. Space-time kernel based numerical method for generalized Black-Scholes equation. Discrete and Continuous Dynamical Systems - S, 2020, 13 (10) : 2905-2915. doi: 10.3934/dcdss.2020221

[8]

Ali Akgül, Mustafa Inc, Esra Karatas. Reproducing kernel functions for difference equations. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1055-1064. doi: 10.3934/dcdss.2015.8.1055

[9]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[10]

Fırat Evirgen, Sümeyra Uçar, Necati Özdemir, Zakia Hammouch. System response of an alcoholism model under the effect of immigration via non-singular kernel derivative. Discrete and Continuous Dynamical Systems - S, 2021, 14 (7) : 2199-2212. doi: 10.3934/dcdss.2020145

[11]

Vitaly Bergelson, Joanna Kułaga-Przymus, Mariusz Lemańczyk, Florian K. Richter. A generalization of Kátai's orthogonality criterion with applications. Discrete and Continuous Dynamical Systems, 2019, 39 (5) : 2581-2612. doi: 10.3934/dcds.2019108

[12]

Pierre Baras. A generalization of a criterion for the existence of solutions to semilinear elliptic equations. Discrete and Continuous Dynamical Systems - S, 2021, 14 (2) : 465-504. doi: 10.3934/dcdss.2020439

[13]

Alfredo Lorenzi, Eugenio Sinestrari. Identifying a BV-kernel in a hyperbolic integrodifferential equation. Discrete and Continuous Dynamical Systems, 2008, 21 (4) : 1199-1219. doi: 10.3934/dcds.2008.21.1199

[14]

Xiao-Qiang Zhao, Shengfan Zhou. Kernel sections for processes and nonautonomous lattice systems. Discrete and Continuous Dynamical Systems - B, 2008, 9 (3&4, May) : 763-785. doi: 10.3934/dcdsb.2008.9.763

[15]

François Bolley, Arnaud Guillin, Xinyu Wang. Non ultracontractive heat kernel bounds by Lyapunov conditions. Discrete and Continuous Dynamical Systems, 2015, 35 (3) : 857-870. doi: 10.3934/dcds.2015.35.857

[16]

Sandra Carillo, Vanda Valente, Giorgio Vergara Caffarelli. Heat conduction with memory: A singular kernel problem. Evolution Equations and Control Theory, 2014, 3 (3) : 399-410. doi: 10.3934/eect.2014.3.399

[17]

Nadia Hazzam, Zakia Kebbiche. A primal-dual interior point method for $ P_{\ast }\left( \kappa \right) $-HLCP based on a class of parametric kernel functions. Numerical Algebra, Control and Optimization, 2021, 11 (4) : 513-531. doi: 10.3934/naco.2020053

[18]

Badr Saad T. Alkahtani, Ilknur Koca. A new numerical scheme applied on re-visited nonlinear model of predator-prey based on derivative with non-local and non-singular kernel. Discrete and Continuous Dynamical Systems - S, 2020, 13 (3) : 429-442. doi: 10.3934/dcdss.2020024

[19]

Giorgio Metafune, Chiara Spina. Heat Kernel estimates for some elliptic operators with unbounded diffusion coefficients. Discrete and Continuous Dynamical Systems, 2012, 32 (6) : 2285-2299. doi: 10.3934/dcds.2012.32.2285

[20]

Steven G. Krantz and Marco M. Peloso. New results on the Bergman kernel of the worm domain in complex space. Electronic Research Announcements, 2007, 14: 35-41. doi: 10.3934/era.2007.14.35

 Impact Factor: 

Metrics

  • PDF downloads (297)
  • HTML views (536)
  • Cited by (0)

Other articles
by authors

[Back to Top]