November  2020, 3(4): 263-277. doi: 10.3934/mfc.2020010

Modeling interactive components by coordinate kernel polynomial models

1. 

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China

2. 

Division of Biostatistics, University of California, Berkeley, Berkeley, CA 94720, USA

3. 

Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN 37132, USA

* Corresponding author: Xin Guo

Received  October 2019 Published  June 2020

Fund Project: The work described in this paper is partially supported by FRCAC of Middle Tennessee State University, and is partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 25301115). All the three authors contributed equally to the paper

We proposed the use of coordinate kernel polynomials in kernel regression. This new approach, called coordinate kernel polynomial regression, can simultaneously identify active variables and effective interactive components. Reparametrization refinement is found critical to improve the modeling accuracy and prediction power. The post-training component selection allows one to identify effective interactive components. Generalization error bounds are used to explain the effectiveness of the algorithm from a learning theory perspective and simulation studies are used to show its empirical effectiveness.

Citation: Xin Guo, Lexin Li, Qiang Wu. Modeling interactive components by coordinate kernel polynomial models. Mathematical Foundations of Computing, 2020, 3 (4) : 263-277. doi: 10.3934/mfc.2020010
References:
[1]

F. R. Bach, Consistency of the group lasso and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), 1179-1225.   Google Scholar

[2]

P. L. Bartlett and S. Mendelson, Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res., 3 (2003), 463-482.  doi: 10.1162/153244303321897690.  Google Scholar

[3]

M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on Applied and Computational Mathematics, 12, Cambridge University Press, Cambridge, 2003. doi: 10.1017/CBO9780511543241.  Google Scholar

[4]

C. M. CarvalhoJ. ChangJ. E. LucasJ. R. NevinsQ. Wang and M. West, High-dimensional sparse factor modeling: Applications in gene expression genomics, J. Amer. Statist. Assoc., 103 (2008), 1438-1456.  doi: 10.1198/016214508000000869.  Google Scholar

[5]

C. Cortes, M. Mohri and A. Rostamizadeh, Learning non-linear combinations of kernels, in Advances in Neural Information Processing Systems, Curran Associates, Inc., (2009), 396–404. Google Scholar

[6]

J. H. Friedman, Multivariate adaptive regression splines, Ann. Statist., 19 (1991), 1-141.  doi: 10.1214/aos/1176347963.  Google Scholar

[7]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.  doi: 10.1023/A:1012487302797.  Google Scholar

[8]

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324.   Google Scholar

[9]

G. R. G. LanckrietN. CristianiniP. BartlettL. El Ghaoui and M. I. Jordan, Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2003/04), 27-72.   Google Scholar

[10]

S. L. Lauritzen, Graphical Models, Oxford Statistical Science Series, 17, The Clarendon Press, Oxford University Press, New York, 1996.  Google Scholar

[11]

L. Li and X. Yin, Sliced inverse regression with regularizations, Biometrics, 64 (2008), 124-131.  doi: 10.1111/j.1541-0420.2007.00836.x.  Google Scholar

[12]

F. Liang, K. Mao, M. Liao, S. Mukherjee and M. West, Nonparametric Bayesian Kernel Models, Technical report, Department of Statistical Science, Duke University, 2007. Google Scholar

[13]

Y. Lin and H. Zhang, Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34 (2006), 2272-2297.  doi: 10.1214/009053606000000722.  Google Scholar

[14]

C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989), 148–188  Google Scholar

[15]

R. Meir and T. Zhang, Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2004), 839-860.  doi: 10.1162/1532443041424300.  Google Scholar

[16]

S. Mukherjee and Q. Wu, Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res., 7 (2006), 2481-2514.   Google Scholar

[17]

S. Mukherjee and D.-X. Zhou, Learning coordinate covariances via gradients, J. Mach. Learn. Res., 7 (2006), 519-549.   Google Scholar

[18]

M. Pontil and C. Micchelli, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125.   Google Scholar

[19]

H. Qin and X. Guo, Semi-supervised learning with summary statistics, Anal. Appl. (Singap.), 17 (2019), 837-851.  doi: 10.1142/S0219530519400037.  Google Scholar

[20]

B. Schölkopf, A. Smola and K.-R. Müller, Kernel principal component analysis, in Artificial Neural Networks–ICANN'97, Lecture Notes in Computer Science, 1327, Springer, Berlin, Heidelberg, (1997), 583–588. Google Scholar

[21]

L. Shi, Distributed learning with indefinite kernels, Anal. Appl. (Singap.), 17 (2019), 947-975.  doi: 10.1142/S021953051850032X.  Google Scholar

[22]

T. P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs, Ann. Statist., 138–150. doi: 10.1214/aos/1176349846.  Google Scholar

[23]

R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.  doi: 10.1111/j.2517-6161.1996.tb02080.x.  Google Scholar

[24]

V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 1998.  Google Scholar

[25]

G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. doi: 10.1137/1.9781611970128.  Google Scholar

[26]

Q. Wang and X. Yin, A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE, Comput. Statist. Data Anal., 52 (2008), 4512-4520.  doi: 10.1016/j.csda.2008.03.003.  Google Scholar

[27]

Q. WuY. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[28]

Y. Xu and H. Zhang, Refinable kernels, J. Mach. Learn. Res., 8 (2007), 2083-2120.  doi: 10.1109/IIHMSP.2010.145.  Google Scholar

[29]

Y. Xu and H. Zhang, Refinement of reproducing kernels, J. Mach. Learn. Res., 10 (2009), 107-140.   Google Scholar

[30]

W. Yao and Q. Wang, Robust variable selection through MAVE, Comput. Statist. Data Anal., 63 (2013), 42-49.  doi: 10.1016/j.csda.2013.01.021.  Google Scholar

[31]

Y. Ying and C. Campbell, Rademacher chaos complexities for learning the kernel problem, Neural Comput., 22 (2010), 2858-2886.  doi: 10.1162/NECO_a_00028.  Google Scholar

[32]

H. H. Zhang, Variable selection for support vector machines via smoothing spline ANOVA, Statist. Sinica, 16 (2006), 659-674.   Google Scholar

[33]

N. ZhangZ. Yu and Q. Wu, Overlapping sliced inverse regression for dimension reduction, Anal. Appl. (Singap.), 17 (2019), 715-736.  doi: 10.1142/S0219530519400013.  Google Scholar

[34]

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (2005), 301-320.  doi: 10.1111/j.1467-9868.2005.00503.x.  Google Scholar

show all references

References:
[1]

F. R. Bach, Consistency of the group lasso and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), 1179-1225.   Google Scholar

[2]

P. L. Bartlett and S. Mendelson, Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res., 3 (2003), 463-482.  doi: 10.1162/153244303321897690.  Google Scholar

[3]

M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on Applied and Computational Mathematics, 12, Cambridge University Press, Cambridge, 2003. doi: 10.1017/CBO9780511543241.  Google Scholar

[4]

C. M. CarvalhoJ. ChangJ. E. LucasJ. R. NevinsQ. Wang and M. West, High-dimensional sparse factor modeling: Applications in gene expression genomics, J. Amer. Statist. Assoc., 103 (2008), 1438-1456.  doi: 10.1198/016214508000000869.  Google Scholar

[5]

C. Cortes, M. Mohri and A. Rostamizadeh, Learning non-linear combinations of kernels, in Advances in Neural Information Processing Systems, Curran Associates, Inc., (2009), 396–404. Google Scholar

[6]

J. H. Friedman, Multivariate adaptive regression splines, Ann. Statist., 19 (1991), 1-141.  doi: 10.1214/aos/1176347963.  Google Scholar

[7]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.  doi: 10.1023/A:1012487302797.  Google Scholar

[8]

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324.   Google Scholar

[9]

G. R. G. LanckrietN. CristianiniP. BartlettL. El Ghaoui and M. I. Jordan, Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2003/04), 27-72.   Google Scholar

[10]

S. L. Lauritzen, Graphical Models, Oxford Statistical Science Series, 17, The Clarendon Press, Oxford University Press, New York, 1996.  Google Scholar

[11]

L. Li and X. Yin, Sliced inverse regression with regularizations, Biometrics, 64 (2008), 124-131.  doi: 10.1111/j.1541-0420.2007.00836.x.  Google Scholar

[12]

F. Liang, K. Mao, M. Liao, S. Mukherjee and M. West, Nonparametric Bayesian Kernel Models, Technical report, Department of Statistical Science, Duke University, 2007. Google Scholar

[13]

Y. Lin and H. Zhang, Component selection and smoothing in multivariate nonparametric regression, Ann. Statist., 34 (2006), 2272-2297.  doi: 10.1214/009053606000000722.  Google Scholar

[14]

C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989), 148–188  Google Scholar

[15]

R. Meir and T. Zhang, Generalization error bounds for Bayesian mixture algorithms, J. Mach. Learn. Res., 4 (2004), 839-860.  doi: 10.1162/1532443041424300.  Google Scholar

[16]

S. Mukherjee and Q. Wu, Estimation of gradients and coordinate covariation in classification, J. Mach. Learn. Res., 7 (2006), 2481-2514.   Google Scholar

[17]

S. Mukherjee and D.-X. Zhou, Learning coordinate covariances via gradients, J. Mach. Learn. Res., 7 (2006), 519-549.   Google Scholar

[18]

M. Pontil and C. Micchelli, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), 1099-1125.   Google Scholar

[19]

H. Qin and X. Guo, Semi-supervised learning with summary statistics, Anal. Appl. (Singap.), 17 (2019), 837-851.  doi: 10.1142/S0219530519400037.  Google Scholar

[20]

B. Schölkopf, A. Smola and K.-R. Müller, Kernel principal component analysis, in Artificial Neural Networks–ICANN'97, Lecture Notes in Computer Science, 1327, Springer, Berlin, Heidelberg, (1997), 583–588. Google Scholar

[21]

L. Shi, Distributed learning with indefinite kernels, Anal. Appl. (Singap.), 17 (2019), 947-975.  doi: 10.1142/S021953051850032X.  Google Scholar

[22]

T. P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs, Ann. Statist., 138–150. doi: 10.1214/aos/1176349846.  Google Scholar

[23]

R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.  doi: 10.1111/j.2517-6161.1996.tb02080.x.  Google Scholar

[24]

V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc., New York, 1998.  Google Scholar

[25]

G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. doi: 10.1137/1.9781611970128.  Google Scholar

[26]

Q. Wang and X. Yin, A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE, Comput. Statist. Data Anal., 52 (2008), 4512-4520.  doi: 10.1016/j.csda.2008.03.003.  Google Scholar

[27]

Q. WuY. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, J. Complexity, 23 (2007), 108-134.  doi: 10.1016/j.jco.2006.06.007.  Google Scholar

[28]

Y. Xu and H. Zhang, Refinable kernels, J. Mach. Learn. Res., 8 (2007), 2083-2120.  doi: 10.1109/IIHMSP.2010.145.  Google Scholar

[29]

Y. Xu and H. Zhang, Refinement of reproducing kernels, J. Mach. Learn. Res., 10 (2009), 107-140.   Google Scholar

[30]

W. Yao and Q. Wang, Robust variable selection through MAVE, Comput. Statist. Data Anal., 63 (2013), 42-49.  doi: 10.1016/j.csda.2013.01.021.  Google Scholar

[31]

Y. Ying and C. Campbell, Rademacher chaos complexities for learning the kernel problem, Neural Comput., 22 (2010), 2858-2886.  doi: 10.1162/NECO_a_00028.  Google Scholar

[32]

H. H. Zhang, Variable selection for support vector machines via smoothing spline ANOVA, Statist. Sinica, 16 (2006), 659-674.   Google Scholar

[33]

N. ZhangZ. Yu and Q. Wu, Overlapping sliced inverse regression for dimension reduction, Anal. Appl. (Singap.), 17 (2019), 715-736.  doi: 10.1142/S0219530519400013.  Google Scholar

[34]

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (2005), 301-320.  doi: 10.1111/j.1467-9868.2005.00503.x.  Google Scholar

Table 1.  Variable selection accuracy and average MSE for Example 1
Algorithm TPR($ x_1 $) TPR($ x_2 $) FPR MSE
CKPR-L 1.00 1.00 0.000 0.008 (0.000)
CKPR-G 1.00 1.00 0.011 0.109 (0.015)
LASSO 1.00 0.18 0.040 1.129 (0.015)
COSSO 0.90 0.02 0.020 10.879 (8.345)
SR-SIR (AIC) 1.00 0.89 0.460 -
SR-SIR (BIC) 1.00 0.85 0.181 -
SR-SIR (RIC) 1.00 0.75 0.053 -
Algorithm TPR($ x_1 $) TPR($ x_2 $) FPR MSE
CKPR-L 1.00 1.00 0.000 0.008 (0.000)
CKPR-G 1.00 1.00 0.011 0.109 (0.015)
LASSO 1.00 0.18 0.040 1.129 (0.015)
COSSO 0.90 0.02 0.020 10.879 (8.345)
SR-SIR (AIC) 1.00 0.89 0.460 -
SR-SIR (BIC) 1.00 0.85 0.181 -
SR-SIR (RIC) 1.00 0.75 0.053 -
Table 2.  Average and standard error of MSEs for Example 2
$ m=100 $ $ m=200 $ $ m=400 $
CKPR-G 0.119 (0.003) 0.054 (0.001) 0.025 (0.0004)
COSSO(GCV) 0.358 (0.009) 0.100 (0.003) 0.045 (0.001)
COSSO(5CV) 0.378 (0.005) 0.094 (0.004) 0.043 (0.001)
MARS 0.239 (0.008) 0.109 (0.003) 0.084 (0.001)
$ m=100 $ $ m=200 $ $ m=400 $
CKPR-G 0.119 (0.003) 0.054 (0.001) 0.025 (0.0004)
COSSO(GCV) 0.358 (0.009) 0.100 (0.003) 0.045 (0.001)
COSSO(5CV) 0.378 (0.005) 0.094 (0.004) 0.043 (0.001)
MARS 0.239 (0.008) 0.109 (0.003) 0.084 (0.001)
Table 3.  RMSE on three UCI data sets
Ionosphere Sonar MR Wisc. BC
$ n $ 351 208 683
$ p $ 33 60 9
CKPR-L $ 0.64 (0.04) $ $ 0.75 (0.06) $ $ 0.34 (0.02) $
CKPR-G $ 0.54 (0.03) $ $ 0.77 (0.06) $ $ 0.34 (0.02) $
Best in [5] $ 0.60 (0.05) $ $ 0.80 (0.04) $ $ 0.70 (0.01) $
Ionosphere Sonar MR Wisc. BC
$ n $ 351 208 683
$ p $ 33 60 9
CKPR-L $ 0.64 (0.04) $ $ 0.75 (0.06) $ $ 0.34 (0.02) $
CKPR-G $ 0.54 (0.03) $ $ 0.77 (0.06) $ $ 0.34 (0.02) $
Best in [5] $ 0.60 (0.05) $ $ 0.80 (0.04) $ $ 0.70 (0.01) $
[1]

Juliang Zhang, Jian Chen. Information sharing in a make-to-stock supply chain. Journal of Industrial & Management Optimization, 2014, 10 (4) : 1169-1189. doi: 10.3934/jimo.2014.10.1169

[2]

Francisco Braun, Jaume Llibre, Ana Cristina Mereu. Isochronicity for trivial quintic and septic planar polynomial Hamiltonian systems. Discrete & Continuous Dynamical Systems - A, 2016, 36 (10) : 5245-5255. doi: 10.3934/dcds.2016029

[3]

Jérôme Ducoat, Frédérique Oggier. On skew polynomial codes and lattices from quotients of cyclic division algebras. Advances in Mathematics of Communications, 2016, 10 (1) : 79-94. doi: 10.3934/amc.2016.10.79

[4]

Armin Lechleiter, Tobias Rienmüller. Factorization method for the inverse Stokes problem. Inverse Problems & Imaging, 2013, 7 (4) : 1271-1293. doi: 10.3934/ipi.2013.7.1271

[5]

Qiang Guo, Dong Liang. An adaptive wavelet method and its analysis for parabolic equations. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 327-345. doi: 10.3934/naco.2013.3.327

[6]

Tao Wu, Yu Lei, Jiao Shi, Maoguo Gong. An evolutionary multiobjective method for low-rank and sparse matrix decomposition. Big Data & Information Analytics, 2017, 2 (1) : 23-37. doi: 10.3934/bdia.2017006

[7]

Deren Han, Zehui Jia, Yongzhong Song, David Z. W. Wang. An efficient projection method for nonlinear inverse problems with sparsity constraints. Inverse Problems & Imaging, 2016, 10 (3) : 689-709. doi: 10.3934/ipi.2016017

[8]

Boris Kramer, John R. Singler. A POD projection method for large-scale algebraic Riccati equations. Numerical Algebra, Control & Optimization, 2016, 6 (4) : 413-435. doi: 10.3934/naco.2016018

[9]

Petra Csomós, Hermann Mena. Fourier-splitting method for solving hyperbolic LQR problems. Numerical Algebra, Control & Optimization, 2018, 8 (1) : 17-46. doi: 10.3934/naco.2018002

[10]

Christina Surulescu, Nicolae Surulescu. Modeling and simulation of some cell dispersion problems by a nonparametric method. Mathematical Biosciences & Engineering, 2011, 8 (2) : 263-277. doi: 10.3934/mbe.2011.8.263

[11]

Jiangxing Wang. Convergence analysis of an accurate and efficient method for nonlinear Maxwell's equations. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2429-2440. doi: 10.3934/dcdsb.2020185

[12]

Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225

[13]

Johannes Kellendonk, Lorenzo Sadun. Conjugacies of model sets. Discrete & Continuous Dynamical Systems - A, 2017, 37 (7) : 3805-3830. doi: 10.3934/dcds.2017161

[14]

Min Li. A three term Polak-Ribière-Polyak conjugate gradient method close to the memoryless BFGS quasi-Newton method. Journal of Industrial & Management Optimization, 2020, 16 (1) : 245-260. doi: 10.3934/jimo.2018149

[15]

Didier Bresch, Thierry Colin, Emmanuel Grenier, Benjamin Ribba, Olivier Saut. A viscoelastic model for avascular tumor growth. Conference Publications, 2009, 2009 (Special) : 101-108. doi: 10.3934/proc.2009.2009.101

[16]

Ondrej Budáč, Michael Herrmann, Barbara Niethammer, Andrej Spielmann. On a model for mass aggregation with maximal size. Kinetic & Related Models, 2011, 4 (2) : 427-439. doi: 10.3934/krm.2011.4.427

[17]

Manfred Einsiedler, Elon Lindenstrauss. On measures invariant under diagonalizable actions: the Rank-One case and the general Low-Entropy method. Journal of Modern Dynamics, 2008, 2 (1) : 83-128. doi: 10.3934/jmd.2008.2.83

[18]

Xiaomao Deng, Xiao-Chuan Cai, Jun Zou. A parallel space-time domain decomposition method for unsteady source inversion problems. Inverse Problems & Imaging, 2015, 9 (4) : 1069-1091. doi: 10.3934/ipi.2015.9.1069

[19]

Marion Darbas, Jérémy Heleine, Stephanie Lohrengel. Numerical resolution by the quasi-reversibility method of a data completion problem for Maxwell's equations. Inverse Problems & Imaging, 2020, 14 (6) : 1107-1133. doi: 10.3934/ipi.2020056

[20]

Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]