July  2016, 1(2&3): 217-225. doi: 10.3934/bdia.2016005

Forward supervised discretization for multivariate with categorical responses

1. 

School of Mathematics and Information Science, Guangzhou University, Guangzhou, Guangdong 510006, China, China

Received  April 2016 Revised  September 2016 Published  September 2016

Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in [12,13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.
Citation: Wenxue Huang, Qitian Qiu. Forward supervised discretization for multivariate with categorical responses. Big Data & Information Analytics, 2016, 1 (2&3) : 217-225. doi: 10.3934/bdia.2016005
References:
[1]

M. Boulle, Khiops: A statistical discretization method of continuous attributes,, Machine Learning, 55 (2004), 53. doi: 10.1023/B:MACH.0000019804.29836.05. Google Scholar

[2]

J. Catlett, On changing continuous attributes into ordered discrete attributes,, In: Machine LearningEWSL-91, 482 (1991), 164. doi: 10.1007/BFb0017012. Google Scholar

[3]

D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization,, Journal of Experimental and Theoretical Artificial Intelligence, 2 (1989), 117. doi: 10.1080/09528139008953718. Google Scholar

[4]

M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning,, International Journal of Approximate Reasoning, 15 (1996), 319. doi: 10.1016/S0888-613X(96)00074-6. Google Scholar

[5]

J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features,, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2 (1995), 194. doi: 10.1016/B978-1-55860-377-6.50032-3. Google Scholar

[6]

U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning,, Proceedings of the International Joint Conference on Uncertainty in AI, 2 (1993), 1022. Google Scholar

[7]

G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability),, Society for Industrial and Applied Mathematics, 20 (2007). doi: 10.1137/1.9780898718348. Google Scholar

[8]

L. Goodman and W. Kruskal, Measures of association for cross classifications,, Journal of the American Statistical Association, 49 (1954), 732. Google Scholar

[9]

I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection,, Applied Physics Letters, 3 (2002), 1157. Google Scholar

[10]

R. Holte, Very sim1ple classification rules perform well on most commonly used datasets,, Machine Learning, 11 (1993), 63. Google Scholar

[11]

W. Huang and Y. Pan, On balalncing between optimal and proportional predictions,, Big Data and Information Analytics, 1 (2016), 129. Google Scholar

[12]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\tau$,, In Procedia Computer Science, 17 (2013), 114. Google Scholar

[13]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\lambda$,, Procedia Computer Science, 30 (2014), 75. Google Scholar

[14]

W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data,, Communications in Statistics - Theory and Methods, (). Google Scholar

[15]

R. Kerber, Chimerge: Discretization of numeric attributes,, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, (1994), 123. Google Scholar

[16]

S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey,, GESTS International Transactions on Computer Science and Engineering, 32 (2006), 47. Google Scholar

[17]

H. Liu and R. Setiono, Chi2: Feature selection and discretization of numeric attributes,, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55 (1995), 388. Google Scholar

[18]

C. Lloyd, Statistical Analysis with Missing Data,, John Wiley & Sons, (1987). Google Scholar

[19]

J. MacQueen, Some methods for classification and analysis of multivariate observations,, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1 (1967), 281. Google Scholar

[20]

D. Olson and Y. Shi, Introduction to business data mining,, Knowledge and information systems, (2007). Google Scholar

[21]

I. Rish, An empirical study of the naive bayes classifier,, IJCAI 2001 workshop on empirical methods in artificial intelligence, (2001), 41. Google Scholar

[22]

S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology,, IEEE Transactions on Systems, 21 (1991), 660. doi: 10.1109/21.97458. Google Scholar

[23]

STATCAN, Survey of Family Expenditures, - 1996., (1996). Google Scholar

[24]

K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning,, Basser Department of Computer Science, (1994). Google Scholar

show all references

References:
[1]

M. Boulle, Khiops: A statistical discretization method of continuous attributes,, Machine Learning, 55 (2004), 53. doi: 10.1023/B:MACH.0000019804.29836.05. Google Scholar

[2]

J. Catlett, On changing continuous attributes into ordered discrete attributes,, In: Machine LearningEWSL-91, 482 (1991), 164. doi: 10.1007/BFb0017012. Google Scholar

[3]

D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization,, Journal of Experimental and Theoretical Artificial Intelligence, 2 (1989), 117. doi: 10.1080/09528139008953718. Google Scholar

[4]

M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning,, International Journal of Approximate Reasoning, 15 (1996), 319. doi: 10.1016/S0888-613X(96)00074-6. Google Scholar

[5]

J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features,, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2 (1995), 194. doi: 10.1016/B978-1-55860-377-6.50032-3. Google Scholar

[6]

U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning,, Proceedings of the International Joint Conference on Uncertainty in AI, 2 (1993), 1022. Google Scholar

[7]

G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability),, Society for Industrial and Applied Mathematics, 20 (2007). doi: 10.1137/1.9780898718348. Google Scholar

[8]

L. Goodman and W. Kruskal, Measures of association for cross classifications,, Journal of the American Statistical Association, 49 (1954), 732. Google Scholar

[9]

I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection,, Applied Physics Letters, 3 (2002), 1157. Google Scholar

[10]

R. Holte, Very sim1ple classification rules perform well on most commonly used datasets,, Machine Learning, 11 (1993), 63. Google Scholar

[11]

W. Huang and Y. Pan, On balalncing between optimal and proportional predictions,, Big Data and Information Analytics, 1 (2016), 129. Google Scholar

[12]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\tau$,, In Procedia Computer Science, 17 (2013), 114. Google Scholar

[13]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\lambda$,, Procedia Computer Science, 30 (2014), 75. Google Scholar

[14]

W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data,, Communications in Statistics - Theory and Methods, (). Google Scholar

[15]

R. Kerber, Chimerge: Discretization of numeric attributes,, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, (1994), 123. Google Scholar

[16]

S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey,, GESTS International Transactions on Computer Science and Engineering, 32 (2006), 47. Google Scholar

[17]

H. Liu and R. Setiono, Chi2: Feature selection and discretization of numeric attributes,, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55 (1995), 388. Google Scholar

[18]

C. Lloyd, Statistical Analysis with Missing Data,, John Wiley & Sons, (1987). Google Scholar

[19]

J. MacQueen, Some methods for classification and analysis of multivariate observations,, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1 (1967), 281. Google Scholar

[20]

D. Olson and Y. Shi, Introduction to business data mining,, Knowledge and information systems, (2007). Google Scholar

[21]

I. Rish, An empirical study of the naive bayes classifier,, IJCAI 2001 workshop on empirical methods in artificial intelligence, (2001), 41. Google Scholar

[22]

S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology,, IEEE Transactions on Systems, 21 (1991), 660. doi: 10.1109/21.97458. Google Scholar

[23]

STATCAN, Survey of Family Expenditures, - 1996., (1996). Google Scholar

[24]

K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning,, Basser Department of Computer Science, (1994). Google Scholar

[1]

Marin Kobilarov, Jerrold E. Marsden, Gaurav S. Sukhatme. Geometric discretization of nonholonomic systems with symmetries. Discrete & Continuous Dynamical Systems - S, 2010, 3 (1) : 61-84. doi: 10.3934/dcdss.2010.3.61

[2]

Michal Fečkan, Michal Pospíšil. Discretization of dynamical systems with first integrals. Discrete & Continuous Dynamical Systems - A, 2013, 33 (8) : 3543-3554. doi: 10.3934/dcds.2013.33.3543

[3]

Fernando Jiménez, Jürgen Scheurle. On some aspects of the discretization of the suslov problem. Journal of Geometric Mechanics, 2018, 10 (1) : 43-68. doi: 10.3934/jgm.2018002

[4]

Matthieu Hillairet, Alexei Lozinski, Marcela Szopos. On discretization in time in simulations of particulate flows. Discrete & Continuous Dynamical Systems - B, 2011, 15 (4) : 935-956. doi: 10.3934/dcdsb.2011.15.935

[5]

Mathieu Desbrun, Evan S. Gawlik, François Gay-Balmaz, Vladimir Zeitlin. Variational discretization for rotating stratified fluids. Discrete & Continuous Dynamical Systems - A, 2014, 34 (2) : 477-509. doi: 10.3934/dcds.2014.34.477

[6]

P.E. Kloeden, Victor S. Kozyakin. Uniform nonautonomous attractors under discretization. Discrete & Continuous Dynamical Systems - A, 2004, 10 (1&2) : 423-433. doi: 10.3934/dcds.2004.10.423

[7]

Simone Göttlich, Ute Ziegler, Michael Herty. Numerical discretization of Hamilton--Jacobi equations on networks. Networks & Heterogeneous Media, 2013, 8 (3) : 685-705. doi: 10.3934/nhm.2013.8.685

[8]

Fernando Jiménez, Jürgen Scheurle. On the discretization of nonholonomic dynamics in $\mathbb{R}^n$. Journal of Geometric Mechanics, 2015, 7 (1) : 43-80. doi: 10.3934/jgm.2015.7.43

[9]

Yinhua Xia, Yan Xu, Chi-Wang Shu. Efficient time discretization for local discontinuous Galerkin methods. Discrete & Continuous Dynamical Systems - B, 2007, 8 (3) : 677-693. doi: 10.3934/dcdsb.2007.8.677

[10]

Luca Dieci, Timo Eirola, Cinzia Elia. Periodic orbits of planar discontinuous system under discretization. Discrete & Continuous Dynamical Systems - B, 2018, 23 (7) : 2743-2762. doi: 10.3934/dcdsb.2018103

[11]

Changbing Hu, Kaitai Li. A simple construction of inertial manifolds under time discretization. Discrete & Continuous Dynamical Systems - A, 1997, 3 (4) : 531-540. doi: 10.3934/dcds.1997.3.531

[12]

Benjamin Couéraud, François Gay-Balmaz. Variational discretization of thermodynamical simple systems on Lie groups. Discrete & Continuous Dynamical Systems - S, 2018, 0 (0) : 1-28. doi: 10.3934/dcdss.2020064

[13]

Mapundi K. Banda, Michael Herty. Numerical discretization of stabilization problems with boundary controls for systems of hyperbolic conservation laws. Mathematical Control & Related Fields, 2013, 3 (2) : 121-142. doi: 10.3934/mcrf.2013.3.121

[14]

Matti Lassas, Eero Saksman, Samuli Siltanen. Discretization-invariant Bayesian inversion and Besov space priors. Inverse Problems & Imaging, 2009, 3 (1) : 87-122. doi: 10.3934/ipi.2009.3.87

[15]

Konstantin Mischaikow, Marian Mrozek, Frank Weilandt. Discretization strategies for computing Conley indices and Morse decompositions of flows. Journal of Computational Dynamics, 2016, 3 (1) : 1-16. doi: 10.3934/jcd.2016001

[16]

Yingxiang Xu, Yongkui Zou. Preservation of homoclinic orbits under discretization of delay differential equations. Discrete & Continuous Dynamical Systems - A, 2011, 31 (1) : 275-299. doi: 10.3934/dcds.2011.31.275

[17]

Peter E. Kloeden, Björn Schmalfuss. Lyapunov functions and attractors under variable time-step discretization. Discrete & Continuous Dynamical Systems - A, 1996, 2 (2) : 163-172. doi: 10.3934/dcds.1996.2.163

[18]

Orazio Muscato, Wolfgang Wagner. A stochastic algorithm without time discretization error for the Wigner equation. Kinetic & Related Models, 2019, 12 (1) : 59-77. doi: 10.3934/krm.2019003

[19]

Rolf Rannacher. A short course on numerical simulation of viscous flow: Discretization, optimization and stability analysis. Discrete & Continuous Dynamical Systems - S, 2012, 5 (6) : 1147-1194. doi: 10.3934/dcdss.2012.5.1147

[20]

Zoltán Horváth, Yunfei Song, Tamás Terlaky. Steplength thresholds for invariance preserving of discretization methods of dynamical systems on a polyhedron. Discrete & Continuous Dynamical Systems - A, 2015, 35 (7) : 2997-3013. doi: 10.3934/dcds.2015.35.2997

 Impact Factor: 

Metrics

  • PDF downloads (6)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]