July  2016, 1(2&3): 217-225. doi: 10.3934/bdia.2016005

Forward supervised discretization for multivariate with categorical responses

1. 

School of Mathematics and Information Science, Guangzhou University, Guangzhou, Guangdong 510006, China, China

Received  April 2016 Revised  September 2016 Published  September 2016

Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in [12,13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.
Citation: Wenxue Huang, Qitian Qiu. Forward supervised discretization for multivariate with categorical responses. Big Data & Information Analytics, 2016, 1 (2&3) : 217-225. doi: 10.3934/bdia.2016005
References:
[1]

M. Boulle, Khiops: A statistical discretization method of continuous attributes, Machine Learning, 55 (2004), 53-69. doi: 10.1023/B:MACH.0000019804.29836.05.

[2]

J. Catlett, On changing continuous attributes into ordered discrete attributes, In: Machine LearningEWSL-91, 482 (1991), 164-178. doi: 10.1007/BFb0017012.

[3]

D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization, Journal of Experimental and Theoretical Artificial Intelligence, 2 (1989), 117-129. doi: 10.1080/09528139008953718.

[4]

M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15 (1996), 319-331. doi: 10.1016/S0888-613X(96)00074-6.

[5]

J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2 (1995), 194-202. doi: 10.1016/B978-1-55860-377-6.50032-3.

[6]

U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the International Joint Conference on Uncertainty in AI, 2 (1993), 1022-1027.

[7]

G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20 (2007), xxii+466 pp. doi: 10.1137/1.9780898718348.

[8]

L. Goodman and W. Kruskal, Measures of association for cross classifications, Journal of the American Statistical Association, 49 (1954), 732-764.

[9]

I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Applied Physics Letters, 3 (2002), 1157-1182.

[10]

R. Holte, Very sim1ple classification rules perform well on most commonly used datasets, Machine Learning, 11 (1993), 63-90.

[11]

W. Huang and Y. Pan, On balalncing between optimal and proportional predictions, Big Data and Information Analytics, 1 (2016), 129-137.

[12]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\tau$, In Procedia Computer Science, 17 (2013), 114-120.

[13]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\lambda$, Procedia Computer Science, 30 (2014), 75-80.

[14]

W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data,, Communications in Statistics - Theory and Methods, (). 

[15]

R. Kerber, Chimerge: Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, 1994, 123-128.

[16]

S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, 32 (2006), 47-58.

[17]

H. Liu and R. Setiono, Chi2: Feature selection and discretization of numeric attributes, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55 (1995), 388-391.

[18]

C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.

[19]

J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1 (1967), 281-297.

[20]

D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.

[21]

I. Rish, An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, 41-46.

[22]

S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics, 21 (1991), 660-674. doi: 10.1109/21.97458.

[23]

STATCAN, Survey of Family Expenditures - 1996.

[24]

K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science,University of Sydney, 1994.

show all references

References:
[1]

M. Boulle, Khiops: A statistical discretization method of continuous attributes, Machine Learning, 55 (2004), 53-69. doi: 10.1023/B:MACH.0000019804.29836.05.

[2]

J. Catlett, On changing continuous attributes into ordered discrete attributes, In: Machine LearningEWSL-91, 482 (1991), 164-178. doi: 10.1007/BFb0017012.

[3]

D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization, Journal of Experimental and Theoretical Artificial Intelligence, 2 (1989), 117-129. doi: 10.1080/09528139008953718.

[4]

M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15 (1996), 319-331. doi: 10.1016/S0888-613X(96)00074-6.

[5]

J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2 (1995), 194-202. doi: 10.1016/B978-1-55860-377-6.50032-3.

[6]

U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the International Joint Conference on Uncertainty in AI, 2 (1993), 1022-1027.

[7]

G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20 (2007), xxii+466 pp. doi: 10.1137/1.9780898718348.

[8]

L. Goodman and W. Kruskal, Measures of association for cross classifications, Journal of the American Statistical Association, 49 (1954), 732-764.

[9]

I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Applied Physics Letters, 3 (2002), 1157-1182.

[10]

R. Holte, Very sim1ple classification rules perform well on most commonly used datasets, Machine Learning, 11 (1993), 63-90.

[11]

W. Huang and Y. Pan, On balalncing between optimal and proportional predictions, Big Data and Information Analytics, 1 (2016), 129-137.

[12]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\tau$, In Procedia Computer Science, 17 (2013), 114-120.

[13]

W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\lambda$, Procedia Computer Science, 30 (2014), 75-80.

[14]

W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data,, Communications in Statistics - Theory and Methods, (). 

[15]

R. Kerber, Chimerge: Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, 1994, 123-128.

[16]

S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, 32 (2006), 47-58.

[17]

H. Liu and R. Setiono, Chi2: Feature selection and discretization of numeric attributes, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55 (1995), 388-391.

[18]

C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.

[19]

J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1 (1967), 281-297.

[20]

D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.

[21]

I. Rish, An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, 41-46.

[22]

S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics, 21 (1991), 660-674. doi: 10.1109/21.97458.

[23]

STATCAN, Survey of Family Expenditures - 1996.

[24]

K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science,University of Sydney, 1994.

[1]

Marin Kobilarov, Jerrold E. Marsden, Gaurav S. Sukhatme. Geometric discretization of nonholonomic systems with symmetries. Discrete and Continuous Dynamical Systems - S, 2010, 3 (1) : 61-84. doi: 10.3934/dcdss.2010.3.61

[2]

Michal Fečkan, Michal Pospíšil. Discretization of dynamical systems with first integrals. Discrete and Continuous Dynamical Systems, 2013, 33 (8) : 3543-3554. doi: 10.3934/dcds.2013.33.3543

[3]

Fernando Jiménez, Jürgen Scheurle. On some aspects of the discretization of the suslov problem. Journal of Geometric Mechanics, 2018, 10 (1) : 43-68. doi: 10.3934/jgm.2018002

[4]

Matthieu Hillairet, Alexei Lozinski, Marcela Szopos. On discretization in time in simulations of particulate flows. Discrete and Continuous Dynamical Systems - B, 2011, 15 (4) : 935-956. doi: 10.3934/dcdsb.2011.15.935

[5]

Mathieu Desbrun, Evan S. Gawlik, François Gay-Balmaz, Vladimir Zeitlin. Variational discretization for rotating stratified fluids. Discrete and Continuous Dynamical Systems, 2014, 34 (2) : 477-509. doi: 10.3934/dcds.2014.34.477

[6]

P.E. Kloeden, Victor S. Kozyakin. Uniform nonautonomous attractors under discretization. Discrete and Continuous Dynamical Systems, 2004, 10 (1&2) : 423-433. doi: 10.3934/dcds.2004.10.423

[7]

Simone Göttlich, Ute Ziegler, Michael Herty. Numerical discretization of Hamilton--Jacobi equations on networks. Networks and Heterogeneous Media, 2013, 8 (3) : 685-705. doi: 10.3934/nhm.2013.8.685

[8]

Benjamin Couéraud, François Gay-Balmaz. Variational discretization of thermodynamical simple systems on Lie groups. Discrete and Continuous Dynamical Systems - S, 2020, 13 (4) : 1075-1102. doi: 10.3934/dcdss.2020064

[9]

Fernando Jiménez, Jürgen Scheurle. On the discretization of nonholonomic dynamics in $\mathbb{R}^n$. Journal of Geometric Mechanics, 2015, 7 (1) : 43-80. doi: 10.3934/jgm.2015.7.43

[10]

Yinhua Xia, Yan Xu, Chi-Wang Shu. Efficient time discretization for local discontinuous Galerkin methods. Discrete and Continuous Dynamical Systems - B, 2007, 8 (3) : 677-693. doi: 10.3934/dcdsb.2007.8.677

[11]

Luca Dieci, Timo Eirola, Cinzia Elia. Periodic orbits of planar discontinuous system under discretization. Discrete and Continuous Dynamical Systems - B, 2018, 23 (7) : 2743-2762. doi: 10.3934/dcdsb.2018103

[12]

Changbing Hu, Kaitai Li. A simple construction of inertial manifolds under time discretization. Discrete and Continuous Dynamical Systems, 1997, 3 (4) : 531-540. doi: 10.3934/dcds.1997.3.531

[13]

Mapundi K. Banda, Michael Herty. Numerical discretization of stabilization problems with boundary controls for systems of hyperbolic conservation laws. Mathematical Control and Related Fields, 2013, 3 (2) : 121-142. doi: 10.3934/mcrf.2013.3.121

[14]

Matti Lassas, Eero Saksman, Samuli Siltanen. Discretization-invariant Bayesian inversion and Besov space priors. Inverse Problems and Imaging, 2009, 3 (1) : 87-122. doi: 10.3934/ipi.2009.3.87

[15]

Konstantin Mischaikow, Marian Mrozek, Frank Weilandt. Discretization strategies for computing Conley indices and Morse decompositions of flows. Journal of Computational Dynamics, 2016, 3 (1) : 1-16. doi: 10.3934/jcd.2016001

[16]

Yingxiang Xu, Yongkui Zou. Preservation of homoclinic orbits under discretization of delay differential equations. Discrete and Continuous Dynamical Systems, 2011, 31 (1) : 275-299. doi: 10.3934/dcds.2011.31.275

[17]

Peter E. Kloeden, Björn Schmalfuss. Lyapunov functions and attractors under variable time-step discretization. Discrete and Continuous Dynamical Systems, 1996, 2 (2) : 163-172. doi: 10.3934/dcds.1996.2.163

[18]

Werner Bauer, François Gay-Balmaz. Towards a geometric variational discretization of compressible fluids: The rotating shallow water equations. Journal of Computational Dynamics, 2019, 6 (1) : 1-37. doi: 10.3934/jcd.2019001

[19]

Anthony Bloch, Leonardo Colombo, Fernando Jiménez. The variational discretization of the constrained higher-order Lagrange-Poincaré equations. Discrete and Continuous Dynamical Systems, 2019, 39 (1) : 309-344. doi: 10.3934/dcds.2019013

[20]

Orazio Muscato, Wolfgang Wagner. A stochastic algorithm without time discretization error for the Wigner equation. Kinetic and Related Models, 2019, 12 (1) : 59-77. doi: 10.3934/krm.2019003

 Impact Factor: 

Metrics

  • PDF downloads (79)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]