# American Institute of Mathematical Sciences

July  2016, 1(2&3): 217-225. doi: 10.3934/bdia.2016005

## Forward supervised discretization for multivariate with categorical responses

 1 School of Mathematics and Information Science, Guangzhou University, Guangzhou, Guangdong 510006, China, China

Received  April 2016 Revised  September 2016 Published  September 2016

Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in [12,13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.
Citation: Wenxue Huang, Qitian Qiu. Forward supervised discretization for multivariate with categorical responses. Big Data & Information Analytics, 2016, 1 (2&3) : 217-225. doi: 10.3934/bdia.2016005
##### References:
 [1] M. Boulle, Khiops: A statistical discretization method of continuous attributes,, Machine Learning, 55 (2004), 53.  doi: 10.1023/B:MACH.0000019804.29836.05.  Google Scholar [2] J. Catlett, On changing continuous attributes into ordered discrete attributes,, In: Machine LearningEWSL-91, 482 (1991), 164.  doi: 10.1007/BFb0017012.  Google Scholar [3] D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization,, Journal of Experimental and Theoretical Artificial Intelligence, 2 (1989), 117.  doi: 10.1080/09528139008953718.  Google Scholar [4] M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning,, International Journal of Approximate Reasoning, 15 (1996), 319.  doi: 10.1016/S0888-613X(96)00074-6.  Google Scholar [5] J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features,, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2 (1995), 194.  doi: 10.1016/B978-1-55860-377-6.50032-3.  Google Scholar [6] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning,, Proceedings of the International Joint Conference on Uncertainty in AI, 2 (1993), 1022.   Google Scholar [7] G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability),, Society for Industrial and Applied Mathematics, 20 (2007).  doi: 10.1137/1.9780898718348.  Google Scholar [8] L. Goodman and W. Kruskal, Measures of association for cross classifications,, Journal of the American Statistical Association, 49 (1954), 732.   Google Scholar [9] I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection,, Applied Physics Letters, 3 (2002), 1157.   Google Scholar [10] R. Holte, Very sim1ple classification rules perform well on most commonly used datasets,, Machine Learning, 11 (1993), 63.   Google Scholar [11] W. Huang and Y. Pan, On balalncing between optimal and proportional predictions,, Big Data and Information Analytics, 1 (2016), 129.   Google Scholar [12] W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\tau$,, In Procedia Computer Science, 17 (2013), 114.   Google Scholar [13] W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\lambda$,, Procedia Computer Science, 30 (2014), 75.   Google Scholar [14] W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data,, Communications in Statistics - Theory and Methods, ().   Google Scholar [15] R. Kerber, Chimerge: Discretization of numeric attributes,, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, (1994), 123.   Google Scholar [16] S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey,, GESTS International Transactions on Computer Science and Engineering, 32 (2006), 47.   Google Scholar [17] H. Liu and R. Setiono, Chi2: Feature selection and discretization of numeric attributes,, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55 (1995), 388.   Google Scholar [18] C. Lloyd, Statistical Analysis with Missing Data,, John Wiley & Sons, (1987).   Google Scholar [19] J. MacQueen, Some methods for classification and analysis of multivariate observations,, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1 (1967), 281.   Google Scholar [20] D. Olson and Y. Shi, Introduction to business data mining,, Knowledge and information systems, (2007).   Google Scholar [21] I. Rish, An empirical study of the naive bayes classifier,, IJCAI 2001 workshop on empirical methods in artificial intelligence, (2001), 41.   Google Scholar [22] S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology,, IEEE Transactions on Systems, 21 (1991), 660.  doi: 10.1109/21.97458.  Google Scholar [23] STATCAN, Survey of Family Expenditures, - 1996., (1996).   Google Scholar [24] K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning,, Basser Department of Computer Science, (1994).   Google Scholar

show all references

##### References:
 [1] M. Boulle, Khiops: A statistical discretization method of continuous attributes,, Machine Learning, 55 (2004), 53.  doi: 10.1023/B:MACH.0000019804.29836.05.  Google Scholar [2] J. Catlett, On changing continuous attributes into ordered discrete attributes,, In: Machine LearningEWSL-91, 482 (1991), 164.  doi: 10.1007/BFb0017012.  Google Scholar [3] D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization,, Journal of Experimental and Theoretical Artificial Intelligence, 2 (1989), 117.  doi: 10.1080/09528139008953718.  Google Scholar [4] M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning,, International Journal of Approximate Reasoning, 15 (1996), 319.  doi: 10.1016/S0888-613X(96)00074-6.  Google Scholar [5] J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features,, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2 (1995), 194.  doi: 10.1016/B978-1-55860-377-6.50032-3.  Google Scholar [6] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning,, Proceedings of the International Joint Conference on Uncertainty in AI, 2 (1993), 1022.   Google Scholar [7] G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability),, Society for Industrial and Applied Mathematics, 20 (2007).  doi: 10.1137/1.9780898718348.  Google Scholar [8] L. Goodman and W. Kruskal, Measures of association for cross classifications,, Journal of the American Statistical Association, 49 (1954), 732.   Google Scholar [9] I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection,, Applied Physics Letters, 3 (2002), 1157.   Google Scholar [10] R. Holte, Very sim1ple classification rules perform well on most commonly used datasets,, Machine Learning, 11 (1993), 63.   Google Scholar [11] W. Huang and Y. Pan, On balalncing between optimal and proportional predictions,, Big Data and Information Analytics, 1 (2016), 129.   Google Scholar [12] W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\tau$,, In Procedia Computer Science, 17 (2013), 114.   Google Scholar [13] W. Huang, Y. Pan and J. Wu, Supervised discretization with $GK-\lambda$,, Procedia Computer Science, 30 (2014), 75.   Google Scholar [14] W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data,, Communications in Statistics - Theory and Methods, ().   Google Scholar [15] R. Kerber, Chimerge: Discretization of numeric attributes,, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, (1994), 123.   Google Scholar [16] S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey,, GESTS International Transactions on Computer Science and Engineering, 32 (2006), 47.   Google Scholar [17] H. Liu and R. Setiono, Chi2: Feature selection and discretization of numeric attributes,, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55 (1995), 388.   Google Scholar [18] C. Lloyd, Statistical Analysis with Missing Data,, John Wiley & Sons, (1987).   Google Scholar [19] J. MacQueen, Some methods for classification and analysis of multivariate observations,, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1 (1967), 281.   Google Scholar [20] D. Olson and Y. Shi, Introduction to business data mining,, Knowledge and information systems, (2007).   Google Scholar [21] I. Rish, An empirical study of the naive bayes classifier,, IJCAI 2001 workshop on empirical methods in artificial intelligence, (2001), 41.   Google Scholar [22] S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology,, IEEE Transactions on Systems, 21 (1991), 660.  doi: 10.1109/21.97458.  Google Scholar [23] STATCAN, Survey of Family Expenditures, - 1996., (1996).   Google Scholar [24] K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning,, Basser Department of Computer Science, (1994).   Google Scholar
 [1] Marin Kobilarov, Jerrold E. Marsden, Gaurav S. Sukhatme. Geometric discretization of nonholonomic systems with symmetries. Discrete & Continuous Dynamical Systems - S, 2010, 3 (1) : 61-84. doi: 10.3934/dcdss.2010.3.61 [2] Michal Fečkan, Michal Pospíšil. Discretization of dynamical systems with first integrals. Discrete & Continuous Dynamical Systems - A, 2013, 33 (8) : 3543-3554. doi: 10.3934/dcds.2013.33.3543 [3] Fernando Jiménez, Jürgen Scheurle. On some aspects of the discretization of the suslov problem. Journal of Geometric Mechanics, 2018, 10 (1) : 43-68. doi: 10.3934/jgm.2018002 [4] Matthieu Hillairet, Alexei Lozinski, Marcela Szopos. On discretization in time in simulations of particulate flows. Discrete & Continuous Dynamical Systems - B, 2011, 15 (4) : 935-956. doi: 10.3934/dcdsb.2011.15.935 [5] Mathieu Desbrun, Evan S. Gawlik, François Gay-Balmaz, Vladimir Zeitlin. Variational discretization for rotating stratified fluids. Discrete & Continuous Dynamical Systems - A, 2014, 34 (2) : 477-509. doi: 10.3934/dcds.2014.34.477 [6] P.E. Kloeden, Victor S. Kozyakin. Uniform nonautonomous attractors under discretization. Discrete & Continuous Dynamical Systems - A, 2004, 10 (1&2) : 423-433. doi: 10.3934/dcds.2004.10.423 [7] Simone Göttlich, Ute Ziegler, Michael Herty. Numerical discretization of Hamilton--Jacobi equations on networks. Networks & Heterogeneous Media, 2013, 8 (3) : 685-705. doi: 10.3934/nhm.2013.8.685 [8] Fernando Jiménez, Jürgen Scheurle. On the discretization of nonholonomic dynamics in $\mathbb{R}^n$. Journal of Geometric Mechanics, 2015, 7 (1) : 43-80. doi: 10.3934/jgm.2015.7.43 [9] Yinhua Xia, Yan Xu, Chi-Wang Shu. Efficient time discretization for local discontinuous Galerkin methods. Discrete & Continuous Dynamical Systems - B, 2007, 8 (3) : 677-693. doi: 10.3934/dcdsb.2007.8.677 [10] Luca Dieci, Timo Eirola, Cinzia Elia. Periodic orbits of planar discontinuous system under discretization. Discrete & Continuous Dynamical Systems - B, 2018, 23 (7) : 2743-2762. doi: 10.3934/dcdsb.2018103 [11] Changbing Hu, Kaitai Li. A simple construction of inertial manifolds under time discretization. Discrete & Continuous Dynamical Systems - A, 1997, 3 (4) : 531-540. doi: 10.3934/dcds.1997.3.531 [12] Benjamin Couéraud, François Gay-Balmaz. Variational discretization of thermodynamical simple systems on Lie groups. Discrete & Continuous Dynamical Systems - S, 2020, 13 (4) : 1075-1102. doi: 10.3934/dcdss.2020064 [13] Mapundi K. Banda, Michael Herty. Numerical discretization of stabilization problems with boundary controls for systems of hyperbolic conservation laws. Mathematical Control & Related Fields, 2013, 3 (2) : 121-142. doi: 10.3934/mcrf.2013.3.121 [14] Matti Lassas, Eero Saksman, Samuli Siltanen. Discretization-invariant Bayesian inversion and Besov space priors. Inverse Problems & Imaging, 2009, 3 (1) : 87-122. doi: 10.3934/ipi.2009.3.87 [15] Konstantin Mischaikow, Marian Mrozek, Frank Weilandt. Discretization strategies for computing Conley indices and Morse decompositions of flows. Journal of Computational Dynamics, 2016, 3 (1) : 1-16. doi: 10.3934/jcd.2016001 [16] Yingxiang Xu, Yongkui Zou. Preservation of homoclinic orbits under discretization of delay differential equations. Discrete & Continuous Dynamical Systems - A, 2011, 31 (1) : 275-299. doi: 10.3934/dcds.2011.31.275 [17] Peter E. Kloeden, Björn Schmalfuss. Lyapunov functions and attractors under variable time-step discretization. Discrete & Continuous Dynamical Systems - A, 1996, 2 (2) : 163-172. doi: 10.3934/dcds.1996.2.163 [18] Orazio Muscato, Wolfgang Wagner. A stochastic algorithm without time discretization error for the Wigner equation. Kinetic & Related Models, 2019, 12 (1) : 59-77. doi: 10.3934/krm.2019003 [19] Rolf Rannacher. A short course on numerical simulation of viscous flow: Discretization, optimization and stability analysis. Discrete & Continuous Dynamical Systems - S, 2012, 5 (6) : 1147-1194. doi: 10.3934/dcdss.2012.5.1147 [20] Zoltán Horváth, Yunfei Song, Tamás Terlaky. Steplength thresholds for invariance preserving of discretization methods of dynamical systems on a polyhedron. Discrete & Continuous Dynamical Systems - A, 2015, 35 (7) : 2997-3013. doi: 10.3934/dcds.2015.35.2997

Impact Factor: