MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics
A testbed to enable comparisons between competing approaches for computational social choice
October  2016, 1(4): 341-347. doi: 10.3934/bdia.2016014

## Increase statistical reliability without losing predictive power by merging classes and adding variables

 1 School of Mathematics and Information Sciences, Guangzhou University, Guangzhou, 510006, China 2 Clearpier Inc., 1300-121 Richmond St.W., Toronto, Ontario, Canada M5H 2K1, Canada

* Corresponding authors: Wenxue Huang and Xiaofeng Li

Revised  April 2017 Published  April 2017

It is usually true that adding explanatory variables into a probability model increases association degree yet risks losing statistical reliability. In this article, we propose an approach to merge classes within the categorical explanatory variables before the addition so as to keep the statistical reliability while increase the predictive power step by step.

Citation: Wenxue Huang, Xiaofeng Li, Yuanyi Pan. Increase statistical reliability without losing predictive power by merging classes and adding variables. Big Data & Information Analytics, 2016, 1 (4) : 341-347. doi: 10.3934/bdia.2016014
Feature selection with merging: Occupation
 $X$ $\tau_b^{(Y|X)}$ $\lambda^{(Y|X)}$ $E(\mbox{Gini}(X|Y))$ $Age group'$+Sex 0.1484 0.0375 0.6688 (Age group'+Sex)'+Education' 0.1542 0.0447 0.6620
 $X$ $\tau_b^{(Y|X)}$ $\lambda^{(Y|X)}$ $E(\mbox{Gini}(X|Y))$ $Age group'$+Sex 0.1484 0.0375 0.6688 (Age group'+Sex)'+Education' 0.1542 0.0447 0.6620
Feature selection without merging: Occupation
 $X$ $\tau^{Y|X}$ $\lambda^{Y|X}$ $E(\mbox{Gini}(X|Y))$ Age group 0.1344 0.0311 0.8773 Age group + Sex 0.1511 0.0476 0.9228
 $X$ $\tau^{Y|X}$ $\lambda^{Y|X}$ $E(\mbox{Gini}(X|Y))$ Age group 0.1344 0.0311 0.8773 Age group + Sex 0.1511 0.0476 0.9228
Compare different merging threshold:Occupation
 $X$ $\phi^{st}(Y|X)$ $\lambda^{(Y|X)}$ $\tau^{(Y|X)}$ $E(Gini(X,Y))$ Age group - 0.0311 0.1344 0.8773 $Age group'$+Sex 0.0005 0.0414 0.1493 0.9222 $Age group'$+$Sex$ 0.0030 0.0375 0.1484 0.6688 $Age group'$+$Sex$ 0.0100 0.0000 0.0209 0.2710
 $X$ $\phi^{st}(Y|X)$ $\lambda^{(Y|X)}$ $\tau^{(Y|X)}$ $E(Gini(X,Y))$ Age group - 0.0311 0.1344 0.8773 $Age group'$+Sex 0.0005 0.0414 0.1493 0.9222 $Age group'$+$Sex$ 0.0030 0.0375 0.1484 0.6688 $Age group'$+$Sex$ 0.0100 0.0000 0.0209 0.2710
Compare different merging threshold
 $X$ $\lambda^{(Y|X)}$ $\tau^{(Y|X)}$ $E(\mbox{Gini}(X|Y))$ Rooms 0.3443598 0.3004656 0.8200656 $Rooms'$+$Tenure'$ 0.4255117 0.3583277 0.7911177 $(Rooms'$+$Tenure')'+bedroom'$ 0.4381247 0.3901767 0.7165204
 $X$ $\lambda^{(Y|X)}$ $\tau^{(Y|X)}$ $E(\mbox{Gini}(X|Y))$ Rooms 0.3443598 0.3004656 0.8200656 $Rooms'$+$Tenure'$ 0.4255117 0.3583277 0.7911177 $(Rooms'$+$Tenure')'+bedroom'$ 0.4381247 0.3901767 0.7165204
