# American Institute of Mathematical Sciences

April 2017, 2(2): 119-125. doi: 10.3934/bdia.2017004

## Proportional association based roi model

 1 School of Mathematics and Information Sciences, Guangzhou University, Guangzhou, 510006, China 2 Clearpier Inc., 1300-121 Richmond St. W., Toronto, Ontario M5H 2K1 Canada 3 School of Mathematics and Information Sciences, Guangzhou University, Guangzhou, 510006, China

* Corresponding authors: Wenxue Huang and Lihong Zheng.

Published  April 2017

Based on a local-to-global proportional association measure proposed by Huang, Shi and Wang [9], with cost and revenue information known, an association measure is proposed to maximize the expected RoI. A descriptive experiment with a synthetical data set is presented.

Citation: Wenxue Huang, Yuanyi Pan, Lihong Zheng. Proportional association based roi model. Big Data & Information Analytics, 2017, 2 (2) : 119-125. doi: 10.3934/bdia.2017004
##### References:
 [1] C. Cornforth, What makes boards effctive? an examination of the relationships between board inputs, structures, processes and effctiveness in non-profit organisations, Corporate Governance: An International Review, 9 (2011), 217-227. [2] L. L. Fong, M. S. Squillante and R. E. Hough, Computer resource proportional utilization and response time scheduling, US Patent, 6 (2001), 263-359. [3] L. A. Goodman, A single general method for the analysis of cross-classifed data: Reconciliation, and synthesis of some methods of pearson, yule, and fisher, and also some methods of correspondence analysis and association analysis, Journal of the American Statistical Association, 91 (1996), 408-428. doi: 10.1080/01621459.1996.10476702. [4] L. A. Goodman and W. H. Kruskal, Measures of Association for Cross Classifications Springer, 1979. [5] M. F. Gregor, L. Yang, E. Fabbrini, B. S. Mohammed, J. C. Eagon, G. S. Hotamisligil and S. Klein, Endoplasmic reticulum stress is reduced in tissues of obese subjects after weight loss, Diabetes, 58 (2009), 693-700. doi: 10.2337/db08-1220. [6] W. Huang and Y. Pan, On balancing between optimal and proportional categorical predictions, Big Data and Information Analytics, 1 (2016), 129-137. doi: 10.3934/bdia.2016.1.129. [7] W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-τ, Procedia Computer Science, 17 (2013), 114-120. [8] W. Huang, Y. Pan and J. Wu, Performance measures of rare events targeting, International Journal of Data Analysis Techniques and Strategies, 6 (2014), 105-120. doi: 10.1504/IJDATS.2014.062450. [9] W. Huang, Y. Shi and X. Wang, A nominal association matrix with feature selection for categorical data, Comunications in Statistic -Theory and Methods, 46 (2017), 7798-7819. doi: 10.1080/03610926.2014.930911. [10] H. Hwang, T. Jung and E. Suh, An ltv model and customer segmentation based on customer value: A case study on the wireless telecommunication industry, Expert Systems with Applications, 26 (2004), 181-188. doi: 10.1016/S0957-4174(03)00133-7. [11] T. Lin, Y. Yang and H. T. Shiau, A work weighted state vector control method for geometrically nonlinear analysis, Computers and Structures, 46 (1993), 689-694. doi: 10.1016/0045-7949(93)90397-V. [12] C. X. Ling and C. Li, Data mining for direct marketing: Problems and solutions, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), AAAI Press, (1998), 73-79. [13] J. R. Quinlan, Induction of decision trees, Machine Learning, 1 (1986), 81-106. doi: 10.1007/BF00116251.

show all references

##### References:
 [1] C. Cornforth, What makes boards effctive? an examination of the relationships between board inputs, structures, processes and effctiveness in non-profit organisations, Corporate Governance: An International Review, 9 (2011), 217-227. [2] L. L. Fong, M. S. Squillante and R. E. Hough, Computer resource proportional utilization and response time scheduling, US Patent, 6 (2001), 263-359. [3] L. A. Goodman, A single general method for the analysis of cross-classifed data: Reconciliation, and synthesis of some methods of pearson, yule, and fisher, and also some methods of correspondence analysis and association analysis, Journal of the American Statistical Association, 91 (1996), 408-428. doi: 10.1080/01621459.1996.10476702. [4] L. A. Goodman and W. H. Kruskal, Measures of Association for Cross Classifications Springer, 1979. [5] M. F. Gregor, L. Yang, E. Fabbrini, B. S. Mohammed, J. C. Eagon, G. S. Hotamisligil and S. Klein, Endoplasmic reticulum stress is reduced in tissues of obese subjects after weight loss, Diabetes, 58 (2009), 693-700. doi: 10.2337/db08-1220. [6] W. Huang and Y. Pan, On balancing between optimal and proportional categorical predictions, Big Data and Information Analytics, 1 (2016), 129-137. doi: 10.3934/bdia.2016.1.129. [7] W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-τ, Procedia Computer Science, 17 (2013), 114-120. [8] W. Huang, Y. Pan and J. Wu, Performance measures of rare events targeting, International Journal of Data Analysis Techniques and Strategies, 6 (2014), 105-120. doi: 10.1504/IJDATS.2014.062450. [9] W. Huang, Y. Shi and X. Wang, A nominal association matrix with feature selection for categorical data, Comunications in Statistic -Theory and Methods, 46 (2017), 7798-7819. doi: 10.1080/03610926.2014.930911. [10] H. Hwang, T. Jung and E. Suh, An ltv model and customer segmentation based on customer value: A case study on the wireless telecommunication industry, Expert Systems with Applications, 26 (2004), 181-188. doi: 10.1016/S0957-4174(03)00133-7. [11] T. Lin, Y. Yang and H. T. Shiau, A work weighted state vector control method for geometrically nonlinear analysis, Computers and Structures, 46 (1993), 689-694. doi: 10.1016/0045-7949(93)90397-V. [12] C. X. Ling and C. Li, Data mining for direct marketing: Problems and solutions, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), AAAI Press, (1998), 73-79. [13] J. R. Quinlan, Induction of decision trees, Machine Learning, 1 (1986), 81-106. doi: 10.1007/BF00116251.
Contingency tables:$X_1$ vs $Y$ and $X_2$ vs $Y$
 $X_1|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $X_2|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $x_{1_1}$ 1000 100 500 400 $x_{2_1}$ 500 300 200 1500 $x_{1_2}$ 200 1500 500 300 $x_{2_2}$ 500 400 400 50 $x_{1_3}$ 400 50 500 500 $x_{2_3}$ 500 500 300 700 $x_{1_4}$ 300 700 500 400 $x_{2_4}$ 500 400 1000 100 $x_{1_5}$ 200 500 400 200 $x_{2_5}$ 200 400 500 200
 $X_1|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $X_2|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $x_{1_1}$ 1000 100 500 400 $x_{2_1}$ 500 300 200 1500 $x_{1_2}$ 200 1500 500 300 $x_{2_2}$ 500 400 400 50 $x_{1_3}$ 400 50 500 500 $x_{2_3}$ 500 500 300 700 $x_{1_4}$ 300 700 500 400 $x_{2_4}$ 500 400 1000 100 $x_{1_5}$ 200 500 400 200 $x_{2_5}$ 200 400 500 200
Association matrices:$X_1$ vs $Y$ and $X_2$ vs $Y$
 $Y|\hat{Y}$ $\hat{y_1}|X_1$ $\hat{y_2}|X_1$ $\hat{y_3}|X_1$ $\hat{y_4}|X_1$ $Y|\hat{Y}$ $\hat{y_1}|X_2$ $\hat{y_2}|X_2$ $\hat{y_3}|X_2$ $\hat{y_4}X_2$ $y_1$ 0.34 0.18 0.27 0.22 $y_1$ 0.26 0.22 0.27 0.25 $y_2$ 0.13 0.48 0.24 0.15 $y_2$ 0.25 0.24 0.29 0.23 $y_{3}$ 0.24 0.28 0.27 0.21 $y_{3}$ 0.25 0.24 0.36 0.15 $y_{4}$ 0.25 0.25 0.28 0.22 $y_{4}$ 0.22 0.18 0.14 0.46
 $Y|\hat{Y}$ $\hat{y_1}|X_1$ $\hat{y_2}|X_1$ $\hat{y_3}|X_1$ $\hat{y_4}|X_1$ $Y|\hat{Y}$ $\hat{y_1}|X_2$ $\hat{y_2}|X_2$ $\hat{y_3}|X_2$ $\hat{y_4}X_2$ $y_1$ 0.34 0.18 0.27 0.22 $y_1$ 0.26 0.22 0.27 0.25 $y_2$ 0.13 0.48 0.24 0.15 $y_2$ 0.25 0.24 0.29 0.23 $y_{3}$ 0.24 0.28 0.27 0.21 $y_{3}$ 0.25 0.24 0.36 0.15 $y_{4}$ 0.25 0.25 0.28 0.22 $y_{4}$ 0.22 0.18 0.14 0.46
Contingency table for correct predictions: $W_1$ and $W_2$
 $X_1|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $X_2|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $x_{1_1}$ 471 6 121 83 $x_{2_1}$ 98 34 19 926 $x_{1_2}$ 101 746 159 107 $x_{2_2}$ 177 114 113 1 $x_{1_3}$ 130 1 167 157 $x_{2_3}$ 114 124 42 256 $x_{1_4}$ 44 243 145 85 $x_{2_4}$ 109 81 489 6 $x_{1_5}$ 21 210 114 32 $x_{2_5}$ 36 119 206 28
 $X_1|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $X_2|Y$ $y_1$ $y_2$ $y_{3}$ $y_{4}$ $x_{1_1}$ 471 6 121 83 $x_{2_1}$ 98 34 19 926 $x_{1_2}$ 101 746 159 107 $x_{2_2}$ 177 114 113 1 $x_{1_3}$ 130 1 167 157 $x_{2_3}$ 114 124 42 256 $x_{1_4}$ 44 243 145 85 $x_{2_4}$ 109 81 489 6 $x_{1_5}$ 21 210 114 32 $x_{2_5}$ 36 119 206 28
Association measures: $\omega^{Y|X}$, and $\widehat{\omega}^{Y|X}$
 $X$ $\omega^{Y|X}$ $\widehat{\omega}^{Y|X}$ total revenue average revenue $X_1$ 0.3406 0.456 4313 0.4714 $X_2$ 0.3391 0.564 5178 0.5659
 $X$ $\omega^{Y|X}$ $\widehat{\omega}^{Y|X}$ total revenue average revenue $X_1$ 0.3406 0.456 4313 0.4714 $X_2$ 0.3391 0.564 5178 0.5659
Association with/without cost vectors: $X_1$ and $X_2$
 $X$ $\omega^{Y|X}$ $\widehat{\omega}^{Y|X}$ $\bar{\omega}^{Y|X}$ total profit average profit $X_1$ 0.3406 0.3406 1.3057 12016.17 1.3132 $X_2$ 0.3391 0.3391 1.8546 17072.17 1.8658
 $X$ $\omega^{Y|X}$ $\widehat{\omega}^{Y|X}$ $\bar{\omega}^{Y|X}$ total profit average profit $X_1$ 0.3406 0.3406 1.3057 12016.17 1.3132 $X_2$ 0.3391 0.3391 1.8546 17072.17 1.8658
Association with/without new cost vectors: $X_1$ and $X_2$
 $X$ $\omega^{Y|X}$ $\widehat{\omega}^{Y|X}$ $\bar{\omega}^{Y|X}$ total profit average profit $X_1$ 0.3406 0.3406 1.7420 15938.17 1.7419 $X_2$ 0.3391 0.3391 1.3424 12268.17 1.3408
 $X$ $\omega^{Y|X}$ $\widehat{\omega}^{Y|X}$ $\bar{\omega}^{Y|X}$ total profit average profit $X_1$ 0.3406 0.3406 1.7420 15938.17 1.7419 $X_2$ 0.3391 0.3391 1.3424 12268.17 1.3408
Simulated feature selection: one variable
 $X$ $|Dmn(X)|$ $\omega^{Y|X}$ $\bar{\omega}^{Y|X}$ total profit average profit $V_1$ 7 0.3906 3.5381 35390 3.5390 $V_2$ 4 0.3882 3.8433 38771 3.8771 $V_{3}$ 4 0.3250 4.8986 48678 4.8678 $V_{4}$ 8 0.3274 3.7050 36889 3.6889
 $X$ $|Dmn(X)|$ $\omega^{Y|X}$ $\bar{\omega}^{Y|X}$ total profit average profit $V_1$ 7 0.3906 3.5381 35390 3.5390 $V_2$ 4 0.3882 3.8433 38771 3.8771 $V_{3}$ 4 0.3250 4.8986 48678 4.8678 $V_{4}$ 8 0.3274 3.7050 36889 3.6889
Simulated feature selection: two variables
 $X_1, X_2$ $|Dmn(X_1, X_2)|$ $\omega^{Y|(X_1, X_2)}$ $\bar{\omega}^{Y|(X_1, X_2)}$ total profit average profit $V_1,V_2$ 28 0.4367 1.8682 18971 1.8971 $V_1, V_{3}$ 28 0.4025 2.1106 20746 2.0746 $V_1, V_{4}$ 56 0.4055 1.8055 17915 1.7915 $V_{3}, V_2$ 16 0.4055 2.3585 24404 2.4404 $V_{3}, V_{4}$ 32 0.3385 2.0145 19903 1.9903
 $X_1, X_2$ $|Dmn(X_1, X_2)|$ $\omega^{Y|(X_1, X_2)}$ $\bar{\omega}^{Y|(X_1, X_2)}$ total profit average profit $V_1,V_2$ 28 0.4367 1.8682 18971 1.8971 $V_1, V_{3}$ 28 0.4025 2.1106 20746 2.0746 $V_1, V_{4}$ 56 0.4055 1.8055 17915 1.7915 $V_{3}, V_2$ 16 0.4055 2.3585 24404 2.4404 $V_{3}, V_{4}$ 32 0.3385 2.0145 19903 1.9903

Impact Factor: