August  2020, 3(3): 185-193. doi: 10.3934/mfc.2020017

Averaging versus voting: A comparative study of strategies for distributed classification

Department of Mathematical Sciences, Middle Tennessee State University, 1301 E Main Street, Murfreesboro, TN 37132, USA

* Corresponding author: Qiang Wu

Received  March 2020 Revised  May 2020 Published  June 2020

In this paper we proposed two strategies, averaging and voting, to implement distributed classification via the divide and conquer approach. When a data set is too big to be processed by one processor or is naturally stored in different locations, the method partitions the whole data into multiple subsets randomly or according to their locations. Then a base classification algorithm is applied to each subset to produce a local classification model. Finally, averaging or voting is used to couple the local models together to produce the final classification model. We performed thorough empirical studies to compare the two strategies. The results show that averaging is more effective in most scenarios.

Citation: Donglin Wang, Honglan Xu, Qiang Wu. Averaging versus voting: A comparative study of strategies for distributed classification. Mathematical Foundations of Computing, 2020, 3 (3) : 185-193. doi: 10.3934/mfc.2020017
References:
[1]

R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David and C. E. Elger, Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state, Physical Review E, 64 (2001), 061907. doi: 10.1103/PhysRevE.64.061907.  Google Scholar

[2]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7.  Google Scholar

[3]

R. K. BockA. ChilingarianM. GaugF. HaklT. HengstebeckM. JiřinaJ. KlaschkaE. KotrčP. SavickỳS. TowersA. Vaiciulis and W. Wittek, Methods for multidimensional event classification: a case study using images from a cherenkov gamma-ray telescope, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 516 (2004), 511-528.  doi: 10.1016/j.nima.2003.08.157.  Google Scholar

[4]

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.  Google Scholar

[5] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[6]

J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[7] I. GoodfellowY. Bengio and A. Courville, Deep learning, MIT Press, Cambridge, MA, 2016.   Google Scholar
[8]

X. Guo, T. Hu and Q. Wu, Distributed minimum error entropy algorithms, preprint, (2020). Google Scholar

[9]

Z.-C. Guo, S.-B. Lin and D.-X. Zhou, Learning theory of distributed spectral algorithms, Inverse Problems, 33 (2017), 074009. doi: 10.1088/1361-6420/aa72b2.  Google Scholar

[10]

Z.-C. GuoL. Shi and Q. Wu, Learning theory of distributed regression with bias corrected regularization kernel network, Journal of Machine Learning Research, 18 (2017), 1-25.   Google Scholar

[11]

Z.-C. GuoD.-H. XiangX. Guo and D.-X. Zhou, Thresholded spectral algorithms for sparse approximations, Analysis and Applications, 15 (2017), 433-455.  doi: 10.1142/S0219530517500026.  Google Scholar

[12]

T. HuQ. Wu and D.-X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Applied and Computational Harmonic Analysis, 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

B. A. JohnsonR. Tateishi and N. T. Hoan, A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees, International Journal of Remote Sensing, 34 (2013), 6969-6982.  doi: 10.1080/01431161.2013.810825.  Google Scholar

[14]

S.-B. LinX. Guo and D.-X. Zhou, Distributed learning with regularized least squares, Journal of Machine Learning Research, 18 (2017), 1-31.   Google Scholar

[15]

E. C. Ozan, E. Riabchenko, S. Kiranyaz and M. Gabbouj, An optimized k-nn approach for classification on imbalanced datasets with missing data, in International Symposium on Intelligent Data Analysis, Springer, (2016), 387–392. doi: 10.1007/978-3-319-46349-0_34.  Google Scholar

[16]

J. Platt, Fast training of support vector machines using sequential minimal optimization, Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge, MA, (1999), 185–208. Google Scholar

[17]

J. G. Rohra, B. Perumal, S. J. Narayanan, P. Thakur and R. B. Bhatt, User localization in an indoor environment using fuzzy hybrid of particle swarm optimization & gravitational search algorithm with neural networks, in Proceedings of Sixth International Conference on Soft Computing for Problem Solving, Springer, (2017), 286–295. doi: 10.1007/978-981-10-3322-3_27.  Google Scholar

[18]

J. D. Rosenblatt and B. Nadler, On the optimality of averaging in distributed statistical learning, Information and Inference: A Journal of the IMA, 5 (2016), 379-404.  doi: 10.1093/imaiai/iaw013.  Google Scholar

[19]

E. R. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. Gonzalez, M. J. Franklin, M. I. Jordan and T. Kraska, Mli: An api for distributed machine learning, in 2013 IEEE 13th International Conference on Data Mining, IEEE, (2013), 1187–1192. doi: 10.1109/ICDM.2013.158.  Google Scholar

[20]

I. Steinwart, Support vector machines are universally consistent, Journal of Complexity, 18 (2002), 768-791.  doi: 10.1006/jcom.2002.0642.  Google Scholar

[21]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[22]

V. Vapnik, Statistical learning theory, John Wiley & Sons, Inc., New York, 1998.  Google Scholar

[23]

Q. WuY. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, Journal of Complexity, 23 (2007), 108-134.   Google Scholar

[24]

Q. Wu and D.-X. Zhou, Analysis of support vector machine classification, Journal of Computational Analysis & Applications, 8 (2006), 99-119.   Google Scholar

[25]

I.-C. Yeh and C.-H. Lien, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Expert Systems with Applications, 36 (2009), 2473-2480.  doi: 10.1016/j.eswa.2007.12.020.  Google Scholar

[26]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Annals of Statistics, 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.  Google Scholar

[27]

Y. ZhangJ. C. Duchi and M. J. Wainwright, Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, Journal of Machine Learning Research, 16 (2015), 3299-3340.   Google Scholar

show all references

References:
[1]

R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David and C. E. Elger, Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state, Physical Review E, 64 (2001), 061907. doi: 10.1103/PhysRevE.64.061907.  Google Scholar

[2]

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337-404.  doi: 10.1090/S0002-9947-1950-0051437-7.  Google Scholar

[3]

R. K. BockA. ChilingarianM. GaugF. HaklT. HengstebeckM. JiřinaJ. KlaschkaE. KotrčP. SavickỳS. TowersA. Vaiciulis and W. Wittek, Methods for multidimensional event classification: a case study using images from a cherenkov gamma-ray telescope, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 516 (2004), 511-528.  doi: 10.1016/j.nima.2003.08.157.  Google Scholar

[4]

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.  Google Scholar

[5] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007.  doi: 10.1017/CBO9780511618796.  Google Scholar
[6]

J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer-Verlag, New York, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar

[7] I. GoodfellowY. Bengio and A. Courville, Deep learning, MIT Press, Cambridge, MA, 2016.   Google Scholar
[8]

X. Guo, T. Hu and Q. Wu, Distributed minimum error entropy algorithms, preprint, (2020). Google Scholar

[9]

Z.-C. Guo, S.-B. Lin and D.-X. Zhou, Learning theory of distributed spectral algorithms, Inverse Problems, 33 (2017), 074009. doi: 10.1088/1361-6420/aa72b2.  Google Scholar

[10]

Z.-C. GuoL. Shi and Q. Wu, Learning theory of distributed regression with bias corrected regularization kernel network, Journal of Machine Learning Research, 18 (2017), 1-25.   Google Scholar

[11]

Z.-C. GuoD.-H. XiangX. Guo and D.-X. Zhou, Thresholded spectral algorithms for sparse approximations, Analysis and Applications, 15 (2017), 433-455.  doi: 10.1142/S0219530517500026.  Google Scholar

[12]

T. HuQ. Wu and D.-X. Zhou, Distributed kernel gradient descent algorithm for minimum error entropy principle, Applied and Computational Harmonic Analysis, 49 (2020), 229-256.  doi: 10.1016/j.acha.2019.01.002.  Google Scholar

[13]

B. A. JohnsonR. Tateishi and N. T. Hoan, A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees, International Journal of Remote Sensing, 34 (2013), 6969-6982.  doi: 10.1080/01431161.2013.810825.  Google Scholar

[14]

S.-B. LinX. Guo and D.-X. Zhou, Distributed learning with regularized least squares, Journal of Machine Learning Research, 18 (2017), 1-31.   Google Scholar

[15]

E. C. Ozan, E. Riabchenko, S. Kiranyaz and M. Gabbouj, An optimized k-nn approach for classification on imbalanced datasets with missing data, in International Symposium on Intelligent Data Analysis, Springer, (2016), 387–392. doi: 10.1007/978-3-319-46349-0_34.  Google Scholar

[16]

J. Platt, Fast training of support vector machines using sequential minimal optimization, Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge, MA, (1999), 185–208. Google Scholar

[17]

J. G. Rohra, B. Perumal, S. J. Narayanan, P. Thakur and R. B. Bhatt, User localization in an indoor environment using fuzzy hybrid of particle swarm optimization & gravitational search algorithm with neural networks, in Proceedings of Sixth International Conference on Soft Computing for Problem Solving, Springer, (2017), 286–295. doi: 10.1007/978-981-10-3322-3_27.  Google Scholar

[18]

J. D. Rosenblatt and B. Nadler, On the optimality of averaging in distributed statistical learning, Information and Inference: A Journal of the IMA, 5 (2016), 379-404.  doi: 10.1093/imaiai/iaw013.  Google Scholar

[19]

E. R. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. Gonzalez, M. J. Franklin, M. I. Jordan and T. Kraska, Mli: An api for distributed machine learning, in 2013 IEEE 13th International Conference on Data Mining, IEEE, (2013), 1187–1192. doi: 10.1109/ICDM.2013.158.  Google Scholar

[20]

I. Steinwart, Support vector machines are universally consistent, Journal of Complexity, 18 (2002), 768-791.  doi: 10.1006/jcom.2002.0642.  Google Scholar

[21]

I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, 2008.  Google Scholar

[22]

V. Vapnik, Statistical learning theory, John Wiley & Sons, Inc., New York, 1998.  Google Scholar

[23]

Q. WuY. Ying and D.-X. Zhou, Multi-kernel regularized classifiers, Journal of Complexity, 23 (2007), 108-134.   Google Scholar

[24]

Q. Wu and D.-X. Zhou, Analysis of support vector machine classification, Journal of Computational Analysis & Applications, 8 (2006), 99-119.   Google Scholar

[25]

I.-C. Yeh and C.-H. Lien, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Expert Systems with Applications, 36 (2009), 2473-2480.  doi: 10.1016/j.eswa.2007.12.020.  Google Scholar

[26]

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Annals of Statistics, 32 (2004), 56-85.  doi: 10.1214/aos/1079120130.  Google Scholar

[27]

Y. ZhangJ. C. Duchi and M. J. Wainwright, Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, Journal of Machine Learning Research, 16 (2015), 3299-3340.   Google Scholar

Table 1.  Description of Data Sets and Classification Tasks
Classification Task Number of Observations Number of Features
Default of Credit Card Clients 30,000 23
Wilt Diseased Tree Detection 4,889 5
APS Failure 60,000 170
MAGIC Gamma Telescope 19,020 10
Spam Email Detection 4,601 57
Epileptic Seizures 9,200 178
Wireless Localization {1, 2} vs {3, 4} 2,000 7
Student Evaluation {1, 2} vs {3, 4, 5} 5,046 32
Handwritten Digits 5 vs 8 12,017 786
Classification Task Number of Observations Number of Features
Default of Credit Card Clients 30,000 23
Wilt Diseased Tree Detection 4,889 5
APS Failure 60,000 170
MAGIC Gamma Telescope 19,020 10
Spam Email Detection 4,601 57
Epileptic Seizures 9,200 178
Wireless Localization {1, 2} vs {3, 4} 2,000 7
Student Evaluation {1, 2} vs {3, 4, 5} 5,046 32
Handwritten Digits 5 vs 8 12,017 786
Table 2.  Classification accuracy (in percentage) of distributed logistic regression and p-values of hypothesis tests on the difference between voting and averaging strategies
Classification Task Voting Averaging p-value
Default of Credit Card Clients 73.71 80.15 <2.2e-16
Wilt Diseased Tree Detection 95.42 96.94 <2.2e-16
APS Failure 98.39 98.75 <2.2e-16
MAGIC Gamma Telescope 79.18 79.18 0.9845
Spam Email Detection 61.52 92.83 <2.2e-16
Epileptic Seizure 50.10 66.11 <2.2e-16
Wireless Localization {1, 2} vs {3, 4} 91.77 95.15 <2.2e-16
Student Evaluation {1, 2} vs {3, 4, 5} 91.81 95.17 <2.2e-16
Handwritten Digits 5 vs 8 84.46 95.84 <2.2e-16
Classification Task Voting Averaging p-value
Default of Credit Card Clients 73.71 80.15 <2.2e-16
Wilt Diseased Tree Detection 95.42 96.94 <2.2e-16
APS Failure 98.39 98.75 <2.2e-16
MAGIC Gamma Telescope 79.18 79.18 0.9845
Spam Email Detection 61.52 92.83 <2.2e-16
Epileptic Seizure 50.10 66.11 <2.2e-16
Wireless Localization {1, 2} vs {3, 4} 91.77 95.15 <2.2e-16
Student Evaluation {1, 2} vs {3, 4, 5} 91.81 95.17 <2.2e-16
Handwritten Digits 5 vs 8 84.46 95.84 <2.2e-16
Table 3.  Classification accuracy (in percentage) of distributed SVM and p-values of hypothesis tests on the difference between voting and averaging strategies
Classification Task Voting Averaging p-value
Default of Credit Card Clients 79.29 79.48 9.2e-05
Wilt Diseased Tree Detection 96.83 97.19 4.6e-08
APS Failure 98.52 98.60 <2.2e-16
MAGIC Gamma Telescope 86.59 86.64 0.2107
Spam Email Detection 93.20 93.47 0.0001
Epileptic Seizure 89.16 89.46 0.0008
Wireless Localization {1, 2} vs {3, 4} 95.42 95.47 0.3773
Student Evaluation {1, 2} vs {3, 4, 5} 95.31 95.34 0.6433
Handwritten Digits 5 vs 8 99.50 99.54 2.3e-05
Classification Task Voting Averaging p-value
Default of Credit Card Clients 79.29 79.48 9.2e-05
Wilt Diseased Tree Detection 96.83 97.19 4.6e-08
APS Failure 98.52 98.60 <2.2e-16
MAGIC Gamma Telescope 86.59 86.64 0.2107
Spam Email Detection 93.20 93.47 0.0001
Epileptic Seizure 89.16 89.46 0.0008
Wireless Localization {1, 2} vs {3, 4} 95.42 95.47 0.3773
Student Evaluation {1, 2} vs {3, 4, 5} 95.31 95.34 0.6433
Handwritten Digits 5 vs 8 99.50 99.54 2.3e-05
[1]

Ying Lin, Qi Ye. Support vector machine classifiers by non-Euclidean margins. Mathematical Foundations of Computing, 2020, 3 (4) : 279-300. doi: 10.3934/mfc.2020018

[2]

Jianhua Huang, Yanbin Tang, Ming Wang. Singular support of the global attractor for a damped BBM equation. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020345

[3]

Mathew Gluck. Classification of solutions to a system of $ n^{\rm th} $ order equations on $ \mathbb R^n $. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5413-5436. doi: 10.3934/cpaa.2020246

[4]

Wenjun Liu, Yukun Xiao, Xiaoqing Yue. Classification of finite irreducible conformal modules over Lie conformal algebra $ \mathcal{W}(a, b, r) $. Electronic Research Archive, , () : -. doi: 10.3934/era.2020123

 Impact Factor: 

Metrics

  • PDF downloads (69)
  • HTML views (212)
  • Cited by (0)

Other articles
by authors

[Back to Top]