Feature selection is a valuable tool in supervised machine learning research fields, such as pattern recognition or classification problems. Feature selection used to eliminate irrelevant and noise features that adversely affect results. Swarm algorithms are usually used in feature selection problem; these algorithms need transfer functions that change search space from continuous to the discrete. However, transfer functions are the backbone of all binary swarm algorithms. Transfer functions in the current formula cannot provide binary swarm algorithms with a fit balance between exploration and exploitation stages. In this work, a feature selection approach based on the binary whale optimization algorithm with different kinds of updating techniques for the time-varying transfer functions is proposed. To evaluate the performance of the proposed method, three of each chemical and biological binary datasets are used. The results proved that BWOA-TV2 has consistency in feature selection and it gives rise to the high accuracy of the classification with more congruent in the convergence. It worth mentioning that the proposed method is proved advance in performance over competitor optimization algorithms, such as particle swarm optimization (PSO) and firefly optimization (FO) that commonly used in this field.
Citation: |
Table 1. Some swarm intelligence algorithms with various time-varying mechanisms
Table 2. High-dimensional binary datasets
Datasets | Class (+/-) | Data type | ||
anti-hepatitis C virus | 121 | 2559 | (31/90) | chemical |
antimicrobial agents | 212 | 3657 | (108/104) | chemical |
H1N1 | 479 | 2322 | (266/213) | chemical |
Leukemia | 72 | 7129 | (47/25) | biological |
Breast Cancer | 38 | 7129 | (18/20) | biological |
Prostate Cancer | 102 | 12600 | (52/50) | biological |
Table 3.
Comparison between the influence of updating techniques over the proposed method in terms of average CA with standard deviation and
Methods | |||||
Datasets | Indicator | BWOA-TV1 | BWOA-TV2 | BWOA-TV3 | BWOA-Sigmoid |
anti-hepatitis | CA | 96.01(0.763) | 96.11(0.639) | 94.03(1.282) | 93.92(1.065) |
C virus | 10.87 | 8.77 | 13.33 | 14.66 | |
antimicrobial | CA | 92.05(1.045) | 93.99(0.987) | 91.57(1.084) | 92.65(0.885) |
agents | 12.60 | 10.53 | 16.20 | 11.76 | |
CA | 98.25(0.973) | 98.34(1.002) | 97.46(1.078) | 96.89(1.541) | |
H1N1 | 9.83 | 7.20 | 11.45 | 14.25 | |
CA | 96.56(1.289) | 97.21(0.587) | 96.29(1.972) | 93.29(1.18) | |
Leukemia | 10.56 | 9.23 | 13.37 | 16.34 | |
Breast | CA | 93.21(0.932) | 94.84(1.021) | 92.81(2.01) | 92.17(1.49) |
Cancer | 17.92 | 17.03 | 21.49 | 22.31 | |
Prostate | CA | 98.22(0.581) | 97.82(0.721) | 96.21(1.143) | 96.03(0.927) |
Cancer | 9.29 | 10.03 | 10.21 | 10.33 |
Table 4. Comparison between the TVTFs kinds in terms of average CA with standard deviation according to the testing data
Methods | ||||
Datasets | BWOA-TV1 | BWOA-TV2 | BWOA-TV3 | BWOA-Sigmoid |
anti-hepatitis C virus | 94.56(0.897) | 94.87(0.654) | 91.75(1.914) | 91.49(0.514) |
antimicrobial agents | 90.32(0.986) | 90.95(1.290) | 89.12(1.590) | 89.85(1.824) |
H1N1 | 96.07(0.904) | 97.26(1.561) | 94.87(1.721) | 94.34(2.005) |
Leukemia | 94.02(0.836) | 94.85(1.051) | 93.79(0.652) | 90.93(0.329) |
Breast Cancer | 90.91(0.730) | 92.32(0.928) | 89.95(0.230) | 89.27(0.296) |
Prostate Cancer | 96.91(0.476) | 96.39(0.713) | 94.31(0.931) | 94.09(1.048) |
Table 5. Basic settings of the BPSO and BFO optimizers
BPSO | BFO | ||
Table 6.
Comparison between the proposed method and rival methods in terms of average CA with standard deviation and
Methods | ||||
Datasets | Indicator | BWOA-TV2 | BFO-Sigmoid | BPSO-Sigmoid |
anti-hepatitis | CA | 96.11(0.639) | 92.51(1.431) | 91.04(1.289) |
C virus | 8.77 | 17.72 | 21.33 | |
antimicrobial | CA | 93.99(0.987) | 90.07(1.129) | 89.91(1.787) |
agents | 10.53 | 20.29 | 22.31 | |
CA | 98.34(1.002) | 93.98(1.236) | 92.71(1.763) | |
H1N1 | 7.20 | 17.33 | 19.83 | |
CA | 97.21(0.587) | 93.92(1.201) | 93.51(1.02) | |
Leukemia | 9.23 | 17.41 | 17.60 | |
Breast | CA | 94.84(1.021) | 91.92(0.921) | 91.27(0.907) |
Cancer | 17.03 | 23.39 | 23.91 | |
Prostate | CA | 97.82(0.721) | 97.28(0.932) | 97.65(0.829) |
Cancer | 10.03 | 10.92 | 10.41 |
Table 7. Comparison between the proposed method and rival methods in terms of average CA with standard deviation according to the testing data
Methods | |||
Datasets | BWOA-TV2 | BFO-Sigmoid | BPSO-Sigmoid |
anti-hepatitis C virus | 94.87(0.654) | 90.43(1.752) | 89.31(2.801) |
antimicrobial agents | 90.95(1.290) | 88.62(2.006) | 87.59(2.582) |
H1N1 | 97.26(1.561) | 92.28(1.320) | 89.89 (1.920) |
Leukemia | 94.85(1.051) | 91.53(1.838) | 90.97(1.308) |
Breast Cancer | 92.32(0.928) | 89.57(1.534) | 89.49(1.395) |
Prostate Cancer | 96.39(0.713) | 94.89(0.872) | 95.06(0.396) |
[1] | P. Adarshvijayan, S. K. Nandakumar, Pr iyadarshini and K. R. Devabalaji, Economic dispatch problem using whale optimization algorithm, International Journal of Pure and Applied Mathematics, 117 (2017), 253-256. |
[2] | Z. Y. Algamal and M. H. Lee, Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification, Expert Systems with Applications, 42 (2015), 9326-9332. |
[3] | I. I. Ali, Optimal location of SSSC based on PSO to improve voltage profile and reduce Iraqi grid system losses, Engineering and Technology Journal, 35 (2017), 372-380. |
[4] | H. Banati and M. Bajaj, Fire fly based feature selection approach, International Journal of Computer Science Issues (IJCSI), 8 (2011), 473. |
[5] | R. Barham and I. Aljarah, Link prediction based on whale optimization algorithm, in 2017 International Conference on New Trends in Computing Sciences (ICTCS), IEEE, (2017), 55–60. |
[6] | M. L. Bermingham et al., Application of high-dimensional feature selection: evaluation for genomic prediction in man, Scientific reports, 5 (2015), 10312. |
[7] | M. Chih, C. J. Lin, M. S. Chern and T. Y. Ou, Particle swarm optimization with time-varying acceleration coefficients for the multidimensional knapsack problem, Applied Mathematical Modelling, 38 (2014), 1338-1350. |
[8] | L. Y. Chuang, H. W. Chang, C. J. Tu and C. H. Yang, Improved binary PSO for feature selection using gene expression data, Computational Biology and Chemistry, 32 (2008), 29-38. |
[9] | E. Emary, H. M. Zawbaa, K. K. A. Ghany, A. E. Hassanien and B. Parv, Firefly optimization algorithm for feature selection, in Proceedings of the 7th Balkan Conference on Informatics Conference, ACM, (2015), 26. |
[10] | T. R. Golub, et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286 (1999), 531-537. |
[11] | I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422. |
[12] | P. Hart, The condensed nearest neighbor rule (Corresp.), IEEE Transactions on Information Theory, 14 (1968), 515-516. |
[13] | M. J. Islam, X. Li and Y. Mei, A time-varying transfer function for balancing the exploration and exploitation ability of a binary PSO, Applied Soft Computing, 59 (2017), 182-196. |
[14] | M. Jassim, Improved PSO algorithm to attack transposition cipher, Engineering and Technology Journal, 35 (2017), 144-149. |
[15] | I. J. Kang, et al., Design and efficient synthesis of novel arylthiourea derivatives as potent hepatitis C virus inhibitors, Bioorganic and Medicinal Chemistry Letters, 19 (2009), 6063-6068. |
[16] | I. J. Kang, et al., Design, synthesis, and anti-HCV activity of thiourea compounds, Bioorganic And Medicinal Chemistry Letters, 19 (2009), 1950-1955. |
[17] | I. J. Kang, et al., Synthesis, activity, and pharmacokinetic properties of a series of conformationally - restricted thiourea analogs as novel hepatitis C virus inhibitors, Bioorganic and Medicinal Chemistry, 18 (2010), 6414-6421. |
[18] | A. Kaveh and M. I. Ghazaan, Enhanced whale optimization algorithm for sizing optimization of skeletal structures, Mechanics Based Design of Structures and Mach., 45 (2017), 345-362. |
[19] | J. Kennedy and R. C. Eberhart, A discrete binary version of the particle swarm algorithm, , in 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, IEEE, (1997), 4104–4108. |
[20] | R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, 97 (1997), 273-324. |
[21] | N. Khatri, V. Lather and A. K. Madan, Diverse classification models for anti-hepatitis C virus activity of thiourea derivatives, Chemometrics and Intelligent Laboratory Systems, 140 (2015), 13-21. |
[22] | Y. Li, Y. Kong, M. Zhang, A. Yan and Z. Liu, Using support vector machine (SVM) for classification of selectivity of H1N1 neuraminidase inhibitors, Molecular informatics, 35 (2016), 116-124. |
[23] | C. Liao, S. Li and Z. Luo, Gene selection using wilcoxon rank sum test and support vector machine for cancer classification, in International Conference on Computational and Information Science (CIS 2006), Springer, (2006), 57-66. |
[24] | M. Mafarja, D. Eleyan, S. Abdullah and S. Mirjalili, S-shaped vs. V-shaped transfer functions for ant lion optimization algorithm in feature selection problem, in Proceedings of the International Conference on Future Networks and Distributed Systems, ACM, (2017), 21. |
[25] | M. Mafarja, I. Jaber, S. Ahmed and T. Thaher, Whale optimisation algorithm for high-dimensional small-instance feature selection, International Journal of Parallel, Emergent and Distributed Systems, (2019), 1–17. |
[26] | M. Mafarja, et al., Binary dragonfly optimization for feature selection using time-varying transfer functions, Knowledge-Based Systems, 161 (2018), 185-204. |
[27] | M. Mafarja, R. Jarrar, S. Ahmad and A. A. Abusnaina, Feature selection using binary particle swarm optimization with time varying inertia weight strategies, , in Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, ACM, (2018), 18. |
[28] | M. Mafarja and S. Mirjalili, Whale optimization approaches for wrapper feature selection, Applied Soft Computing, 62 (2018), 441-453. |
[29] | M. M. Mafarja and S. Mirjalili, Hybrid Whale Optimization Algorithm with simulated annealing for feature selection, Neurocomputing, 260 (2017), 302-312. |
[30] | S. Mirjalili and A. Lewis, S-shaped versus V-shaped transfer functions for binary particle swarm optimization, Swarm and Evolutionary Computation, 9 (2013), 1-14. |
[31] | S. Mirjalili and A. Lewis, The whale optimization algorithm, Advances in Engineering Software, 95 (2016), 51-67. |
[32] | R. Y. M. Nakamura, L. A. M. Pereira, K. A. Costa, D. Rodrigues, J. P. Papa and X. S. Yang, BBA: a binary bat algorithm for feature selection, in 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, IEEE, (2012), 291–297. |
[33] | O. S. Qasim and Z. Y. Algamal, Feature selection using particle swarm optimization-based logistic regression model, Chemometrics and Intelligent Laboratory Sys., 182 (2018), 41-46. |
[34] | D. Rodrigues, et al., A wrapper approach for feature selection based on bat algorithm and optimum-path forest, Expert Systems with Applications, 41 (2014), 2250-2258. |
[35] | D. Rodrigues et al., BCS: A binary cuckoo search algorithm for feature selection, in 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), IEEE, (2013), 465–468. |
[36] | D. Singh, et al., Gene expression correlates of clinical prostate cancer behavior, Cancer cell, 1 (2001), 203-209. |
[37] | P. A. Vikhar, Evolutionary algorithms: A critical review and its future prospects, , in 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), IEEE, (2016), 261–265. |
[38] | X. Wang, J. Yang, X. Teng, W. Xia and R. Jensen, Feature selection based on rough sets and particle swarm optimization, Pattern Recognition Letters, 28 (2007), 459-471. |
[39] | M. West et al., Predicting the clinical status of human breast cancer by using gene expression profiles, Proceedings of the National Academy of Sciences, 98 (2001), 11462-11467. |
[40] | I. H. Witten, E. Frank, M. A. Hall and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, , Morgan Kaufmann, 2016. |
[41] | X. Wu et al., Top 10 algorithms in data mining, Knowledge and Information Systems, 14 (2008), 1-37. |
[42] | J. J. Xing, Y. F. Liu, Y. Q. Li, H. Gong and Y. P. Zhou, QSAR classification model for diverse series of antimicrobial agents using classification tree configured by modified particle swarm optimization, Chemometrics and Intelligent Laboratory Systems, 137 (2014), 82-90. |
[43] | B. Xue, M. Zhang, W. N. Browne and X. Yao, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20 (2016), 606-626. |
[44] | C. Yang, W. Gao, N. Liu and C. Song, Low-discrepancy sequence initialized particle swarm optimization algorithm with high-order nonlinear time-varying inertia weight, Applied Soft Computing, 29 (2015), 386-394. |
[45] | X. S. Yang, Firefly algorithm, Nature-Inspired Metaheuristic Algorithms, 20 (2008), 79-90. |
[46] | B. Zeng, L. Gao and X. Li, Whale swarm algorithm for function optimization, in International Conference on Intelligent Computing, Springer, (2017), 624–639. |