Article Contents
Article Contents

# A kernel-free fuzzy support vector machine with Universum

• *Corresponding author: Hongmiao Zhu

The first author is supported by National Natural Science Foundation of China (No.71901140) and Humanities and Social Science Fund of Ministry of Education of China (No.18YJC630220). The second author is supported by National Natural Science Foundation of China (No.72101143)

• Support vector machines with Universum are attractive for dealing with classification problems by incorporating prior information. In this paper, a quadratic function based kernel-free support vector machine with Universum is proposed for binary classification. To deal with noise and outliers, two fuzzy membership functions considering both information entropy and distance information are constructed for labeled and Universum data, respectively. The fuzzy membership function for Universum is also adopted for further selecting Universum data to improve the robustness. The proposed model corresponds to an efficiently solved convex quadratic programming. In the meanwhile, by avoiding the issue of choosing kernel functions, the proposed model saves more computational time when compared with other Universum-based support vector machines. Finally, some numerical tests are implemented on several data sets to validate the classification effectiveness of the proposed method. The numerical results illustrate the competitive performance when compared with some state-of-the-art support vector machines. Applications on two credit rating data sets are also conducted to distinguish the classification performance of the proposed method.

Mathematics Subject Classification: Primary: 62H30; Secondary: 90C90.

 Citation:

• Figure 1.  Distribution of four artificial data sets. Triangles and circles represent two different classes of data points, respectively. Squares represent Universum data points. The solid points shown in figures (b) and (d) are mislabeled points which are manmade

Table 1.  Accuracy rate comparison of KFFUSVM with QSSVM, EFSVM, USVM and FUSVM on artificial data sets (%)

 KFFUSVM QSSVM EFSVM USVM FUSVM Data1a $\mathbf{98.75\pm2.64}$ $95.63\pm 5.15$ $96.25\pm4.37$ $98.13\pm3.02$ $96.88\pm3.29$ Data1b $\mathbf{95.00\pm5.74}$ $92.50\pm 6.45$ $93.75\pm7.80$ $93.75\pm7.22$ $93.75\pm7.22$ Data2a $\mathbf{96.88\pm4.42}$ $95.63\pm 5.93$ $88.75\pm7.68$ $93.75\pm5.89$ $91.88\pm5.93$ Data2b $\mathbf{93.75\pm4.17}$ $91.88\pm 4.22$ $86.88\pm4.61$ $92.50\pm6.45$ $91.88\pm4.22$

Table 2.  Mean of computational time on artificial data sets (seconds)

 KFFUSVM QSSVM EFSVM USVM FUSVM Time 1.83 0.12 4.85 4.24 4.90

Table 3.  Detail information on benchmark data sets

 Dataset Number Dimension Positive number Negative number Ecoli0146Vs5 280 6 260 20 Ecoli034Vs5 200 7 180 20 Glass016Vs5 184 9 175 9 Glass0 214 9 144 70 Ecoli01Vs5 240 7 220 20 Absenteeism 740 20 420 320 Ecoli067Vs5 220 7 200 20 CMC 1473 9 844 629 Glass4 214 9 201 13 BupaLiver 345 6 200 145 Transfusion 748 4 570 178 Ecoli0147Vs56 332 6 307 25 Yeast2Vs8 483 8 463 20 Ecoli4 336 7 316 20 Vehicle0 846 18 647 199 Haberman 306 3 225 81

Table 4.  ACC comparison of KFFUSVM with QSSVM, EFSVM, USVM and FUSVM on benchmark data sets (%)

 Dataset KFFUSVM QSSVM EFSVM USVM FUSVM Ecoli0146Vs5 $\mathbf{98.21\pm1.19}$ $96.79\pm1.64$ $95.18\pm3.16$ $96.96\pm2.07$ $97.14\pm2.10$ Ecoli034Vs5 $\mathbf{99.50\pm1.05}$ $96.25\pm 1.77$ $97.25\pm2.99$ $98.75\pm1.32$ $98.50\pm1.75$ Glass016Vs5 $94.05\pm 2.13$ $94.32\pm1.53$ $94.05\pm3.78$ $\mathbf{95.95\pm2.92}$ $\mathbf{95.95\pm2.63}$ Glass0 $72.09\pm5.26$ $73.02\pm5.17$ $75.35\pm5.05$ $\mathbf{76.98\pm7.47}$ $76.51\pm8.80$ Ecoli01Vs5 $\mathbf{99.58\pm0.88}$ $96.25\pm2.15$ $96.88\pm3.71$ $97.50\pm2.15$ $97.92\pm2.60$ Absenteeism $\mathbf{99.53\pm0.64}$ $\mathbf{99.53\pm0.64}$ $78.31\pm4.21$ $77.84\pm3.56$ $77.84\pm3.56$ Ecoli067Vs5 $96.14\pm1.87$ $93.86\pm1.87$ $93.64\pm4.65$ $94.55\pm2.67$ $\mathbf{96.36\pm3.25}$ CMC $72.07\pm2.20$ $\mathbf{72.20\pm1.79}$ $68.27\pm4.53$ $69.63\pm3.00$ $67.90\pm4.69$ Glass4 $92.95\pm2.26$ $92.50\pm1.87$ $92.73\pm2.99$ $95.23\pm2.92$ $\mathbf{95.45\pm4.67}$ BupaLiver $69.86\pm5.06$ $\mathbf{71.45\pm3.75}$ $62.46\pm3.24$ $64.64\pm6.56$ $66.38\pm6.83$ Transfusion $\mathbf{76.80\pm1.29}$ $76.53\pm1.57$ $76.13\pm1.66$ $75.40\pm2.40$ $75.73\pm2.14$ Ecoli0147Vs56 $\mathbf{98.36\pm1.31}$ $95.97\pm1.42$ $92.84\pm4.15$ $97.46\pm1.73$ $96.72\pm2.09$ Yeast2Vs8 $97.32\pm0.87$ $97.32\pm0.87$ $\mathbf{97.73\pm1.06}$ $97.42\pm0.73$ $97.42\pm0.73$ Ecoli4 $\mathbf{98.68\pm1.29}$ $95.74\pm1.46$ $92.94\pm4.32$ $96.76\pm1.67$ $97.21\pm1.62$ Vehicle0 $\mathbf{98.29\pm0.85}$ $97.76\pm1.10$ $77.24\pm1.82$ $76.65\pm0.28$ $76.65\pm0.28$ Haberman $\mathbf{73.55\pm4.25}$ $73.39\pm2.87$ $71.77\pm3.06$ $72.74\pm3.68$ $72.10\pm4.44$

Table 5.  Average rank comparison of proposed KFFUSVM with QSSVM, EFSVM, USVM and FUSVM for ACC

 Dataset KFFUSVM QSSVM EFSVM USVM FUSVM Ecoli0146Vs5 1 4 5 3 2 Ecoli034Vs5 1 5 4 2 3 Glass016Vs5 4.5 3 4.5 1.5 1.5 Glass0 5 4 3 1 2 Ecoli01Vs5 1 5 4 3 2 Absenteeism 1.5 1.5 3 4.5 4.5 Ecoli067Vs5 2 4 5 3 1 CMC 2 1 4 3 5 Glass4 3 5 4 2 1 BupaLiver 2 1 5 4 3 Transfusion 1 2 3 5 4 Ecoli0147Vs56 1 4 5 2 3 Yeast2Vs8 4.5 4.5 1 2.5 2.5 Ecoli4 1 4 5 3 2 Vehicle0 1 2 3 4.5 4.5 Haberman 1 2 5 3 4 Average rank 2.03 3.25 3.97 2.94 2.81

Table 6.  AUC comparison of KFFUSVM with QSSVM, EFSVM, USVM and FUSVM on benchmark data sets (%)

 Dataset KFFUSVM QSSVM EFSVM USVM FUSVM Ecoli0146Vs5 $92.12\pm6.20$ $77.50\pm11.49$ $84.71\pm19.73$ $91.44\pm7.80$ $\mathbf{93.85\pm6.19}$ Ecoli034Vs5 $98.61\pm3.93$ $81.25\pm 8.84$ $92.92\pm15.53$ $\mathbf{99.31\pm0.73}$ $99.17\pm0.97$ Glass016Vs5 $66.21\pm 19.71$ $64.00\pm12.68$ $63.86\pm22.40$ $\mathbf{81.36\pm23.47}$ $\mathbf{81.36\pm20.49}$ Glass0 $68.23\pm6.05$ $67.81\pm7.11$ $74.89\pm 7.66$ $77.76\pm7.13$ $\mathbf{78.15\pm8.70}$ Ecoli01Vs5 $\mathbf{97.50\pm5.27}$ $77.50\pm12.91$ $81.25\pm22.24$ $88.41\pm13.46$ $93.18\pm15.64$ Absenteeism $\mathbf{99.55\pm0.62}$ $\mathbf{99.55\pm0.62}$ $74.92\pm4.86$ $74.38\pm4.11$ $74.38\pm4.11$ Ecoli067Vs5 $\mathbf{86.63\pm9.90}$ $66.25\pm10.92$ $78.50\pm24.63$ $77.88\pm18.57$ $85.63\pm17.72$ CMC $\mathbf{69.45\pm2.41}$ $69.29\pm1.81$ $66.59\pm4.76$ $68.59\pm2.79$ $67.18\pm4.02$ Glass4 $63.78\pm14.32$ $57.36\pm11.25$ $65.20\pm17.67$ $83.54\pm14.52$ $\mathbf{86.75\pm11.84}$ BupaLiver $\mathbf{70.02\pm4.71}$ $\mathbf{70.02\pm3.90}$ $56.39\pm5.28$ $60.06\pm8.40$ $62.56\pm8.63$ Transfusion $54.71\pm3.71$ $52.92\pm4.06$ $54.08\pm2.84$ $53.41\pm4.08$ $\mathbf{55.34\pm4.38}$ Ecoli0147Vs56 $\mathbf{94.52\pm8.20}$ $74.84\pm8.41$ $73.15\pm22.63$ $88.52\pm12.43$ $89.03\pm10.09$ Yeast2Vs8 $67.50\pm10.54$ $67.50\pm10.54$ $\mathbf{72.50\pm12.91}$ $68.75\pm8.84$ $68.75\pm8.84$ Ecoli4 $\mathbf{89.92\pm9.89}$ $63.75\pm12.43$ $71.64\pm22.98$ $73.67\pm15.94$ $77.42\pm14.13$ Vehicle0 $\mathbf{97.85\pm1.42}$ $97.33\pm1.25$ $51.71\pm4.14$ $50.38\pm0.60$ $50.38\pm0.60$ Haberman $\mathbf{60.37\pm5.85}$ $55.50\pm4.67$ $54.57\pm4.53$ $56.15\pm6.01$ $54.06\pm6.16$

Table 7.  Average rank comparison of proposed KFFUSVM with QSSVM, EFSVM, USVM and FUSVM for AUC

 Dataset KFFUSVM QSSVM EFSVM USVM FUSVM Ecoli0146Vs5 2 5 4 3 1 Ecoli034Vs5 3 5 4 1 2 Glass016Vs5 3 4 5 1.5 1.5 Glass0 4 5 3 2 1 Ecoli01Vs5 1 5 4 3 2 Absenteeism 1.5 1.5 3 4.5 4.5 Ecoli067Vs5 1 5 3 4 2 CMC 1 2 5 3 4 Glass4 4 5 3 2 1 BupaLiver 1.5 1.5 5 4 3 Transfusion 2 5 3 4 1 Ecoli0147Vs56 1 4 5 3 2 Yeast2Vs8 4.5 4.5 1 2.5 2.5 Ecoli4 1 5 4 3 2 Vehicle0 1 2 3 4.5 4.5 Haberman 1 3 4 2 5 Average rank 2.03 3.91 3.69 2.94 2.44

Table 8.  Classification performance on Automobile data sets (%)

 Dataset KFFUSVM QSSVM EFSVM USVM FUSVM Automobile(ACC) $\bf{82.94\pm5.68}$ $79.41\pm5.19$ $80.59\pm5.22$(l) $78.53\pm5.01$(l) $77.65\pm 4.21$(l) $80.88\pm3.73$(p) $81.18\pm5.22$(p) $81.47\pm 7.08$(p) $80.29\pm5.20$g) $78.24\pm5.41$(g) $77.06\pm 5.15$(g) Automobile'(ACC) $\bf{80.88\pm6.08}$ $77.94\pm7.50$ $78.82\pm10.17$(l) $76.47\pm6.50$(l) $78.24\pm 8.11$(l) $77.65\pm9.22$(p) $78.82\pm8.52$(p) $76.47\pm 7.84$(p $78.82\pm6.01$(g) $76.76\pm5.79$(g) $77.06\pm 5.85$(g) Automobile(AUC) $\bf{82.94\pm5.68}$ $79.41\pm5.19$ $80.59\pm5.22$(l) $78.53\pm5.01$(l) $77.65\pm 4.21$(l) $80.88\pm3.73$(p) $81.18\pm5.22$(p) $81.47\pm 7.08$(p) $80.29\pm5.20$(g) $78.24\pm5.41$(g) $77.06\pm 5.15$(g) Automobile'(AUC) $\bf{80.88\pm6.08}$ $77.94\pm7.50$ $78.82\pm10.17$(l) $76.47\pm6.50$(l) $78.24\pm 8.11$(l) $77.65\pm9.22$(p) $78.82\pm8.52$(p) $76.47\pm 7.84$(p) $78.82\pm6.01$(g) $76.76\pm5.79$(g) $77.06\pm 5.85$(g)

Table 9.  Classification performance on Australian data sets (%)

 Dataset KFFUSVM QSSVM EFSVM USVM FUSVM Australian(ACC) $\bf{85.30\pm2.82}$ $84.28\pm2.18$ $84.61\pm2.86$(l) $83.40\pm3.19$(l) $83.46\pm 3.01$(l) $83.65\pm2.54$(p) $84.47\pm1.92$(p) $84.05\pm 2.06$(p) $81.44\pm7.89$(g) $83.77\pm1.81$(g) $84.37\pm 3.68$(g) Australian'(ACC) $\bf{83.87\pm1.83}$ $82.11\pm1.99$ $82.03\pm4.16$(l) $81.59\pm4.27$(l) $81.36\pm 4.04$(l) $79.93\pm4.02$(p) $81.35\pm3.35$(p) $81.07\pm 3.16$(p) $81.89\pm7.36$(g) $80.06\pm6.62$(g) $81.27\pm 6.90$(g) Australian(AUC) $84.57\pm3.39$ $83.02\pm3.28$ $\bf{85.33\pm2.51}$(l) $84.83\pm3.04$(l) $84.83\pm 2.82$(l) $83.62\pm2.81$(p) $84.03\pm2.38$(p) $83.73\pm 2.39$(p) $80.29\pm11.41$(g) $83.88\pm3.09$(g) $83.95\pm 4.04$(g) Australian'(AUC) $\bf{83.27\pm1.79}$ $80.34\pm2.05$ $82.17\pm4.29$(l) $82.19\pm4.76$(l) $81.95\pm 4.53$(l) $79.89\pm4.87$(p) $81.13\pm3.50$(p) $80.90\pm 3.45$(p) $79.78\pm10.75$(g) $78.55\pm9.91$(g) $80.18\pm 10.40$(g)
•  [1] X. Bai and V. Cherkassky, Gender classification of human faces using inference through contradictions, In Proceedings of the IEEE International Joint Conference on Neural Networks, (2008), 746–750. [2] R. Batuwita and V. Palade, FSVM-CIL: Fuzzy support vector machines for class imbalance learning, IEEE Transactions on Fuzzy Systems, 18 (2010), 558-571.  doi: 10.1109/TFUZZ.2010.2042721. [3] C. L. Blake and C. J. Merz, UCIrepository for machine learning databases [online], http//www.ics.uci.edu/ mlearn/MLRepository.html, 1998. [4] S. Chen and C. Zhang, Selecting informative Universum sample for semi-supervised learning, In Proceedings of the 21st International Joint Conference on Artificial Intelligence, (2009), 1016–1021. [5] V. Cherkassky, S. Dhar and W. Dai, Practical conditions for effectiveness of the universum learning, IEEE Transactions on Neural Networks, 22 (2011), 1241-1255.  doi: 10.1109/TNN.2011.2157522. [6] P. Cho, M. Lee and W. Chang, Instance-based entropy fuzzy support vector machine for imbalanced data, PAA Pattern Anal. Appl., 23 (2020), 1183-1202.  doi: 10.1007/s10044-019-00851-x. [7] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018. [8] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7 (2006), 1-30. [9] Q. Fan, Z. Wang, D. Li, D. Gao and H. Zha, Entropy-based fuzzy support vector machine for imbalanced datasets, Knowledge-Based Systems, 115 (2017), 87-99.  doi: 10.1016/j.knosys.2016.09.032. [10] D. Gupta, B. Richhariya and P. Borah, A fuzzy twin support vector machine based on information entropy for class imbalance learning, Neural Computing and Applications, 31 (2019), 7153-7164.  doi: 10.1007/s00521-018-3551-9. [11] J. Huang and C. X. Ling, Using AUC and accuracy in evaluating learning algorithms, IEEE Transactions on Knowledge and Data Engineering, 17 (2005), 299-310. [12] L.-L. Li, X. Zhao, M.-L. Tseng and R. R. Tan, Short-term wind power forecasting based on support vector machine with improved dragonfly algorithm, Journal of Cleaner Production, 242 (2020), 118447.  doi: 10.1016/j.jclepro.2019.118447. [13] C.-F. Lin and S.-D. Wang, Fuzzy support vector machines, IEEE Transactions on Neural Networks, 13 (2002), 464-471. [14] W. Long, Y. Tang and Y. Tian, Investor sentiment identification based on the universum SVM, Neural Computing and Applications, 30 (2018), 661-670.  doi: 10.1007/s00521-016-2684-y. [15] J. Luo, S.-C. Fang, Y. Bai and Z. Deng, Fuzzy quadratic surface support vector machine based on Fisher discriminant analysis, J. Ind. Manag. Optim., 12 (2016), 357-373.  doi: 10.3934/jimo.2016.12.357. [16] J. Luo, S.-C. Fang, Z. Deng and X. Guo, Soft quadratic surface support vector machine for binary classification, Asia-Pac. J. Oper. Res., 33 (2016), 22pp. doi: 10.1142/S0217595916500469. [17] J. Luo, X. Yan and Y. Tian, Unsupervised quadratic surface support vector machine with application to credit risk assessment, European J. Oper. Res., 280 (2020), 1008-1017.  doi: 10.1016/j.ejor.2019.08.010. [18] J. Luo, X. Yang, Y. Tian and W. Yu, Corporate and personal credit scoring via fuzzy non-kernel SVM with fuzzy within-class scatter, J. Ind. Manag. Optim., 16 (2020), 2743-2756.  doi: 10.3934/jimo.2019078. [19] A. Mousavi, Z. Gao, L. Han and A. Lim, Quadratic surface support vector machine with l1 norm regularization, J. Industrial and Management Optimization, 2021. doi: 10.3934/jimo.2021046. [20] Z. Qi, Y. Tian and Y. Shi, Twin support vector machine with universum data, Neural Networks, 36 (2012), 112-119.  doi: 10.1016/j.neunet.2012.09.004. [21] Z. Qi, Y. Tian and Y. Shi, A nonparallel support vector machine for a classification problem with universum learning, J. Comput. Appl. Math., 263 (2014), 288-298.  doi: 10.1016/j.cam.2013.11.003. [22] S. Raghavendra and P. C. Deka, Support vector machine applications in the field of hydrology: A review, Applied Soft Computing, 19 (2014), 372-386.  doi: 10.1016/j.asoc.2014.02.002. [23] B. Richhariya and M. Tanveer, A fuzzy universum support vector machine based on information entropy, Machine Intelligence and Signal Analysis, 748 (2019), 569-582.  doi: 10.1007/978-981-13-0923-6_49. [24] B. Richhariya and M. Tanveer, A reduced universum twin support vector machine for class imbalance learning, Pattern Recognition, 102 (2020). doi: 10.1016/j.patcog.2019.107150. [25] B. Richhariya, M. Tanveer, A. Rashid and A. D. N. Initiative et al., Diagnosis of Alzheimer's disease using universum support vector machine based recursive feature elimination (USVM-RFE), Biomedical Signal Processing and Control, 59 (2020), 101903.  doi: 10.1016/j.bspc.2020.101903. [26] Y. Tian, M. Sun, Z. Deng, J. Luo and Y. Li, A new fuzzy set and nonkernel svm approach for mislabeled binary classification with applications, IEEE Transactions on Fuzzy Systems, 25 (2017), 1536-1545.  doi: 10.1109/TFUZZ.2017.2752138. [27] J. Weston, R. Collobert, F. Sinz, L. Bottou and V. Vapnik, Inference with the universum, In Proceedings of the 23rd International Conference on Machine Learning, (2006), 1009–1016. doi: 10.1145/1143844.1143971. [28] Y. Xu, M. Chen, Z. Yang and G. Li, $\nu$-twin support vector machine with Universum data for classification, Applied Intelligence, 44 (2016), 956-968.

Figures(1)

Tables(9)