No. | Attribute 1 | Attribute 2 | Class | Status |
1 | 0.26 | small | A | |
2 | 0.25 | small | A | |
3 | 0.29 | small | B | class noise |
4 | 1.02 | large | B | |
5 | 1.05 | large | B | |
6 | 0.30 | large | B | attribute noise |
This paper presents a robust binary classification method, which is an extended version of the Modified Polyhedral Conic Functions (M-PCF) algorithm, earlier developed by Gasimov and Ozturk. The new version presented in this paper, has new features in comparison to the original algorithm. The mathematical model used in the new version, is relaxed by allowing some inaccuracies in an optimal way. By this way, it is aimed to reduce the overfitting and improve the generalization property. In the original version, the sublevel set of a separating function generated at every iteration, does not contain any element of the other set. This is changed in the new version, where the sublevel sets of separating functions generated by the new algorithm, are allowed to contain some elements from other set. On the other hand, the new algorithm uses a tolerance parameter which prevents generating "less productive separating functions". In the original version, the algorithm continues till all points of the "first" set are separated from the second one, where a separating function is generated if there still exist unseparated elements regardless the number of such elements. In the new version, the tolerance parameter is used to terminate iterations if there are only a few unseparated elements. By this way, it is aimed to improve the generalization property of the algorithm, and therefore the new version is called Parameterized Polyhedral Conic Functions (P-PCF) method. The performance and efficiency of the proposed algorithm is demonstrated on well-known datasets from the literature and on noisy data.
Citation: |
Table 1. Data illustrating different noise types
No. | Attribute 1 | Attribute 2 | Class | Status |
1 | 0.26 | small | A | |
2 | 0.25 | small | A | |
3 | 0.29 | small | B | class noise |
4 | 1.02 | large | B | |
5 | 1.05 | large | B | |
6 | 0.30 | large | B | attribute noise |
Table 2. Data for Example 1
Training set |
Test set |
Training set |
Test set |
(x, y) | (x, y) | (x, y) | (x, y) |
(2, 4) | (3, 5) | (4, 19) | (5, 20) |
(2, 6) | (3, 7) | (6, 19) | (7, 20) |
(2, 8) | (3, 9) | (8, 19) | (11, 20) |
(2, 10) | (3, 11) | (10, 19) | (13, 20) |
(2, 12) | (3, 13) | (12, 19) | (15, 20) |
(2, 14) | (5, 5) | (14, 19) | (17, 20) |
(4, 4) | (5, 7) | (16, 19) | (17, 18) |
(4, 8) | (5, 9) | (18, 19) | (17, 16) |
(4, 10) | (5, 11) | (16, 17) | (17, 14) |
(4, 14) | (5, 13) | (18, 17) | (17, 12) |
(6, 4) | (7, 5) | (16, 15) | (17, 10) |
(6, 6) | (7, 7) | (18, 15) | - |
(6, 8) | (7, 9) | (16, 13) | - |
(6, 10) | (7, 11) | (18, 13) | - |
(6, 12) | (7, 13) | (16, 11) | - |
(6, 14) | (9, 5) | (16, 9) | - |
(8, 4) | (9, 7) | (18, 9) | - |
(8, 6) | (9, 9) | (17, 8) | - |
(8, 8) | (9, 11) | (17, 6) | - |
(8, 10) | (9, 13) | (17, 4) | - |
(8, 12) | (11, 5) | (14, 20.5) | - |
(8, 14) | (16, 7) | (4, 6) | - |
(10, 4) | (18, 7) | (10, 6) | - |
(10, 8) | (16, 5) | (4, 12) | - |
(10, 10) | (16, 3) | (10, 12) | - |
(10, 14) | (11, 7) | (6, 21) | - |
(12, 4) | (11, 9) | (10, 21) | - |
(12, 6) | (11, 11) | (12, 21) | - |
(12, 8) | (11, 13) | (16, 21) | - |
(12, 10) | - | (18, 21) | - |
(12, 12) | - | - | - |
(12, 14) | - | - | - |
(8, 21) | - | - | - |
(14, 21) | - | - | - |
(18, 11) | - | - | - |
(18, 5) | - | - | - |
Table 3. Original data for Example 2 (from [11])
Training set |
Test set |
Training set |
Test set |
(x, y) | (x, y) | (x, y) | (x, y) |
(14.5, -2) | (-0.5, 2) | (1, 6) | (20, -6) |
(0.5, 2) | (14.5, 2) | (-6, -6) | (-6, -1) |
(2, -0.5) | (13.5, -2) | (8, -1) | (-1, -6) |
(-2, 2) | - | (15, -6) | (8, 6) |
(16, -0.5) | - | (-6, 6) | (8, 1) |
(16, 0.5) | - | (8, -6) | (15, 6) |
(12, -2) | - | (6, 6) | (-6, 1) |
(0.5, -2) | - | (6, 1) | (1, -6) |
(12, 0.5) | - | (20, 1) | (6, 1) |
(16, -2) | - | (20, -1) | - |
(2, 2) | - | (20, 6) | - |
(2, 0.5) | - | (13, 6) | - |
(-2, -2) | - | (-1, 6) | - |
(12, 2) | - | (6, -6) | - |
(13.5, 2) | - | (13, -6) | - |
(-0.5, -2) | - | - | - |
(2, -0.5) | - | - | - |
(16, 2) | - | - | - |
(-2, 0.5) | - | - | - |
(-2, -0.5) | - | - | - |
(12, -0.5) | - | - | - |
Table 4. Modified data with noise ratio of %60, for Example 2
Training set |
Test set |
Training set |
Test set |
(x, y) | (x, y) | (x, y) | (x, y) |
(16, 0.5) | (12, -2) | (-6, -6) | (6, 6) |
(-0.5, -2) | (0.5, -2) | (8, -1) | (6, 1) |
(16, 2) | (12, 0.5) | (15, -6) | (20, 1) |
(-2, 0.5) | (16, -2) | (-6, 6) | (20, -1) |
(12, -0.5) | (2, 2) | (6, -6) | (20, 6) |
(1, 6) | (2, 0.5) | (-6, -1) | (13, 6) |
(8, -6) | - | (8, 6) | - |
(-1, 6) | - | (8, 1) | - |
(13, -6) | - | (15, 6) | - |
(20, -6) | - | (6, -1) | - |
(-1, -6) | - | (14.5, -2) | - |
(-6, 1) | - | (0.5, 2) | - |
(1, -6) | - | (2, -2) | - |
- | - | (-2, 2) | - |
- | - | (16, -0, 5) | - |
- | - | (-2, -2) | - |
- | - | (12, 2) | - |
- | - | (13.5, 2) | - |
- | - | (2, -0.5) | - |
- | - | (-2, -0.5) | - |
- | - | (-0.5, 2) | - |
- | - | (14.5, 2) | - |
- | - | (13.5, -2) | - |
Table 5. Classification accuracies obtained for Example 2
P-PCF Algorithm | M-PCF Algorithm | |||
Training | Test | Training | Test | |
Original Data | 88.89 | 85.41 | 100 | 83.33 |
Noisy Data | 61.80 | 56.25 | 100 | 52.08 |
Table 6.
Properties of datasets. Dataset description:
Dataset | Short Name | ||||
Wisconsin Breast Cancer | Wis | 683 | 444 | 239 | 10 |
German-Credit | Ger | 1000 | 700 | 300 | 21 |
Haberman | Hab | 306 | 225 | 81 | 4 |
Hearth-statlog | Hea | 270 | 137 | 160 | 14 |
Ionosphere | Ion | 351 | 126 | 225 | 35 |
Liver-disorders | Liv | 345 | 145 | 200 | 7 |
Sonar | Son | 208 | 111 | 107 | 61 |
Australian credit | Aus | 690 | 383 | 307 | 14 |
Monk | Monk | 432 | 228 | 204 | 6 |
Table 7. Training and test accuracies obtained by applying M-PCF and P-PCF methods for the original data
M-PCF Algorithm | P-PCF Algorithm | |||
Training | Test | Training | Test | |
Wis | 100 | 98.50 | 98.59 | 96.13 |
Ger | 100 | 72.41 | 82.56 | 73.80 |
Hab | 100 | 74.27 | 86.97 | 74.25 |
Hea | 100 | 84.41 | 93.67 | 84.76 |
Ion | 100 | 88.42 | 94.87 | 88.96 |
Liv | 100 | 68.87 | 78.43 | 69.40 |
Son | 100 | 70.24 | 80.47 | 71.09 |
Aus | 100 | 85.42 | 87.2 | 86.23 |
Monk | 100 | 99.82 | 100 | 99.02 |
Table 8. Test accuracies obtained for datasets with %0 noise
Datasets | M-PCF | P-PCF | SVM | 1-NN | 3-NN | C 4.5 |
Wis | 98.50 | 96.13 | 95.91 | 91.21 | 95.61 | 92.39 |
Ger | 72.41 | 73.80 | 70.35 | 68.50 | 67.70 | 74.5 |
Hab | 74.27 | 74.25 | 73.82 | 68.48 | 68.28 | 69.42 |
Hea | 84.41 | 84.76 | 78.88 | 69.99 | 68.47 | 70.73 |
Ion | 88.42 | 88.96 | 90.48 | 90.22 | 89.98 | 89.87 |
Liv | 68.87 | 69.40 | 61.12 | 59.17 | 58.87 | 58.96 |
Son | 70.24 | 71.09 | 78.21 | 89.75 | 82.52 | 71.18 |
Aus | 85.42 | 86.23 | 85.51 | 80.73 | 85.8 | 84.35 |
Monk2 | 99.82 | 92.02 | 80.56 | 75.69 | 97.92 | 99.5 |
Table 9. Test accuracies obtained for datasets with %5 noise
Datasets | M-PCF | P-PCF | SVM | 1NN | 3NN | C 4.5 |
Wis | 86.84 | 96.14 | 96.34 | 89.16 | 94.29 | 92.80 |
Ger | 70.8 | 72.40 | 73.37 | 68.44 | 65.98 | 63.01 |
Hab | 63.9 | 74.44 | 72.17 | 67.29 | 66.25 | 68.47 |
Hea | 68.89 | 78.44 | 77.84 | 62.58 | 67.03 | 69.99 |
Ion | 67.85 | 85.84 | 89.18 | 88.02 | 89.10 | 88.28 |
Liv | 61.46 | 67.76 | 55.29 | 59.52 | 53.85 | 59.45 |
Son | 67.14 | 70.52 | 74.37 | 86.53 | 83.31 | 68.64 |
Aus | 78.21 | 85.37 | 81.74 | 72.75 | 80.15 | 81.16 |
Monk | 88.42 | 98.21 | 77.55 | 73.84 | 90.05 | 95.14 |
Table 10. Test accuracies obtained for datasets with %10 noise
Datasets | M-PCF | P-PCF | SVM | 1NN | 3NN | C 4.5 |
Wis | 84.27 | 95.85 | 96.05 | 86.52 | 91.95 | 93.26 |
Ger | 68.4 | 73.8 | 70.64 | 65.28 | 64.97 | 60.24 |
Hab | 60.54 | 75.65 | 70.28 | 64.23 | 65.42 | 67.93 |
Hea | 68.52 | 79.26 | 76.03 | 58.14 | 62.95 | 65.18 |
Ion | 65.45 | 83.17 | 81.57 | 86.25 | 88.47 | 85.23 |
Liv | 60.29 | 67.49 | 55.50 | 54.07 | 59.20 | 51.36 |
Son | 62.64 | 68.56 | 73.97 | 81.18 | 81.11 | 52.36 |
Aus | 72.01 | 83.48 | 77.54 | 69.42 | 74.93 | 74.89 |
Monk | 78.56 | 96.98 | 74.77 | 69.91 | 81.48 | 89.35 |
Table 11. Test accuracies obtained for datasets with %20 noise
Datasets | M-PCF | P-PCF | SVM | 1NN | 3NN | C 4.5 |
Wis | 79.56 | 95.43 | 95.32 | 81.24 | 84.91 | 90.19 |
Ger | 65.8 | 71.56 | 65.14 | 60.48 | 62.11 | 61.40 |
Hab | 54.28 | 74.54 | 67.25 | 63.28 | 62.47 | 65.37 |
Hea | 67.04 | 74.82 | 69.46 | 56.91 | 58.14 | 60.60 |
Ion | 62.03 | 80.34 | 81.43 | 82.27 | 86.07 | 79.45 |
Liv | 59.72 | 66.70 | 54.28 | 55.01 | 58.76 | 51.69 |
Son | 59.64 | 65.98 | 70.84 | 73.91 | 78.21 | 63.9 |
Aus | 64.34 | 82.03 | 70.73 | 63.04 | 65.36 | 68.41 |
Monk | 71.78 | 92.34 | 66.9 | 62.73 | 70.83 | 77.55 |
Table 12. Test accuracies obtained for datasets with %30 noise
Datasets | M-PCF | P-PCF | SVM | 1NN | 3NN | C 4.5 |
Wis | 69.95 | 92.85 | 92.74 | 75.41 | 77.31 | 87.56 |
Ger | 62.4 | 69.21 | 63.78 | 57.62 | 56.70 | 53.17 |
Hab | 51.83 | 68.66 | 62.87 | 61.82 | 60.44 | 63.53 |
Hea | 60.37 | 68.15 | 65.28 | 51.47 | 53.69 | 46.60 |
Ion | 61.48 | 80.71 | 76.98 | 78.51 | 84.27 | 77.81 |
Liv | 56.58 | 63.37 | 54.12 | 48.66 | 55.40 | 43.45 |
Son | 52.64 | 60.01 | 68.64 | 65.23 | 72.80 | 65.75 |
Aus | 52.78 | 73.77 | 62.46 | 55.65 | 55.8 | 58.84 |
Monk | 51.47 | 90.61 | 60.19 | 53.7 | 60.42 | 66.44 |
[1] | A. Astorino and M. Gaudioso, Polyhedral separability through successive LP, Journal of Optimization Theory and Applications, 112 (2002), 265-293. doi: 10.1023/A:1013649822153. |
[2] | K. Bache and M. Lichman, UCI Machine Learning Repository. University of California, School of Information and Computer Science, (2013)., http://archive.ics.uci.edu/ml |
[3] | A. M. Bagirov, Max–min separability, Optimization Methods and Software, 20 (2005), 277-296. doi: 10.1080/10556780512331318263. |
[4] | A. M. Bagirov, G. Ozturk and R. Kasimbeyli, A sharp augmented Lagrangian-based method in constrained non-convex optimization, Optimization Methods and Software, 34 (2019), 462-488. doi: 10.1080/10556788.2018.1496431. |
[5] | A. M. Bagirov, J. Ugon, D. Webb, G. Ozturk and and R. Kasimbeyli, A novel piecewise linear classifier based on polyhedral conic and max–min separabilities, TOP, 21 (2013), 3-24. doi: 10.1007/s11750-011-0241-5. |
[6] | K. P. Bennett and O. L. Mangasarian, Robust linear programming discrimination of two linearly inseparable sets, Optimization Methods and Software, 1 (1992), 23-34. |
[7] | C. E. Brodley and M. A. Friedl, Identifying mislabeled training data, Journal of Artificial Intelligence Research, 11 (1999), 131-167. |
[8] | E. Cimen and G. Ozturk, O-PCF algorithm for one-class classification, Optimization Methods and Software, (2019), 1–15. |
[9] | W. W. Cohen, Fast effective rule induction, Proceedings of the Twelfth International Conference on Machine Learning, ML95, San Francisco, CA, 115–123. |
[10] | C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297. doi: 10.1007/BF00994018. |
[11] | R. N. Gasimov and G. Ozturk, Separation via polyhedral conic functions, Optimization Methods and Software, 21 (2006), 527-540. doi: 10.1080/10556780600723252. |
[12] | R. N. Gasimov and O. Ustun, Solving the quadratic assignment problem using F-MSG algorithm, Journal of Industrial and Management Optimization, 3 (2007), 173-191. doi: 10.3934/jimo.2007.3.173. |
[13] | M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. H. Witten, The WEKA data mining software: An update, SIGKDD Explorations, 11 (2003), 10-18. |
[14] | N. Kasimbeyli and R. Kasimbeyli, A representation theorem for Bishop-Phelps cones, Pacific Journal of Optimization, 13 (2017), 55-74. |
[15] | R. Kasimbeyli, A nonlinear cone separation theorem and scalarization in nonconvex vector optimization, SIAM Journal on Optimization, 20 (2010), 1591-1619. doi: 10.1137/070694089. |
[16] | R. Kasimbeyli, Radial epiderivatives and set-valued optimization, Optimization, 58 (2009), 521-534. doi: 10.1080/02331930902928310. |
[17] | R. Kasimbeyli and M. Karimi, Separation theorems for nonconvex sets and application in optimization, Operations Research Letters, 47 (2019), 569-573. doi: 10.1016/j.orl.2019.09.011. |
[18] | R. Kasimbeyli and M. Mammadov, Optimality conditions in nonconvex optimization via weak subdifferentials, Nonlinear Analysis: Theory, Methods and Applications, 74 (2011), 2534-2547. doi: 10.1016/j.na.2010.12.008. |
[19] | R. Kasimbeyli, O. Ustun and A. Rubinov, The modified subgradient algorithm based on feasible values, Optimization, 58 (2009), 535-560. doi: 10.1080/02331930902928419. |
[20] | D. T. Larose and C. D. Larose, Discovering knowledge in data: An introduction to data mining, John Wiley & Sons, Hoboken, NJ, 2005. |
[21] | C. J. Mantas and J. Abell'an, Credal-C4.5 decision tree based on imprecise probabilities to classify noisy data. Expert Systems with Applications, 41(10) (2014), 4625-4637. |
[22] | G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, Inc., New York, 1992. doi: 10.1002/0471725293. |
[23] | G. Ozturk, A. M. Bagirov and R. Kasimbeyli, An incremental piecewise linear classifier based on polyhedral conic separation, Machine Learning, 101 (2015), 397-413. doi: 10.1007/s10994-014-5449-9. |
[24] | G. Ozturk and M. T. Ciftci, Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial and Management Optimization, 11 (3) (2015), 921-932. doi: 10.3934/jimo.2015.11.921. |
[25] | J. R. Quinlan, The effect of noise on concept learning, Machine Learning, (1986), 149–166. |
[26] | A. M. Rubinov and R. N. Gasimov, Strictly increasing positively homogeneous functions with applications to exact penalization, Optimization, 52 (2003), 1-28. doi: 10.1080/0233193021000058931. |
[27] | J. A. Sáez, M. Galar, J. Luengo and F. Herrera, Tackling the problem of classification with noisy data using multiple classifier systems: Analysis of the performance and robustness, Information Sciences, 247 (2013), 1-20. |
[28] | X. Zhu and X. Wu, Class noise vs. attribute noise: A quantitative study, Artificial Intelligence Review, 22 (2004), 177-210. |