\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

A polyhedral conic functions based classification method for noisy data

  • * Corresponding author; The author published under the name Rafail N. Gasimov until 2007

    * Corresponding author; The author published under the name Rafail N. Gasimov until 2007
Abstract / Introduction Full Text(HTML) Figure(4) / Table(12) Related Papers Cited by
  • This paper presents a robust binary classification method, which is an extended version of the Modified Polyhedral Conic Functions (M-PCF) algorithm, earlier developed by Gasimov and Ozturk. The new version presented in this paper, has new features in comparison to the original algorithm. The mathematical model used in the new version, is relaxed by allowing some inaccuracies in an optimal way. By this way, it is aimed to reduce the overfitting and improve the generalization property. In the original version, the sublevel set of a separating function generated at every iteration, does not contain any element of the other set. This is changed in the new version, where the sublevel sets of separating functions generated by the new algorithm, are allowed to contain some elements from other set. On the other hand, the new algorithm uses a tolerance parameter which prevents generating "less productive separating functions". In the original version, the algorithm continues till all points of the "first" set are separated from the second one, where a separating function is generated if there still exist unseparated elements regardless the number of such elements. In the new version, the tolerance parameter is used to terminate iterations if there are only a few unseparated elements. By this way, it is aimed to improve the generalization property of the algorithm, and therefore the new version is called Parameterized Polyhedral Conic Functions (P-PCF) method. The performance and efficiency of the proposed algorithm is demonstrated on well-known datasets from the literature and on noisy data.

    Mathematics Subject Classification: Primary: 62H30, 65K0590C25, 90C90; Secondary: 68Q32.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  The illustration of M-PCF algorithm on the training set for Example 1

    Figure 2.  The illustration of M-PCF algorithm on the test set for Example 1

    Figure 3.  The illustration of P-PCF algorithm on the training set for Example 1

    Figure 4.  The illustration of P-PCF algorithm on the test set for Example 1

    Table 1.  Data illustrating different noise types

    No. Attribute 1 Attribute 2 Class Status
    1 0.26 small A
    2 0.25 small A
    3 0.29 small B class noise
    4 1.02 large B
    5 1.05 large B
    6 0.30 large B attribute noise
     | Show Table
    DownLoad: CSV

    Table 2.  Data for Example 1

    Training set $ A $ Test set $ A $ Training set $ B $ Test set $ B $
    (x, y) (x, y) (x, y) (x, y)
    (2, 4) (3, 5) (4, 19) (5, 20)
    (2, 6) (3, 7) (6, 19) (7, 20)
    (2, 8) (3, 9) (8, 19) (11, 20)
    (2, 10) (3, 11) (10, 19) (13, 20)
    (2, 12) (3, 13) (12, 19) (15, 20)
    (2, 14) (5, 5) (14, 19) (17, 20)
    (4, 4) (5, 7) (16, 19) (17, 18)
    (4, 8) (5, 9) (18, 19) (17, 16)
    (4, 10) (5, 11) (16, 17) (17, 14)
    (4, 14) (5, 13) (18, 17) (17, 12)
    (6, 4) (7, 5) (16, 15) (17, 10)
    (6, 6) (7, 7) (18, 15) -
    (6, 8) (7, 9) (16, 13) -
    (6, 10) (7, 11) (18, 13) -
    (6, 12) (7, 13) (16, 11) -
    (6, 14) (9, 5) (16, 9) -
    (8, 4) (9, 7) (18, 9) -
    (8, 6) (9, 9) (17, 8) -
    (8, 8) (9, 11) (17, 6) -
    (8, 10) (9, 13) (17, 4) -
    (8, 12) (11, 5) (14, 20.5) -
    (8, 14) (16, 7) (4, 6) -
    (10, 4) (18, 7) (10, 6) -
    (10, 8) (16, 5) (4, 12) -
    (10, 10) (16, 3) (10, 12) -
    (10, 14) (11, 7) (6, 21) -
    (12, 4) (11, 9) (10, 21) -
    (12, 6) (11, 11) (12, 21) -
    (12, 8) (11, 13) (16, 21) -
    (12, 10) - (18, 21) -
    (12, 12) - - -
    (12, 14) - - -
    (8, 21) - - -
    (14, 21) - - -
    (18, 11) - - -
    (18, 5) - - -
     | Show Table
    DownLoad: CSV

    Table 3.  Original data for Example 2 (from [11])

    Training set $ A $ Test set $ A $ Training set $ B $ Test set $ B $
    (x, y) (x, y) (x, y) (x, y)
    (14.5, -2) (-0.5, 2) (1, 6) (20, -6)
    (0.5, 2) (14.5, 2) (-6, -6) (-6, -1)
    (2, -0.5) (13.5, -2) (8, -1) (-1, -6)
    (-2, 2) - (15, -6) (8, 6)
    (16, -0.5) - (-6, 6) (8, 1)
    (16, 0.5) - (8, -6) (15, 6)
    (12, -2) - (6, 6) (-6, 1)
    (0.5, -2) - (6, 1) (1, -6)
    (12, 0.5) - (20, 1) (6, 1)
    (16, -2) - (20, -1) -
    (2, 2) - (20, 6) -
    (2, 0.5) - (13, 6) -
    (-2, -2) - (-1, 6) -
    (12, 2) - (6, -6) -
    (13.5, 2) - (13, -6) -
    (-0.5, -2) - - -
    (2, -0.5) - - -
    (16, 2) - - -
    (-2, 0.5) - - -
    (-2, -0.5) - - -
    (12, -0.5) - - -
     | Show Table
    DownLoad: CSV

    Table 4.  Modified data with noise ratio of %60, for Example 2

    Training set $ A $ Test set $ A $ Training set $ B $ Test set $ B $
    (x, y) (x, y) (x, y) (x, y)
    (16, 0.5) (12, -2) (-6, -6) (6, 6)
    (-0.5, -2) (0.5, -2) (8, -1) (6, 1)
    (16, 2) (12, 0.5) (15, -6) (20, 1)
    (-2, 0.5) (16, -2) (-6, 6) (20, -1)
    (12, -0.5) (2, 2) (6, -6) (20, 6)
    (1, 6) (2, 0.5) (-6, -1) (13, 6)
    (8, -6) - (8, 6) -
    (-1, 6) - (8, 1) -
    (13, -6) - (15, 6) -
    (20, -6) - (6, -1) -
    (-1, -6) - (14.5, -2) -
    (-6, 1) - (0.5, 2) -
    (1, -6) - (2, -2) -
    - - (-2, 2) -
    - - (16, -0, 5) -
    - - (-2, -2) -
    - - (12, 2) -
    - - (13.5, 2) -
    - - (2, -0.5) -
    - - (-2, -0.5) -
    - - (-0.5, 2) -
    - - (14.5, 2) -
    - - (13.5, -2) -
     | Show Table
    DownLoad: CSV

    Table 5.  Classification accuracies obtained for Example 2

    P-PCF Algorithm M-PCF Algorithm
    Training Test Training Test
    Original Data 88.89 85.41 100 83.33
    Noisy Data 61.80 56.25 100 52.08
     | Show Table
    DownLoad: CSV

    Table 6.  Properties of datasets. Dataset description: $ N $ is the number of instances in the dataset, $ m $ is the number of instances in the first class, $ p $ is the number of instances in the second class, and $ n $ is the number of attributes

    Dataset Short Name $ N $ $ m $ $ p $ $ n $
    Wisconsin Breast Cancer Wis 683 444 239 10
    German-Credit Ger 1000 700 300 21
    Haberman Hab 306 225 81 4
    Hearth-statlog Hea 270 137 160 14
    Ionosphere Ion 351 126 225 35
    Liver-disorders Liv 345 145 200 7
    Sonar Son 208 111 107 61
    Australian credit Aus 690 383 307 14
    Monk Monk 432 228 204 6
     | Show Table
    DownLoad: CSV

    Table 7.  Training and test accuracies obtained by applying M-PCF and P-PCF methods for the original data

    M-PCF Algorithm P-PCF Algorithm
    Training Test Training Test
    Wis 100 98.50 98.59 96.13
    Ger 100 72.41 82.56 73.80
    Hab 100 74.27 86.97 74.25
    Hea 100 84.41 93.67 84.76
    Ion 100 88.42 94.87 88.96
    Liv 100 68.87 78.43 69.40
    Son 100 70.24 80.47 71.09
    Aus 100 85.42 87.2 86.23
    Monk 100 99.82 100 99.02
     | Show Table
    DownLoad: CSV

    Table 8.  Test accuracies obtained for datasets with %0 noise

    Datasets M-PCF P-PCF SVM 1-NN 3-NN C 4.5
    Wis 98.50 96.13 95.91 91.21 95.61 92.39
    Ger 72.41 73.80 70.35 68.50 67.70 74.5
    Hab 74.27 74.25 73.82 68.48 68.28 69.42
    Hea 84.41 84.76 78.88 69.99 68.47 70.73
    Ion 88.42 88.96 90.48 90.22 89.98 89.87
    Liv 68.87 69.40 61.12 59.17 58.87 58.96
    Son 70.24 71.09 78.21 89.75 82.52 71.18
    Aus 85.42 86.23 85.51 80.73 85.8 84.35
    Monk2 99.82 92.02 80.56 75.69 97.92 99.5
     | Show Table
    DownLoad: CSV

    Table 9.  Test accuracies obtained for datasets with %5 noise

    Datasets M-PCF P-PCF SVM 1NN 3NN C 4.5
    Wis 86.84 96.14 96.34 89.16 94.29 92.80
    Ger 70.8 72.40 73.37 68.44 65.98 63.01
    Hab 63.9 74.44 72.17 67.29 66.25 68.47
    Hea 68.89 78.44 77.84 62.58 67.03 69.99
    Ion 67.85 85.84 89.18 88.02 89.10 88.28
    Liv 61.46 67.76 55.29 59.52 53.85 59.45
    Son 67.14 70.52 74.37 86.53 83.31 68.64
    Aus 78.21 85.37 81.74 72.75 80.15 81.16
    Monk 88.42 98.21 77.55 73.84 90.05 95.14
     | Show Table
    DownLoad: CSV

    Table 10.  Test accuracies obtained for datasets with %10 noise

    Datasets M-PCF P-PCF SVM 1NN 3NN C 4.5
    Wis 84.27 95.85 96.05 86.52 91.95 93.26
    Ger 68.4 73.8 70.64 65.28 64.97 60.24
    Hab 60.54 75.65 70.28 64.23 65.42 67.93
    Hea 68.52 79.26 76.03 58.14 62.95 65.18
    Ion 65.45 83.17 81.57 86.25 88.47 85.23
    Liv 60.29 67.49 55.50 54.07 59.20 51.36
    Son 62.64 68.56 73.97 81.18 81.11 52.36
    Aus 72.01 83.48 77.54 69.42 74.93 74.89
    Monk 78.56 96.98 74.77 69.91 81.48 89.35
     | Show Table
    DownLoad: CSV

    Table 11.  Test accuracies obtained for datasets with %20 noise

    Datasets M-PCF P-PCF SVM 1NN 3NN C 4.5
    Wis 79.56 95.43 95.32 81.24 84.91 90.19
    Ger 65.8 71.56 65.14 60.48 62.11 61.40
    Hab 54.28 74.54 67.25 63.28 62.47 65.37
    Hea 67.04 74.82 69.46 56.91 58.14 60.60
    Ion 62.03 80.34 81.43 82.27 86.07 79.45
    Liv 59.72 66.70 54.28 55.01 58.76 51.69
    Son 59.64 65.98 70.84 73.91 78.21 63.9
    Aus 64.34 82.03 70.73 63.04 65.36 68.41
    Monk 71.78 92.34 66.9 62.73 70.83 77.55
     | Show Table
    DownLoad: CSV

    Table 12.  Test accuracies obtained for datasets with %30 noise

    Datasets M-PCF P-PCF SVM 1NN 3NN C 4.5
    Wis 69.95 92.85 92.74 75.41 77.31 87.56
    Ger 62.4 69.21 63.78 57.62 56.70 53.17
    Hab 51.83 68.66 62.87 61.82 60.44 63.53
    Hea 60.37 68.15 65.28 51.47 53.69 46.60
    Ion 61.48 80.71 76.98 78.51 84.27 77.81
    Liv 56.58 63.37 54.12 48.66 55.40 43.45
    Son 52.64 60.01 68.64 65.23 72.80 65.75
    Aus 52.78 73.77 62.46 55.65 55.8 58.84
    Monk 51.47 90.61 60.19 53.7 60.42 66.44
     | Show Table
    DownLoad: CSV
  • [1] A. Astorino and M. Gaudioso, Polyhedral separability through successive LP, Journal of Optimization Theory and Applications, 112 (2002), 265-293.  doi: 10.1023/A:1013649822153.
    [2] K. Bache and M. Lichman, UCI Machine Learning Repository. University of California, School of Information and Computer Science, (2013)., http://archive.ics.uci.edu/ml
    [3] A. M. Bagirov, Max–min separability, Optimization Methods and Software, 20 (2005), 277-296.  doi: 10.1080/10556780512331318263.
    [4] A. M. BagirovG. Ozturk and R. Kasimbeyli, A sharp augmented Lagrangian-based method in constrained non-convex optimization, Optimization Methods and Software, 34 (2019), 462-488.  doi: 10.1080/10556788.2018.1496431.
    [5] A. M. BagirovJ. UgonD. WebbG. Ozturk and and R. Kasimbeyli, A novel piecewise linear classifier based on polyhedral conic and max–min separabilities, TOP, 21 (2013), 3-24.  doi: 10.1007/s11750-011-0241-5.
    [6] K. P. Bennett and O. L. Mangasarian, Robust linear programming discrimination of two linearly inseparable sets, Optimization Methods and Software, 1 (1992), 23-34. 
    [7] C. E. Brodley and M. A. Friedl, Identifying mislabeled training data, Journal of Artificial Intelligence Research, 11 (1999), 131-167. 
    [8] E. Cimen and G. Ozturk, O-PCF algorithm for one-class classification, Optimization Methods and Software, (2019), 1–15.
    [9] W. W. Cohen, Fast effective rule induction, Proceedings of the Twelfth International Conference on Machine Learning, ML95, San Francisco, CA, 115–123.
    [10] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.
    [11] R. N. Gasimov and G. Ozturk, Separation via polyhedral conic functions, Optimization Methods and Software, 21 (2006), 527-540.  doi: 10.1080/10556780600723252.
    [12] R. N. Gasimov and O. Ustun, Solving the quadratic assignment problem using F-MSG algorithm, Journal of Industrial and Management Optimization, 3 (2007), 173-191.  doi: 10.3934/jimo.2007.3.173.
    [13] M. HallE. FrankG. HolmesB. PfahringerP. Reutemann and I. H. Witten, The WEKA data mining software: An update, SIGKDD Explorations, 11 (2003), 10-18. 
    [14] N. Kasimbeyli and R. Kasimbeyli, A representation theorem for Bishop-Phelps cones, Pacific Journal of Optimization, 13 (2017), 55-74. 
    [15] R. Kasimbeyli, A nonlinear cone separation theorem and scalarization in nonconvex vector optimization, SIAM Journal on Optimization, 20 (2010), 1591-1619.  doi: 10.1137/070694089.
    [16] R. Kasimbeyli, Radial epiderivatives and set-valued optimization, Optimization, 58 (2009), 521-534.  doi: 10.1080/02331930902928310.
    [17] R. Kasimbeyli and M. Karimi, Separation theorems for nonconvex sets and application in optimization, Operations Research Letters, 47 (2019), 569-573.  doi: 10.1016/j.orl.2019.09.011.
    [18] R. Kasimbeyli and M. Mammadov, Optimality conditions in nonconvex optimization via weak subdifferentials, Nonlinear Analysis: Theory, Methods and Applications, 74 (2011), 2534-2547.  doi: 10.1016/j.na.2010.12.008.
    [19] R. KasimbeyliO. Ustun and A. Rubinov, The modified subgradient algorithm based on feasible values, Optimization, 58 (2009), 535-560.  doi: 10.1080/02331930902928419.
    [20] D. T. Larose and C. D. Larose, Discovering knowledge in data: An introduction to data mining, John Wiley & Sons, Hoboken, NJ, 2005.
    [21] C. J. Mantas and J. Abell'an, Credal-C4.5 decision tree based on imprecise probabilities to classify noisy data. Expert Systems with Applications, 41(10) (2014), 4625-4637.
    [22] G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, Inc., New York, 1992. doi: 10.1002/0471725293.
    [23] G. OzturkA. M. Bagirov and R. Kasimbeyli, An incremental piecewise linear classifier based on polyhedral conic separation, Machine Learning, 101 (2015), 397-413.  doi: 10.1007/s10994-014-5449-9.
    [24] G. Ozturk and M. T. Ciftci, Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial and Management Optimization, 11 (3) (2015), 921-932.  doi: 10.3934/jimo.2015.11.921.
    [25] J. R. Quinlan, The effect of noise on concept learning, Machine Learning, (1986), 149–166.
    [26] A. M. Rubinov and R. N. Gasimov, Strictly increasing positively homogeneous functions with applications to exact penalization, Optimization, 52 (2003), 1-28.  doi: 10.1080/0233193021000058931.
    [27] J. A. SáezM. GalarJ. Luengo and F. Herrera, Tackling the problem of classification with noisy data using multiple classifier systems: Analysis of the performance and robustness, Information Sciences, 247 (2013), 1-20. 
    [28] X. Zhu and X. Wu, Class noise vs. attribute noise: A quantitative study, Artificial Intelligence Review, 22 (2004), 177-210. 
  • 加载中

Figures(4)

Tables(12)

SHARE

Article Metrics

HTML views(2905) PDF downloads(528) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return