A classification algorithm with Linear Discriminant Analysis and Axiomatic Fuzzy Sets

In exploratory data mining, most classifiers pay more attention on the accuracy and speed of learned models, but they are lacking of the interpretability. In this paper, an interpretable and comprehensible classifier is proposed based on Linear Discriminant Analysis (LDA) and Axiomatic Fuzzy Sets (AFS). The algorithm utilizes LDA to extract features with the largest inter-class variance. Besides, the proposed approach aims to explore a transformation from the selected feature space to a semantic space where the samples in the same class are made as close as possible to one another, whereas the samples in the different class are as far as possible from one another. Moreover, the descriptions of each class can be obtained by the proposed approach. When compared with well-known classifiers such as LogisticR, C4.5Tree, SVM and KNN, the proposed method not only can achieve better performance in terms of accuracy but also has the capability of interpretability and comprehension.


1.
Introduction. In knowledge discovery and machine learning, classification is one of the most important task [21]. There are numerous classifiers exploiting various fundamentals, such as Decision Trees [2], Support Vector Machine (SVM) [3], K-Nearest Neighbors(KNN) [5] and Linear Regression Analysis(LDA) [22]. In these classifiers, LDA is used to identify attributes that account for the most variance between classes. In essence, LDA is a supervised method using known class labels. The ultimate goal of a supervised classification method is to identify to which class a given instance belongs based on a given set of correctly labeled instances. The basic idea of LDA is the high-dimensional pattern sample projection to best identify vector space, to extract the classification information and the compression feature space dimension. Meanwhile, the effect of samples in the new subspace projection guarantee pattern after the biggest class within the class and the minimum distance, the distance between the pattern has the best separability in the space. However, LDA lacks semantic interpretability.
Axiomatic fuzzy sets (AFS) theory was proposed in [15], which can build the relationship between the data and semantic concepts. Unlike the traditional fuzzy set approaches, the difference is that the membership function of AFS was related to data distribution instead of artificially formed with some guidance of human. AFS has been used in various fields [16,8,9,20,17], such as business intelligence [1], computer vision [10,18], among others. AFS theory can be applied to machine learning and some other research areas according to the above characteristics. Inspired by the idea of LDA and AFS theory, the new approach is proposed, which not only can help learn and understand a model, but also becomes endowed with sound logic operation and membership function depending upon the data. In sum, the structure of AFS is related to the data and the selected features from LDA. Besides, the global difference and absolute difference is utilized to explore the best separability among patterns in the space.
The sections of this paper are formed as follows: Section 2 is a brief introduction to the AFS theory. In Section 3, we introduce the proposed classifier in detail. Experimental results is covered in Section 4. Finally, a conclusion is given in Section 5.
2. Axiomatic fuzzy set theory. To make this paper self-contained, this section serves as a brief introduction to axiomatic fuzzy set (AFS) [11].

AFS concepts.
Here's a strict mathematical definition [12] to explain concepts in detail. Let ζ be a concept on X, R ζ is called a sub-order relationship on ζ. And let X be a collection, R is a binary relationship on X. For any x, y, z ∈ X and x = y, R is called a sub-order relationship on X, if R satisfies the following relationship: (1): if (x, y) ∈ R, then (x, x) ∈ R; (2): if (x, x) ∈ R and (y, y) / ∈ R, then (x, y) ∈ R; (3): if (x, y), (y, z) ∈ R, then (x, z) ∈ R; (4): if (x, x) ∈ R and (y, y) ∈ R, then (x, y) ∈ R or (y, x) ∈ R If R ζ is a sub-order relationship on X, then ζ is called a simple concept. Otherwise, ζ is called a complex concept.
2.2. AFS algebras. AFS algebras is a way to study the logical relationship between simple concepts and complex concepts. More specifically, it is a family of completely distributive lattices that are made up of EI, EII, ..., EI n , E # I, E # II,..., E # I n . Let M be a set, EM * is defined as follows: where I is an arbitrarily non-empty indexing set. m∈A mA indicates that simple concepts which come from A ⊆ M by operation "∧" , i∈I ( m∈Ai m) denotes that all m∈Ai m by operation "∨". AFS logic operation [13] can be written as follows: 2.3. AFS structure. AFS structure reflects the key relationship of the data distribution and semantic concepts [15], which is defined as follows: Let X, M be sets, 2 M is the power set of M , τ : X × X → 2 M . (M, τ, X) is treated as AFS structure. Now, here are two axioms: Where X is the domain, M is simple concept set on X, τ can be treated as a kind of structure.
2.4. AFS membership function. Let X be a set, M is a simple concepts set on X, (M, τ, X) is an AFS structure based on a data set. If x ∈ X, A ⊆ M , the definition of A τ (x) reads as follows: The definition of membership function is given as follows [14]: for ∀m ∈ M , x ∈ X: and concept ξ = Σ i∈I ( m∈Ai m) ∈ EM , , ∀x ∈ Ω. (6) where N u represents how many times sample x ∈ X is observed, M is a simple concepts set on X, the weight function of simple concept γ is ρ γ (x). If ρ γ (x) is continuous on Ω for every simple concept γ ∈ M and X is a randomly selected data set on (Ω, F, P ), For any X ∈ Ω and |X| tends to infinity, the conclusion can be obtained that the membership function defined by Eq.5 converges to that defined by Eq.6.
3. The proposed classifier. LDA is a supervised learning approach which chooses a projected hyperplane in k-dimensional space such that the samples belonging to the same class on the hyperplane are as close as possible, while the distances between the different classes are as large as possible. Due to these characteristics, LDA is applied to transform the data into a new abstract space. All the steps of whole process is shown in Fig.1
1: for x i ∈ X do 2: Choose the simple concept B ε x i by a threshold ε, 3:

4:
The description ζx i for x i , 5: end for accuracy and more accurate description.
G Ci = arg max A Ci = arg max Finally, we use the descriptions of each class to classify the samples. Similar to Eq.(4), for test sample s, A ⊆ M , one has The membership degree µ Ξ C i (s) of the sample s belonging to Ξ Ci = i∈I ( m∈Ai m), i = 1, 2, ..., c can be obtained by Eq.(5), namely, Then, the class label l of the sample s can be achieved by 4. Experimental results. In order to demonstrate the superiority of the proposed method, several data sets and several methods were used in this experiment. At the same time, by way of eliminating the randomness error, the 10 times 10-fold cross-validation method used in this paper. The data set are obtained from the UCI repository [4], which are heart.statlog(heart), Breast Cancer Coimbra (breast c), seeds, wine, iris, heat, Data User Modeling Dataset Hamdi Tolga KAHRAMAN(USD), vertebral column(column 2c), caesarian, immunotherapy [6,7], SomervilleHappinessSurvey2015(SHS2015).

4.1.
Case study (iris data set). The selected features obtained by the LDA algorithm are called abstract features. After normalizing these features, then we define the simple concepts: "small", "medium" and "large". To better illustrate the approach, iris data set can be used as an example. The iris data has 4 features. By using the LDA, the 4 features can be transformed into 2 abstract features f 1 and f 2 , respectively. The maximum, average and minimum of each feature is shown in Table 1. Then, the simple concepts can be defined as below: m 1 : f 1 is small, m 2 : f 1 is medium, m 3 : f 1 is large, m 4 : f 2 is small, m 5 : f 2 is medium, m 6 : f 2 is large. Fig.2 is a scatter plot that shows samples in abstract features space. The membership of every simple concept and all their combinations on all samples are determined with the aid of Eq.5, where ρ = 1. Now, the membership values of simple concepts and their combinations on all samples can be obtained. The next step is to from the descriptions of samples. The proposed method have three threshold values to filter simple concepts and their combinations. Fig.3 shows the membership degree of the best description m 1 of the sample S 1 on all data. Step by step, the descriptions of each sample can be obtained. By using the proposed approach, all class descriptions of iris data are given as shown below: Class 1 : m 1 (f 1 is small). Class 2 : m 2 (f 1 is medium). Class 3 : m 3 (f 1 is large). The membership of class descriptions of all simples are shown in Fig.4, Fig.5 and Fig.6. In Fig.4, Fig.5 and Fig.6, it illustrates that the proposed method can distinguish three classes on iris data set. Fig.7 shows the membership function of three classes. The semantic descriptions of three classes can be obtained: Class 1: If abstract f 1 is small, then the sample belongs to Class 1. Class 2: If abstract f 1 is medium, then the sample belongs to Class 2. Class 3: If abstract f 1 is large, then the sample belongs to Class 3.

4.2.
Comparative analysis. For comparative analysis, some classical classification methods are considered in this paper, such as logistic regression (LogisticR), C4.5Tree, support vector machine (SVM) and k-NearestNeighbor (KNN). These algorithms are implemented by scikit-learn with Python and all the parameters are set by default [19]. For SVM, the regularization coefficients C are chosen from {0.01, 0.1, 0.5, 1, 2, 10} for the RBF kernel with σ ∈ {0.001, 0.01, 0.1, 0.5, 1, 5, 10} by  Experimental results-accuracy rates is shown in Table 2, where the bold number represents the highest accuracy rate. As shown in Table 2, the proposed method can achieve the better performance than other methods. Besides, the proposed method has semantic interpretation realized in abstract feature space, but other methods do not have this characteristic. Thus, the proposed method is a novel classifier with good performance and attractive features.

5.
Conclusions. In this paper, a new classifier is proposed, which combines linear discriminant analysis (LDA) and axiomatic fuzzy set (AFS) theory. It provides an innovative approach to transform the data into another feature space. The proposed method transform features of raw data into abstract features space, then it converts abstract features into the semantic space, so that the proposed method has semantic interpretation which can help the users to understand the classifier easily. In a conclusion, the proposed classifier with semantic interpretation based on axiomatic fuzzy set theory is an excellent classifier. In the future, there are two issues to study: how to automatically obtain the parameters values based on data distribution and how to develop semantic interpretation about features present in raw data.