CORPORATE AND PERSONAL CREDIT SCORING VIA FUZZY NON-KERNEL SVM WITH FUZZY WITHIN-CLASS SCATTER

. Nowadays, the eﬀective credit scoring becomes a very crucial fac-tor for gaining competitive advantages in credit market for both customers and corporations. In this paper, we propose a credit scoring method which combines the non-kernel fuzzy 2-norm quadratic surface SVM model, T-test feature weighting strategy and fuzzy within-class scatter together. It is worth pointing out that this new method not only saves computational time by avoiding choosing a kernel and corresponding parameters in the classical SVM models, but also addresses the “curse of dimensionality” issue and improves the robust- ness. Besides, we develop an eﬃcient way to calculate the fuzzy membership of each training point by solving a linear programming problem. Finally, we con- duct several numerical tests on two benchmark data sets of personal credit and one real-world data set of corporation credit. The numerical results strongly demonstrate that the proposed method outperforms eight state-of-the-art and commonly-used credit scoring methods in terms of accuracy and robustness.

1. Introduction. Many financial institutions have lost much from a steady growth in customers' defaults on loan in the past several financial crises. However, in order to gain high profits, those credit granting institutions cannot simply refuse growing credit applications from both customers and corporations. To control the credit risk, most financial institutions use the credit scoring models to make credit granting decisions. Therefore, an effective credit risk assessment model plays a significant role in gaining competitive advantages in the growing credit market nowadays.
In the past few decades, many quantitative methods have been introduced to improve the credit scoring accuracy. One primary stream of credit scoring models is using statistical methods. These methods are relatively easy to implement and capable of generating straightforward results. In 1936, Fisher discriminant analysis (FDA) was proposed as the first statistical method for credit scoring [7]. Then logistic regression method was introduced to the "bank crisis early warning classification" in 1977 [19]. Compared to Fisher discriminant analysis, logistic regression model achieves great improvements in the classifying accuracy [19]. Besides, since logistic regression is simple and easy to be illustrated, it became a primary approach for real-world credit scoring [23]. The main statistical methods for credit scoring are summarized in [9]. However, these statistical methods have a few limitations. The first fatal issue is the "curse of dimensionality", which indicates that the results may become misleading due to the multi-collinearity between variables. Therefore, lots of data preprocessing efforts need to be done on choosing the variables. Additionally, all reported statistical models in [9] rely on one or more hypotheses. In practical applications, those hypotheses such as "the dependent variable should follow a logic normal distribution", are difficult to be held. Moreover, there exist difficulties for these statistical models to automate the modeling process and remain robust.
In response to the credit scoring accuracy, researchers found out that support vector machine (SVM) is quite an effective tool in credit scoring. A benchmarking study of seventeen different classifying methods on eight various real-world credit data sets was presented in [8]. Note that, least square SVM with linear and Gaussian kernel were utilized in this study. From the numerical results, the least square SVM model overall outperforms all other state-of-the-art methods. Combing the resampling techiques proposed in [18], the SVM model performs better for credit scoring. Then the clustered SVM model is proposed in [11] to greatly improve its efficiency while achieving good classifying accuracy performance. Moreover, for corporate credit ratings, [26] showed that SVM beats neural networks and other state-of-the-art learning algorithms.
Moreover, since irrelevant input features may mess with the underlying structure of a data set and reduce the classification accuracy, many novel feature selecting and weighting strategies have been proposed to handle this problem. Note that, the feature selection and weighting process can also address the issue of "curse of dimensionality" for some large-scaled data sets at the same time. A hybrid strategy which combines the genetic algorithm and SVM models, then simultaneously does the feature selection task and tunes model parameters was proposed in [12]. As a classical method, orthogonal dimension reduction was proposed for selecting the features in [10]. The costs of feature selection become obvious. To fully utilize the information in all input features and reduce the costs of feature selection, a weighted 2-norm SVM (W2NSVM) model incorporating the feature weighting strategy was proposed for credit scoring in [27].
Notice that, the classifying performance of general SVM classifiers could be greatly damaged by outliers and noise points. To reduce the impact of erroneous points and handle the unbalanced data better, the fuzzy (or weighted) SVM (FSVM) is introduced for credit scoring [27] and bankruptcy prediction [22] by assigning each point a fuzzy membership. Many fuzzy membership functions have been proposed for enhancing the noise resistance [13]. Inspired by the Fisher discriminant analysis [7], minimizing the within-class scatter of points has been proved very useful in handling data sets with many outliers and noise points [4]. Hence, [1] followed this idea and proposed a FSVM model with within-class scatter. Note that, this model can produce more accurate classifications than other well-known FSVM models.
To deal with linearly inseparable data sets, a general way is to use kernel functions in SVM models [8,12,15,27]. Note that, kernel functions map all points from original space to a higher dimensional space. Then SVM models separate these mapped points into two classes by a hyperplane in the new space. It is worth pointing out that the kernel and its corresponding parameters greatly influence the credit scoring results. However, there lacks a universal rule to automatically choose a proper kernel for a given data set. It may take a significant amount of computational time and effort to select a proper kernel and its parameters. To overcome these drawbacks of kernel-SVM models, a non-kernel quadratic surface SVM (QSSVM) model which directly uses a quadratic surface in the original space was proposed in [17]. The experimental results show that the non-kernel QSSVM model outperforms the SVM model with Gaussian or quadratic kernel in terms of classifying accuracy and efficiency. A least square QSSVM model was proposed in [3] for target disease classifications and achieved good performance. Moreover, to handle outliers and noise points well, a fuzzy QSSVM model with within-class scatter was proposed in [16] to improve the classifying performance of QSSVM model. Besides, a new approach based on the QSSVM model and fuzzy set was proposed in [21] to handle classification problems with mislabeled information. Because of the good performance and applicability, we choose non-kernel QSSVM model as the fundamental tool for credit scoring in this paper. This paper mainly contributes to the field on the following four aspects: (1) We propose a new non-kernel nonlinear SVM model for direct credit scoring. Compared with the kernel-based SVM models, the non-kernel SVM model can save much computational time and improve the classifying accuracy; (2) A fuzzy non-kernel QSSVM model is proposed by incorporating the 2-norm measure of misclassification errors and "fuzzy within-class scatter" to handle the outliers and noise points. And an efficient way is developed to calculate the fuzzy memberships of training points by solving an LP problem; (3) To enhance the credit scoring performance, the Ttest feature weighting strategy is also included in the proposed model to address the "curse of dimensionality" issue and extract the most useful information from input features; (4) The comprehensive numerical experiments, tested on two benchmark personal credit data sets and one real-world corporation credit data set, validate the superior performance of proposed method over eight well-known methods in terms of accuracy and robustness.
The rest of the paper is arranged as follows. Section 2 reviews the soft QSSVM model and develops an efficient way to calculate the fuzzy memberships. In Section 3, for credit scoring, we propose one fuzzy non-kernel QSSVM model by incorporating "fuzzy within-class scatter" of training points, the T-test feature weights and 2-norm of misclassification errors. Furthermore, the numerical results of the proposed model on two benchmark data sets of personal credit and one real-world data set of corporation credit are shown in Section 4. Finally, Section 5 concludes the paper.
2. Fuzzy membership function. In this section, we develop an efficient way to calculate the fuzzy membership for each training point.

2.1.
Review of soft QSSVM model. Given the training data set or −1 indicates the class label of point x i for Class C 1 {x j |y j = +1, j = 1, · · · , n} or Class C 2 {x j |y j = −1, j = 1, · · · , n}. The number of elements in C 1 and C 2 are denoted as n 1 and n 2 , respectively, then n 1 + n 2 = n. The soft QSSVM model is to determine the parameter set (Q, f , c) of a quadratic surface c ∈ R, that separates the n training points into two classes with maximum margin.
Since there is no special restriction on Q, f , c, the quadratic classifier can approximate many types of data very well. Above all, the soft QSSVM model [17] is formulated as: where ξ i denotes the misclassification error for point x i andη > 0 is a penalty constant. For simplicity, the model (SQSSVM) can also be reformulated as follows [17]. First, the vectorq is formed by including the m 2 +m 2 elements of upper triangle part of matrix Q, i.e., q q 11 , q 12 , · · · , q 1m , q 22 , q 23 , · · · , q 2m , · · · , q mm T ∈ R m 2 +m 2 .
as follows. We first check each element ofq in turn, then assign x i k as the p-th element of j-th row of M i if the p-th element ofq is q jk or q kj for some k = 1, 2, · · · , m, otherwise assign 0. Afterwards, let H i M i , I ∈ R m×( m 2 +m 2 +m) , i = 1, · · · , n and then W 2 ) , where I denotes the identity matrix. Note that, W is positive semidefinite. Finally, and vector Hence the model (SQSSVM) can be equivalently reformulated as It should be noted that the (SQSSVM ) model is a convex problem which can be solved efficiently.

2.
2. An LP model for approximating soft QSSVM model. If W is not positive definite, one perturbation can be appended such that the matrix W + I ( > 0) becomes positive definite. Without loss of generality, suppose that W is positive definite [17]. Then we can utilize the Lagrangian duality theory to formulate the dual problem of model (SQSSVM ) as follows.
Suppose that the primal optimal solution of problem (SQSSVM ) is (z * , c * , ξ * ) and the dual optimal solution of problem (DSQSSVM) is β * = (β * 1 , β * 2 , · · · , β * n ). Then the following relationship between the primal and dual optimal solutions can be obtained by the Lagrangian dual theory [5]: The optimal values of the primal and dual problems are also equivalent to each other as follows.
By replacing the right hand of (2) with (1), we have Hence, this quadratic programming problem (SQSSVM ) can be transferred to one LP problem heuristically for simplicity [20]. The objective function in problem (SQSSVM ) is first replaced by n i=1 β i +η n i=1 ξ i (similar as expression (3)); then z is substituted by 1 2 W −1 n i=1 β i y i s i in the constraints of (SQSSVM ) (similar as expression (1)); finally the constraintsη ≥ β i ≥ 0, i = 1, · · · , n, are added to obtain the following LP problem: Generally, the CPU time of solving problem (LPM) is much less than that of solving problem (SQSSVM ). After solving problem (LPM) to get the optimal solution (β,c,ξ),z can be calculated byz = 1 2 W −1 n i=1β i y i s i . Then the parameter set (z,c) of a separating quadratic surface can be obtained.

Review of quadratic center surface and quadratic-margin distance.
In FSVM models [14,25], authors utilize the Euclidean distance to measure the distance between two training points, then calculate the central point of each class as the mean of all training points in this class. However, for many general cases, if the training data set is not separable by one hyperplane but separable by one quadratic surface, then the mean of all training points in one class may be within the other class. Thus the mean of all training points in one class is not appropriate for being the center of the class. Therefore, it is necessary to introduce the quadratic center surface of one class and quadratic-margin distance in [16] as follows.
Definition 2.1. In the soft QSSVM settings, q(x) = 1 n1 j:y j =+1 q(x j ) (or q(x) = 1 n2 j:y j =−1 q(x j )) is called as the quadratic center surface of Class C 1 (or C 2 ) with respect to q(x) = 0. Definition 2.2. In the soft QSSVM settings, for y i being +1 (or −1), |q( j:y j =−1 q(x j )|) is called as the quadratic-margin distance between point x i and the quadratic center surface of its related class with respect to q(x) = 0.

2.4.
A way to calculate fuzzy memberships. In this paper, we calculate the fuzzy membership of each training point to obtain its relative importance, based on the quadratic-margin distance between each point and the quadratic center surface of its related class as follows. Problem (LPM) is first solved to obtain the parameters (z,c) of an effective separating surface. Then, with regard to this separating surface, we calculate the mean functional margins of all points in Class C 1 and C 2 as α 1 1 n1 j:y j =+1 (s j ) Tz +c ≥ 0 and α 2 − 1 n2 j:y j =−1 (s j ) Tz −c ≥ 0, respectively. Also, as in Definition 2.2, the quadratic-margin distance d(x i ) between the training point x i and its related quadratic center surface is calculated by Then, similar to the fuzzy membership function in [16], for each training point x i , the following way is introduced to calculate the fuzzy membership h i to also consider the affinity among training points: where the constant 0.5 <p < 1 is given in advance. Obviously, 0 ≤ h i ≤ 1, i = 1, · · · , n. From this fuzzy membership function, take training points with label +1 (Class C 1 ) as examples, if they are farther away from the related quadratic center surface q(x) = α 1 , they are more likely to be noise (or outliers) and less important, hence they have smaller fuzzy memberships. With functional margins being larger than 2α 1 or smaller than 0, the training points are supposed to be outliers so that the fuzzy memberships for them are much smaller than those for points with functional margins between 0 and 2α 1 .
3. Fuzzy non-kernel QSSVM with fuzzy within-class scatter for credit scoring. In this section, for credit scoring, we propose a fuzzy non-kernel QSSVM model which incorporates the fuzzy membership of each training point, "fuzzy within-class scatter" of training points, the 2-norm measure of misclassification errors and the T-test feature weights.

3.1.
Review of fuzzy QSSVM model. By incorporating the fuzzy membership h i , i = 1, · · · , n, of each training point into the soft QSSVM model, a fuzzy QSSVM model is given below where the penalty constantη 1 ≥ 0 needs to be chosen beforehand as a tradeoff between two terms in the objective function. The slack variable ξ i is a measure of misclassification error and the fuzzy membership h i is the relative importance of point x i toward its corresponding class. The terms h i ξ i and h i Qx i + f 2 2 can be deemed as a measure of ξ i and Qx i + f 2 2 with the weight h i , respectively. If the training point x i is more likely to be an outlier or noise, which indicates the less importance of x i , then the corresponding h i is expected to be smaller to reduce the effect of parameter ξ i and expression Qx i + f 2 2 in the model (FQS1). 3.2. Fuzzy non-kernel QSSVM with fuzzy within-class scatter for credit scoring. In this subsection, we first develop the fuzzy within-class scatter of training points. Then, for credit scoring, one fuzzy non-kernel QSSVM with fuzzy withinclass scatter by incorporating the fuzzy within-class scatter, T-test feature weights and 2-norm of misclassification errors.
Motivated by the within-class scatter in [7], the fuzzy within-class scatter with respect to a quadratic surface is introduced by considering the relative importance of each training point as follows.
Definition 3.1. In the soft QSSVM settings, we define S(Q, f , c) j:y j =−1 q(x j )) 2 ) ∈ R as the fuzzy within-class scatter of training points with respect to q(x) = 0.
From this definition, we can see that the quadratic-margin distance between point x i and the quadratic center surface of its related class (see Definition 2.2) is scaled by the fuzzy membership h i of point x i . To increase the class separability, we need to minimize S(Q, f , c) for the following proposed model.
For better performance of credit scoring, we first generate the T-test based feature weights (ω j , j = 1, · · · , n) as follows. For j = 1, 2, · · · , m, define υ + j and υ − j as the variance of the sets {x i j : y i = +1} and {x i j : y i = −1}, respectively; and let ρ + j and ρ − j be the mean of the sets {x i j : y i = +1} and {x i j : y i = −1}, respectively. Then the weight for feature j based on T-test of the training data set is defined as: , j = 1, 2, · · · , m.
Notice that, the reason why we adopt this method is that it is the most effective feature weighting strategy in the literature [27]. Then, to propose the fuzzy non-kernel QSSVM model with fuzzy within-class scatter for credit scoring, an LP model is first solved to find the parameters of an effective separating surface via incorporating the T-test based feature weights (ω j , j = 1, · · · , n). Similar to the proposed model (LPM) for approximating the SQSSVM model in Section 2.2, the following LP model is considered: wheres i andW are generated, respectively, after substitutingω j x i j for x i j in s i and W (shown in model (SQSSVM ) of Section 2.1) for i = 1, · · · , n and j = 1, · · · , m. Similarly, the fuzzy memberships (h i , i = 1, · · · , n) of all training points can be obtained by substitutingω j x i j for x i j in the fuzzy membership function in Section 2.4. For better credit scoring accuracy [27], we first replace the 1-norm of slack variable vector ξ = (ξ 1 , ξ 2 , · · · , ξ n ) with the square of its 2-norm in the objective function of fuzzy QSSVM model (FQS1) in Section 3.1, then incorporate the T-test based feature weights (ω j , j = 1, · · · , n) and fuzzy within-class scatterS(Q, f , c) (which is generated by replacing x i j withω j x i j in S(Q, f , c)) to model (FQS1) to propose the following model: Similar to the reformulation of model (SQSSVM) in Section 2.1, and letW whereW andS are obtained by substitutingω j x i j for x i j , ∀i, j inW andS, respectively. Notice that, this model is a convex problem, whose global optimal solution can be generated efficiently. Moreover, the model (FNKSVM-FWS) is a fuzzy QSSVM model which uses a quadratic surface q(x) = 0 to measure the fuzzy within-class scatter. The tradeoff between classification margin z TW z and the fuzzy within-class scatter z TS z is controlled by the nonnegative constantη 2 . In general, a largeη 2 will reduce the fuzzy within-class scatter and narrow the classification margin.
4. Numerical tests on credit data. In this section, we investigate the credit scoring performance of the proposed fuzzy non-kernel QSSVM model on some realworld credit data sets. For comparisons, we mainly test three groups of credit scoring methods: The first group includes commonly-used non-SVM methods such as the logistic regression (denoted by "LOG REG") and feed-forward backpropagation (FFBP) neural networks (denoted by "FFBP NN"); the second group includes commonly-used kernel-based SVM methods such as the soft SVM model with Gaussian kernel (denoted by "SVM GausKer"), the weighted 2-norm SVM model with Gaussian or Quadratic kernel [27] (denoted by "W2NSVM GausKer" or "W2NSVM QuadKer"), well-known FSVM model with within-class scatter and Gaussian kernel [1] (denoted by "FSVMWCS GausKer") and clustered SVM method [11] (denoted by "Clu SVM"); the third group includes two non-kernel SVM models such as Dagher's QSVM model [6] and soft QSSVM model [17]. For all SVM models, the grid method is utilized to search the best penalty parameterη as the following: log 2η ∈ {2, 3, . . . , 19, 20}.
All numerical tests in this paper are conducted via MATLAB (R2014a) on a personal laptop equipped with Intel Core i5 2.40 GHz CPU, 2.5GB usable RAM and Microsoft Windows 7 Professional. The logistic regression and neural networks are implemented using the MATLAB module "glmfit" and "newff", respectively. The group of non-kernel SVM models are implemented using the interior point algorithm in the "quadprog" module of MATLAB, while the group of kernel-based SVM models are implemented using the SMO algorithm in the MATLAB code.

Credit data.
For numerical experiments, we utilize two types of credit data sets: One is about the personal credit information, such as the German and Australian applicants from the UCI machine learning repository database [2]; the other one is about the corporation credit information, such as the suppliers from the state grid corporation of China. It is worth pointing out that the information of these suppliers is gathered from Nanping City in Fujian Province. Moreover, to protect the privacies of these suppliers, we delete their names and invited 21 experienced experts to classify them as good suppliers or bad ones based on their different features. For more details, the profiles of three credit data sets are shown in Table 1.
For German and Australian data sets of personal credit information, the common features include personal sex, marriage, age, salary, credit history, saving account, present residence, present employment and so forth. For Chinese real-world data set, there are some features about the credit information of some corporations, such as the number of cooperation within three years, number of good credit record, number of black list, number of bad delivery, and so forth. For these three credit data sets, the information in all ordinal and continuous variables are not changed. To reflect the real world, all nominal variables without the logic relationships are transferred into one or several categories accordingly. Then for each category, one dumb variable is used to indicate the related applicant in the specific state (i.e., 1 indicates yes while 0 indicates no). Moreover, for the prepossessed data points {x i , i = 1, · · · , n} with all numerical variables, where 1 , x i 2 , · · · , x i m ] T ∈ R m , we linearly scaled all input features to [0, 1] to avoid the dominance of features with greater values over those with smaller values, i.e., , i = 1, · · · , n, j = 1, · · · , m.

4.2.
Numerical experiments on three credit data sets. After prepossessing the three credit data sets, the commonly-used 10-fold cross validation method is utilized to validate the performance of all models. For the German credit data set, the 10-fold cross validation procedure is repeated for 100 times by randomly partitioning the data set into 10 equal-sized parts. For all models, the mean and standard deviation of the calculated misclassification rates and average CPU time of the 100 independent tests are reported in Table 2. Moreover, the reported CPU time of solving the proposed fuzzy non-kernel QSSVM model or weighted 2-norm SVM model includes the time of computing fuzzy memberships of all training points. However, the reported CPU time of all methods doesn't include the time of tuning the parameters in the neural network, the penalty parameterη or kernel parameters in SVM. Furthermore, using the same experiment procedures as those designed for German credit data set, we conduct experiments for all models on Australian and Chinese credit data set, and record the computational results in Tables 3 and 4, respectively.
From Tables 2-4, we have the following observations and conclusions: • For the two credit data sets of personal information, the proposed FNKSVM-FWS model produces much more accurate credit scorings than all compared well-known methods. • For the Chinese credit data set of corporate information, the proposed FNKSV-M-FWS model produces more accurate credit scorings than all other compared well-known methods besides the logistic regression. The logistic regression slightly outperforms the fuzzy 2-norm QSSVM in terms of accuracy, but the CPU time of logistic regression is longer than that of fuzzy 2-norm QSSVM. In addition, the robustness of all models is further investigated by utilizing the 10-fold cross validation method on Australian credit data with outliers as follows. First, the Australian credit data set is randomly partitioned into 10 equal-sized parts after the same prepossess procedure in Section 4.1. Then, for each training data set including nine parts of Australian data, the outliers are generated by randomly selecting 35% of all points and changing the labels of these training points. Finally, using the generated training data set with 35% outliers, the 10-fold cross validation procedure is repeated for 100 times. For all models, the mean of calculated misclassification rates of the 100 independent tests are recorded in Table 5. It should be noted that there supposed to be no outliers (or noise) in the original Australian data so that the numerical results of experiments with or without outliers (from Table 3) are also compared in Table 5, to investigate the robustness of all models. From Table 5, we can see that the proposed FNKFSVM-FWS model is the most robust one among all tested models. Moreover, Dagher's QSVM model produces much less inaccurate credit scores than other tested models mainly due to its lack of slack variables. And the proposed model is more robust than the SQSSVM model, which indicates that the fuzzy memberships and fuzzy within-class scatter can increase the robustness. 5. Conclusions and discussions. In this paper, we first develop a new way to efficiently calculate the fuzzy memberships of training points. Then the fuzzy withinclass scatter is developed to handle the outliers and noise points well. Most importantly, one fuzzy non-kernel QSSVM model with fuzzy within-class scatter is proposed for credit scoring by incorporating the fuzzy within-class scatter, T-test based feature weights and 2-norm of the slack vector. Finally, the proposed fuzzy 2-norm QSSVM credit scoring method is tested on personal and corporation credit data sets to investigate its performance. Our major findings are summarized as below.
• For credit scoring, without increasing much computational time, the proposed fuzzy non-kernel QSSVM model is much more accurate and robust than other state-of-the-art credit scoring methods including logistic regression, neural networks, non-kernel and kernel-based SVM methods. • Unlike well-known kernel-based SVM models, the proposed model does not need to use any kernel function or to tune the parameters in the kernel function for credit scoring. Therefore, compared to kernel-based SVMs, the new method saves much more effort and time. For future research, we are interested in developing fuzzy semi-supervised non-kernel QSSVM methods [24] for credit scoring and other issues of risk management.
Funding. Jian's research has been supported by the National Natural Science Foundation of China Grant # 71701035 and # 71831003. Tian's research has been supported by the Fundamental Research Funds for the Central Universities # JBK1805005.