A NEW SEMI-SUPERVISED CLASSIFIER BASED ON MAXIMUM VECTOR-ANGULAR MARGIN

. Semi-supervised learning is an attractive method in classiﬁcation problems when insuﬃcient training information is available. In this investiga- tion,a new semi-supervised classiﬁer is proposed based on the concept of maximum vector-angular margin, (called S 3 MAMC), the main goal of which is to ﬁnd an optimal vector c as close as possible to the center of the dataset consist- ing of both labeled samples and unlabeled samples. This makes S 3 MAMC better generalization with smaller VC (Vapnik-Chervonenkis ) dimension. How- ever, S 3 MAMC formulation is a non-convex model and therefore it is diﬃcult to solve. Following that we present two optimization algorithms, mixed integer quadratic program (MIQP) and DC (diﬀerence of convex functions) program algorithms, to solve the S 3 MAMC. Compared with the supervised learning methods, numerical experiments on real and synthetic databases demonstrate that the S 3 MAMC can improve generalization when the labelled samples are relatively few. In addition, the S 3 MAMC has competitive experiment results in generalization compared to the traditional semi-supervised classiﬁcation meth- ods.

1. Introduction. In many classification problems, labelled data are scarce, while manually labelled data are often expensive. In contrast, unlabelled data are abundant and easy to collect. Moreover, when there are relatively few labelled data, a frequent drawback in classification is the possibility of overfitting the training data with a consequent loss of generality. Using both labelled and unlabelled data for the purpose of learning is called semi-supervised learning, the main goal of which is to improve generalization by incorporating unlabelled data in training when insufficient training information is available. Recently, semi-supervised learning has become an important topic in both theory and practice. Semi-supervised support vector machine (S 3 VM) [4,6,8,14] is a popular semi-supervised learning framework and has demonstrated its effectiveness in machine learning. However, the parameters for the S 3 VM balance only the model complexity and misclassification errors, and thus do not permit other theoretical interpretation.
For different data sets, the amount of data points, the number of data dimensions and the shapes of data distributions are quite different. Therefore different models are used to solve problems in different data sets. Recently, a maximum vectorangular margin classifier (MAMC) [5] has been proposed based on a new concept of vector-angular margin. The main idea of MAMC is to find a vector c as close as possible to the data center in the sense of smaller VC dimension [17]. Unlike the standard support vector machine (SVM) [16,17] where the number of support vectors cannot be controlled by certain parameters, MAMC can effectively control the number of support vectors by model parameter ν. However, this supervised MAMC requires a large number of labelled data in order to construct accurate classifier.
In this work, we extend the MAMC to semi-supervised learning framework and a new semi-supervised classifier is proposed based on the maximum vector-angular margin (called S 3 MAMC). The advantages of doing so are twofold: (1) S 3 MAMC is to find an optimal vector c approximate to the center of the whole dataset including both labelled dada and unlabelled data. Thus the S 3 MAMC has better generalization with smaller VC dimension than the supervised MAMC since the upper bound of the VC dimension depends theoretically on the minimal radius of the smallest sphere enclosing all samples including labelled samples and unlabelled samples.
(2) S 3 MAMC inherits the advantage of MAMC and ν-support vector classifier (ν-SVC) [12]. It can effectively control the number of support vectors by model parameter ν, which make the parameter ν in S 3 MAMC have theoretical better interpretation than the penalty factors in SVM and S 3 VM models.
However, S 3 MAMC model is nonconvex and nonsmooth, which makes the problem difficult to solve. In this work, we first give a mixed integer quadratic programming (MIQP) algorithm to obtain exact solution of S 3 MAMC. Following that, we reformulate S 3 MAMC as a DC (difference of convex functions) program [1,2,5,11,13]. Moreover, the resulting DCA (DC algorithm) converges finitely and only requires solving one quadratic program at each iteration.
2.1. ν-SVC. The standard SVM is a supervised learning method. It classifies two-class datasets by constructing a separating hyperplane , and the corresponding primal optimization problem is mathematically modeled as min w,b,ξ where λ > 0 does not permit any other theoretical interpretation except for trade off between model complexity and empirical risk. Scholkopf et al propose a modification of the SVM model, called ν-SVC [12], by introducing a new parameter ν and an additional variable ρ: Where the parameter ν controls the number of support vectors and the bound of the classification errors. The corresponding decision hyperplane takes still the form: f (x) = sgn(w T x + b), but the separating margin is changed to 2ρ , which is different from the traditional SVM. The dual of problem (2) is usually solved.

2.2.
Maximum vector-angular margin classifier (MAMC). The MAMC is a supervised classification method. By finding an optimal vector c using labelled samples, MAMC separates two-class samples in terms of the maximum vectorangular margin between the vector c and the labelled samples. Hu et al define MAMC as the following optimization [10]: where ξ i is a slack variable and the second item in objective function is to make vector c as close as possible to the center of the labelled data. The ρ is known as the margin in the sense of vector-angular distance, which is not a Euclidean distance. According to the principle of Euclidean geometry, x T i c = cos(θ i ) x i c , where θ i is the angle between x i and c. Therefore, x T i c actually reflects the information about the angular and Euclidean distance between x i and c. The corresponding decision function is f (x) = sgn(x T c).
The dual problem of MAMC (3) has the form: We see that parameter ν > 0 exactly controls the sum of support vectors, which implies that MAMC inherits the main advantage of ν-SVC. Furthermore, parameter ν has the following theoretical interpretations [12] Suppose ρ > 0, then the following two statements hold: (1) ν/(mλ) is the lower bound of the fraction of support vectors.
(2) ν/(mλ) is the upper bound of the fraction of misclassified samples.
2.3. DC programming. We outline the main algorithmic results for DC programming [1,2,5,11,13]. DC programming and DCA, introduced by Pham Dinh Tao in 1985, constitute the backbone of nonconvex continuous programming. The key to DC programs is to decompose an objective function into the difference of two convex (DC) functions, from which a sequence of approximations of the objective function yields a sequence of solutions converging to a stationary point, possibly an optimal solution. Generally speaking, a so-called DC program (P dc ) is to minimize a DC function: where g(x) and h(x) are all convex functions. Let g * (y) = sup{x T y − g(x), x ∈ R n } denote the conjugate function of g. The Fenchel-Rockafellar dual of (P dc ) is defined as A DC program is called a polyhedral DC program when either g(x) or h(x) is a polyhedral convex function (i.e., the pointwise supremum of a finite collection of affine functions). The DCA is an iterative algorithm based on local optimality conditions and duality. The idea of DCA is simple (for simplicity, we omit here the dual part): at each iteration, one replaces in the primal DC problem (P dc ) the second component h by its affine minorization: h(x k ) + (x − x k ) T y k , to generate the convex program: Where ∂h is the subdifferential of convex function h. In practice, a simplified form of the DCA is used. Two sequences {x k } and {y k } satisfying y k ∈ ∂h(x k ) are constructed, and x k+1 is a solution to the convex program (7). DCA is a descent method without line search, and it converges linearly for general DC programs. In particular, for polyhedral DC programs, the sequence {x k } contains finitely many elements, and in a finite number of iterations the algorithm converges to a stationary point satisfying the necessary optimality condition.
3. Semi-supervised maximum vector-angular margin classifier (S 3 MA-MC). When the dataset lacks of labelled samples, the supervised MAMC is difficult to apply. Therefore, motivated by the idea from the supervised MAMC and the traditional S 3 VM [4], we here propose a new semi-supervised classification method based on the concept of the maximum vector-angular margin,called S 3 MAMC, the main idea of which is to find a optimal vector c as close as possible to the center of the whole dataset consisting of labelled data and unlabelled data. This implies that the S 3 MAMC has better generalization than MAMC and ν-SVC in the perspective of VC dimension since the upper bound of the VC dimension depends theoretically on the minimal radius of the smallest sphere enclosing all samples including labelled samples and unlabelled samples.
Specifically, for each unlabelled sample x j , we add two constraints and introduce two variables r j and s j to represent two possible misclassification errors . One constraint calculates the misclassification error r j as if the sample x j is in positive class and the other constraint calculates the misclassification error s j as if the sample x j is in negative class. The objective function defines the minimum of the two possible misclassification errors min{r j , s j }(j = m + 1, · · · , m + p). The final class of the unlabelled sample x j corresponds to the one that results in the smallest error. The above motivation can be formulated as the following optimization problem, where λ, µ > 0 are penalty parameters for misclassification of labelled and unlabelled samples. The variable ρ is vector-angular margin between two-class samples. The first three terms of objective function together with the first two constraints are corresponding to a supervised MAMC. The last item of objective function together with the last four constraints are to classify each unlabelled sample to the category with minimum misclassification. The decision function is f (x) = sgn(x T c).
However, S 3 MAMC (8) is a nonconvex and nonsmooth problem owing to the last term in the objective function, which precludes the use of convex and smooth methods. Thus, this makes the problem difficult to optimize.
The following should be noted: (1) S 3 MAMC extends the supervised MAMC to the semi-supervised learning setting. When the unlabelled sample set is null, S 3 MAMC is equivalent to the supervised MAMC. Thus S 3 MAMC includes and extends the MAMC.
(2) The goal of S 3 MAMC is to find a vector c as close as possible to the center of the whole dataset including labelled data and unlabelled data, and therefore it has better generalization with smaller VC dimension than ν-SVC and the supervised MAMC.
(3) Similar to the MAMC and ν-SVC, S 3 MAMC can well control the number of support vectors and the classification error by introducing an additional variable ρ and a new parameter ν.
(4) Compared with the popular semi-supervised methods, S 3 VMs, the main benefit of S 3 MAMC is that the parameter ν in the S 3 MAMC has a better theoretical interpretation than the penalty factors in S 3 VMs.
3.1. Solving S 3 MAMC via mixed integer quadratic programming (MIQP-S 3 MAMC). In this section, we present a combinatorial optimization technique to globally solve the S 3 MAMC. Specifically, by introducing integer variable d ∈ R p with component d j = 0 or 1 for each unlabelled sample x j , (j = m+1, · · · , m+p), the S 3 MAMC (8) can be posed as the following mixed integer quadratic programming, called MIQP-S 3 MAMC: where M > 0 is a sufficiently large constant such that if d j = 0 then r j = 0 is feasible for any optimal c, which attempts to classify the unlabelled sample x j to negative class. Likewise if d j = 1 then s j = 0, which attempts to classify the sample x j to positive class. Solving the problem (9)obtains the exact solution of S 3 MAMC (8) using the popular YALMIP Toolbox [15]. In general, this global optimization can be computationally very demanding.
According to the above analysis, the algorithm for solving MIQP-S 3 MAMC (9) is described as follows Algorithm 1 (1) Choose a sufficiently large constant M > 0 and suitable parameters λ, µ > 0.
3.2. Solving S 3 MAMC via DC programming (DCA-S 3 MAMC). In this section, we discuss an approximation algorithm for solving the S 3 MAMC (8). By suitable decomposition of the objective function, the S 3 MAMC (8) can be transformed into a DC program.
Let x = (ρ, c, ξ, r, s) and Ω denote the feasible set of constraints for problem (8). The S 3 MAMC (8) can be expressed as the following unconstrained optimization problem, called DCA-S 3 MAMC: with Where χ Ω (x) denotes the indicator function for the set Ω:

) is a convex function and H(x) is a polyhedral convex function. Thus DCA-S 3 MAMC (14) is a polyhedral DC program.
According to the analysis in Section 2, performing DCA for problem (14) amounts to computing two sequences {x k+1 } and {y k }, where {x k+1 } is the solution to the following convex programming: with y k ∈ ∂H(x k ). The subgradient of convex function H(x) is below:
(2) Calculate y k ∈ ∂H(x k ) according to (19) and solve the quadratic programming min c,ρ,ξ,r,s then the iteration stops, and x k+1 is the computed solution as required; otherwise, set k = k + 1, to step (2). Theorem 3.2. (1). DCA generates two sequences {x k } and {y k } such that G(x k )− H(x k ) and H * (y k ) − G * (y k ) decrease monotonically.
(2).After a finite number of iterations, the sequences {x k } and {y k } converge to x * and y * respectively. The point x * is the stationary point of the DC program (14).
Proof. The first conclusion is a direct consequence of the convergence properties of the general DC program. Note that DC program (14) is a DC polyhedral program, and hence algorithm 2 converges finitely, and its limit point is a stationary point of the DC program (14). The proof is then complete.

Numerical experiments.
To evaluate the proposed framework, we run the proposed S 3 MAMC on a synthetic dataset and some the real world data sets from the University of California Irvine (UCI) Machine Learning Repository [3]. Experiments use Matlab R2012a as a solver.
where TP and TN denote true positives and true negatives; FN and FP denote false negatives and false positives, respectively. And a + = T P/(T P + F N ), a − = T N/(T N + F P ). The G-ACC, MCC and F 1 -measure are comprehensive measures of the quality of classification models. The higher the values of these values are, the better the models are.
For comprehensive evaluation, our experiment is composed of three parts.
(1)We compare the performance of S 3 MAMC with the supervised learning methods.
(3) we compare two S 3 MAMC algorithms with other semi-supervised learning models.
In addition, we conduct 10 times trials for each dataset. In each trial, each dataset is randomly divided into two parts: 20% of the samples for training and the remaining 80% of the samples for testing. We remove the labels for the test set, and run the proposed semi-supervised algorithms to reclassify the test set using the learning results.   The accuracy of the proposed method depends usually on these parameter values. In this work, these parameters are adjusted from the set {10 i |i = −2, · · · , 3} by cross-validation. We also present a map (FIGURE 1) to illustrate the the relationship between the ACC and the µ with ν = 1 and ν = 10, where the x-axis denotes the values of parameter µ, and the y-axis denotes the accuracy (ACC). For fixed ν, we find that the accuracy of the S 3 MAMC increases when µ ranges from 1 to 100, while the S 3 MAMC produces smaller accuracy when µ ranges from 100 to 1000. At the same time, FIGURE 1 shows that the ACC is not very sensitive with the choice of parameter ν. These findings are helpful in the choice of parameters µ and ν in the following benchmark experiments.
According to the above analysis, S 3 MAMC parameters ν = 10 and λ = µ = 100 are chosen in our experiments. The parameters in the other models are set to be the same as in S 3 MAMC.
Moreover, a synthetic banana shaped dataset is shown in FIGURE 2 , which is a linearly separable two-dimensional dataset. The goal of designing this dataset is to support the superiority of S 3 MAMC more intuitively. Moreover, the positive samples are represented by " " and the negative samples are represented by " * ".  We find from TABLE 1 that S 3 MAMC outperforms MAMC and ν-SVM in five datasets, Wine, Cancer, Tryroid, Sonar and Synthesis datasets, especially synthetic dataset; in other four datasets, the performance of S 3 MAMC is competitive with or better lightly than that of MAMC and ν-SVM according to generalization. These results show that S 3 MAMC either improves or shows no significant difference in generalization compared to the supervised approach when lacking of adequate labelled samples.
To further examine the performance of the proposed S 3 MAMC for large unlabelled samples, we present another numerical experiment result. Specifically, each dataset is randomly separated into: 10% of samples as the labelled samples and 90% of samples as unlabelled. We perform the DCA-S 3 MAMC,MAMC and ν-SVM in eight datasets. With the ratio of labelled to unlabelled samples being 1:9, experimental results are averaged over 10-time trials, and the generalization of S 3 MAMC and the supervised learning methods are reported in TABLE 2.
As expected, TABLE 2 shows that the performance of the DCA-S 3 MAMC outperforms MAMC and ν-SVM in all eight considered datasets. This suggests that incorporating unlabelled samples in training improves the generalization when the proportion of labelled samples in the dataset is relatively low.

4.3.
Comparisons of the two proposed semi-supervised MAMC algorithms. We here compare the proposed two semi-supervised MAMC models: MIQP-S 3 MAMC and DCA-S 3 MAMC. The accuracy (ACC) and time-consuming (CPUtime, in seconds) of these two models are illustrated in FIGURE 3 and FIGURE 4 respectively.
We know from FIGURE 3 that MIQP-S 3 MAMC requires more running time than DCA-S 3 MAMC. At the same time, FIGURE 4 illustrates that the accuracy of these two algorithms has no very significant difference in the considered datasets.   in Sonar and Ionosphere datasets; while in other two datasets, the performances of these four semi-supervised learning methods show no significant difference. However, for the synthetic banana shaped dataset, the S 3 MAMC is superior to the MIP-S 3 VM and VS 3 VM by obtaining better generalization.

5.
Conclusions. In this paper, we extend the supervised MAMC to semi-supervised learning framework. A novel semi-supervised classifier (called S 3 MAMC) is proposed based on the concept of vector-angular margin. This makes S 3 MAMC better generalization with smaller VC dimension. And S 3 MAMC can control the number of support vectors and the classification errors by model parameter ν. Following that, we show how the S 3 MAMC can be converted to a mixed-integer quadratic program and then solved exactly. Moreover, the S 3 MAMC is posed as a DC program by suitable decomposition of the objective function. The resulting DCA converges finitely and only requires solving one quadratic program at each iteration. Computational results of S 3 MAMC are compared with the supervised methods on real and synthetic databases. The results show that S 3 MAMC can improve the generalization by incorporating unlabelled samples in training when there are relatively few labelled data. Moreover, comparing with other semi-supervised methods, S 3 MAMC has competitive results in UCI datasets. For the synthetic banana shaped dataset, the S 3 MAMC outperforms significantly the supervised methods (MAMC and ν-SVM) and the traditional semi-supervised methods (MIP-S 3 VM and VS 3 VM) by obtaining better ability.
The proposed methods in this work can also be extended to formulate multi-class and nonlinear cases. In addition, it is worth noting that the proposed DCA depends greatly on two components( g and h). The question of "good" DC decompositions for the S 3 MAMC will be studied in the future work.