# American Institute of Mathematical Sciences

## Density-based semi-supervised classification for intrusion detection

 1 College of economics management, Shangluo University, Shangluo 726000, China 2 College of mathematics and computer application, Shangluo University, Shangluo 726000, China

*Corresponding author: Ning Liu

Received  April 2019 Revised  May 2019 Published  February 2020

In order to improve the classification performance of intrusion detection problems with only a small number of labeled samples, semi-supervised learning is applied into the field of network intrusion. A semi-supervised classification method based on data density (SSC-density) is proposed to implement intrusion detection and solve network intrusion detection problem with fewer label samples. Firstly, the intrusion detection data is numerically numbered and normalized; secondly, the density of each sample is calculated, and the data samples are divided into security points, boundary points and noise points based on the density, so as to determine the spatial structure of the data; thirdly, different strategies are used for semi-supervised learning on different types of samples to mine the implicit information of unlabeled samples and expand the number of labeled samples. Specifically, deleting noise points, semi-supervised learning is firstly performed on the data set composed of security points, and then semi-supervised learning is performed on the data set composed of boundary points. Finally, the labeled samples are trained to generate the final classifier to realize the intrusion detection of network data. Experiments are carried out on KDD CUP 99 data set, the experimental result shows that the proposed algorithm has good classification performance.

Citation: Ning Liu, Jianhua Zhao. Density-based semi-supervised classification for intrusion detection. Discrete & Continuous Dynamical Systems - S, doi: 10.3934/dcdss.2020263
##### References:

show all references

##### References:
The flow chart of SSC-density
Experimental data set
 Attack categories Type of attack training set Test set Normal normal 8000 4000 DOS back 900 400 neptune 3500 2000 smurf 2100 1000 Total 6500 3400 R2L guess_passwd 53 40 Total 73 40 U2R buffer_overflow 30 22 Total 30 22 Probe ipsweep 500 180 portsweep 500 200 satan 417 158 Total 1397 538
 Attack categories Type of attack training set Test set Normal normal 8000 4000 DOS back 900 400 neptune 3500 2000 smurf 2100 1000 Total 6500 3400 R2L guess_passwd 53 40 Total 73 40 U2R buffer_overflow 30 22 Total 30 22 Probe ipsweep 500 180 portsweep 500 200 satan 417 158 Total 1397 538
Confusion matrix
 Category Actual positive class Actual negative class Experimental positive class TP FN Experimental negative class FP TN
 Category Actual positive class Actual negative class Experimental positive class TP FN Experimental negative class FP TN
Experimental result (SVM, N = 5%)
 data set method1 method2 method3 Our algorithm normal 0.7554 0.8282 0.9057 0.9282 abnormal 0.6587 0.7640 0.8874 0.8934
 data set method1 method2 method3 Our algorithm normal 0.7554 0.8282 0.9057 0.9282 abnormal 0.6587 0.7640 0.8874 0.8934
Experimental result (SVM, N = 10%)
 data set method1 method2 method3 Our algorithm normal 0.7845 0.8572 0.9274 0.9517 abnormal 0.7012 0.7774 0.9045 0.9354
 data set method1 method2 method3 Our algorithm normal 0.7845 0.8572 0.9274 0.9517 abnormal 0.7012 0.7774 0.9045 0.9354
Experimental result (SVM, N = 15%)
 data set method1 method2 method3 Our algorithm normal 0.8117 0.8819 0.9556 0.9720 abnormal 0.7157 0.8447 0.9384 0.9402
 data set method1 method2 method3 Our algorithm normal 0.8117 0.8819 0.9556 0.9720 abnormal 0.7157 0.8447 0.9384 0.9402
Experimental result (RBF, N = 5%)
 data set method1 method2 method3 Our algorithm normal 0.7014 0.8041 0.8734 0.9047 abnormal 0.6347 0.7513 0.8524 0.8786
 data set method1 method2 method3 Our algorithm normal 0.7014 0.8041 0.8734 0.9047 abnormal 0.6347 0.7513 0.8524 0.8786
Experimental result (RBF, N = 10%)
 data set method1 method2 method3 Our algorithm normal 0.7328 0.8312 0.9324 0.9437 abnormal 0.6847 0.8090 0.8878 0.9275
 data set method1 method2 method3 Our algorithm normal 0.7328 0.8312 0.9324 0.9437 abnormal 0.6847 0.8090 0.8878 0.9275
Experimental result (RBF, N = 15%)
 data set method1 method2 method3 Our algorithm normal 0.7925 0.8675 0.9425 0.9734 abnormal 0.7089 0.8287 0.9074 0.9355
 data set method1 method2 method3 Our algorithm normal 0.7925 0.8675 0.9425 0.9734 abnormal 0.7089 0.8287 0.9074 0.9355
 [1] Robert L. Peach, Alexis Arnaudon, Mauricio Barahona. Semi-supervised classification on graphs using explicit diffusion dynamics. Foundations of Data Science, 2020, 2 (1) : 19-33. doi: 10.3934/fods.2020002 [2] Liming Yang, Yannan Chao. A new semi-supervised classifier based on maximum vector-angular margin. Journal of Industrial & Management Optimization, 2017, 13 (2) : 609-622. doi: 10.3934/jimo.2016035 [3] Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008 [4] Li Gang. An optimization detection algorithm for complex intrusion interference signal in mobile wireless network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1371-1384. doi: 10.3934/dcdss.2019094 [5] Doan The Hieu, Tran Le Nam. The classification of constant weighted curvature curves in the plane with a log-linear density. Communications on Pure & Applied Analysis, 2014, 13 (4) : 1641-1652. doi: 10.3934/cpaa.2014.13.1641 [6] Habib Ammari, Josselin Garnier, Vincent Jugnon. Detection, reconstruction, and characterization algorithms from noisy data in multistatic wave imaging. Discrete & Continuous Dynamical Systems - S, 2015, 8 (3) : 389-417. doi: 10.3934/dcdss.2015.8.389 [7] Fioralba Cakoni, Rainer Kress. Integral equations for inverse problems in corrosion detection from partial Cauchy data. Inverse Problems & Imaging, 2007, 1 (2) : 229-245. doi: 10.3934/ipi.2007.1.229 [8] Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111 [9] Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031 [10] Xiangmin Zhang. User perceived learning from interactive searching on big medical literature data. Big Data & Information Analytics, 2017, 2 (5) : 1-16. doi: 10.3934/bdia.2017019 [11] D. Warren, K Najarian. Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure. Conference Publications, 2003, 2003 (Special) : 898-904. doi: 10.3934/proc.2003.2003.898 [12] Audric Drogoul, Gilles Aubert. The topological gradient method for semi-linear problems and application to edge detection and noise removal. Inverse Problems & Imaging, 2016, 10 (1) : 51-86. doi: 10.3934/ipi.2016.10.51 [13] Yunmei Lu, Mingyuan Yan, Meng Han, Qingliang Yang, Yanqing Zhang. Privacy preserving feature selection and Multiclass Classification for horizontally distributed data. Mathematical Foundations of Computing, 2018, 1 (4) : 331-348. doi: 10.3934/mfc.2018016 [14] Liqiang Zhu, Ying-Cheng Lai, Frank C. Hoppensteadt, Jiping He. Characterization of Neural Interaction During Learning and Adaptation from Spike-Train Data. Mathematical Biosciences & Engineering, 2005, 2 (1) : 1-23. doi: 10.3934/mbe.2005.2.1 [15] Marc Bocquet, Julien Brajard, Alberto Carrassi, Laurent Bertino. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science, 2020, 2 (1) : 55-80. doi: 10.3934/fods.2020004 [16] Shixiu Zheng, Zhilei Xu, Huan Yang, Jintao Song, Zhenkuan Pan. Comparisons of different methods for balanced data classification under the discrete non-local total variational framework. Mathematical Foundations of Computing, 2019, 2 (1) : 11-28. doi: 10.3934/mfc.2019002 [17] Tracy L. Stepien, Erica M. Rutter, Yang Kuang. A data-motivated density-dependent diffusion model of in vitro glioblastoma growth. Mathematical Biosciences & Engineering, 2015, 12 (6) : 1157-1172. doi: 10.3934/mbe.2015.12.1157 [18] Paul Sacks, Mahamadi Warma. Semi-linear elliptic and elliptic-parabolic equations with Wentzell boundary conditions and $L^1$-data. Discrete & Continuous Dynamical Systems - A, 2014, 34 (2) : 761-787. doi: 10.3934/dcds.2014.34.761 [19] Monika Eisenmann, Etienne Emmrich, Volker Mehrmann. Convergence of the backward Euler scheme for the operator-valued Riccati differential equation with semi-definite data. Evolution Equations & Control Theory, 2019, 8 (2) : 315-342. doi: 10.3934/eect.2019017 [20] Xin Zhong. Global well-posedness to the cauchy problem of two-dimensional density-dependent boussinesq equations with large initial data and vacuum. Discrete & Continuous Dynamical Systems - A, 2019, 39 (11) : 6713-6745. doi: 10.3934/dcds.2019292

2018 Impact Factor: 0.545