November  2018, 1(4): 331-348. doi: 10.3934/mfc.2018016

Privacy preserving feature selection and Multiclass Classification for horizontally distributed data

1. 

Department of Computer Science, 33 Gilmer Street SE Atlanta, GA, USA

2. 

University of North Georgia, Dahlonega, GA, USA

3. 

Data-driven Intelligence Research Laboratory, College of Computing and Software Engineering, Kennesaw State University, 1100 South Marietta Pkwy, Marietta, GA, USA

* Corresponding author: Meng Han

Received  August 2018 Revised  October 2018 Published  December 2018

In the last two decades, a lot of scientific fields have experienced a huge growth in data volume and data complexity, which brings data miners lots of opportunities, as well as many challenges. With the advent of the era of big data, applying data mining techniques on assembling data from multiple parties (or sources) has become a leading trend. However, those data mining tasks may divulge individuals' privacy, which leads to the increased concerns in privacy preserving. In this work, a Privacy Preserving feature selection method (PPFS-IFW) and Multiclass Classification method (PPM2C) are proposed. Experiments had been conducted to validate the performance of the proposed approaches. Both PPFS-IFW and PPM2C were tested on six benchmark datasets. The testing results demonstrate PPFS-IFW's capability in enhancing the classification performance at the level of accuracy by selection informative features. PPFS-IFW can not only preserve private information but also outperform some other state-of-the-art feature selection approaches. Experimental results also show that the proposed PPM2C method is workable and stable. Particularly, It reduces the risk of over-fitting when compared with the regular Support Vector Machine. In the meantime, by employing the Secure Sum Protocol to encrypt data at the bottom layer, users' privacy is preserved.

Citation: Yunmei Lu, Mingyuan Yan, Meng Han, Qingliang Yang, Yanqing Zhang. Privacy preserving feature selection and Multiclass Classification for horizontally distributed data. Mathematical Foundations of Computing, 2018, 1 (4) : 331-348. doi: 10.3934/mfc.2018016
References:
[1]

C. Ambroise and G. J. McLachlan, Selection bias in gene extraction on the basis of microarray gene-expression data, Proceedings of the National Academy of Sciences, 99 (2002), 6562-6566.  doi: 10.1073/pnas.102102699.  Google Scholar

[2]

V. G. Ashok, K. Navuluri, A. Alhafdhi and R. Mukkamala, Dataless data mining: Association rules-based distributed privacy-preserving data mining, in Information Technology-New Generations (ITNG), 2015 12th International Conference on, IEEE, 2015, 615-620. doi: 10.1109/ITNG.2015.102.  Google Scholar

[3]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013. Google Scholar

[4]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013. Google Scholar

[5]

S. D. Bay, Combining nearest neighbor classifiers through multiple feature subsets. in ICML, 98 (1998), 37-45. Google Scholar

[6]

M. Bendechache and M.-T. Kechadi, Distributed clustering algorithm for spatial data mining, in Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International Conference on. IEEE, 2015, 60-65. Google Scholar

[7]

L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, U. A. Muller, E. Sackinger, P. Simard et al., Comparison of classifier methods: a case study in handwritten digit recognition, in Pattern Recognition, 1994. Vol. 2-Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on, vol. 2. IEEE, 1994, 77-82. Google Scholar

[8]

Z. Cai, R. Goebel, M. R. Salavatipour, Y. Shi, L. Xu and G. Lin, Selecting genes with dissimilar discrimination strength for sample class prediction, in Proceedings Of The 5th Asia-Pacific Bioinformatics Conference, World Scientific, 2007, 81-90. doi: 10.1142/9781860947995_0011.  Google Scholar

[9]

P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in ICML, 98 (1998), 82-90. Google Scholar

[10]

C. J. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121-167.   Google Scholar

[11]

Z. CaiT. Zhang and X.-F Wan, A computational framework for influenza antigenic cartography, PLoS Computational Biology, 6 (2010), e1000949.  doi: 10.1371/journal.pcbi.1000949.  Google Scholar

[12]

C. CliftonM. KantarciogluJ. VaidyaX. Lin and M. Y. Zhu, Tools for privacy preserving distributed data mining, ACM Sigkdd Explorations Newsletter, 4 (2002), 273-297.  doi: 10.1145/772862.772867.  Google Scholar

[13]

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.  Google Scholar

[14]

P. Drineas and M. W. Mahoney, On the nyström method for approximating a gram matrix for improved kernel-based learning, journal of Machine Learning Research, 6 (2015), 2153-2175.   Google Scholar

[15]

S. DudoitY. H. YangM. J. Callow and T. P. Speed, Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments, Statistica Sinica, 12 (2002), 111-139.   Google Scholar

[16]

R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7 (1936), 179-188.  doi: 10.1111/j.1469-1809.1936.tb02137.x.  Google Scholar

[17]

Z. Cai and X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transactions on Network Science and Engineering, (2018), 1-1.  doi: 10.1109/TNSE.2018.2830307.  Google Scholar

[18]

V. Franc and S. Sonnenburg, Optimized cutting plane algorithm for large-scale risk minimization, Journal of Machine Learning Research, 10 (2009), 2157-2192.   Google Scholar

[19]

J. Friedman, Another Approach to Polychotomous Classification, Technical report, Department of Statistics, Stanford University, Tech. Rep., 1996. Google Scholar

[20]

C. FurlanelloM. SerafiniS. Merler and G. Jurman, Entropy-based gene ranking without selection bias for the predictive classification of microarray data, BMC Bioinformatics, 4 (2003), 54.   Google Scholar

[21]

M. Han, J. Li, Ji and Z. Cai, Q. Han, Privacy reserved influence maximization in gps-enabled cyber-physical and online social networks, in 2016 IEEE International Conferences on Social Computing and Networking (SocialCom), 2016, 284-292. Google Scholar

[22]

H. Albinali, M. Han, J. Wang, H. Gao, Y. Li, The roles of social network mavens, in 2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), 2016, 1-8. Google Scholar

[23]

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, science, 286 (1999), 531-537. doi: 10.1126/science.286.5439.531.  Google Scholar

[24]

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, 3 (2003), 1157-1182.   Google Scholar

[25]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.   Google Scholar

[26]

I. Kholod, M. Kuprianov and I. Petukhov, Distributed data mining based on actors for internet of things, in Embedded Computing (MECO), 2016 5th Mediterranean Conference on, IEEE, 2016, 480-484. doi: 10.1109/MECO.2016.7525698.  Google Scholar

[27]

S. Knerr, L. Personnaz and G. Dreyfus, Single-layer learning revisited: A stepwise procedure for building and training a neural network, in Neurocomputing, Springer, 68 (1990), 41-50. doi: 10.1016/j.jcss.2003.06.002.  Google Scholar

[28]

L. Liu, M. Han, Y. Zhou, Y. Wang, LSTM Recurrent Neural Networks for Influenza Trends Prediction, in International Symposium on Bioinformatics Research and Applications, 2018, 259-264. Google Scholar

[29]

Y. Lu, M. Yan, M. Han, Q. Yang, Y. Zhang, Privacy Preserving Multiclass Classification for Horizontally Distributed Data, in Proceedings of the 19th Annual SIG Conference on Information Technology Education, 2018, 165-165. Google Scholar

[30]

Y. Lindell and B. Pinkas, Privacy preserving data mining, Journal of Cryptology, 15 (2002), 177-206.  doi: 10.1007/s00145-001-0019-2.  Google Scholar

[31]

M. Han, J. Wang, M. Yan, C. Ai, Z. Duan, Z. Hong, Near-complete privacy protection: cognitive optimal strategy in location-based services, in Procedia Computer Science, 129 (2018), 298-304. Google Scholar

[32]

A. Joshi, M. Han, Y. Wang, A survey on security and privacy issues of blockchain technology, in Mathematical Foundations of Computing, 1 (2018), 121-147. Google Scholar

[33]

Y. Lu, P. Phoungphol and Y. Zhang, Privacy aware non-linear support vector machine for multi-source big data, in Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, IEEE, 2014, 783-789. doi: 10.1109/TrustCom.2014.103.  Google Scholar

[34]

S. MaldonadoR. Weber and J. Basak, Simultaneous feature selection and classification using kernel-penalized support vector machines, Information Sciences, 181 (2011), 115-128.  doi: 10.1016/j.ins.2010.08.047.  Google Scholar

[35]

J. Miao and L. Niu, A survey on feature selection, Procedia Computer Science, 91 (2016), 919-926.  doi: 10.1016/j.procs.2016.07.111.  Google Scholar

[36]

J. Miranda, R. Montoya and R. Weber, Linear penalization support vector machines for feature selection, in International Conference on Pattern Recognition and Machine Intelligence. Springer, 2005, 188-192. Google Scholar

[37]

K. Parmar, D. Vaghela and P. Sharma, Performance prediction of students using distributed data mining, in Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on, IEEE, 2015, 1-5. Google Scholar

[38]

I. Rish, An empirical study of the naive bayes classifier, in IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3 (2001), 41-46. Google Scholar

[39]

S. L. Salzberg, C4. 5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993, Machine Learning, 16 (1994), 235-240.   Google Scholar

[40]

A. SharmaS. Imoto and S. Miyano, A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9 (2012), 754-764.   Google Scholar

[41]

Y. Shen, H. Shao and Y. Li, Research on the personalized privacy preserving distributed data mining, in Future Information Technology and Management Engineering, 2009. FITME'09. Second International Conference on. IEEE, 2009, 436-439. doi: 10.1109/FITME.2009.115.  Google Scholar

[42]

C.-A. Tsai, C.-H. Huang, C.-W. Chang and C.-H. Chen, Recursive feature selection with significant variables of support vectors, Computational and Mathematical Methods in Medicine, 2012 (2012), Art. ID 712542, 12 pp. doi: 10.1155/2012/712542.  Google Scholar

[43]

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik, Feature selection for svms, in Advances in Neural Information Processing Systems, 2001, 668-674. Google Scholar

[44]

Z. Xu and X. Yi, Classification of privacy-preserving distributed data mining protocols, in Digital Information Management (ICDIM), 2011 Sixth International Conference on. IEEE, 2011, 337-342. doi: 10.1109/ICDIM.2011.6093356.  Google Scholar

[45]

K. Yang, Z. Cai, J. Li and G. Lin, A stable gene selection in microarray data analysis, BMC bioinformatics, 7 (2006), p228. Google Scholar

[46]

J. Ye and T. Xiong, Computational and theoretical analysis of null space and orthogonal linear discriminant analysis, Journal of Machine Learning Research, 7 (2006), 1183-1204.   Google Scholar

[47]

L. Ying-hua, Y. Bing-ru, C. Dan-yang and M. Nan, State-of-the-art in distributed privacy preserving data mining, in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on. IEEE, 2011, 545-549. doi: 10.1109/ICCSN.2011.6014329.  Google Scholar

[48]

K. Zhang, L. Lan, Z. Wang and F. Moerchen, Scaling up kernel svm on limited resources: A low-rank linearization approach, in Artificial Intelligence and Statistics, 2012, 1425-1434. Google Scholar

[49]

X. ZhangX. LuQ. ShiX.-q. XuE. L. Hon-chiuN. HarrisJ. D. IglehartA. MironJ. S. Liu and W. H. Wong, Recursive svm feature selection and sample classification for mass-spectrometry and microarray data, BMC Bioinformatics, 7 (2006), p197.   Google Scholar

[50]

F. Zhang, C. Rong, G. Zhao, J. Wu and X. Wu, Privacy-preserving two-party distributed association rules mining on horizontally partitioned data, in Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on. IEEE, 2013, 633-640. doi: 10.1109/CLOUDCOM-ASIA.2013.87.  Google Scholar

[51]

K. Zhang, I. W. Tsang and J. T. Kwok, Improved nyström low-rank approximation and error analysis, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, 1232-1239. doi: 10.1145/1390156.1390311.  Google Scholar

[52]

X. ZhengZ. Cai and Y. Li, Data linkage in smart internet of things systems: A consideration from a privacy perspective, IEEE Communications Magazine, 56 (2018), 55-61.  doi: 10.1109/MCOM.2018.1701245.  Google Scholar

[53]

Z. ZhuY.-S. Ong and M. Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognition, 40 (2007), 3236-3248.  doi: 10.1016/j.patcog.2007.02.007.  Google Scholar

[54]

http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/, 2013. Google Scholar

show all references

References:
[1]

C. Ambroise and G. J. McLachlan, Selection bias in gene extraction on the basis of microarray gene-expression data, Proceedings of the National Academy of Sciences, 99 (2002), 6562-6566.  doi: 10.1073/pnas.102102699.  Google Scholar

[2]

V. G. Ashok, K. Navuluri, A. Alhafdhi and R. Mukkamala, Dataless data mining: Association rules-based distributed privacy-preserving data mining, in Information Technology-New Generations (ITNG), 2015 12th International Conference on, IEEE, 2015, 615-620. doi: 10.1109/ITNG.2015.102.  Google Scholar

[3]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013. Google Scholar

[4]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013. Google Scholar

[5]

S. D. Bay, Combining nearest neighbor classifiers through multiple feature subsets. in ICML, 98 (1998), 37-45. Google Scholar

[6]

M. Bendechache and M.-T. Kechadi, Distributed clustering algorithm for spatial data mining, in Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International Conference on. IEEE, 2015, 60-65. Google Scholar

[7]

L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, U. A. Muller, E. Sackinger, P. Simard et al., Comparison of classifier methods: a case study in handwritten digit recognition, in Pattern Recognition, 1994. Vol. 2-Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on, vol. 2. IEEE, 1994, 77-82. Google Scholar

[8]

Z. Cai, R. Goebel, M. R. Salavatipour, Y. Shi, L. Xu and G. Lin, Selecting genes with dissimilar discrimination strength for sample class prediction, in Proceedings Of The 5th Asia-Pacific Bioinformatics Conference, World Scientific, 2007, 81-90. doi: 10.1142/9781860947995_0011.  Google Scholar

[9]

P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in ICML, 98 (1998), 82-90. Google Scholar

[10]

C. J. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121-167.   Google Scholar

[11]

Z. CaiT. Zhang and X.-F Wan, A computational framework for influenza antigenic cartography, PLoS Computational Biology, 6 (2010), e1000949.  doi: 10.1371/journal.pcbi.1000949.  Google Scholar

[12]

C. CliftonM. KantarciogluJ. VaidyaX. Lin and M. Y. Zhu, Tools for privacy preserving distributed data mining, ACM Sigkdd Explorations Newsletter, 4 (2002), 273-297.  doi: 10.1145/772862.772867.  Google Scholar

[13]

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297.  doi: 10.1007/BF00994018.  Google Scholar

[14]

P. Drineas and M. W. Mahoney, On the nyström method for approximating a gram matrix for improved kernel-based learning, journal of Machine Learning Research, 6 (2015), 2153-2175.   Google Scholar

[15]

S. DudoitY. H. YangM. J. Callow and T. P. Speed, Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments, Statistica Sinica, 12 (2002), 111-139.   Google Scholar

[16]

R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7 (1936), 179-188.  doi: 10.1111/j.1469-1809.1936.tb02137.x.  Google Scholar

[17]

Z. Cai and X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transactions on Network Science and Engineering, (2018), 1-1.  doi: 10.1109/TNSE.2018.2830307.  Google Scholar

[18]

V. Franc and S. Sonnenburg, Optimized cutting plane algorithm for large-scale risk minimization, Journal of Machine Learning Research, 10 (2009), 2157-2192.   Google Scholar

[19]

J. Friedman, Another Approach to Polychotomous Classification, Technical report, Department of Statistics, Stanford University, Tech. Rep., 1996. Google Scholar

[20]

C. FurlanelloM. SerafiniS. Merler and G. Jurman, Entropy-based gene ranking without selection bias for the predictive classification of microarray data, BMC Bioinformatics, 4 (2003), 54.   Google Scholar

[21]

M. Han, J. Li, Ji and Z. Cai, Q. Han, Privacy reserved influence maximization in gps-enabled cyber-physical and online social networks, in 2016 IEEE International Conferences on Social Computing and Networking (SocialCom), 2016, 284-292. Google Scholar

[22]

H. Albinali, M. Han, J. Wang, H. Gao, Y. Li, The roles of social network mavens, in 2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), 2016, 1-8. Google Scholar

[23]

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, science, 286 (1999), 531-537. doi: 10.1126/science.286.5439.531.  Google Scholar

[24]

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, 3 (2003), 1157-1182.   Google Scholar

[25]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.   Google Scholar

[26]

I. Kholod, M. Kuprianov and I. Petukhov, Distributed data mining based on actors for internet of things, in Embedded Computing (MECO), 2016 5th Mediterranean Conference on, IEEE, 2016, 480-484. doi: 10.1109/MECO.2016.7525698.  Google Scholar

[27]

S. Knerr, L. Personnaz and G. Dreyfus, Single-layer learning revisited: A stepwise procedure for building and training a neural network, in Neurocomputing, Springer, 68 (1990), 41-50. doi: 10.1016/j.jcss.2003.06.002.  Google Scholar

[28]

L. Liu, M. Han, Y. Zhou, Y. Wang, LSTM Recurrent Neural Networks for Influenza Trends Prediction, in International Symposium on Bioinformatics Research and Applications, 2018, 259-264. Google Scholar

[29]

Y. Lu, M. Yan, M. Han, Q. Yang, Y. Zhang, Privacy Preserving Multiclass Classification for Horizontally Distributed Data, in Proceedings of the 19th Annual SIG Conference on Information Technology Education, 2018, 165-165. Google Scholar

[30]

Y. Lindell and B. Pinkas, Privacy preserving data mining, Journal of Cryptology, 15 (2002), 177-206.  doi: 10.1007/s00145-001-0019-2.  Google Scholar

[31]

M. Han, J. Wang, M. Yan, C. Ai, Z. Duan, Z. Hong, Near-complete privacy protection: cognitive optimal strategy in location-based services, in Procedia Computer Science, 129 (2018), 298-304. Google Scholar

[32]

A. Joshi, M. Han, Y. Wang, A survey on security and privacy issues of blockchain technology, in Mathematical Foundations of Computing, 1 (2018), 121-147. Google Scholar

[33]

Y. Lu, P. Phoungphol and Y. Zhang, Privacy aware non-linear support vector machine for multi-source big data, in Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, IEEE, 2014, 783-789. doi: 10.1109/TrustCom.2014.103.  Google Scholar

[34]

S. MaldonadoR. Weber and J. Basak, Simultaneous feature selection and classification using kernel-penalized support vector machines, Information Sciences, 181 (2011), 115-128.  doi: 10.1016/j.ins.2010.08.047.  Google Scholar

[35]

J. Miao and L. Niu, A survey on feature selection, Procedia Computer Science, 91 (2016), 919-926.  doi: 10.1016/j.procs.2016.07.111.  Google Scholar

[36]

J. Miranda, R. Montoya and R. Weber, Linear penalization support vector machines for feature selection, in International Conference on Pattern Recognition and Machine Intelligence. Springer, 2005, 188-192. Google Scholar

[37]

K. Parmar, D. Vaghela and P. Sharma, Performance prediction of students using distributed data mining, in Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on, IEEE, 2015, 1-5. Google Scholar

[38]

I. Rish, An empirical study of the naive bayes classifier, in IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3 (2001), 41-46. Google Scholar

[39]

S. L. Salzberg, C4. 5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993, Machine Learning, 16 (1994), 235-240.   Google Scholar

[40]

A. SharmaS. Imoto and S. Miyano, A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9 (2012), 754-764.   Google Scholar

[41]

Y. Shen, H. Shao and Y. Li, Research on the personalized privacy preserving distributed data mining, in Future Information Technology and Management Engineering, 2009. FITME'09. Second International Conference on. IEEE, 2009, 436-439. doi: 10.1109/FITME.2009.115.  Google Scholar

[42]

C.-A. Tsai, C.-H. Huang, C.-W. Chang and C.-H. Chen, Recursive feature selection with significant variables of support vectors, Computational and Mathematical Methods in Medicine, 2012 (2012), Art. ID 712542, 12 pp. doi: 10.1155/2012/712542.  Google Scholar

[43]

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik, Feature selection for svms, in Advances in Neural Information Processing Systems, 2001, 668-674. Google Scholar

[44]

Z. Xu and X. Yi, Classification of privacy-preserving distributed data mining protocols, in Digital Information Management (ICDIM), 2011 Sixth International Conference on. IEEE, 2011, 337-342. doi: 10.1109/ICDIM.2011.6093356.  Google Scholar

[45]

K. Yang, Z. Cai, J. Li and G. Lin, A stable gene selection in microarray data analysis, BMC bioinformatics, 7 (2006), p228. Google Scholar

[46]

J. Ye and T. Xiong, Computational and theoretical analysis of null space and orthogonal linear discriminant analysis, Journal of Machine Learning Research, 7 (2006), 1183-1204.   Google Scholar

[47]

L. Ying-hua, Y. Bing-ru, C. Dan-yang and M. Nan, State-of-the-art in distributed privacy preserving data mining, in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on. IEEE, 2011, 545-549. doi: 10.1109/ICCSN.2011.6014329.  Google Scholar

[48]

K. Zhang, L. Lan, Z. Wang and F. Moerchen, Scaling up kernel svm on limited resources: A low-rank linearization approach, in Artificial Intelligence and Statistics, 2012, 1425-1434. Google Scholar

[49]

X. ZhangX. LuQ. ShiX.-q. XuE. L. Hon-chiuN. HarrisJ. D. IglehartA. MironJ. S. Liu and W. H. Wong, Recursive svm feature selection and sample classification for mass-spectrometry and microarray data, BMC Bioinformatics, 7 (2006), p197.   Google Scholar

[50]

F. Zhang, C. Rong, G. Zhao, J. Wu and X. Wu, Privacy-preserving two-party distributed association rules mining on horizontally partitioned data, in Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on. IEEE, 2013, 633-640. doi: 10.1109/CLOUDCOM-ASIA.2013.87.  Google Scholar

[51]

K. Zhang, I. W. Tsang and J. T. Kwok, Improved nyström low-rank approximation and error analysis, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, 1232-1239. doi: 10.1145/1390156.1390311.  Google Scholar

[52]

X. ZhengZ. Cai and Y. Li, Data linkage in smart internet of things systems: A consideration from a privacy perspective, IEEE Communications Magazine, 56 (2018), 55-61.  doi: 10.1109/MCOM.2018.1701245.  Google Scholar

[53]

Z. ZhuY.-S. Ong and M. Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognition, 40 (2007), 3236-3248.  doi: 10.1016/j.patcog.2007.02.007.  Google Scholar

[54]

http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/, 2013. Google Scholar

Figure 1.  Workflow of PPM2C
Figure 2.  Classification accuracy improved by PPFS-IFW under CV1 scenario
Figure 3.  Classification accuracy improved by PPFS-IFW under CV2 scenario
Figure 4.  Classification Accuracy comparison before and after feature selection (PPFS-IFW)
Figure 5.  Comparison of classification accuracy for PPM2C when using PAN-SVM and LIBSVM
Figure 6.  Classification accuracy of PrivacySVM under CV1 and CV2
Figure 7.  Classification accuracy of LIBSVM under CV1 and CV2
Figure 8.  Classification accuracy of PrivacySVM under CV1
Figure 9.  Classification accuracy of PrivacySVM under CV2
Table 1.  Details of Datasets used in Evaluation of PPFS-IFW
Datasetnum. samples num. features C $\gamma$
Diabetes(DIA) 768 8 512.0 0.0078125
Ionosphere 351 34 8.0 0.5
Colon 62 2000 32.0 0.0078125
Leukemia 72 7129 128.0 0.0001221
Lymhoma(DLBCL) 47 4026 2.0 0.0078125
Breast Cancer (WBC) 569 30 128.0 8.0
Datasetnum. samples num. features C $\gamma$
Diabetes(DIA) 768 8 512.0 0.0078125
Ionosphere 351 34 8.0 0.5
Colon 62 2000 32.0 0.0078125
Leukemia 72 7129 128.0 0.0001221
Lymhoma(DLBCL) 47 4026 2.0 0.0078125
Breast Cancer (WBC) 569 30 128.0 8.0
Table 2.  Accuracy improved under CV1 and CV2
Dataset CV2 CV1 CV1 num. of Feature CV2 num. of Feature
DIA $3.39\%$ $2.10\%$ $4$ $4$
Ionosphere $0.35\%$ $3.42\%$ $2$ $8$
Colon $3.08\%$ $8.00\%$ $34$ $157$
WBC $2.47\%$ $1.12\%$ $10$ $4$
DLBCL $5.57\%$ $10.95\%$ $394$ $444$
Leukemia $8.57\%$ $3.45\%$ $537$ $631$
Sum $23.43\%$ $29.04\%$ $981$ $1248$
Dataset CV2 CV1 CV1 num. of Feature CV2 num. of Feature
DIA $3.39\%$ $2.10\%$ $4$ $4$
Ionosphere $0.35\%$ $3.42\%$ $2$ $8$
Colon $3.08\%$ $8.00\%$ $34$ $157$
WBC $2.47\%$ $1.12\%$ $10$ $4$
DLBCL $5.57\%$ $10.95\%$ $394$ $444$
Leukemia $8.57\%$ $3.45\%$ $537$ $631$
Sum $23.43\%$ $29.04\%$ $981$ $1248$
Table 3.  Accuracy comparison with other methods
Dataset Fisher SVM FSV RFE SVM KP SVM Ours(CV2) Ours (CV1)
DIA $76.42$ $76.58$ $76.56$ $76.74$ $79.87$ $78.86$
WBC $94.7$ $95.23$ $95.25$ $97.55$ $99.11$ $97.81$
Colon $87.46$ $92.03$ $92.52$ $96.57$ $85.00$ $90.00$
Dataset Fisher SVM FSV RFE SVM KP SVM Ours(CV2) Ours (CV1)
DIA $76.42$ $76.58$ $76.56$ $76.74$ $79.87$ $78.86$
WBC $94.7$ $95.23$ $95.25$ $97.55$ $99.11$ $97.81$
Colon $87.46$ $92.03$ $92.52$ $96.57$ $85.00$ $90.00$
Table 4.  Details of Datasets
Dataset num. of samples num. of features num. of class
$Leukemia_3c$ 72 7129 3
$Leukemia_4a$ 72 7129 4
DNA 2000 180 3
Vowel 528 10 11
Lung 32 56 3
Letter 15000 16 26
Dataset num. of samples num. of features num. of class
$Leukemia_3c$ 72 7129 3
$Leukemia_4a$ 72 7129 4
DNA 2000 180 3
Vowel 528 10 11
Lung 32 56 3
Letter 15000 16 26
[1]

Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2017, 2 (5) : 1-8. doi: 10.3934/bdia.2017020

[2]

Ying Hao, Fanwen Meng. A new method on gene selection for tissue classification. Journal of Industrial & Management Optimization, 2007, 3 (4) : 739-748. doi: 10.3934/jimo.2007.3.739

[3]

Mohamed A. Tawhid, Kevin B. Dsouza. Hybrid binary dragonfly enhanced particle swarm optimization algorithm for solving feature selection problems. Mathematical Foundations of Computing, 2018, 1 (2) : 181-200. doi: 10.3934/mfc.2018009

[4]

Hans Weinberger. The approximate controllability of a model for mutant selection. Evolution Equations & Control Theory, 2013, 2 (4) : 741-747. doi: 10.3934/eect.2013.2.741

[5]

Jonathan C. Mattingly, Etienne Pardoux. Invariant measure selection by noise. An example. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 4223-4257. doi: 10.3934/dcds.2014.34.4223

[6]

K. Schittkowski. Optimal parameter selection in support vector machines. Journal of Industrial & Management Optimization, 2005, 1 (4) : 465-476. doi: 10.3934/jimo.2005.1.465

[7]

Ke Ruan, Masao Fukushima. Robust portfolio selection with a combined WCVaR and factor model. Journal of Industrial & Management Optimization, 2012, 8 (2) : 343-362. doi: 10.3934/jimo.2012.8.343

[8]

Reinhard Bürger. A survey of migration-selection models in population genetics. Discrete & Continuous Dynamical Systems - B, 2014, 19 (4) : 883-959. doi: 10.3934/dcdsb.2014.19.883

[9]

Sebastian Bonhoeffer, Pia Abel zur Wiesch, Roger D. Kouyos. Rotating antibiotics does not minimize selection for resistance. Mathematical Biosciences & Engineering, 2010, 7 (4) : 919-922. doi: 10.3934/mbe.2010.7.919

[10]

Renato Bruni, Gianpiero Bianchi, Alessandra Reale. A combinatorial optimization approach to the selection of statistical units. Journal of Industrial & Management Optimization, 2016, 12 (2) : 515-527. doi: 10.3934/jimo.2016.12.515

[11]

P. Magal, G. F. Webb. Mutation, selection, and recombination in a model of phenotype evolution. Discrete & Continuous Dynamical Systems - A, 2000, 6 (1) : 221-236. doi: 10.3934/dcds.2000.6.221

[12]

Shaoyong Lai, Qichang Xie. A selection problem for a constrained linear regression model. Journal of Industrial & Management Optimization, 2008, 4 (4) : 757-766. doi: 10.3934/jimo.2008.4.757

[13]

Pierre-Emmanuel Jabin. Small populations corrections for selection-mutation models. Networks & Heterogeneous Media, 2012, 7 (4) : 805-836. doi: 10.3934/nhm.2012.7.805

[14]

Hanqing Jin, Xun Yu Zhou. Continuous-time portfolio selection under ambiguity. Mathematical Control & Related Fields, 2015, 5 (3) : 475-488. doi: 10.3934/mcrf.2015.5.475

[15]

Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang. A clustering based mate selection for evolutionary optimization. Big Data & Information Analytics, 2017, 2 (1) : 77-85. doi: 10.3934/bdia.2017010

[16]

Irina Kareva, Faina Berezovkaya, Georgy Karev. Mixed strategies and natural selection in resource allocation. Mathematical Biosciences & Engineering, 2013, 10 (5&6) : 1561-1586. doi: 10.3934/mbe.2013.10.1561

[17]

Xueting Cui, Xiaoling Sun, Dan Sha. An empirical study on discrete optimization models for portfolio selection. Journal of Industrial & Management Optimization, 2009, 5 (1) : 33-46. doi: 10.3934/jimo.2009.5.33

[18]

Yuan Lou, Thomas Nagylaki, Wei-Ming Ni. An introduction to migration-selection PDE models. Discrete & Continuous Dynamical Systems - A, 2013, 33 (10) : 4349-4373. doi: 10.3934/dcds.2013.33.4349

[19]

Yufei Sun, Ee Ling Grace Aw, Bin Li, Kok Lay Teo, Jie Sun. CVaR-based robust models for portfolio selection. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-11. doi: 10.3934/jimo.2019032

[20]

Li Xue, Hao Di. Uncertain portfolio selection with mental accounts and background risk. Journal of Industrial & Management Optimization, 2019, 15 (4) : 1809-1830. doi: 10.3934/jimo.2018124

 Impact Factor: 

Metrics

  • PDF downloads (39)
  • HTML views (726)
  • Cited by (0)

[Back to Top]