July  2016, 1(2&3): 247-259. doi: 10.3934/bdia.2016007

Scalable clustering by truncated fuzzy $c$-means

1. 

Department of Mathematics, University of Connecticut, 341 Mans eld Road, Storrs, CT 06269-1009, United States

2. 

Business School, Hunan University, Changsha, Hunan 410082, China

3. 

Columbian College of Arts & Sciences, George Washington University, Washington, D.C., 20052, United States

Received  July 2016 Revised  August 2016 Published  September 2016

Most existing clustering algorithms are slow for dividing a large dataset into a large number of clusters. In this paper, we propose a truncated FCM algorithm to address this problem. The main idea behind our proposed algorithm is to keep only a small number of cluster centers during the iterative process of the FCM algorithm. Our numerical experiments on both synthetic and real datasets show that the proposed algorithm is much faster than the original FCM algorithm and the accuracy is comparable to that of the original FCM algorithm.
Citation: Guojun Gan, Qiujun Lan, Shiyang Sima. Scalable clustering by truncated fuzzy $c$-means. Big Data & Information Analytics, 2016, 1 (2&3) : 247-259. doi: 10.3934/bdia.2016007
References:
[1]

C. C. Aggarwal and C. K. Reddy (eds.), Data Clustering: Algorithms and Applications,, CRC Press, (2014).   Google Scholar

[2]

D. Arthur and S. Vassilvitskii, $k$-means++: The advantages of careful seeding,, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, (2007), 1027.   Google Scholar

[3]

J. C. Bezdek, R. Ehrlich and W. Full, FCM: The fuzzy c-means clustering algorithm,, Computers & Geosciences, 10 (1984), 191.  doi: 10.1016/0098-3004(84)90020-7.  Google Scholar

[4]

J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms,, Kluwer Academic Publishers, (1981).   Google Scholar

[5]

A. Broder, L. Garcia-Pueyo, V. Josifovski, S. Vassilvitskii and S. Venkatesan, Scalable k-means by ranked retrieval,, in Proceedings of the 7th ACM International Conference on Web Search and Data Mining, (2014), 233.  doi: 10.1145/2556195.2556260.  Google Scholar

[6]

R. L. Cannon, J. V. Dave and J. Bezdek, Efficient implementation of the fuzzy c-means clustering algorithms,, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8 (1986), 248.  doi: 10.1109/TPAMI.1986.4767778.  Google Scholar

[7]

T. W. Cheng, D. B. Goldgof and L. O. Hall, Fast fuzzy clustering,, Fuzzy Sets and Systems, 93 (1998), 49.  doi: 10.1016/S0165-0114(96)00232-1.  Google Scholar

[8]

M. de Souto, I. Costa, D. de Araujo, T. Ludermir and A. Schliep, Clustering cancer gene expression data: A comparative study,, BMC Bioinformatics, 9 (2008).  doi: 10.1186/1471-2105-9-497.  Google Scholar

[9]

J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,, Journal of Cybernetics, 3 (1973), 32.  doi: 10.1080/01969727308546046.  Google Scholar

[10]

G. Gan, Data Clustering in C++: An Object-Oriented Approach,, Data Mining and Knowledge Discovery Series, (2011).  doi: 10.1201/b10814.  Google Scholar

[11]

G. Gan, Application of data clustering and machine learning in variable annuity valuation,, Insurance: Mathematics and Economics, 53 (2013), 795.  doi: 10.1016/j.insmatheco.2013.09.021.  Google Scholar

[12]

G. Gan, A multi-asset Monte Carlo simulation model for the valuation of variable annuities,, in Proceedings of the Winter Simulation Conference, (2015), 3162.  doi: 10.1109/WSC.2015.7408450.  Google Scholar

[13]

G. Gan and S. Lin, Valuation of large variable annuity portfolios under nested simulation: A functional data approach,, Insurance: Mathematics and Economics, 62 (2015), 138.  doi: 10.1016/j.insmatheco.2015.02.007.  Google Scholar

[14]

G. Gan and M. K.-P. Ng, Subspace clustering using affinity propagation,, Pattern Recognition, 48 (2015), 1455.  doi: 10.1016/j.patcog.2014.11.003.  Google Scholar

[15]

G. Gan and M. K.-P. Ng, Subspace clustering with automatic feature grouping,, Pattern Recognition, 48 (2015), 3703.  doi: 10.1016/j.patcog.2015.05.016.  Google Scholar

[16]

G. Gan, Y. Zhang and D. K. Dey, Clustering by propagating probabilities between data points,, Applied Soft Computing, 41 (2016), 390.  doi: 10.1016/j.asoc.2016.01.034.  Google Scholar

[17]

R. J. Hathaway and J. C. Bezdek, Extending fuzzy and probabilistic clustering to very large data sets,, Computational Statistics & Data Analysis, 51 (2006), 215.  doi: 10.1016/j.csda.2006.02.008.  Google Scholar

[18]

T. Havens, J. Bezdek, C. Leckie, L. Hall and M. Palaniswami, Fuzzy c-means algorithms for very large data,, IEEE Transactions on Fuzzy Systems, 20 (2012), 1130.  doi: 10.1109/TFUZZ.2012.2201485.  Google Scholar

[19]

M.-C. Hung and D.-L. Yang, An efficient fuzzy c-means clustering algorithm,, in Proceedings IEEE International Conference on Data Mining, (2001), 225.   Google Scholar

[20]

Z.-X. Ji, Q.-S. Sun and D.-S. Xia, A modified possibilistic fuzzy c-means clustering algorithm for bias field estimation and segmentation of brain MR image,, Computerized Medical Imaging and Graphics, 35 (2011), 383.  doi: 10.1016/j.compmedimag.2010.12.001.  Google Scholar

[21]

D. Jiang, C. Tang and A. Zhang, Cluster analysis for gene expression data: A survey,, IEEE Transactions on Knowledge and Data Engineering, 16 (2004), 1370.   Google Scholar

[22]

F. Klawonn, Fuzzy clustering: Insights and a new approach,, Mathware & Soft Computing, 11 (2004), 125.   Google Scholar

[23]

J. F. Kolen and T. Hutcheson, Reducing the time complexity of the fuzzy c-means algorithm,, IEEE Transactions on Fuzzy Systems, 10 (2002), 263.  doi: 10.1109/91.995126.  Google Scholar

[24]

T. Kwok, K. Smith, S. Lozano and D. Taniar, Parallel fuzzy c- means clustering for large data sets,, in Euro-Par 2002 Parallel Processing (eds. B. Monien and R. Feldmann), (2002), 365.  doi: 10.1007/3-540-45706-2_48.  Google Scholar

[25]

H. Liu, F. Zhao and L. Jiao, Fuzzy spectral clustering with robust spatial information for image segmentation,, Applied Soft Computing, 12 (2012), 3636.  doi: 10.1016/j.asoc.2012.05.026.  Google Scholar

[26]

J. D. MacCuish and N. E. MacCuish, Clustering in Bioinformatics and Drug Discovery,, CRC Press, (2010).  doi: 10.1201/b10331.  Google Scholar

[27]

J. Macqueen, Some methods for classification and analysis of multivariate observations,, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics andProbability (eds. L. LeCam and J. Neyman), 1 (1967), 281.   Google Scholar

[28]

S. A. A. Shalom, M. Dash and M. Tue, Graphics hardware based efficient and scalable fuzzy c-means clustering,, in Proceedings of the 7th Australasian Data Mining Conference, 87 (2008), 179.   Google Scholar

[29]

A. Stetco, X.-J. Zeng and J. Keane, Fuzzy c-means++: Fuzzy c-means with effective seeding initalization,, Expert Systems with Applications, 42 (2015).  doi: 10.1016/j.eswa.2015.05.014.  Google Scholar

show all references

References:
[1]

C. C. Aggarwal and C. K. Reddy (eds.), Data Clustering: Algorithms and Applications,, CRC Press, (2014).   Google Scholar

[2]

D. Arthur and S. Vassilvitskii, $k$-means++: The advantages of careful seeding,, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, (2007), 1027.   Google Scholar

[3]

J. C. Bezdek, R. Ehrlich and W. Full, FCM: The fuzzy c-means clustering algorithm,, Computers & Geosciences, 10 (1984), 191.  doi: 10.1016/0098-3004(84)90020-7.  Google Scholar

[4]

J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms,, Kluwer Academic Publishers, (1981).   Google Scholar

[5]

A. Broder, L. Garcia-Pueyo, V. Josifovski, S. Vassilvitskii and S. Venkatesan, Scalable k-means by ranked retrieval,, in Proceedings of the 7th ACM International Conference on Web Search and Data Mining, (2014), 233.  doi: 10.1145/2556195.2556260.  Google Scholar

[6]

R. L. Cannon, J. V. Dave and J. Bezdek, Efficient implementation of the fuzzy c-means clustering algorithms,, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8 (1986), 248.  doi: 10.1109/TPAMI.1986.4767778.  Google Scholar

[7]

T. W. Cheng, D. B. Goldgof and L. O. Hall, Fast fuzzy clustering,, Fuzzy Sets and Systems, 93 (1998), 49.  doi: 10.1016/S0165-0114(96)00232-1.  Google Scholar

[8]

M. de Souto, I. Costa, D. de Araujo, T. Ludermir and A. Schliep, Clustering cancer gene expression data: A comparative study,, BMC Bioinformatics, 9 (2008).  doi: 10.1186/1471-2105-9-497.  Google Scholar

[9]

J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,, Journal of Cybernetics, 3 (1973), 32.  doi: 10.1080/01969727308546046.  Google Scholar

[10]

G. Gan, Data Clustering in C++: An Object-Oriented Approach,, Data Mining and Knowledge Discovery Series, (2011).  doi: 10.1201/b10814.  Google Scholar

[11]

G. Gan, Application of data clustering and machine learning in variable annuity valuation,, Insurance: Mathematics and Economics, 53 (2013), 795.  doi: 10.1016/j.insmatheco.2013.09.021.  Google Scholar

[12]

G. Gan, A multi-asset Monte Carlo simulation model for the valuation of variable annuities,, in Proceedings of the Winter Simulation Conference, (2015), 3162.  doi: 10.1109/WSC.2015.7408450.  Google Scholar

[13]

G. Gan and S. Lin, Valuation of large variable annuity portfolios under nested simulation: A functional data approach,, Insurance: Mathematics and Economics, 62 (2015), 138.  doi: 10.1016/j.insmatheco.2015.02.007.  Google Scholar

[14]

G. Gan and M. K.-P. Ng, Subspace clustering using affinity propagation,, Pattern Recognition, 48 (2015), 1455.  doi: 10.1016/j.patcog.2014.11.003.  Google Scholar

[15]

G. Gan and M. K.-P. Ng, Subspace clustering with automatic feature grouping,, Pattern Recognition, 48 (2015), 3703.  doi: 10.1016/j.patcog.2015.05.016.  Google Scholar

[16]

G. Gan, Y. Zhang and D. K. Dey, Clustering by propagating probabilities between data points,, Applied Soft Computing, 41 (2016), 390.  doi: 10.1016/j.asoc.2016.01.034.  Google Scholar

[17]

R. J. Hathaway and J. C. Bezdek, Extending fuzzy and probabilistic clustering to very large data sets,, Computational Statistics & Data Analysis, 51 (2006), 215.  doi: 10.1016/j.csda.2006.02.008.  Google Scholar

[18]

T. Havens, J. Bezdek, C. Leckie, L. Hall and M. Palaniswami, Fuzzy c-means algorithms for very large data,, IEEE Transactions on Fuzzy Systems, 20 (2012), 1130.  doi: 10.1109/TFUZZ.2012.2201485.  Google Scholar

[19]

M.-C. Hung and D.-L. Yang, An efficient fuzzy c-means clustering algorithm,, in Proceedings IEEE International Conference on Data Mining, (2001), 225.   Google Scholar

[20]

Z.-X. Ji, Q.-S. Sun and D.-S. Xia, A modified possibilistic fuzzy c-means clustering algorithm for bias field estimation and segmentation of brain MR image,, Computerized Medical Imaging and Graphics, 35 (2011), 383.  doi: 10.1016/j.compmedimag.2010.12.001.  Google Scholar

[21]

D. Jiang, C. Tang and A. Zhang, Cluster analysis for gene expression data: A survey,, IEEE Transactions on Knowledge and Data Engineering, 16 (2004), 1370.   Google Scholar

[22]

F. Klawonn, Fuzzy clustering: Insights and a new approach,, Mathware & Soft Computing, 11 (2004), 125.   Google Scholar

[23]

J. F. Kolen and T. Hutcheson, Reducing the time complexity of the fuzzy c-means algorithm,, IEEE Transactions on Fuzzy Systems, 10 (2002), 263.  doi: 10.1109/91.995126.  Google Scholar

[24]

T. Kwok, K. Smith, S. Lozano and D. Taniar, Parallel fuzzy c- means clustering for large data sets,, in Euro-Par 2002 Parallel Processing (eds. B. Monien and R. Feldmann), (2002), 365.  doi: 10.1007/3-540-45706-2_48.  Google Scholar

[25]

H. Liu, F. Zhao and L. Jiao, Fuzzy spectral clustering with robust spatial information for image segmentation,, Applied Soft Computing, 12 (2012), 3636.  doi: 10.1016/j.asoc.2012.05.026.  Google Scholar

[26]

J. D. MacCuish and N. E. MacCuish, Clustering in Bioinformatics and Drug Discovery,, CRC Press, (2010).  doi: 10.1201/b10331.  Google Scholar

[27]

J. Macqueen, Some methods for classification and analysis of multivariate observations,, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics andProbability (eds. L. LeCam and J. Neyman), 1 (1967), 281.   Google Scholar

[28]

S. A. A. Shalom, M. Dash and M. Tue, Graphics hardware based efficient and scalable fuzzy c-means clustering,, in Proceedings of the 7th Australasian Data Mining Conference, 87 (2008), 179.   Google Scholar

[29]

A. Stetco, X.-J. Zeng and J. Keane, Fuzzy c-means++: Fuzzy c-means with effective seeding initalization,, Expert Systems with Applications, 42 (2015).  doi: 10.1016/j.eswa.2015.05.014.  Google Scholar

[1]

Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565

[2]

Daniel Mckenzie, Steven Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 2019, 1 (3) : 307-327. doi: 10.3934/fods.2019014

[3]

Pawan Lingras, Farhana Haider, Matt Triff. Fuzzy temporal meta-clustering of financial trading volatility patterns. Big Data & Information Analytics, 2017, 2 (5) : 1-20. doi: 10.3934/bdia.2017018

[4]

Baoli Shi, Zhi-Feng Pang, Jing Xu. Image segmentation based on the hybrid total variation model and the K-means clustering strategy. Inverse Problems & Imaging, 2016, 10 (3) : 807-828. doi: 10.3934/ipi.2016022

[5]

Yongbin Ou, Cun-Quan Zhang. A new multimembership clustering method. Journal of Industrial & Management Optimization, 2007, 3 (4) : 619-624. doi: 10.3934/jimo.2007.3.619

[6]

Elissar Nasreddine. Two-dimensional individual clustering model. Discrete & Continuous Dynamical Systems - S, 2014, 7 (2) : 307-316. doi: 10.3934/dcdss.2014.7.307

[7]

Elissar Nasreddine. Well-posedness for a model of individual clustering. Discrete & Continuous Dynamical Systems - B, 2013, 18 (10) : 2647-2668. doi: 10.3934/dcdsb.2013.18.2647

[8]

Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems & Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025

[9]

Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang. A clustering based mate selection for evolutionary optimization. Big Data & Information Analytics, 2017, 2 (1) : 77-85. doi: 10.3934/bdia.2017010

[10]

Gurkan Ozturk, Mehmet Tahir Ciftci. Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial & Management Optimization, 2015, 11 (3) : 921-932. doi: 10.3934/jimo.2015.11.921

[11]

Guojun Gan, Kun Chen. A soft subspace clustering algorithm with log-transformed distances. Big Data & Information Analytics, 2016, 1 (1) : 93-109. doi: 10.3934/bdia.2016.1.93

[12]

Richard L Buckalew. Cell cycle clustering and quorum sensing in a response / signaling mediated feedback model. Discrete & Continuous Dynamical Systems - B, 2014, 19 (4) : 867-881. doi: 10.3934/dcdsb.2014.19.867

[13]

Ruiqi Yang, Dachuan Xu, Yicheng Xu, Dongmei Zhang. An adaptive probabilistic algorithm for online k-center clustering. Journal of Industrial & Management Optimization, 2019, 15 (2) : 565-576. doi: 10.3934/jimo.2018057

[14]

Jiangchuan Fan, Xinyu Guo, Jianjun Du, Weiliang Wen, Xianju Lu, Brahmani Louiza. Analysis of the clustering fusion algorithm for multi-band color image. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1233-1249. doi: 10.3934/dcdss.2019085

[15]

Daniel Roggen, Martin Wirz, Gerhard Tröster, Dirk Helbing. Recognition of crowd behavior from mobile sensors with pattern analysis and graph clustering methods. Networks & Heterogeneous Media, 2011, 6 (3) : 521-544. doi: 10.3934/nhm.2011.6.521

[16]

Willem Mélange, Herwig Bruneel, Bart Steyaert, Dieter Claeys, Joris Walraevens. A continuous-time queueing model with class clustering and global FCFS service discipline. Journal of Industrial & Management Optimization, 2014, 10 (1) : 193-206. doi: 10.3934/jimo.2014.10.193

[17]

Juncheng Wei, Jun Yang. Toda system and interior clustering line concentration for a singularly perturbed Neumann problem in two dimensional domain. Discrete & Continuous Dynamical Systems - A, 2008, 22 (3) : 465-508. doi: 10.3934/dcds.2008.22.465

[18]

Chao Mi, Jun Wang, Weijian Mi, Youfang Huang, Zhiwei Zhang, Yongsheng Yang, Jun Jiang, Postolache Octavian. Research on regional clustering and two-stage SVM method for container truck recognition. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1117-1133. doi: 10.3934/dcdss.2019077

[19]

Cheng-Kai Hu, Fung-Bao Liu, Cheng-Feng Hu. Efficiency measures in fuzzy data envelopment analysis with common weights. Journal of Industrial & Management Optimization, 2017, 13 (1) : 237-249. doi: 10.3934/jimo.2016014

[20]

Feyza Gürbüz, Panos M. Pardalos. A decision making process application for the slurry production in ceramics via fuzzy cluster and data mining. Journal of Industrial & Management Optimization, 2012, 8 (2) : 285-297. doi: 10.3934/jimo.2012.8.285

 Impact Factor: 

Metrics

  • PDF downloads (7)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]