# American Institute of Mathematical Sciences

April  2019, 15(2): 565-576. doi: 10.3934/jimo.2018057

## An adaptive probabilistic algorithm for online k-center clustering

 1 Department of Information and Operations Research, College of Applied Sciences, Beijing University of Technology, Beijing 100124, China 2 School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China

* Corresponding author: Dongmei Zhang

Received  August 2017 Revised  December 2017 Published  April 2018

The $k$-center clustering is one of the well-studied clustering problems in computer science. We are given a set of data points $P\subseteq R^d$, where $R^d$ is $d$ dimensional Euclidean space. We need to select $k≤ |P|$ points as centers and partition the set $P$ into $k$ clusters with each point connecting to its nearest center. The goal is to minimize the maximum radius. We consider the so-called online $k$-center clustering model where the data points in $R^d$ arrive over time. We present the bi-criteria $(\frac{n}{k}, (\log\frac{U^*}{L^*})^2)$-competitive algorithm and $(\frac{n}{k}, \logγ\log\frac{nγ}{k})$-competitive algorithm for semi-online and fully-online $k$-center clustering respectively, where $U^*$ is the maximum cluster radius of optimal solution, $L^*$ is the minimum distance of two distinct points of $P$, $γ$ is the ratio of the maximum distance of two distinct points and the minimum distance of two distinct points of $P$ and $n$ is the number of points that will arrive in total.

Citation: Ruiqi Yang, Dachuan Xu, Yicheng Xu, Dongmei Zhang. An adaptive probabilistic algorithm for online k-center clustering. Journal of Industrial & Management Optimization, 2019, 15 (2) : 565-576. doi: 10.3934/jimo.2018057
##### References:
 [1] D. Achlioptas, M. Chrobak and J. Noga, Competitive analysis of randomized paging algorithms, Theoretical Computer Science, 234 (2000), 203-218. doi: 10.1016/S0304-3975(98)00116-9. Google Scholar [2] J. Bar-Ilan, G. Kortsarz and D. Peleg, How to allocate network centers, Journal of Algorithms, 15 (1993), 385-415. doi: 10.1006/jagm.1993.1047. Google Scholar [3] M. Charikar, C. Chekuri, T. Feder and R. Motwani, Incremental clustering and dynamic information retrieval, SIAM Journal on Computing, 33 (2004), 1417-1440. doi: 10.1137/S0097539702418498. Google Scholar [4] S. Chaudhuri, N. Garg and R. Ravi, The p-neighbor k-center problem, Information Processing Letters, 65 (1998), 131-134. doi: 10.1016/S0020-0190(97)00224-X. Google Scholar [5] S. Chechik and D. Peleg, The fault-tolerant capacitated k-center problem, Theoretical Computer Science, 566 (2015), 12-25. doi: 10.1016/j.tcs.2014.11.017. Google Scholar [6] M. Cygan, M. T. Hajiaghayi and S. Khuller, LP rounding for k-centers with non-uniform hard capacities, Proceedings of the fifity-third Annual Symposium on Foundations of Computer Science (FOCS). IEEE, (2012), 273–282. Google Scholar [7] S. B. David, A. Borodin, R. Karp, G. Tardos and A. Wigderson, On the power of randomization in on-line algorithms, Algorithmica, 11 (1994), 2-14. doi: 10.1007/BF01294260. Google Scholar [8] T. Feder and D. Greene, Optimal algorithms for approximate clustering, Proceedings of the twentieth Annual ACM Symposium on Theory of Computing, ACM, (1988), 434–444. doi: 10.1145/62212.62255. Google Scholar [9] C. G. Fernandes, S. P. de Paula and L. L. C. Pedrosa, Improved approximation algorithms for capacitated fault-tolerant k-center, Algorithmica, 80 (2018), 1041-1072. Google Scholar [10] T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theoretical Computer Science, 38 (1985), 293-306. doi: 10.1016/0304-3975(85)90224-5. Google Scholar [11] D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the k-center problem, Mathematics of Operations Research, 10 (1985), 180-184. doi: 10.1287/moor.10.2.180. Google Scholar [12] W. L. Hsu and G. L. Nemhauser, Easy and hard bottleneck location problems, Discrete Applied Mathematics, 1 (1979), 209-215. doi: 10.1016/0166-218X(79)90044-1. Google Scholar [13] S. Khuller, R. Pless and Y. J. Sussmann, Fault tolerant k-center problems, Theoretical Computer Science, 242 (2000), 237-245. doi: 10.1016/S0304-3975(98)00222-9. Google Scholar [14] S. Khuller and Y. J. Sussmann, The capacitated k-center problem, SIAM Journal on Discrete Mathematics, 13 (2000), 403-418. doi: 10.1137/S0895480197329776. Google Scholar [15] S. O. Krumke, On a generalization of the p-center problem, Information Processing Letters, 56 (1995), 67-71. doi: 10.1016/0020-0190(95)00141-X. Google Scholar [16] E. Liberty, R. Sriharsha and M. Sviridenko, An algorithm for online k-means clustering, Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), Society for Industrial and Applied Mathematics, (2016), 81–89. doi: 10.1137/1.9781611974317.7. Google Scholar [17] R. G. Michael and S. J. David, Computers and intractability: A guide to the theory of NP-completeness, W. H. Freeman, San Francisco, (1979), 90-91. Google Scholar

show all references

##### References:
 [1] D. Achlioptas, M. Chrobak and J. Noga, Competitive analysis of randomized paging algorithms, Theoretical Computer Science, 234 (2000), 203-218. doi: 10.1016/S0304-3975(98)00116-9. Google Scholar [2] J. Bar-Ilan, G. Kortsarz and D. Peleg, How to allocate network centers, Journal of Algorithms, 15 (1993), 385-415. doi: 10.1006/jagm.1993.1047. Google Scholar [3] M. Charikar, C. Chekuri, T. Feder and R. Motwani, Incremental clustering and dynamic information retrieval, SIAM Journal on Computing, 33 (2004), 1417-1440. doi: 10.1137/S0097539702418498. Google Scholar [4] S. Chaudhuri, N. Garg and R. Ravi, The p-neighbor k-center problem, Information Processing Letters, 65 (1998), 131-134. doi: 10.1016/S0020-0190(97)00224-X. Google Scholar [5] S. Chechik and D. Peleg, The fault-tolerant capacitated k-center problem, Theoretical Computer Science, 566 (2015), 12-25. doi: 10.1016/j.tcs.2014.11.017. Google Scholar [6] M. Cygan, M. T. Hajiaghayi and S. Khuller, LP rounding for k-centers with non-uniform hard capacities, Proceedings of the fifity-third Annual Symposium on Foundations of Computer Science (FOCS). IEEE, (2012), 273–282. Google Scholar [7] S. B. David, A. Borodin, R. Karp, G. Tardos and A. Wigderson, On the power of randomization in on-line algorithms, Algorithmica, 11 (1994), 2-14. doi: 10.1007/BF01294260. Google Scholar [8] T. Feder and D. Greene, Optimal algorithms for approximate clustering, Proceedings of the twentieth Annual ACM Symposium on Theory of Computing, ACM, (1988), 434–444. doi: 10.1145/62212.62255. Google Scholar [9] C. G. Fernandes, S. P. de Paula and L. L. C. Pedrosa, Improved approximation algorithms for capacitated fault-tolerant k-center, Algorithmica, 80 (2018), 1041-1072. Google Scholar [10] T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theoretical Computer Science, 38 (1985), 293-306. doi: 10.1016/0304-3975(85)90224-5. Google Scholar [11] D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the k-center problem, Mathematics of Operations Research, 10 (1985), 180-184. doi: 10.1287/moor.10.2.180. Google Scholar [12] W. L. Hsu and G. L. Nemhauser, Easy and hard bottleneck location problems, Discrete Applied Mathematics, 1 (1979), 209-215. doi: 10.1016/0166-218X(79)90044-1. Google Scholar [13] S. Khuller, R. Pless and Y. J. Sussmann, Fault tolerant k-center problems, Theoretical Computer Science, 242 (2000), 237-245. doi: 10.1016/S0304-3975(98)00222-9. Google Scholar [14] S. Khuller and Y. J. Sussmann, The capacitated k-center problem, SIAM Journal on Discrete Mathematics, 13 (2000), 403-418. doi: 10.1137/S0895480197329776. Google Scholar [15] S. O. Krumke, On a generalization of the p-center problem, Information Processing Letters, 56 (1995), 67-71. doi: 10.1016/0020-0190(95)00141-X. Google Scholar [16] E. Liberty, R. Sriharsha and M. Sviridenko, An algorithm for online k-means clustering, Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), Society for Industrial and Applied Mathematics, (2016), 81–89. doi: 10.1137/1.9781611974317.7. Google Scholar [17] R. G. Michael and S. J. David, Computers and intractability: A guide to the theory of NP-completeness, W. H. Freeman, San Francisco, (1979), 90-91. Google Scholar
An example of a sequence of points $1, x^2, ..., x^{2t}$ coming over time. The circles with dotted lines are the clusters produced by an online algorithm. Solid ones are those produced by the optimal off-line algorithm
An illustration of partition of $S^*_i$ for any given $i\in[k]$
 [1] Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565 [2] Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems & Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025 [3] Gurkan Ozturk, Mehmet Tahir Ciftci. Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial & Management Optimization, 2015, 11 (3) : 921-932. doi: 10.3934/jimo.2015.11.921 [4] Jiping Tao, Ronghuan Huang, Tundong Liu. A $2.28$-competitive algorithm for online scheduling on identical machines. Journal of Industrial & Management Optimization, 2015, 11 (1) : 185-198. doi: 10.3934/jimo.2015.11.185 [5] Guojun Gan, Kun Chen. A soft subspace clustering algorithm with log-transformed distances. Big Data & Information Analytics, 2016, 1 (1) : 93-109. doi: 10.3934/bdia.2016.1.93 [6] Jiangchuan Fan, Xinyu Guo, Jianjun Du, Weiliang Wen, Xianju Lu, Brahmani Louiza. Analysis of the clustering fusion algorithm for multi-band color image. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1233-1249. doi: 10.3934/dcdss.2019085 [7] Aude Hofleitner, Tarek Rabbani, Mohammad Rafiee, Laurent El Ghaoui, Alex Bayen. Learning and estimation applications of an online homotopy algorithm for a generalization of the LASSO. Discrete & Continuous Dynamical Systems - S, 2014, 7 (3) : 503-523. doi: 10.3934/dcdss.2014.7.503 [8] Baoli Shi, Zhi-Feng Pang, Jing Xu. Image segmentation based on the hybrid total variation model and the K-means clustering strategy. Inverse Problems & Imaging, 2016, 10 (3) : 807-828. doi: 10.3934/ipi.2016022 [9] Gaidi Li, Zhen Wang, Dachuan Xu. An approximation algorithm for the $k$-level facility location problem with submodular penalties. Journal of Industrial & Management Optimization, 2012, 8 (3) : 521-529. doi: 10.3934/jimo.2012.8.521 [10] Jiping Tao, Zhijun Chao, Yugeng Xi. A semi-online algorithm and its competitive analysis for a single machine scheduling problem with bounded processing times. Journal of Industrial & Management Optimization, 2010, 6 (2) : 269-282. doi: 10.3934/jimo.2010.6.269 [11] Ran Ma, Jiping Tao. An improved 2.11-competitive algorithm for online scheduling on parallel machines to minimize total weighted completion time. Journal of Industrial & Management Optimization, 2018, 14 (2) : 497-510. doi: 10.3934/jimo.2017057 [12] Jie Shen, Jian Lv, Fang-Fang Guo, Ya-Li Gao, Rui Zhao. A new proximal chebychev center cutting plane algorithm for nonsmooth optimization and its convergence. Journal of Industrial & Management Optimization, 2018, 14 (3) : 1143-1155. doi: 10.3934/jimo.2018003 [13] Yongbin Ou, Cun-Quan Zhang. A new multimembership clustering method. Journal of Industrial & Management Optimization, 2007, 3 (4) : 619-624. doi: 10.3934/jimo.2007.3.619 [14] Petteri Piiroinen, Martin Simon. Probabilistic interpretation of the Calderón problem. Inverse Problems & Imaging, 2017, 11 (3) : 553-575. doi: 10.3934/ipi.2017026 [15] Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008 [16] Vitali Kapovitch, Anton Petrunin, Wilderich Tuschmann. On the torsion in the center conjecture. Electronic Research Announcements, 2018, 25: 27-35. doi: 10.3934/era.2018.25.004 [17] Keith Burns, Amie Wilkinson. Dynamical coherence and center bunching. Discrete & Continuous Dynamical Systems - A, 2008, 22 (1&2) : 89-100. doi: 10.3934/dcds.2008.22.89 [18] Camillo De Lellis, Emanuele Spadaro. Center manifold: A case study. Discrete & Continuous Dynamical Systems - A, 2011, 31 (4) : 1249-1272. doi: 10.3934/dcds.2011.31.1249 [19] Sarah Day; William D. Kalies; Konstantin Mischaikow and Thomas Wanner. Probabilistic and numerical validation of homology computations for nodal domains. Electronic Research Announcements, 2007, 13: 60-73. [20] Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2017, 2 (5) : 1-8. doi: 10.3934/bdia.2017020

2018 Impact Factor: 1.025