-
Previous Article
Multistage optimal control for microbial fed-batch fermentation process
- JIMO Home
- This Issue
-
Next Article
Multiobjective mathematical models and solution approaches for heterogeneous fixed fleet vehicle routing problems
The approximation algorithm based on seeding method for functional $ k $-means problem†
1. | School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, China |
2. | Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China |
3. | Department of Operations Research and Information Engineering, Beijing University of Technology, Beijing 100124, China |
4. | School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China |
Different from the classical $ k $-means problem, the functional $ k $-means problem involves a kind of dynamic data, which is generated by continuous processes. In this paper, we mainly design an $ O(\ln\; k) $-approximation algorithm based on the seeding method for functional $ k $-means problem. Moreover, the numerical experiment presented shows that this algorithm is more efficient than the functional $ k $-means clustering algorithm.
References:
[1] |
C. Abraham, P. A. Cornillon, E. Matzner-Løber and N. Molinari,
Unsupervised curve clustering using B-splines, Scandinavian Journal of Statistics, 30 (2003), 581-595.
doi: 10.1111/1467-9469.00350. |
[2] |
S. Ahmadian, A. Norouzi-Fard, O. Svensson and J. Ward, Better guarantees for $k$-means and Euclidean $k$-median by primal-dual algorithms, SIAM Journal on Computing, (2019), FOCS17-97–FOCS17-156.
doi: 10.1137/18M1171321. |
[3] |
D. Aloise, A. Deshpande, P. Hansen and P. Popat,
NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75 (2009), 245-248.
doi: 10.1007/s10994-009-5103-0. |
[4] |
D. Arthur and S. Vassilvitskii, $K$-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2007), 1027–1035. |
[5] |
M. Boullé, Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognition, 45 (2012), 4389-4401. Google Scholar |
[6] |
C. Bouveyron and C. Brunet-Saumard,
Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, 71 (2014), 52-78.
doi: 10.1016/j.csda.2012.12.008. |
[7] |
R. Gamasaee and M. Zarandi,
A new dirichlet process for mining dynamic patterns in functional data, Information Sciences, 405 (2017), 55-80.
doi: 10.1016/j.ins.2017.04.008. |
[8] |
S. Har-Peled and B. Sadri,
How fast is the $k$-means method?, Algorithmica, 41 (2005), 185-202.
doi: 10.1007/s00453-004-1127-9. |
[9] |
J. Jacques and C. Preda,
Functional data clustering: A survey, Advances in Data Analysis and Classification, 8 (2014), 231-255.
doi: 10.1007/s11634-013-0158-y. |
[10] |
S. Ji, D. Xu, L. Guo, M. Li and D. Zhang, The seeding algorithm for spherical $k$-means clustering with penalties, Journal of Combinatorial Optimization, 2020.
doi: 10.1007/s10878-020-00569-1. |
[11] |
M. Kayano, K. Dozono and S. Konishi,
Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, Journal of Classification, 27 (2010), 211-230.
doi: 10.1007/s00357-010-9054-8. |
[12] |
M. Li, The bi-criteria seeding algorithms for two variants of $k$-means problem, Journal of Combinatorial Optimization, 2020.
doi: 10.1007/s10878-020-00537-9. |
[13] |
M. Li, D. Xu, J. Yue, D. Zhang and P. Zhang,
The seeding algorithm for $k$-means problem with penalties, Journal of Combinatorial Optimization, 39 (2020), 15-32.
doi: 10.1007/s10878-019-00450-w. |
[14] |
S. Lloyd,
Least squares quantization in PCM, IEEE Transactions on Information Theory, 28 (1982), 129-137.
doi: 10.1109/TIT.1982.1056489. |
[15] |
Y. Meng, J. Liang, F. Cao and Y. He,
A new distance with derivative information for functional $k$-means clustering algorithm, Information Sciences, 463/464 (2018), 166-185.
doi: 10.1016/j.ins.2018.06.035. |
[16] |
R. Ostrovsky, Y. Rabani, L. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the $k$-means problem, Journal of the ACM, 59 (2012), 28: 1–28: 22.
doi: 10.1145/2395116.2395117. |
[17] |
G. Ozturk and M. Ciftci,
Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial & Management Optimization, 11 (2015), 921-932.
doi: 10.3934/jimo.2015.11.921. |
[18] |
J. Park and J. Ahn,
Clustering multivariate functional data with phase variation, Biometrics, 73 (2017), 324-333.
doi: 10.1111/biom.12546. |
[19] |
J. Peng and H. G. Müller,
Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, The Annals of Applied Statistics, 2 (2008), 1056-1077.
doi: 10.1214/08-AOAS172. |
[20] |
C. Preda, G. Saporta and C. Lévéder,
PLS classification of functional data, Computational Statistics, 22 (2007), 223-235.
doi: 10.1007/s00180-007-0041-4. |
[21] |
T. Tarpey and K. K. Kinateder,
Clustering functional data, Journal of Classification, 20 (2003), 93-114.
doi: 10.1007/s00357-003-0007-3. |
[22] |
D. Wei, A constant-factor bi-criteria approximation guarantee for $k$-means++, Proceedings of the Thirtieth International Conference on Neural Information Processing Systems, (2016), 604–612. Google Scholar |
[23] |
X. Wu, V. Kumar, J. Quinlan, J. Ross Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P.S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand and D. Steinberg,
Top 10 algorithms in data mining, Knowledge and Information Systems, 14 (2008), 1-37.
doi: 10.1007/s10115-007-0114-2. |
show all references
References:
[1] |
C. Abraham, P. A. Cornillon, E. Matzner-Løber and N. Molinari,
Unsupervised curve clustering using B-splines, Scandinavian Journal of Statistics, 30 (2003), 581-595.
doi: 10.1111/1467-9469.00350. |
[2] |
S. Ahmadian, A. Norouzi-Fard, O. Svensson and J. Ward, Better guarantees for $k$-means and Euclidean $k$-median by primal-dual algorithms, SIAM Journal on Computing, (2019), FOCS17-97–FOCS17-156.
doi: 10.1137/18M1171321. |
[3] |
D. Aloise, A. Deshpande, P. Hansen and P. Popat,
NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75 (2009), 245-248.
doi: 10.1007/s10994-009-5103-0. |
[4] |
D. Arthur and S. Vassilvitskii, $K$-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2007), 1027–1035. |
[5] |
M. Boullé, Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognition, 45 (2012), 4389-4401. Google Scholar |
[6] |
C. Bouveyron and C. Brunet-Saumard,
Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, 71 (2014), 52-78.
doi: 10.1016/j.csda.2012.12.008. |
[7] |
R. Gamasaee and M. Zarandi,
A new dirichlet process for mining dynamic patterns in functional data, Information Sciences, 405 (2017), 55-80.
doi: 10.1016/j.ins.2017.04.008. |
[8] |
S. Har-Peled and B. Sadri,
How fast is the $k$-means method?, Algorithmica, 41 (2005), 185-202.
doi: 10.1007/s00453-004-1127-9. |
[9] |
J. Jacques and C. Preda,
Functional data clustering: A survey, Advances in Data Analysis and Classification, 8 (2014), 231-255.
doi: 10.1007/s11634-013-0158-y. |
[10] |
S. Ji, D. Xu, L. Guo, M. Li and D. Zhang, The seeding algorithm for spherical $k$-means clustering with penalties, Journal of Combinatorial Optimization, 2020.
doi: 10.1007/s10878-020-00569-1. |
[11] |
M. Kayano, K. Dozono and S. Konishi,
Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, Journal of Classification, 27 (2010), 211-230.
doi: 10.1007/s00357-010-9054-8. |
[12] |
M. Li, The bi-criteria seeding algorithms for two variants of $k$-means problem, Journal of Combinatorial Optimization, 2020.
doi: 10.1007/s10878-020-00537-9. |
[13] |
M. Li, D. Xu, J. Yue, D. Zhang and P. Zhang,
The seeding algorithm for $k$-means problem with penalties, Journal of Combinatorial Optimization, 39 (2020), 15-32.
doi: 10.1007/s10878-019-00450-w. |
[14] |
S. Lloyd,
Least squares quantization in PCM, IEEE Transactions on Information Theory, 28 (1982), 129-137.
doi: 10.1109/TIT.1982.1056489. |
[15] |
Y. Meng, J. Liang, F. Cao and Y. He,
A new distance with derivative information for functional $k$-means clustering algorithm, Information Sciences, 463/464 (2018), 166-185.
doi: 10.1016/j.ins.2018.06.035. |
[16] |
R. Ostrovsky, Y. Rabani, L. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the $k$-means problem, Journal of the ACM, 59 (2012), 28: 1–28: 22.
doi: 10.1145/2395116.2395117. |
[17] |
G. Ozturk and M. Ciftci,
Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial & Management Optimization, 11 (2015), 921-932.
doi: 10.3934/jimo.2015.11.921. |
[18] |
J. Park and J. Ahn,
Clustering multivariate functional data with phase variation, Biometrics, 73 (2017), 324-333.
doi: 10.1111/biom.12546. |
[19] |
J. Peng and H. G. Müller,
Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, The Annals of Applied Statistics, 2 (2008), 1056-1077.
doi: 10.1214/08-AOAS172. |
[20] |
C. Preda, G. Saporta and C. Lévéder,
PLS classification of functional data, Computational Statistics, 22 (2007), 223-235.
doi: 10.1007/s00180-007-0041-4. |
[21] |
T. Tarpey and K. K. Kinateder,
Clustering functional data, Journal of Classification, 20 (2003), 93-114.
doi: 10.1007/s00357-003-0007-3. |
[22] |
D. Wei, A constant-factor bi-criteria approximation guarantee for $k$-means++, Proceedings of the Thirtieth International Conference on Neural Information Processing Systems, (2016), 604–612. Google Scholar |
[23] |
X. Wu, V. Kumar, J. Quinlan, J. Ross Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P.S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand and D. Steinberg,
Top 10 algorithms in data mining, Knowledge and Information Systems, 14 (2008), 1-37.
doi: 10.1007/s10115-007-0114-2. |
Notations | Meaning of symbols |
$x_i(t)$ or $x_i^j(t)$ | functional curve |
$X(t)$ or $X^i(t)$ or $C^i(t)$ | $d$-dimensional functional sample with functional curves as its components |
$\mathfrak{F}^d(t)$ | set of $d$-dimensional functional samples |
$\Gamma(t)$ or $\Delta(t)$ or $C(t)$ | subset of $\mathfrak{F}^d(t)$ |
$\mu(\Gamma(t))$ | center of mass of functional samples of $\Gamma(t)$ |
$d(X^i(t), X^j(t))$ | distance between the functional samples $X^i(t)$ and $X^j(t)$ |
$d(X^i(t), \Gamma(t))$ | distance from the functional sample $X^i(t)$ to the subset of functional samples $\Gamma(t)$ |
$X(t)_{\Gamma(t)}$ | the closest functional sample in $\Gamma(t)$ to the functional sample $X(t)$ |
$\Phi(\Gamma(t), C(t))$ | potential function of $\Gamma(t)$ over $C(t)$ |
$\Gamma_{C(t)}^i(t)$ | the $i$-th cluster of $\Gamma(t)$ with respect to $C(t)$ |
Notations | Meaning of symbols |
$x_i(t)$ or $x_i^j(t)$ | functional curve |
$X(t)$ or $X^i(t)$ or $C^i(t)$ | $d$-dimensional functional sample with functional curves as its components |
$\mathfrak{F}^d(t)$ | set of $d$-dimensional functional samples |
$\Gamma(t)$ or $\Delta(t)$ or $C(t)$ | subset of $\mathfrak{F}^d(t)$ |
$\mu(\Gamma(t))$ | center of mass of functional samples of $\Gamma(t)$ |
$d(X^i(t), X^j(t))$ | distance between the functional samples $X^i(t)$ and $X^j(t)$ |
$d(X^i(t), \Gamma(t))$ | distance from the functional sample $X^i(t)$ to the subset of functional samples $\Gamma(t)$ |
$X(t)_{\Gamma(t)}$ | the closest functional sample in $\Gamma(t)$ to the functional sample $X(t)$ |
$\Phi(\Gamma(t), C(t))$ | potential function of $\Gamma(t)$ over $C(t)$ |
$\Gamma_{C(t)}^i(t)$ | the $i$-th cluster of $\Gamma(t)$ with respect to $C(t)$ |
Data Set | Method | ARI | DBI | Initial Cost | Returned Cost | Time (s) |
Simudata | SeedAlg | 0.7919 | 0.8652 | 3338 | 1689 | 58 |
FuncAlg | 0.7975 | 0.8656 | 5465 | 1689 | 61 | |
Sdata | SeedAlg | 0.6651 | 1.1385 | 1667 | 712 | 192 |
FuncAlg | 0.6561 | 1.1384 | 2485 | 720 | 217 |
Data Set | Method | ARI | DBI | Initial Cost | Returned Cost | Time (s) |
Simudata | SeedAlg | 0.7919 | 0.8652 | 3338 | 1689 | 58 |
FuncAlg | 0.7975 | 0.8656 | 5465 | 1689 | 61 | |
Sdata | SeedAlg | 0.6651 | 1.1385 | 1667 | 712 | 192 |
FuncAlg | 0.6561 | 1.1384 | 2485 | 720 | 217 |
[1] |
J. Frédéric Bonnans, Justina Gianatti, Francisco J. Silva. On the convergence of the Sakawa-Shindo algorithm in stochastic control. Mathematical Control & Related Fields, 2016, 6 (3) : 391-406. doi: 10.3934/mcrf.2016008 |
[2] |
Demetres D. Kouvatsos, Jumma S. Alanazi, Kevin Smith. A unified ME algorithm for arbitrary open QNMs with mixed blocking mechanisms. Numerical Algebra, Control & Optimization, 2011, 1 (4) : 781-816. doi: 10.3934/naco.2011.1.781 |
[3] |
Ralf Hielscher, Michael Quellmalz. Reconstructing a function on the sphere from its means along vertical slices. Inverse Problems & Imaging, 2016, 10 (3) : 711-739. doi: 10.3934/ipi.2016018 |
[4] |
Davide La Torre, Simone Marsiglio, Franklin Mendivil, Fabio Privileggi. Public debt dynamics under ambiguity by means of iterated function systems on density functions. Discrete & Continuous Dynamical Systems - B, 2021 doi: 10.3934/dcdsb.2021070 |
[5] |
Ka Luen Cheung, Man Chun Leung. Asymptotic behavior of positive solutions of the equation $ \Delta u + K u^{\frac{n+2}{n-2}} = 0$ in $IR^n$ and positive scalar curvature. Conference Publications, 2001, 2001 (Special) : 109-120. doi: 10.3934/proc.2001.2001.109 |
[6] |
Wen-Bin Yang, Yan-Ling Li, Jianhua Wu, Hai-Xia Li. Dynamics of a food chain model with ratio-dependent and modified Leslie-Gower functional responses. Discrete & Continuous Dynamical Systems - B, 2015, 20 (7) : 2269-2290. doi: 10.3934/dcdsb.2015.20.2269 |
[7] |
Zhihua Zhang, Naoki Saito. PHLST with adaptive tiling and its application to antarctic remote sensing image approximation. Inverse Problems & Imaging, 2014, 8 (1) : 321-337. doi: 10.3934/ipi.2014.8.321 |
[8] |
Luke Finlay, Vladimir Gaitsgory, Ivan Lebedev. Linear programming solutions of periodic optimization problems: approximation of the optimal control. Journal of Industrial & Management Optimization, 2007, 3 (2) : 399-413. doi: 10.3934/jimo.2007.3.399 |
[9] |
Xianming Liu, Guangyue Han. A Wong-Zakai approximation of stochastic differential equations driven by a general semimartingale. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2499-2508. doi: 10.3934/dcdsb.2020192 |
2019 Impact Factor: 1.366
Tools
Metrics
Other articles
by authors
[Back to Top]