
-
Previous Article
Randomized learning of the second-moment matrix of a smooth function
- FoDS Home
- This Issue
-
Next Article
Modelling dynamic network evolution as a Pitman-Yor process
Power weighted shortest paths for clustering Euclidean data
1. | Department of Mathematics, University of California, Los Angeles, Los Angeles CA 90095, USA |
2. | Department of Mathematics, University of Michigan, Ann Arbor MI 48109, USA |
We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.
References:
[1] |
M. Alamgir and U. Von Luxburg, Shortest path distance in random k-nearest neighbor graphs, arXiv preprint, arXiv: 1206.6381. Google Scholar |
[2] |
A. Aldroubi, K. Hamm, A. Koku and A. Sekmen, Cur decompositions, similarity matrices, and subspace clustering, Front. Appl. Math. Stat., 4 (2019), p65.
doi: 10.3389/fams.2018.00065. |
[3] |
E. Arias-Castro,
Clustering based on pairwise distances when the data is of mixed dimensions, IEEE Transactions on Information Theory, 57 (2011), 1692-1706.
doi: 10.1109/TIT.2011.2104630. |
[4] |
R. Basri and D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002,218–233.
doi: 10.1109/ICCV.2001.937651. |
[5] |
J. L. Bentley,
Multidimensional binary search trees used for associative searching, Communications of the ACM, 18 (1975), 509-517.
doi: 10.1145/361002.361007. |
[6] |
A. Beygelzimer, S. Kakade and J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 97–104.
doi: 10.1145/1143844.1143857. |
[7] |
A. Bijral, N. Ratliff and N. Srebro, Semi-supervised learning with density based distances, in Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, AUAI Press, 2011, 43–50. Google Scholar |
[8] |
H. Chang and D.-Y. Yeung,
Robust path-based spectral clustering, Pattern Recognition, 41 (2008), 191-203.
doi: 10.1016/j.patcog.2007.04.010. |
[9] |
T. Chu, G. Miller and D. Sheehy, Exploration of a graph-based density sensitive metric, arXiv preprint, arXiv: 1709.07797. Google Scholar |
[10] |
R. Coifman and S. Lafon,
Diffusion maps, Applied and Computational Harmonic Analysis, 21 (2006), 5-30.
doi: 10.1016/j.acha.2006.04.006. |
[11] |
T. Cormen, C. Leiserson, R. Rivest and C. Stein, Introduction to Algorithms, MIT press, 2009.
![]() |
[12] |
J. Costeira and T. Kanade, A multibody factorization method for independently moving objects, International Journal of Computer Vision, 29 (1998), 159-179. Google Scholar |
[13] |
K. Diaz-Chito, A. Hernández-Sabaté and A. López,
A reduced feature set for driver head pose estimation, Applied Soft Computing, 45 (2016), 98-107.
doi: 10.1016/j.asoc.2016.04.027. |
[14] |
D. Dua and C. Graff, UCImachine learning repository, 2017, http://archive.ics.uci.edu/ml. Google Scholar |
[15] |
C. Fefferman, S. Mitter and H. Narayanan,
Testing the manifold hypothesis, Journal of the American Mathematical Society, 29 (2016), 983-1049.
doi: 10.1090/jams/852. |
[16] |
B. Fischer and J. Buhmann,
Path-based clustering for grouping of smooth curves and texture segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (2003), 513-518.
doi: 10.1109/TPAMI.2003.1190577. |
[17] |
S. Har-Peled, Computing the k nearest-neighbors for all vertices via dijkstra, arXiv preprint, arXiv: 1607.07818. Google Scholar |
[18] |
J. Ho, M.-H. Yang, J. Lim, K.-C. Lee and D. Kriegman, Clustering appearances of objects under varying illumination conditions, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, 11–18.
doi: 10.1109/CVPR.2003.1211332. |
[19] |
C. D. Howard and C. Newman,
Geodesics and spanning trees for euclidean first-passage percolation, Annals of Probability, 29 (2001), 577-623.
doi: 10.1214/aop/1008956685. |
[20] |
S. Hwang, S. Damelin and A. Hero Ⅲ,
Shortest path through random points, The Annals of Applied Probability, 26 (2016), 2791-2823.
doi: 10.1214/15-AAP1162. |
[21] |
M. Jacobs, E. Merkurjev and S. Esedoḡlu,
Auction dynamics: A volume constrained mbo scheme, Journal of Computational Physics, 354 (2018), 288-310.
doi: 10.1016/j.jcp.2017.10.036. |
[22] |
A. Little, M. Maggioni and J. Murphy, Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms, arXiv preprint, arXiv: 1712.06206. Google Scholar |
[23] |
A. Moscovich, A. Jaffe and B. Nadler, Minimax-optimal semi-supervised regression on unknown manifolds, in Artificial Intelligence and Statistics, 2017,933–942. Google Scholar |
[24] |
S. Nene, S. Nayar, H. Murase et al., Columbia object image library (coil-20). Google Scholar |
[25] |
A. Ng, M. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, 2002,849–856. Google Scholar |
[26] |
A. Orlitsky and Sajama, Estimating and computing density based distance metrics, in Proceedings of the 22nd International Conference on Machine Learning, ACM, 2005,760–767. Google Scholar |
[27] |
J. Tenenbaum, V. De Silva and J. Langford,
A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.
doi: 10.1126/science.290.5500.2319. |
[28] |
P. Vincent and Y. Bengio, Density-sensitive Metrics and Kernels, Snowbird Learning Workshop, 2003. Google Scholar |
[29] |
K. Yin and X.-C. Tai,
An effective region force for some variational models for learning and clustering, Journal of Scientific Computing, 74 (2018), 175-196.
doi: 10.1007/s10915-017-0429-4. |
[30] |
L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in Neural Information Processing Systems, 2005, 1601–1608. Google Scholar |
show all references
References:
[1] |
M. Alamgir and U. Von Luxburg, Shortest path distance in random k-nearest neighbor graphs, arXiv preprint, arXiv: 1206.6381. Google Scholar |
[2] |
A. Aldroubi, K. Hamm, A. Koku and A. Sekmen, Cur decompositions, similarity matrices, and subspace clustering, Front. Appl. Math. Stat., 4 (2019), p65.
doi: 10.3389/fams.2018.00065. |
[3] |
E. Arias-Castro,
Clustering based on pairwise distances when the data is of mixed dimensions, IEEE Transactions on Information Theory, 57 (2011), 1692-1706.
doi: 10.1109/TIT.2011.2104630. |
[4] |
R. Basri and D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002,218–233.
doi: 10.1109/ICCV.2001.937651. |
[5] |
J. L. Bentley,
Multidimensional binary search trees used for associative searching, Communications of the ACM, 18 (1975), 509-517.
doi: 10.1145/361002.361007. |
[6] |
A. Beygelzimer, S. Kakade and J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 97–104.
doi: 10.1145/1143844.1143857. |
[7] |
A. Bijral, N. Ratliff and N. Srebro, Semi-supervised learning with density based distances, in Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, AUAI Press, 2011, 43–50. Google Scholar |
[8] |
H. Chang and D.-Y. Yeung,
Robust path-based spectral clustering, Pattern Recognition, 41 (2008), 191-203.
doi: 10.1016/j.patcog.2007.04.010. |
[9] |
T. Chu, G. Miller and D. Sheehy, Exploration of a graph-based density sensitive metric, arXiv preprint, arXiv: 1709.07797. Google Scholar |
[10] |
R. Coifman and S. Lafon,
Diffusion maps, Applied and Computational Harmonic Analysis, 21 (2006), 5-30.
doi: 10.1016/j.acha.2006.04.006. |
[11] |
T. Cormen, C. Leiserson, R. Rivest and C. Stein, Introduction to Algorithms, MIT press, 2009.
![]() |
[12] |
J. Costeira and T. Kanade, A multibody factorization method for independently moving objects, International Journal of Computer Vision, 29 (1998), 159-179. Google Scholar |
[13] |
K. Diaz-Chito, A. Hernández-Sabaté and A. López,
A reduced feature set for driver head pose estimation, Applied Soft Computing, 45 (2016), 98-107.
doi: 10.1016/j.asoc.2016.04.027. |
[14] |
D. Dua and C. Graff, UCImachine learning repository, 2017, http://archive.ics.uci.edu/ml. Google Scholar |
[15] |
C. Fefferman, S. Mitter and H. Narayanan,
Testing the manifold hypothesis, Journal of the American Mathematical Society, 29 (2016), 983-1049.
doi: 10.1090/jams/852. |
[16] |
B. Fischer and J. Buhmann,
Path-based clustering for grouping of smooth curves and texture segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (2003), 513-518.
doi: 10.1109/TPAMI.2003.1190577. |
[17] |
S. Har-Peled, Computing the k nearest-neighbors for all vertices via dijkstra, arXiv preprint, arXiv: 1607.07818. Google Scholar |
[18] |
J. Ho, M.-H. Yang, J. Lim, K.-C. Lee and D. Kriegman, Clustering appearances of objects under varying illumination conditions, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, 11–18.
doi: 10.1109/CVPR.2003.1211332. |
[19] |
C. D. Howard and C. Newman,
Geodesics and spanning trees for euclidean first-passage percolation, Annals of Probability, 29 (2001), 577-623.
doi: 10.1214/aop/1008956685. |
[20] |
S. Hwang, S. Damelin and A. Hero Ⅲ,
Shortest path through random points, The Annals of Applied Probability, 26 (2016), 2791-2823.
doi: 10.1214/15-AAP1162. |
[21] |
M. Jacobs, E. Merkurjev and S. Esedoḡlu,
Auction dynamics: A volume constrained mbo scheme, Journal of Computational Physics, 354 (2018), 288-310.
doi: 10.1016/j.jcp.2017.10.036. |
[22] |
A. Little, M. Maggioni and J. Murphy, Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms, arXiv preprint, arXiv: 1712.06206. Google Scholar |
[23] |
A. Moscovich, A. Jaffe and B. Nadler, Minimax-optimal semi-supervised regression on unknown manifolds, in Artificial Intelligence and Statistics, 2017,933–942. Google Scholar |
[24] |
S. Nene, S. Nayar, H. Murase et al., Columbia object image library (coil-20). Google Scholar |
[25] |
A. Ng, M. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, 2002,849–856. Google Scholar |
[26] |
A. Orlitsky and Sajama, Estimating and computing density based distance metrics, in Proceedings of the 22nd International Conference on Machine Learning, ACM, 2005,760–767. Google Scholar |
[27] |
J. Tenenbaum, V. De Silva and J. Langford,
A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.
doi: 10.1126/science.290.5500.2319. |
[28] |
P. Vincent and Y. Bengio, Density-sensitive Metrics and Kernels, Snowbird Learning Workshop, 2003. Google Scholar |
[29] |
K. Yin and X.-C. Tai,
An effective region force for some variational models for learning and clustering, Journal of Scientific Computing, 74 (2018), 175-196.
doi: 10.1007/s10915-017-0429-4. |
[30] |
L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in Neural Information Processing Systems, 2005, 1601–1608. Google Scholar |



3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
[1] |
Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019 |
[2] |
San Ling, Buket Özkaya. New bounds on the minimum distance of cyclic codes. Advances in Mathematics of Communications, 2021, 15 (1) : 1-8. doi: 10.3934/amc.2020038 |
[3] |
Liping Tang, Ying Gao. Some properties of nonconvex oriented distance function and applications to vector optimization problems. Journal of Industrial & Management Optimization, 2021, 17 (1) : 485-500. doi: 10.3934/jimo.2020117 |
[4] |
Shahede Omidi, Jafar Fathali. Inverse single facility location problem on a tree with balancing on the distance of server to clients. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2021017 |
[5] |
Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 |
[6] |
Cheng Peng, Zhaohui Tang, Weihua Gui, Qing Chen, Jing He. A bidirectional weighted boundary distance algorithm for time series similarity computation based on optimized sliding window size. Journal of Industrial & Management Optimization, 2021, 17 (1) : 205-220. doi: 10.3934/jimo.2019107 |
[7] |
Kateřina Škardová, Tomáš Oberhuber, Jaroslav Tintěra, Radomír Chabiniok. Signed-distance function based non-rigid registration of image series with varying image intensity. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1145-1160. doi: 10.3934/dcdss.2020386 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]