
-
Previous Article
Randomized learning of the second-moment matrix of a smooth function
- FoDS Home
- This Issue
-
Next Article
Modelling dynamic network evolution as a Pitman-Yor process
Power weighted shortest paths for clustering Euclidean data
1. | Department of Mathematics, University of California, Los Angeles, Los Angeles CA 90095, USA |
2. | Department of Mathematics, University of Michigan, Ann Arbor MI 48109, USA |
We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.
References:
[1] |
M. Alamgir and U. Von Luxburg, Shortest path distance in random k-nearest neighbor graphs, arXiv preprint, arXiv: 1206.6381. Google Scholar |
[2] |
A. Aldroubi, K. Hamm, A. Koku and A. Sekmen, Cur decompositions, similarity matrices, and subspace clustering, Front. Appl. Math. Stat., 4 (2019), p65.
doi: 10.3389/fams.2018.00065. |
[3] |
E. Arias-Castro,
Clustering based on pairwise distances when the data is of mixed dimensions, IEEE Transactions on Information Theory, 57 (2011), 1692-1706.
doi: 10.1109/TIT.2011.2104630. |
[4] |
R. Basri and D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002,218–233.
doi: 10.1109/ICCV.2001.937651. |
[5] |
J. L. Bentley,
Multidimensional binary search trees used for associative searching, Communications of the ACM, 18 (1975), 509-517.
doi: 10.1145/361002.361007. |
[6] |
A. Beygelzimer, S. Kakade and J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 97–104.
doi: 10.1145/1143844.1143857. |
[7] |
A. Bijral, N. Ratliff and N. Srebro, Semi-supervised learning with density based distances, in Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, AUAI Press, 2011, 43–50. Google Scholar |
[8] |
H. Chang and D.-Y. Yeung,
Robust path-based spectral clustering, Pattern Recognition, 41 (2008), 191-203.
doi: 10.1016/j.patcog.2007.04.010. |
[9] |
T. Chu, G. Miller and D. Sheehy, Exploration of a graph-based density sensitive metric, arXiv preprint, arXiv: 1709.07797. Google Scholar |
[10] |
R. Coifman and S. Lafon,
Diffusion maps, Applied and Computational Harmonic Analysis, 21 (2006), 5-30.
doi: 10.1016/j.acha.2006.04.006. |
[11] |
T. Cormen, C. Leiserson, R. Rivest and C. Stein, Introduction to Algorithms, MIT press, 2009.
![]() |
[12] |
J. Costeira and T. Kanade, A multibody factorization method for independently moving objects, International Journal of Computer Vision, 29 (1998), 159-179. Google Scholar |
[13] |
K. Diaz-Chito, A. Hernández-Sabaté and A. López,
A reduced feature set for driver head pose estimation, Applied Soft Computing, 45 (2016), 98-107.
doi: 10.1016/j.asoc.2016.04.027. |
[14] |
D. Dua and C. Graff, UCImachine learning repository, 2017, http://archive.ics.uci.edu/ml. Google Scholar |
[15] |
C. Fefferman, S. Mitter and H. Narayanan,
Testing the manifold hypothesis, Journal of the American Mathematical Society, 29 (2016), 983-1049.
doi: 10.1090/jams/852. |
[16] |
B. Fischer and J. Buhmann,
Path-based clustering for grouping of smooth curves and texture segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (2003), 513-518.
doi: 10.1109/TPAMI.2003.1190577. |
[17] |
S. Har-Peled, Computing the k nearest-neighbors for all vertices via dijkstra, arXiv preprint, arXiv: 1607.07818. Google Scholar |
[18] |
J. Ho, M.-H. Yang, J. Lim, K.-C. Lee and D. Kriegman, Clustering appearances of objects under varying illumination conditions, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, 11–18.
doi: 10.1109/CVPR.2003.1211332. |
[19] |
C. D. Howard and C. Newman,
Geodesics and spanning trees for euclidean first-passage percolation, Annals of Probability, 29 (2001), 577-623.
doi: 10.1214/aop/1008956685. |
[20] |
S. Hwang, S. Damelin and A. Hero Ⅲ,
Shortest path through random points, The Annals of Applied Probability, 26 (2016), 2791-2823.
doi: 10.1214/15-AAP1162. |
[21] |
M. Jacobs, E. Merkurjev and S. Esedoḡlu,
Auction dynamics: A volume constrained mbo scheme, Journal of Computational Physics, 354 (2018), 288-310.
doi: 10.1016/j.jcp.2017.10.036. |
[22] |
A. Little, M. Maggioni and J. Murphy, Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms, arXiv preprint, arXiv: 1712.06206. Google Scholar |
[23] |
A. Moscovich, A. Jaffe and B. Nadler, Minimax-optimal semi-supervised regression on unknown manifolds, in Artificial Intelligence and Statistics, 2017,933–942. Google Scholar |
[24] |
S. Nene, S. Nayar, H. Murase et al., Columbia object image library (coil-20). Google Scholar |
[25] |
A. Ng, M. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, 2002,849–856. Google Scholar |
[26] |
A. Orlitsky and Sajama, Estimating and computing density based distance metrics, in Proceedings of the 22nd International Conference on Machine Learning, ACM, 2005,760–767. Google Scholar |
[27] |
J. Tenenbaum, V. De Silva and J. Langford,
A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.
doi: 10.1126/science.290.5500.2319. |
[28] |
P. Vincent and Y. Bengio, Density-sensitive Metrics and Kernels, Snowbird Learning Workshop, 2003. Google Scholar |
[29] |
K. Yin and X.-C. Tai,
An effective region force for some variational models for learning and clustering, Journal of Scientific Computing, 74 (2018), 175-196.
doi: 10.1007/s10915-017-0429-4. |
[30] |
L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in Neural Information Processing Systems, 2005, 1601–1608. Google Scholar |
show all references
References:
[1] |
M. Alamgir and U. Von Luxburg, Shortest path distance in random k-nearest neighbor graphs, arXiv preprint, arXiv: 1206.6381. Google Scholar |
[2] |
A. Aldroubi, K. Hamm, A. Koku and A. Sekmen, Cur decompositions, similarity matrices, and subspace clustering, Front. Appl. Math. Stat., 4 (2019), p65.
doi: 10.3389/fams.2018.00065. |
[3] |
E. Arias-Castro,
Clustering based on pairwise distances when the data is of mixed dimensions, IEEE Transactions on Information Theory, 57 (2011), 1692-1706.
doi: 10.1109/TIT.2011.2104630. |
[4] |
R. Basri and D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002,218–233.
doi: 10.1109/ICCV.2001.937651. |
[5] |
J. L. Bentley,
Multidimensional binary search trees used for associative searching, Communications of the ACM, 18 (1975), 509-517.
doi: 10.1145/361002.361007. |
[6] |
A. Beygelzimer, S. Kakade and J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 97–104.
doi: 10.1145/1143844.1143857. |
[7] |
A. Bijral, N. Ratliff and N. Srebro, Semi-supervised learning with density based distances, in Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, AUAI Press, 2011, 43–50. Google Scholar |
[8] |
H. Chang and D.-Y. Yeung,
Robust path-based spectral clustering, Pattern Recognition, 41 (2008), 191-203.
doi: 10.1016/j.patcog.2007.04.010. |
[9] |
T. Chu, G. Miller and D. Sheehy, Exploration of a graph-based density sensitive metric, arXiv preprint, arXiv: 1709.07797. Google Scholar |
[10] |
R. Coifman and S. Lafon,
Diffusion maps, Applied and Computational Harmonic Analysis, 21 (2006), 5-30.
doi: 10.1016/j.acha.2006.04.006. |
[11] |
T. Cormen, C. Leiserson, R. Rivest and C. Stein, Introduction to Algorithms, MIT press, 2009.
![]() |
[12] |
J. Costeira and T. Kanade, A multibody factorization method for independently moving objects, International Journal of Computer Vision, 29 (1998), 159-179. Google Scholar |
[13] |
K. Diaz-Chito, A. Hernández-Sabaté and A. López,
A reduced feature set for driver head pose estimation, Applied Soft Computing, 45 (2016), 98-107.
doi: 10.1016/j.asoc.2016.04.027. |
[14] |
D. Dua and C. Graff, UCImachine learning repository, 2017, http://archive.ics.uci.edu/ml. Google Scholar |
[15] |
C. Fefferman, S. Mitter and H. Narayanan,
Testing the manifold hypothesis, Journal of the American Mathematical Society, 29 (2016), 983-1049.
doi: 10.1090/jams/852. |
[16] |
B. Fischer and J. Buhmann,
Path-based clustering for grouping of smooth curves and texture segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (2003), 513-518.
doi: 10.1109/TPAMI.2003.1190577. |
[17] |
S. Har-Peled, Computing the k nearest-neighbors for all vertices via dijkstra, arXiv preprint, arXiv: 1607.07818. Google Scholar |
[18] |
J. Ho, M.-H. Yang, J. Lim, K.-C. Lee and D. Kriegman, Clustering appearances of objects under varying illumination conditions, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, 11–18.
doi: 10.1109/CVPR.2003.1211332. |
[19] |
C. D. Howard and C. Newman,
Geodesics and spanning trees for euclidean first-passage percolation, Annals of Probability, 29 (2001), 577-623.
doi: 10.1214/aop/1008956685. |
[20] |
S. Hwang, S. Damelin and A. Hero Ⅲ,
Shortest path through random points, The Annals of Applied Probability, 26 (2016), 2791-2823.
doi: 10.1214/15-AAP1162. |
[21] |
M. Jacobs, E. Merkurjev and S. Esedoḡlu,
Auction dynamics: A volume constrained mbo scheme, Journal of Computational Physics, 354 (2018), 288-310.
doi: 10.1016/j.jcp.2017.10.036. |
[22] |
A. Little, M. Maggioni and J. Murphy, Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms, arXiv preprint, arXiv: 1712.06206. Google Scholar |
[23] |
A. Moscovich, A. Jaffe and B. Nadler, Minimax-optimal semi-supervised regression on unknown manifolds, in Artificial Intelligence and Statistics, 2017,933–942. Google Scholar |
[24] |
S. Nene, S. Nayar, H. Murase et al., Columbia object image library (coil-20). Google Scholar |
[25] |
A. Ng, M. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, 2002,849–856. Google Scholar |
[26] |
A. Orlitsky and Sajama, Estimating and computing density based distance metrics, in Proceedings of the 22nd International Conference on Machine Learning, ACM, 2005,760–767. Google Scholar |
[27] |
J. Tenenbaum, V. De Silva and J. Langford,
A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.
doi: 10.1126/science.290.5500.2319. |
[28] |
P. Vincent and Y. Bengio, Density-sensitive Metrics and Kernels, Snowbird Learning Workshop, 2003. Google Scholar |
[29] |
K. Yin and X.-C. Tai,
An effective region force for some variational models for learning and clustering, Journal of Scientific Computing, 74 (2018), 175-196.
doi: 10.1007/s10915-017-0429-4. |
[30] |
L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in Neural Information Processing Systems, 2005, 1601–1608. Google Scholar |



3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
3 Lines | |||||
3 Moons | |||||
3 Circles | |||||
- |
[1] |
Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010 |
[2] |
Leonor Cruzeiro. The VES hypothesis and protein misfolding. Discrete & Continuous Dynamical Systems - S, 2011, 4 (5) : 1033-1046. doi: 10.3934/dcdss.2011.4.1033 |
[3] |
Yongbin Ou, Cun-Quan Zhang. A new multimembership clustering method. Journal of Industrial & Management Optimization, 2007, 3 (4) : 619-624. doi: 10.3934/jimo.2007.3.619 |
[4] |
Martin Frank, Armin Fügenschuh, Michael Herty, Lars Schewe. The coolest path problem. Networks & Heterogeneous Media, 2010, 5 (1) : 143-162. doi: 10.3934/nhm.2010.5.143 |
[5] |
José M. Arrieta, Esperanza Santamaría. Estimates on the distance of inertial manifolds. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 3921-3944. doi: 10.3934/dcds.2014.34.3921 |
[6] |
Liliana Trejo-Valencia, Edgardo Ugalde. Projective distance and $g$-measures. Discrete & Continuous Dynamical Systems - B, 2015, 20 (10) : 3565-3579. doi: 10.3934/dcdsb.2015.20.3565 |
[7] |
Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565 |
[8] |
Dominique Duncan, Thomas Strohmer. Classification of Alzheimer's disease using unsupervised diffusion component analysis. Mathematical Biosciences & Engineering, 2016, 13 (6) : 1119-1130. doi: 10.3934/mbe.2016033 |
[9] |
Maria Gabriella Mosquera, Vlado Keselj. Identifying electronic gaming machine gambling personae through unsupervised session classification. Big Data & Information Analytics, 2017, 2 (2) : 141-175. doi: 10.3934/bdia.2017015 |
[10] |
Hong-Kun Zhang. Free path of billiards with flat points. Discrete & Continuous Dynamical Systems - A, 2012, 32 (12) : 4445-4466. doi: 10.3934/dcds.2012.32.4445 |
[11] |
Matthias Gerdts, René Henrion, Dietmar Hömberg, Chantal Landry. Path planning and collision avoidance for robots. Numerical Algebra, Control & Optimization, 2012, 2 (3) : 437-463. doi: 10.3934/naco.2012.2.437 |
[12] |
Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics & Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117 |
[13] |
Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems & Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901 |
[14] |
Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete & Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17 |
[15] |
Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012 |
[16] |
E. Camouzis, H. Kollias, I. Leventides. Stable manifold market sequences. Journal of Dynamics & Games, 2018, 5 (2) : 165-185. doi: 10.3934/jdg.2018010 |
[17] |
Camillo De Lellis, Emanuele Spadaro. Center manifold: A case study. Discrete & Continuous Dynamical Systems - A, 2011, 31 (4) : 1249-1272. doi: 10.3934/dcds.2011.31.1249 |
[18] |
Zhiguo Feng, Ka-Fai Cedric Yiu. Manifold relaxations for integer programming. Journal of Industrial & Management Optimization, 2014, 10 (2) : 557-566. doi: 10.3934/jimo.2014.10.557 |
[19] |
Changbing Hu, Roger Temam, Mohammed Ziane. The primitive equations on the large scale ocean under the small depth hypothesis. Discrete & Continuous Dynamical Systems - A, 2003, 9 (1) : 97-131. doi: 10.3934/dcds.2003.9.97 |
[20] |
Shinsuke Koyama, Lubomir Kostal. The effect of interspike interval statistics on the information gain under the rate coding hypothesis. Mathematical Biosciences & Engineering, 2014, 11 (1) : 63-80. doi: 10.3934/mbe.2014.11.63 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]