# American Institute of Mathematical Sciences

September  2019, 1(3): 307-327. doi: 10.3934/fods.2019014

## Power weighted shortest paths for clustering Euclidean data

 1 Department of Mathematics, University of California, Los Angeles, Los Angeles CA 90095, USA 2 Department of Mathematics, University of Michigan, Ann Arbor MI 48109, USA

* Corresponding author: mckenzie@math.ucla.edu

Published  September 2019

Fund Project: The first author gratefully acknowledges the support of the Department of Mathematics, The University of Georgia, where the first author was a graduate student while this work was completed. The second author thanks the Department of Mathematics, The University of Michigan for their support. Both authors thank the anonymous reviewer for many useful suggestions.

We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.

Citation: Daniel Mckenzie, Steven Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 2019, 1 (3) : 307-327. doi: 10.3934/fods.2019014
##### References:

show all references

##### References:
Three sample geodesics in the power weighted shortest path metric with $p=2$, for the data set "Three Lines" (see §6). Observe how the geodesics consist of many small hops, instead of several large hops. The total lengths of the red and green paths are significantly smaller than the length of the blue path
All three synthetic data sets, projected into $\mathbb{R}^{2}$. From left to right: Three Lines, Three Moons and Three Circles
Varying $p$ and recording the accuracy of spectral clustering on the Three Lines data set, for three different values of the ambient dimension
Classification accuracy of spectral clustering. Note that $A^{(1)}$ represents using the Euclidean metric
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $66.11\pm 0.94\%$ $66.35 \pm 3.73\%$ $66.87 \pm 3.37\%$ $95.38\pm 9.22\%$ $\bf{95.38 \pm 9.1\%}$ 3 Moons $85.90 \pm 1.13\%$ $94.40 \pm 1.48\%$ $94.40 \pm 1.48\%$ $\bf{96.20 \pm 1.76\%}$ $94.35 \pm 3.34\%$ 3 Circles $51.87 \pm 0.00\%$ $51.93 \pm 0.32\%$ $51.94 \pm 0.36\%$ $71.22 \pm 9.50\%$ $\bf{73.61 \pm 10.47\%}$ $\mathtt{ DrivFace}$ $78.88\%$ $71.62\%$ $71.62\%$ $74.71\%$ $\bf{85.38\%}$ $\mathtt{ COIL-20}$ $63.24\%$ $75.28\%$ $\bf{78.61\%}$ $77.45\%$ $60.92\%$ $\mathtt{ OptDigits}$ $77.73\%$ $91.49\%$ $\bf{91.54\%}$ $88.39\%$ $83.17\%$ $\mathtt{ USPS}$ $48.65\%$ $65.05\%$ $65.02\%$ $76.20\%$ $\bf{77.92\%}$ $\mathtt{ MNIST}$ - $76.11\%$ $75.63\%$ $84.54\%$ $\bf{86.77\%}$
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $66.11\pm 0.94\%$ $66.35 \pm 3.73\%$ $66.87 \pm 3.37\%$ $95.38\pm 9.22\%$ $\bf{95.38 \pm 9.1\%}$ 3 Moons $85.90 \pm 1.13\%$ $94.40 \pm 1.48\%$ $94.40 \pm 1.48\%$ $\bf{96.20 \pm 1.76\%}$ $94.35 \pm 3.34\%$ 3 Circles $51.87 \pm 0.00\%$ $51.93 \pm 0.32\%$ $51.94 \pm 0.36\%$ $71.22 \pm 9.50\%$ $\bf{73.61 \pm 10.47\%}$ $\mathtt{ DrivFace}$ $78.88\%$ $71.62\%$ $71.62\%$ $74.71\%$ $\bf{85.38\%}$ $\mathtt{ COIL-20}$ $63.24\%$ $75.28\%$ $\bf{78.61\%}$ $77.45\%$ $60.92\%$ $\mathtt{ OptDigits}$ $77.73\%$ $91.49\%$ $\bf{91.54\%}$ $88.39\%$ $83.17\%$ $\mathtt{ USPS}$ $48.65\%$ $65.05\%$ $65.02\%$ $76.20\%$ $\bf{77.92\%}$ $\mathtt{ MNIST}$ - $76.11\%$ $75.63\%$ $84.54\%$ $\bf{86.77\%}$
Run time of spectral clustering, in seconds. Note that this includes the time rquired to construct the similarity matrix. $A^{(1)}$ represents using the Euclidean metric
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $0.32$ $0.16$ $1.20$ $1.22$ $1.22$ 3 Moons $0.33$ $0.17$ $1.31$ $1.30$ $1.36$ 3 Circles $0.35$ $0.16$ $1.00$ $1.06$ $1.07$ $\mathtt{ DrivFace}$ $0.37$ $1.24$ $1.55$ $1.64$ $1.64$ $\mathtt{ COIL-20}$ $0.57$ $0.72$ $1.57$ $1.82$ $1.78$ $\mathtt{ OptDigits}$ $5.40$ $1.41$ $5.28$ $5.58$ $5.67$ $\mathtt{ USPS}$ $27.40$ $17.12$ $26.75$ $22.78$ $23.79$ $\mathtt{ MNIST}$ - $2060.23$ $2031.38$ $1554.15$ $1613.41$
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $0.32$ $0.16$ $1.20$ $1.22$ $1.22$ 3 Moons $0.33$ $0.17$ $1.31$ $1.30$ $1.36$ 3 Circles $0.35$ $0.16$ $1.00$ $1.06$ $1.07$ $\mathtt{ DrivFace}$ $0.37$ $1.24$ $1.55$ $1.64$ $1.64$ $\mathtt{ COIL-20}$ $0.57$ $0.72$ $1.57$ $1.82$ $1.78$ $\mathtt{ OptDigits}$ $5.40$ $1.41$ $5.28$ $5.58$ $5.67$ $\mathtt{ USPS}$ $27.40$ $17.12$ $26.75$ $22.78$ $23.79$ $\mathtt{ MNIST}$ - $2060.23$ $2031.38$ $1554.15$ $1613.41$
 [1] Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010 [2] Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2020  doi: 10.3934/naco.2020028 [3] Dmitry Pozharskiy, Noah J. Wichrowski, Andrew B. Duncan, Grigorios A. Pavliotis, Ioannis G. Kevrekidis. Manifold learning for accelerating coarse-grained optimization. Journal of Computational Dynamics, 2020, 7 (2) : 511-536. doi: 10.3934/jcd.2020021 [4] Leonor Cruzeiro. The VES hypothesis and protein misfolding. Discrete & Continuous Dynamical Systems - S, 2011, 4 (5) : 1033-1046. doi: 10.3934/dcdss.2011.4.1033 [5] Yongbin Ou, Cun-Quan Zhang. A new multimembership clustering method. Journal of Industrial & Management Optimization, 2007, 3 (4) : 619-624. doi: 10.3934/jimo.2007.3.619 [6] Martin Frank, Armin Fügenschuh, Michael Herty, Lars Schewe. The coolest path problem. Networks & Heterogeneous Media, 2010, 5 (1) : 143-162. doi: 10.3934/nhm.2010.5.143 [7] José M. Arrieta, Esperanza Santamaría. Estimates on the distance of inertial manifolds. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 3921-3944. doi: 10.3934/dcds.2014.34.3921 [8] Liliana Trejo-Valencia, Edgardo Ugalde. Projective distance and $g$-measures. Discrete & Continuous Dynamical Systems - B, 2015, 20 (10) : 3565-3579. doi: 10.3934/dcdsb.2015.20.3565 [9] Dominique Duncan, Thomas Strohmer. Classification of Alzheimer's disease using unsupervised diffusion component analysis. Mathematical Biosciences & Engineering, 2016, 13 (6) : 1119-1130. doi: 10.3934/mbe.2016033 [10] Maria Gabriella Mosquera, Vlado Keselj. Identifying electronic gaming machine gambling personae through unsupervised session classification. Big Data & Information Analytics, 2017, 2 (2) : 141-175. doi: 10.3934/bdia.2017015 [11] Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565 [12] Hong-Kun Zhang. Free path of billiards with flat points. Discrete & Continuous Dynamical Systems - A, 2012, 32 (12) : 4445-4466. doi: 10.3934/dcds.2012.32.4445 [13] Matthias Gerdts, René Henrion, Dietmar Hömberg, Chantal Landry. Path planning and collision avoidance for robots. Numerical Algebra, Control & Optimization, 2012, 2 (3) : 437-463. doi: 10.3934/naco.2012.2.437 [14] Changbing Hu, Roger Temam, Mohammed Ziane. The primitive equations on the large scale ocean under the small depth hypothesis. Discrete & Continuous Dynamical Systems - A, 2003, 9 (1) : 97-131. doi: 10.3934/dcds.2003.9.97 [15] Shinsuke Koyama, Lubomir Kostal. The effect of interspike interval statistics on the information gain under the rate coding hypothesis. Mathematical Biosciences & Engineering, 2014, 11 (1) : 63-80. doi: 10.3934/mbe.2014.11.63 [16] Christian Bonatti, Sylvain Crovisier, Katsutoshi Shinohara. The $C^{1+\alpha }$ hypothesis in Pesin Theory revisited. Journal of Modern Dynamics, 2013, 7 (4) : 605-618. doi: 10.3934/jmd.2013.7.605 [17] J.-M. Deshouillers, G. Effinger, H. te Riele and D. Zinoviev. A complete Vinogradov 3-primes theorem under the Riemann hypothesis. Electronic Research Announcements, 1997, 3: 99-104. [18] E. Camouzis, H. Kollias, I. Leventides. Stable manifold market sequences. Journal of Dynamics & Games, 2018, 5 (2) : 165-185. doi: 10.3934/jdg.2018010 [19] Camillo De Lellis, Emanuele Spadaro. Center manifold: A case study. Discrete & Continuous Dynamical Systems - A, 2011, 31 (4) : 1249-1272. doi: 10.3934/dcds.2011.31.1249 [20] Zhiguo Feng, Ka-Fai Cedric Yiu. Manifold relaxations for integer programming. Journal of Industrial & Management Optimization, 2014, 10 (2) : 557-566. doi: 10.3934/jimo.2014.10.557

Impact Factor: