# American Institute of Mathematical Sciences

September  2019, 1(3): 307-327. doi: 10.3934/fods.2019014

## Power weighted shortest paths for clustering Euclidean data

 1 Department of Mathematics, University of California, Los Angeles, Los Angeles CA 90095, USA 2 Department of Mathematics, University of Michigan, Ann Arbor MI 48109, USA

* Corresponding author: mckenzie@math.ucla.edu

Published  September 2019

Fund Project: The first author gratefully acknowledges the support of the Department of Mathematics, The University of Georgia, where the first author was a graduate student while this work was completed. The second author thanks the Department of Mathematics, The University of Michigan for their support. Both authors thank the anonymous reviewer for many useful suggestions.

We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.

Citation: Daniel Mckenzie, Steven Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 2019, 1 (3) : 307-327. doi: 10.3934/fods.2019014
##### References:

show all references

##### References:
Three sample geodesics in the power weighted shortest path metric with $p=2$, for the data set "Three Lines" (see §6). Observe how the geodesics consist of many small hops, instead of several large hops. The total lengths of the red and green paths are significantly smaller than the length of the blue path
All three synthetic data sets, projected into $\mathbb{R}^{2}$. From left to right: Three Lines, Three Moons and Three Circles
Varying $p$ and recording the accuracy of spectral clustering on the Three Lines data set, for three different values of the ambient dimension
Classification accuracy of spectral clustering. Note that $A^{(1)}$ represents using the Euclidean metric
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $66.11\pm 0.94\%$ $66.35 \pm 3.73\%$ $66.87 \pm 3.37\%$ $95.38\pm 9.22\%$ $\bf{95.38 \pm 9.1\%}$ 3 Moons $85.90 \pm 1.13\%$ $94.40 \pm 1.48\%$ $94.40 \pm 1.48\%$ $\bf{96.20 \pm 1.76\%}$ $94.35 \pm 3.34\%$ 3 Circles $51.87 \pm 0.00\%$ $51.93 \pm 0.32\%$ $51.94 \pm 0.36\%$ $71.22 \pm 9.50\%$ $\bf{73.61 \pm 10.47\%}$ $\mathtt{ DrivFace}$ $78.88\%$ $71.62\%$ $71.62\%$ $74.71\%$ $\bf{85.38\%}$ $\mathtt{ COIL-20}$ $63.24\%$ $75.28\%$ $\bf{78.61\%}$ $77.45\%$ $60.92\%$ $\mathtt{ OptDigits}$ $77.73\%$ $91.49\%$ $\bf{91.54\%}$ $88.39\%$ $83.17\%$ $\mathtt{ USPS}$ $48.65\%$ $65.05\%$ $65.02\%$ $76.20\%$ $\bf{77.92\%}$ $\mathtt{ MNIST}$ - $76.11\%$ $75.63\%$ $84.54\%$ $\bf{86.77\%}$
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $66.11\pm 0.94\%$ $66.35 \pm 3.73\%$ $66.87 \pm 3.37\%$ $95.38\pm 9.22\%$ $\bf{95.38 \pm 9.1\%}$ 3 Moons $85.90 \pm 1.13\%$ $94.40 \pm 1.48\%$ $94.40 \pm 1.48\%$ $\bf{96.20 \pm 1.76\%}$ $94.35 \pm 3.34\%$ 3 Circles $51.87 \pm 0.00\%$ $51.93 \pm 0.32\%$ $51.94 \pm 0.36\%$ $71.22 \pm 9.50\%$ $\bf{73.61 \pm 10.47\%}$ $\mathtt{ DrivFace}$ $78.88\%$ $71.62\%$ $71.62\%$ $74.71\%$ $\bf{85.38\%}$ $\mathtt{ COIL-20}$ $63.24\%$ $75.28\%$ $\bf{78.61\%}$ $77.45\%$ $60.92\%$ $\mathtt{ OptDigits}$ $77.73\%$ $91.49\%$ $\bf{91.54\%}$ $88.39\%$ $83.17\%$ $\mathtt{ USPS}$ $48.65\%$ $65.05\%$ $65.02\%$ $76.20\%$ $\bf{77.92\%}$ $\mathtt{ MNIST}$ - $76.11\%$ $75.63\%$ $84.54\%$ $\bf{86.77\%}$
Run time of spectral clustering, in seconds. Note that this includes the time rquired to construct the similarity matrix. $A^{(1)}$ represents using the Euclidean metric
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $0.32$ $0.16$ $1.20$ $1.22$ $1.22$ 3 Moons $0.33$ $0.17$ $1.31$ $1.30$ $1.36$ 3 Circles $0.35$ $0.16$ $1.00$ $1.06$ $1.07$ $\mathtt{ DrivFace}$ $0.37$ $1.24$ $1.55$ $1.64$ $1.64$ $\mathtt{ COIL-20}$ $0.57$ $0.72$ $1.57$ $1.82$ $1.78$ $\mathtt{ OptDigits}$ $5.40$ $1.41$ $5.28$ $5.58$ $5.67$ $\mathtt{ USPS}$ $27.40$ $17.12$ $26.75$ $22.78$ $23.79$ $\mathtt{ MNIST}$ - $2060.23$ $2031.38$ $1554.15$ $1613.41$
 $A^{(f, 1)}$ $A^{(1)}$ $A^{(2)}$ $A^{(10)}$ $A^{(\infty)}$ 3 Lines $0.32$ $0.16$ $1.20$ $1.22$ $1.22$ 3 Moons $0.33$ $0.17$ $1.31$ $1.30$ $1.36$ 3 Circles $0.35$ $0.16$ $1.00$ $1.06$ $1.07$ $\mathtt{ DrivFace}$ $0.37$ $1.24$ $1.55$ $1.64$ $1.64$ $\mathtt{ COIL-20}$ $0.57$ $0.72$ $1.57$ $1.82$ $1.78$ $\mathtt{ OptDigits}$ $5.40$ $1.41$ $5.28$ $5.58$ $5.67$ $\mathtt{ USPS}$ $27.40$ $17.12$ $26.75$ $22.78$ $23.79$ $\mathtt{ MNIST}$ - $2060.23$ $2031.38$ $1554.15$ $1613.41$
 [1] Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028 [2] Antonio Rieser. A topological approach to spectral clustering. Foundations of Data Science, 2021, 3 (1) : 49-66. doi: 10.3934/fods.2021005 [3] M. Phani Sudheer, Ravi S. Nanjundiah, A. S. Vasudeva Murthy. Revisiting the slow manifold of the Lorenz-Krishnamurthy quintet. Discrete & Continuous Dynamical Systems - B, 2006, 6 (6) : 1403-1416. doi: 10.3934/dcdsb.2006.6.1403 [4] Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics & Games, 2021  doi: 10.3934/jdg.2021008 [5] Yi Gao, Rui Li, Yingjing Shi, Li Xiao. Design of path planning and tracking control of quadrotor. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021063 [6] Mario Bukal. Well-posedness and convergence of a numerical scheme for the corrected Derrida-Lebowitz-Speer-Spohn equation using the Hellinger distance. Discrete & Continuous Dynamical Systems, 2021, 41 (7) : 3389-3414. doi: 10.3934/dcds.2021001 [7] Shan-Shan Lin. Due-window assignment scheduling with learning and deterioration effects. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021081 [8] Sheng-I Chen, Yen-Che Tseng. A partitioning column approach for solving LED sorter manipulator path planning problems. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021055 [9] Yongkun Wang, Fengshou He, Xiaobo Deng. Multi-aircraft cooperative path planning for maneuvering target detection. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021050 [10] Marian Gidea, Rafael de la Llave, Tere M. Seara. A general mechanism of instability in Hamiltonian systems: Skipping along a normally hyperbolic invariant manifold. Discrete & Continuous Dynamical Systems, 2020, 40 (12) : 6795-6813. doi: 10.3934/dcds.2020166 [11] Dingheng Pi. Periodic orbits for double regularization of piecewise smooth systems with a switching manifold of codimension two. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021080 [12] Wei Wang, Yang Shen, Linyi Qian, Zhixin Yang. Hedging strategy for unit-linked life insurance contracts with self-exciting jump clustering. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021072 [13] Mehmet Duran Toksari, Emel Kizilkaya Aydogan, Berrin Atalay, Saziye Sari. Some scheduling problems with sum of logarithm processing times based learning effect and exponential past sequence dependent delivery times. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021044 [14] Hao Li, Honglin Chen, Matt Haberland, Andrea L. Bertozzi, P. Jeffrey Brantingham. PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021039 [15] Shanjian Tang, Fu Zhang. Path-dependent optimal stochastic control and viscosity solution of associated Bellman equations. Discrete & Continuous Dynamical Systems, 2015, 35 (11) : 5521-5553. doi: 10.3934/dcds.2015.35.5521

Impact Factor: