• PDF
• Cite
• Share
Article Contents  Article Contents

# Generalized penalty for circular coordinate representation

• * Corresponding author: Hengrui Luo
• Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. To do that, we use a generalized penalty function instead of an $L_{2}$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analyses to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.

Mathematics Subject Classification: 55N31, 62R40, 68T09.

 Citation: • • Figure 1.1.  Example in Section 2.3 of  with four points $a = (-1,0.5),b = (1,0.5),c = (1,-0.5),d = (-1,-0.5)$

Figure 1.2.  The scatter plot, barcode, coordinate plot, and the colormap for the dataset $X\subset\mathbb{R}^{2}$, which is a dataset of $50$ points equidistantly sampled on a figure-$8$ shape

Figure 2.1.  The dataset $X\subset\mathbb{R}^{3}$, which is a dataset of $150$ samples on a figure-$8$ shape $S^{1}\times\{0\}\bigcup\{0\}\times(S^{1}(-1,-1))$, where $S^{1}(-1,-1)$ denotes a unit circle centered at $(-1,-1)$

Figure 2.2.  The dimension reduced data $X^{cc}$ obtained from circular coordinates based on the Vietoris-Rips complex constructed from $X$

Figure 2.3.  The PCA representation $X^{pca}$ from choosing $2$ principal components

Figure 3.1.  Example 1: The $L_{2}$ smoothed and generalized penalized circular coordinates of the uniformly sampled dataset ($n = 300$) from a ring of inner radius $R = 1.5$ and width $d = 1.5$. The first, second, and the third row correspond to $\lambda = 0,0.5$, and 1, respectively

Figure 3.2.  Example 1: The $L_{1}$ smoothed (first column) and $L_{2}$ smoothed (second column) circular coordinates of the uniformly sampled dataset from a ring with the same radius $R = 1.5$ but different widths $d = 1,2,7.5$, corresponding to each row. The first and second columns correspond to $\lambda = 0$ and 1, respectively

Figure 3.3.  Functional norms of varying $\lambda$ coefficient on Example 1: The $L_1$ (first row), $L_2$ (second row), and mixed norm (third row) for smoothed circular coordinates functions optimized with different choices of $\lambda$ as in (2.3). The coordinates are computed for the uniformly sampled dataset from a ring with the same radius $R = 1.5$ but different widths $d = 1,2,7.5$, as in Figure 3.2, corresponding to each column. We also use black vertical dashed lines to delineate the $\lambda = 0,0.5,1$ on the log scale

Figure 3.4.  Example 2: The $L_{2}$ smoothed and generalized penalized circular coordinate (displayed in different rows) of the uniformly sampled dataset ($n = 100$) from double rings, both with inner radius $R = 1.5$ and width $d = 0.5$. The first, second, and the third row correspond to $\lambda = 0,0.5$, and 1, respectively

Figure 3.5.  Example 3: The $L_{2}$ smoothed and generalized penalized circular coordinate (displayed in different rows) of the uniformly sampled dataset ($n = 300$) from Dupin cyclides (a.k.a. pinched torus). The first, second, and the third row correspond to $\lambda = 0,0.5$, and 1, respectively

Figure 4.1.  The $S^{1}$ representation obtained from the circular coordinate representation under different penalty functions. The first, second, and the third row correspond to $\lambda = 0$, $0.5$, and $1$, respectively

Figure 4.2.  The $L_{2}$ smoothed and generalized penalized circular coordinates (displayed in different rows) of the three collections of fan frequency dataset ($n = 175$) from  plotted against indices (equivalent to the distances of distance-bins). The first, second, and the third row correspond to $\lambda = 0$, $0.5$, and 1, respectively. The circular coordinates with generalized penalty function are much sparser compared to the coordinates associated with the $L_2$ penalty function, which means that our method captures the periodic pattern better

Figure 4.3.  The $L_{2}$ smoothed and generalized penalized (mod 1) combined circular coordinates among congressman/woman across party-lines. Each point represents a congressman/woman and the color represents party-lines. The circular coordinates are computed from congress voting datasets from years 1990, 1998, and 2006 (displayed in different rows). The first and the second column correspond to $\lambda = 0$ and $1$, respectively. We compute the cluster scores by mapping the combined circular coordinates (summed up by all 1-cocycles with persistence greater than 1) to $\mathbb{R}^2$ with the mapping $x\mapsto(\cos(2\pi x),\sin(2\pi x))$ to accommodate the circularity.

Figure B.1.  The GPCA representation $X^{gpca,2}$ and $X^{gpca,3}$ of the embeddings from the first $2$ principal components of the homogeneous polynomials of degree $2$ and $3$, respectively

Figure C.1.  Evaluation of dimension reduction results obtained from different NLDR methods with the congress voting dataset of year 1990. We display the coranking matrices of PCA and t-SNE in the first row, and the coranking matrices of UMAP and Laplacian eigenmap in the second row. We display the coranking matrices of circular coordinates with penalty functions $L_{1}$, elastic norm, and $L_{2}$ in the third row

Figure C.2.  Evaluation of dimension reduction results obtained from different NLDR methods with the congress voting dataset of year 1998. We display the coranking matrices of PCA and t-SNE in the first row, and the coranking matrices of UMAP and Laplacian eigenmap in the second row. We display the coranking matrices of circular coordinates with penalty functions $L_{1}$, elastic norm, and $L_{2}$ in the third row

Figure C.3.  Evaluation of dimension reduction results obtained from different NLDR methods with the congress voting dataset of year 2006. We display the coranking matrices of PCA and t-SNE in the first row, and the coranking matrices of UMAP and Laplacian eigenmap in the second row. We display the coranking matrices of circular coordinates with penalty functions $L_{1}$, elastic norm and $L_{2}$ in the third row

Figure D.1.  Example 5: The $L_{2}$ smoothed and generalized penalized circular coordinates of the Jacobian rejection sampled dataset ($n = 300$) from a ring with fixed width (Jacobian rejection sampling). The first, second, and the third row correspond to $\lambda = 0$, $0.5$, and $1$, respectively

Figure D.2.  Example 6: The $L_{2}$ smoothed and generalized penalized circular coordinates of the Jacobian rejection sampled dataset ($n = 300$) from a Dupin cyclide with $r = 2$, $R = 1.5$ as in Section 3.4. The first, second, and the third row correspond to $\lambda = 0$, $0.5$, and 1, respectively

Figure E.1.  (top) Barcode for a simulated example of 150 uniformly sampled points from an annulus. (bottom) Resulting circular coordinates computed using different thresholds along the filtration for longest persisting cocycle represented as color of the points

Figure E.2.  (top) Barcode for a simulated example of 150 uniformly sampled points from an annulus. (center) Resulting circular coordinates computed using different thresholds along the filtration for longest persisting cocycle represented as color of the points. (bottom) Resulting circular coordinates plotted against the angle theta between the respective point and the $x = 0$ axis with values colored the same way as the center row

Figure E.3.  Comparison of 100 circular coordinates computed with threshold varying between the birth and death of the longest cocycle in the example Fig. E.2. The blue bars represent a box plot for the circular coordinate values for the circular coordinates relative to the points represented by angle theta

• ## Article Metrics  DownLoad:  Full-Size Img  PowerPoint