Principal symmetric space analysis

We develop a novel analogue of Euclidean PCA (principal component analysis) for data taking values on a Riemannian symmetric space, using totally geodesic submanifolds as approximating lower dimnsional submanifolds. We illustrate the technique on n-spheres, Grassmannians, n-tori and polyspheres.


Introduction
Principal Components Analysis (PCA [10]), traditionally applied for data on a Euclidean space E n , has many notable features that have made it one of the most widely used of all statistical techniques. We single out the following: 1. The approximating subspaces (affine subspaces of E n ) have zero extrinsic curvature; 2. any two affine subspaces of the same dimension are related by a Euclidean transformation; 3. the best approximations of each dimension are nested (that is, the best approximation by a k-dimensional subspace lies in the best approximation by a k + 1-dimensional subspace); and 4. the best approximations of each dimension from 0 to n − 1 can be computed easily using linear algebra.
The underlying idea of PCA has been extended to deal with data on non-Euclidean manifolds. One such method is that of Principal Geodesic Analysis (PGA [5,6,8]). For data on a Riemannian manifold M , the Karcher mean x is computed and the data pulled back to the tangent space T x M by the logarithm of the Riemannian exponential map at x (see Figure 1). PCA can now be applied to the data on this Euclidean vector space. However, this and related methods suffer from a fundamental flaw in that they fail to deal properly with the curvature of the manifold. Two geodesics with common base points and distant tangent vectors may pass close to each other or intersect (see Figure 1). In this situation, nearby data points would become far apart in their linear approximation.
In seeking a method that avoids this flaw we have focussed on property (1) above. A submanifold N of a Riemannian manifold M has zero extrinsic curvature if and only if it is totally geodesic (i.e., any geodesic of N is also a geodesic of M ). Such submanifolds provide excellent approximating spaces, being in a sense the flattest or simplest possible lowerdimensional representations of the data. One-dimensional totally geodesic submanifolds Figure 1: In Principal Geodesic Analysis, the data (here 20 points on a sphere) is pulled back to the tangent space of the Karcher mean (shown here as a disk) using geodesics. Data points near the mean are well represented, but data points far from the mean (here, near the south pole) become far apart in the linear approximation.
Generic manifolds have no totally geodesic submanifolds of dimension higher than 1, but Riemannian symmetric spaces have many. Examples of Riemannian symmetric spaces are compact Lie groups, Euclidean spaces, spheres, projective spaces, Grassmannians, and products of these; these are all important examples of nonlinear domains for data. We will see that the structure of totally geodesic submanifolds offers rich possibilities for data reduction and for the discovery of hidden structure in data sets. Totally geodesic submanifolds of Riemannian symmetric spaces are themselves Riemannian symmetric, which offers the possibility of a nested structure as in point (3) above.
Although some form of nesting is desirable, we will see that, given two best approximating totally geodesic submanifolds, one is not necessarily contained in the other. To overcome this, in this paper we define the symmetric space approximations of a dataset in a Riemannian symmetric space. This is a set whose elements are best approximating totally geodesic submanifolds. Applying this construction recursively gives the principal symmetric space approximation which is structured as a rooted tree. In this sense the nesting structure is retained, although it may be complicated in specific instances.
In Section 2 we review the relevant elements of symmetric spaces. In particular, the determination of totally geodesic submanifolds can be reduced to a purely algebraic equation in a vector space (the Lie algebra of the symmetry group of the symmetric space). Solv-ing this equation may be difficult, however; it has been solved completely only for spaces of rank 1 (such as spheres and projective spaces) and rank 2 (such as 2-Grassmannians and products of two spheres). In the remainder of the paper, therefore, we proceed by example. Section 3 considers data on the n-dimensional sphere S n . Section 4 considers data on the Grassmannian G(k, n) of k-planes in R n ; even here we need to restrict to the simple submanifolds G(k, m) of k-planes in R m . In both of these cases, we will show that the approximation problem can be linearised so that a PCA-like nested sequences of approximating submanifolds can be determined using linear algebra.
More complicated cases are handled in Section 5 on products of spheres. The two subcases that we consider are tori (S 1 ) n and polyspheres (S 2 ) n . Each of these has an infinite number of distinct types of totally geodesic submanifolds and each reveals new features of the general situation.
We now introduce the central ideas of principal symmetric space approximation. Let M = G/H be a Riemannian symmetric space. Let T G(M ) be the set of connected totally geodesic submanifolds of M . G acts on T G(M ) and partitions it into group orbits. We regard the submanifolds in each orbit as being of equivalent structure and complexity, so that if there is a unique best approximation within an orbit, we choose it; but the submanifolds from different orbits, even if of the same dimension, are different geometrically and are best regarded as representing different models.
Let the data set be X : where local minima are taken.
Thus, each element of SSA(X, M ) is a totally geodesic submanifold N of M , which best approximates the data in the sense that the approximation cannot be improved by passing to gN where g is close to the identity.
As each N ∈ SSA(X, M ) is a Riemannian symmetric space, it typically has many totally geodesic submanifolds itself. These are already contained in T G(M ). We can now calculate the symmetric space approximations of X with respect to each such N . Repeating this construction gives a tree of submanifolds. Each branch contains a nested sequence of approximations of decreasing dimensions, with each branch terminating in a submanifold of dimension 0, that is, a point.
The principal symmetric space approximation PSSA(X, M ) of X with respect to M is the rooted tree in which 1. each node is a totally geodesic submanifold of M ; 2. the root node is M ; and 3. the children of a node N are the symmetric space approximations of X with respect to N .
Examples are the unbranched tree E n ⊃ E n−1 ⊃ · · · ⊃ E 0 found in Euclidean PCA, and the 2-node tree M ⊃ {x} for any Riemannian manifold M , where x is the Karcher mean of X.

Symmetric spaces
We give a brief account of symmetric spaces relevant to the sequel. The material presented here is standard, see for instance [14, Chapter XI].

Definition 3.
A symmetric space is a triple (G, H, σ) where G is a connected Lie group, σ is an involutive automorphism of G, and H is a closed subgroup of G such that H lies between the isotropy subgroup G σ and its identity component G o σ . In particular, the manifold M = G/H is a canonically reductive homogeneous space and hence comes equipped with a canonical linear connection. Let s o be the automorphism of G/H induced by σ. For any point x = g.o where o is the origin, the mapping s x = g.s o .g −1 is independent of the choice of g. Moreover, s x is a symmetry of the canonical connection for all x, i.e. a diffeomorphism of a neighbourhood of x onto itself sending exp X → exp −X for any tangent vector X. We now present the infinitesimal picture. Definition 4. A symmetric Lie algebra is a triple (g, h, σ) where g is a Lie algebra, σ is an involutive automorphism of g, and h ⊂ g is the Lie subalgebra of elements fixed by σ.
There is a one-to-one correspondence between effective symmetric Lie algebras and almost effective (i.e., the only normal subgroups of G are discrete) symmetric spaces with G simply connected and H connected.
The involution σ induces a decomposition g = h + m of g into the ±1 eigenspaces of σ, called the canonical decomposition. The following relations the hold, which suffice to characterize symmetric Lie algebras: Examples of symmetric spaces include the oriented Grassmannian G + (k, n) of oriented kplanes in R n . The symmetric space structure is described by G + (k, n) SO(n)/(SO(k) × SO(n − k)), with automorphism σ(A) = SAS −1 , where I p is the p × p identity matrix. The case k = 1 gives the symmetric space structure of the sphere S n . The unoriented case G(k, n) O(n)/(O(k) × O(n − k)) is similar, and specialization to k = 1 then gives projective spaces. We also note that there is a natural direct product of symmetric spaces: A submanifold N ⊂ M is said to be totally geodesic if for all points x ∈ N and tangent vectors X ∈ T x (N ), the geodesic exp(tX) is contained in N for sufficiently small t. Where M is a Riemannian manifold, this is equivalent to requiring that the induced metric on N coincides with the metric on M . A Lie triple system m is a subspace of a Lie algebra for which [[m, m], m] ⊂ m. The following result underlies our interest in totally geodesic submanifolds of symmetric spaces: There is a one-to-one correspondence between complete totally geodesic submanifolds M containing the origin and Lie triple systems m ⊂ m. Moreover (G , H , σ ) is a symmetric subspace, where G is the largest connected Lie subgroup of G leaving M invariant, H = G ∩ H, and σ = σ| G .
Note that the proof constructs the symmetric subalgebra (g , h , σ ). Indeed, given such an m , we take h = [m , m ], then set g = h + m .
The problem of classifying totally geodesic submanifolds of symmetric spaces is thus reduced to an algebraic one. It remains a difficult task [2,13,19]. Moreover, there may exist complicated totally geodesic submanifolds which are of little physical relevance, so in some cases we restrict our attention to subfamilies of symmetric subspaces.
The notion of a symmetric space approximation requires a distance function on the manifold. It is most natural to specify this through a Riemannian metric. This makes most sense where our notion of totally geodesic submanifold coincides with the Riemannian geodesics, as summarized by the following definition.
Definition 5. A Riemannian symmetric space is a symmetric space for which the canonical connection coincides with the Riemannian (Levi-Civita) connection.
This implies that the symmetries s x are isometries. A symmetric space equipped with a metric is Riemannian symmetric if the metric is G-invariant. Given a symmetric space (G, H, σ) for which ad g (H) is compact, a G-invariant Riemannian metric may be constructed in a canonical manner.
All of the symmetric spaces we consider are canonically Riemannian symmetric spaces. Nonetheless, for practical purposes we will minimize distances which differ from the Riemannian distance, typically to obtain a linearization of the minimization problem. We will say that two metrics d 1 , d 2 are compatible if they agree up to first order for nearby points, that is, if d 2 (x, y) = d 1 (x, y) + O(d 1 (x, y) 2 ). In a non-Riemannian metric space, the length of a curve is defined by a Riemann sum, and thus one still has the concept of geodesic and of totally geodesic submanifolds in this case. Moreover, the geodesics and totally geodesic submanifolds of a given smooth manifold equipped with two compatible metrics coincide. Thus, although perturbing the Riemannian metric of a Riemannian symmetric space changes the specific principal symmetric space approximation corresponding to a given set of data, it does not change the system of totally geodesic submanifolds itself.

Spheres
Datasets on high-dimensional spheres arise naturally whenever we have a set of measurements in a Euclidean space for which the magnitude is irrelevant. One important instance concerns directional data, see [11] and the references therein for more examples.
The connected totally geodesic submanifolds of S n are the spheres S k , realized as the image of a standard sphere x 2 1 + · · · + x 2 k+1 = 1 in R n+1 under an element of SO(n + 1) [19, Thm 1].
We consider first the case of S 2 , represented as the set of unit vectors in R 3 . Geodesics on S 2 are precisely the great circles, which may be described as the set of points in S 2 orthogonal to a given unit vector v. We call this great circle S v := {w ∈ R 3 : w · v = 0}. In this case the Riemannian distance from a point x to S v is the angle between x and v, that is, Note that the great circle with axis v consists of the intersection of S 2 and the plane with normal vector v. More generally, the totally geodesic submanifolds of S n , viewed as submanifolds of R n+1 , are precisely the intersections S N of S n with a given linear subspace N of R n+1 . Lemma 1. Let N be a subspace of R n+1 and let {v 1 , . . . , v m } be an orthogonal basis for N ⊥ . Then the distance between x ∈ S n and S N : Proof. The angle θ (= d(x, S N )) between x and N , and the angleθ between x and N ⊥ , are complementary angles. Likewise, the angle θ between x and S N and the angleθ between x and S n ∩ N ⊥ are complementary angles. Letx be the orthogonal projection of x to N ⊥ , that is, We now make the obvious linearization of this distance so that best approximations may be determined using linear algebra. We call the projection distance d p (x, v) between x and v ∈ S n the shortest Euclidean distance from x to span(v) in R n+1 . (Equivalently, from v to span(x).) Lemma 2. Let N be a subspace of R n+1 and let {v 1 , . . . , v m } be an orthogonal basis for N ⊥ . Then the projection distance of between x ∈ S n and S N is Proof. Let θ be the angle between x and v ∈ S n , that is, cos θ = x·v. Then d p (x, v) = sin θ. The same construction as in Lemma 1, except measuring distances as sin θ instead of θ, gives the result.
Note that the projection distance between two points is nonlinear; its use is favoured here because it becomes linear when calculating distances to subspheres S N . The projection distance is compatible with the Riemannian distance. Proposition 1. Let X be the matrix whose columns consists of the data points x i ∈ S n , where S n is identified with the unit sphere in R n+1 . Then for any m with 0 < m < n, the best approximating (n − m)-sphere in the projection distance is given by S N , where N is the the span of the singular vectors corresponding to the smallest m singular values of X T .
We seek to minimize d p (X, S N ) subject to the constraint that V is orthogonal. Introducing a Lagrange multiplier Λ ∈ R m×m for the constraint, where Λ T = Λ, we need to make At any solution to these equations, the objective function is d p (X, . Given any solution (V, Λ) to these equations, orthogonally diagonalize Λ = ZΩZ T where Z T Z = I and Ω is diagonal. Then (W Z, Ω) is also a solution. The value of the objective function, tr Λ = tr Ω, is the same for both solutions. Therefore, we can take Λ to be diagonal. Therefore, the stationary points are those for which the columns of V are eigenvectors of XX T (that is, singular vectors of X T ) and the diagonal entries of Λ are the associated eigenvalues of XX T (that is, squares of the singular values of X T ). The minimum value of the objective function is obtained by taking the m smallest singular values.
Note that, in the sense of Euclidean PCA, if we regard the data as a set of points in R n+1 , the best approximating (n + 1 − m)-subspace is just the span of the singular vectors associated with the n+1−m largest singular values of X T . That subspace is the orthogonal complement of the span of the singular vectors associated with the m smallest singular values, found in the proposition. Thus in this case, the two approximations coincide (after intersecting with S n ).
The linearization of the distance function, considered here, reduces the calculation to linear algebra, produces unique best approximations, and also provides the nesting property shared by Euclidean PCA: the best S p lies inside the best S m for p < m: Corollary 1. The principal symmetric space approximation of X with respect to S n is the unbranched tree S n ⊃ S n−1 ⊃ · · · ⊃ S 1 , where each S m is as determined in Proposition 1.
As stated, we have restricted the dimension of the subspheres in Proposition 1 and Corollary 1 to be positive. If they are applied with m = n to yield 0-dimensional approximations, they yield the pair of antipodal points that best approximates the data in the projective metric, because S N is then disconnected. Depending on the application, this may be what is wanted. Even if the best single point is wanted, the best such S N may still be a usefully good approximation if the data is, in fact, strongly clustered around a single point. If the data is not strongly clustered, and the best single point is wanted, then it may be necessary to switch to another metric (e.g. the Riemannian metric) and calculate the best point within each S m in Cor. 1, creating a branched tree of approximations. Example 1. We present a sequence of 3 examples of synthetic datasets on S 3 . Each contains 20 data points. In the first dataset, each x i is the projection of a point in N (0, diag(1, 1, 0.1, 0.05)) to S 3 . The data lies close to the 2-sphere x 4 = 0 and even closer to the great circle x 3 = x 4 = 0. The singular values of X T were found to be (0.3486, 0.4095, 3.0571, 3.2195). Thus the error of the best S 2 approximation to the data is 0.3486, and the error of the best S 1 approximation (shown in Fig. 2, left) is 0.5378. The error of the best S 0 approximation is 3.1040 and is clearly found to be not relevant. Likewise, the Karcher mean is not relevant for this dataset.
In the second dataset, each x i is the projection of a point in N (0, diag(1, 0.3, 0.1, 0.05)) to S 3 . Thus the data are more strongly clustered around the 0-sphere x 1 = ±1, x 2 = x 3 = Figure 2: Results for datasets 1-3 of Example 1. In each case, the best subspheres that approximate a set of 20 points on S 3 is shown. Data points further from the best S 2 are shown smaller. The best S 1 is shown in blue, lying on the best S 2 in teal. In datasets 2 and 3, the axis of the best S 0 (which consists of two antipodal points) is shown in black. In dataset 3, this also coincides with a standard mean of the data.

Grassmannians
The Grassmannian G(k, n) of k-dimensional subspaces (or k-planes) of R n is a symmetric space (see Sec. 2). Data comprising subspaces may arise if we wish to track the eigenspace decomposition of symmetric matrices such as diffusion tensors, or if we collect a sequence of low-dimensional approximating subspaces to Euclidean data using Euclidean PCA as some parameter (e.g. time) evolves. Related applications occur in computer vision and signal processing [18].
The classification of geodesic submanifolds of Grassmannians is surprisingly complicated [13]. Here we restrict our attention to a specific type of geodesic submanifold, namely the space of k-planes orthogonal to a given subspace W of R n . Lemma 3. The space of k-planes in R n orthogonal to a given (n−m)-dimensional subspace W of R n is a totally geodesic submanifold of G(k, n) and is diffeomorphic to G(k, m).

Proof. The symmetric algebra has canonical decomposition
A geodesic connecting two points on a Grassmannian may be characterized as a linear interpolation of each principal angle.
Fixing an orthogonal basis of W, extending this to an orthogonal basis of R n , and expressing subspaces orthogonal to W in terms of this basis gives the required diffeomorphism.
Let the columns of the matrix W ∈ R n×(n−m) be an orthonormal basis for the subspace W. Let X, Y ∈ R n×k be orthonormal bases for two elements X, Y of G(k, n). The relationship between X and Y is measured by their principal angles θ = (θ 1 , . . . , θ k ), defined by cos θ k = σ k (X T Y ). The geodesic distance between X and Y in the Riemannian symmetric space G(k, n) is θ 2 . Another popular measure of distance between subspaces is max k θ k [9, p. 584]. However, like Conway et al. [3], we find that it is far easier and more natural to use the "chordal distance" sin θ 2 (so named because when equipped with this metric, G(k, n) isometrically embeds in a Euclidean sphere). The chordal and geodesic metrics are compatible and thus have the same totally geodesic subspaces.
In the present context, the advantage of the chordal distance is that it linearizes the calculation of the distance from a k-plane to a totally geodesic submanifold.
Lemma 4. The chordal distance between two subspaces X, Y ∈ G(k, n) is given by An immediate consequence is the following Proposition 2. The chordal distance from a subspace X ∈ G(k, n) to the set G(k, m) of k-planes orthogonal to W ∈ R n,n−m is X T W 2 F .
Proof. We consider the cases k = m and k < m separately. If k = m then G(k, m) is a single point, Y ⊥ = W , and we are done. If k < m, then an orthogonal basis for the orthogonal complement Y ⊥ of any Y orthogonal to W may be written as [W |U ] for some U ∈ R n,m−k that satisfies U T W = 0. We have This is to be minimized over all choices of orthogonal U that are also orthogonal to W . Any U that is orthogonal to both W and X achieves X T U F = 0, giving the result. As we have we can choose such a U .
As in the case of spheres, the best approximating Grassmannians can now be read off from the SVD of a matrix representing from the dataset. Proposition 3. Let X 1 , . . . , X d be a set of d k-planes with orthogonal bases X 1 , . . . , X d . Then the matrix W minimizing the sum of squared chordal distances of the X i to W is precisely the matrix of singular vectors of X T corresponding to its p smallest singular values, where X = [X 1 , . . . , X d ] ∈ R n×kd is the matrix obtained by concatenating the X i s. The chordal distance of the X i to W is the 2-norm of the p smallest singular values of X T . The principal symmetric space approximations are nested, in that the best G(k, p) lies in the best G(k, q) for p ≤ q.
Proof. The sum of the squared chordal distances is This expression is formally identical to that studied in Proposition 1, hence the result follows as in Proposition 1.

Products of spheres
Given two symmetric spaces (G, H, σ) and (G , H , σ ), the direct product (G × G , H × H , σ × σ ) is also a symmetric space [14, p. 228]. The simplest case to consider is that of products of spheres, S a 1 × S a 2 × . . . . Here we focus on products of n circles giving the n-torus (Sec. 5.1) and products of n 2-spheres (Sec. 5.2). In the previous examples, of spheres and Grassmannians, the action of the symmetry group was transitive. Our task was limited to selecting, from the single group orbit available, the best point (or points). On products of spheres, the action of the symmetry group is not transitive; there are many (even infinitely many) distinct orbits. In fitting models with both continuous and discrete parameters, one common approach is to consider each value of the discrete parameters as specifying a different model; which value is chosen then corresponds to a model selection problem. This is the approach adopted here. We note that in the Bayesian paradigm model selection arises naturally through the choice of prior; however we will not pursue this further here.
Two examples illustrate the complexity of the situation. First, consider data consisting of n angles, i.e. x ∈ T n . Totally geodesic submanifolds are subtori described by resonance relations of the form a · x = c, a ∈ Z n , c ∈ R. Each fixed a specifies a different resonance relation, while the continuous parameter c selects the best model for a given discrete parameter a.
Second, consider data consisting of n points on S 2 , i.e. spherical polygons. Totally geodesic manifolds are products of copies of S 1 and S 2 . An example is x 1 lying on a great circle; x 2 lying on a second great circle, and obeying a resonance relation with x 1 ; x 3 arbitrary; and x 4 , . . . , x n being rotations of a fixed spherical polygon.

Classification of totally geodesic submanifolds of tori
The product (S 1 ) n is a symmetric space that we identify with the flat torus T n := (R/Z) n with standard coordinates x ∈ [0, 1) n . The connected totally geodesic submanifolds of R n are the affine subspaces; taking their translations by Z n and passing to the quotient gives the connected totally geodesic submanifolds of T n . Amongst these, we wish to select those that are regular submanifolds. We will describe them by the resonance relations that they satisfy. The group T n acts by translations on T n and leaves the resonance relations invariant; we regard the submanifolds that satisfy different resonance relations as belonging to different models. Thus, the problem of finding the principal symmetric space approximations to given data involves first fixing the resonance relation and then determining the best fitting submanifold that obeys that resonance relation.
We will show in Proposition 4 that the regular connected totally geodesic submanifolds of T n are all tori. Up to translations, they are parameterized by unimodular matrices A ∈ Z k×n , i.e., matrices with integer entries all of whose k × k minors do not have a common factor (their greatest common divisor is equal to 1). Specifically, they have the form T := {x ∈ T n : Ax = c} for some c ∈ [0, 1) n .
Example 2. The case of geodesics in T 2 gives a feel for the requirement that the submanifold be represented by an unimodular matrix A. (i) The subset √ 2], is totally geodesic, but it is an irregular submanifold. It is useless for data fitting as it passing arbitrarily close to every point of the torus. (ii) The subset 2x 1 = 0 of T 2 , associated with A = [2, 0], consists of the two vertical lines (0, y) and ( 1 2 , y) for 0 ≤ y < 1. This set is a regular totally geodesic submanifold, but it is not connected-and A is not unimodular. (iii) The subset 2x 1 + 5x 2 = c, associated with the unimodular matrix A = [2,5], is a regular, connected, totally geodesic submanifold of T 2 . We will give an example of fitting such a geodesic below. Proposition 4. [15] Every regular connected codimension-k totally geodesic submanifold of T n is a subtorus given by Eq. (1) for some c ∈ [0, 1) k and unimodular A ∈ Z k×n .
Proof. First let A be unimodular. We will show that T in Eq. (1) is a regular connected codimension-k totally geodesic submanifold and is a subtorus. Rows can be added to A to create a matrix, C, of determinant 1 [17]. The linear map φ : R n → R n ,x → Cx is invertible, therefore the mapx → Ax is surjective. Let x ∈ T and letx be any point in R n such thatx mod 1 = x. We are given that Ax = c + m for some m ∈ Z n . From the surjectivity ofx → Ax, there is a p ∈ Z n such that Ap = m. Therefore A(x − p) = c. That is, some integer translation ofx lies on the affine subspace {x ∈ R n : Ax = c}, which is the cover of a connected totally geodesic submanifold of T n . Hence T is a totally geodesic submanifold of T n .
The map φ descends to an automorphism of T n ; it provides a change of coordinates on T n . In coordinates y = Cx, the submanifold is given by y 1 = c 1 , . . . , y k = c k , with y k+1 , . . . , y n ∈ [0, 1) n−k . This submanifold is a connected regular submanifold of T n and is a subtorus.
To show the converse, let T be any regular connected totally geodesic submanifold of T n . Its translation U to the origin is a subgroup, hence a subtorus of T n . The kernel of the exponential map of the Lie algebra of U is a lattice in Z n . Form a matrix whose rows are a basis of this lattice. The null space of this matrix has a unimodular integer basis whose entries are the resonance relations satisfied by elements of U and T . These form the rows of A.
The matrix A describes the resonance relations satisfied by the subtorus. If a i · x mod 1 = c i for all i, then for any m i ∈ Z we have ( k i=1 m i a i ) · x mod 1 = c i as well. That is, the set of resonance relations forms a k-dimensional lattice L in Z n , with the rows of A as a basis. Two matrices A, A describe the lattice, and the same family of subtori, if there is a matrix Z ∈ GL(k, Z) such that A = ZA.
Recall that the dual lattice L * is defined by L * := {y ∈ R n : y ∈ span(L), Ay ∈ Z} For any y ∈ L * and x ∈ T , we have A(x + y) = c, thus x + y ∈ T . Since span(L) is orthogonal to the tangent space of T , the lifted subtorus is the product of an affine space and the dual lattice L * .  The lattice generated by the columns of B shows the intersection between the subtorus and a lifted plane orthogonal to it (see Figure 3). Example 4. Let n = 2, k = 1, c = 0, and A = [2,5]. A basis for the dual lattice is A T (AA T ) −1 = [ 2 29 ; 5 29 ]; this vector is orthogonal to the tangent space of the geodesic and gives the spacing between its successive winds, which are spaced a distance 1/ √ 29 ≈ 0.19 apart (see Figure 4). The fractional part of 2x 1 + 5x 2 measures the angular distance from a point x to the geodesic.

Finding the best subtorus with given resonance relation
We now consider the problem of computing the distance from a datapoint to a subtorus. Consider the example shown in Figure 3. To compute the Euclidean distance, it is necessary to (i) lift the datapoint to R 3 ; (ii) project to a plane orthogonal to the tangent space of the subtorus; and (iii) find the nearest point in the dual lattice L * ; and (iv) compute the distance to this point. The difficult step is (iii), an instance of the Closest Vector Problem (CVP) in the dual lattice L * . However, this is a difficult problem in high dimensions and the degree of complexity it entails seems unnecessary here.
This step can be avoided by modifying the metric suitably.
As we are working with angular distances, we replace the standard angular distance d(x, y) = 2π|x − y| ≤ π, x, y ∈ T, by the chordal distance d c (x, y) = 1 2 sin π|x − y| ≤ 1 2 . The Karcher mean of angles x i ∈ [0, 1) is easily calculated as the circular mean We define the circular meanx of x i ∈ T n componentwise. We now introduce a further modification of the metric that is adapted to the chosen family of subtori.

Proposition 5. Let
A be the first k rows of C ∈ GL(n, Z). Amongst the subtori with resonance relation A, the subtorus of best fit in the metric d C to the data x 1 , . . . , x d ∈ T n is {x ∈ T n : Ax = c}, where c is the circular mean of Ax 1 , . . . , Ax d .
Proof. In coordinates y = Cx, the subtorus is given bŷ and the distance from y to the subtorus T = {x : Ax = c} is determined by the angular displacementŷ − c ∈ T k whereŷ = (y 1 , . . . , y k ).
Note that although d C depends on the whole matrix C, the best subtorus only depends on its first k rows, A.
If the rows of A are pairwise orthogonal and all have the same length, then d C (x, T ) = d c (x, T ), but in general the two metrics are not the same. Different bases A of L lead to different distance measures and different best tori. Most lattices have no orthogonal bases. However, it does point to the necessity of choosing a good basis for the resonance relations, one in which the relations are as nearly orthogonal as possible. This is another standard problem in lattice theory, one that can be solved exactly in low dimensions, and approximately (by the LLL algorithm) in high dimensions.
Example 3 (ctd.) The angle between the two basis vectors in Example 1 is 32 • . A more nearly orthogonal basis is in which the angle between the basis vectors is 75 • .
The best fit of a 1-torus with fixed resonance relation to data in T 2 is shown in Figure  4, and the best fits of 1-tori and 2-tori with fixed resonance relations is shown in Figure 5.  Figure 4: Fitting data on a torus. Here the closed geodesic of best fit is computed to a set of 50 data points on S 1 × S 1 . The data set is synthetic and has been chosen to lie near the geodesic with resonance relation 2x 1 + 5x 2 = const.; each data point has normal random noise of standard deviation 0.1/(2π) in each angle.

Model selection for tori
In finding the best subtorus amongst those with fixed resonance relations, the overall scaling of the metric is irrelevant. It becomes relevant during the model selection phase, when fitted subtori with different resonance relations are compared. Here we illustrate one possible approach to this issue using (i) the unscaled circular means, as given above; and (ii) the 'leave one out' model selection method. Item (i) means that the maximum distance of any point to a subtorus, in each coordinate, is 0.5, regardless of the resonance relations or winding density of the subtorus. While the metric could be scaled down, to make it more closely approximate the original Riemannian distance in T n , doing so would strongly favour models with very dense windings, as they pass close to every point in T n . Therefore we stick with the unscaled metric d C defined above. Item (ii) means that for each data point i, the subtorus of best fit to the data set omitting point i is calculated, from which the prediction error of this fit to data point i can be calculated. In the scaled chordal metric we are using, this is e i := The method is illustrated on a synthetic data set of 50 points that lie near the geodesic 2x 1 + 5x 2 = const.; see Figure 4. All resonance relations with A ∞ < 10 are tested. The leave-one-out method selects the 'correct' A = [2,5] for this dataset.

Nested approximations
So far we have presented a method for finding the best subtorus of a given dimension. However, note that the same method naturally produces a nested sequence of approximations of subtori of different dimensions. Proposition 6. Let A ∈ GL(n, Z), let A k be the first k rows of A, and let x 1 , . . . , x d be data in T n . For each k = 1, 2, . . . , n − 2, the (n − k)-dimensional subtorus with resonance relations A k of best fit contains the (n − k − 1)-dimensional subtorus of resonance relations A k+1 of best fit.
Proof. The subtori are A k x = c, where A k is the first k rows of A and the entries in c ∈ R k are the circular means of Ax i . Adding another resonance relation, i.e. increasing k by 1, does not change the first k entries of c.
Thus to each A ∈ GL(n, Z) we get a nested sequence of subtori of dimension 1 to n − 1 and an approximation error associated to each subtorus. If the rows of A are nearly orthogonal, this is a close analogue of standard PCA.
We take a synthetic data set of 50 points on T 3 . (See Figure 5.) When k = 1 we are seeking the best 2-torus of the form −x − y − z = const.; it has mean error 0.049. When k = 2 we are seeking the best 1-torus of the form −x − y − z = const., −2x + y = const., i.e., the best geodesic parallel to [1,2,3]. It has mean error 0.049 orthogonal to the previously found 2-torus, i.e. in the direction [−1, −1, 1], and mean error 0.169 in the direction [−2, 1, 0]; its mean error is √ 0.049 2 + 0.169 2 = 0.176. These errors are scaled so that the distance between winds is 1, i.e., the distance between winds of the blue 2-torus is 1 and the distance, measured within the blue 2-torus, between winds of the red 1-torus is 1.

Polyspheres
The polyspheres S 2 × · · · × S 2 arise frequently in practical applications, for example in joint data. We begin by considering the case S 2 × S 2 , as the arguments are analogous in higher dimensions. We must classify the geodesic submanifolds of S 2 × S 2 , for which purpose we recall that the symmetric algebra of , which will be used frequently in the coming calculations. The symmetric algebra of S 2 ×S 2 is (h+h)+(m+m), and the totally geodesic submanifolds (and hence symmetric subspaces) take the form where x ∈ S 2 × S 2 is an arbitrary point, and m ⊂ m + m is a Lie triple system. We now state the results we obtain, the proofs of which are to be found in the appendix. We first classify the geodesic submanifolds of S 2 × S 2 : Theorem 2. The 1-dimensional connected geodesic submanifolds of S 2 ×S 2 are the submanifolds {(exp(r 1 t)x, exp(r 2 t)y : t ∈ R}, where r i ∈ o(3) and x, y ∈ S 2 . The 2-dimensional connected geodesic submanifolds of S 2 × S 2 are of the following types: x, y ∈ S 2 are fixed.
The principal symmetric space decompositions of S 2 × S 2 can by summarized by the following diagram: Theorem 3. The totally geodesic submanifolds of (S 2 ) n are isomorphic to a product of copies of S 2 and S 1 . The rooted tree structure of the principal symmetric space approximations PSSA(X, (S 2 ) n ) is characterized as follows: every edge of the tree corresponds to either 1. A reduction to an m-torus inside an n-torus (m < n) 2. A 2-dimensional reduction arising from a coupling of two spheres after situation 1 of Lemma 2.
3. One of the following one-dimensional reductions: the restriction to a great circle S 1 ⊂ S 2 , or to a trivial submanifold x 0 ⊂ S 1 .
The complexity of rooted tree arising from principal symmetric space approximations of polyspheres shows that a model selection problem cannot be avoided; see the remarks in the introduction and §5.1.3.
Example 5. We illustrate the behaviour with two synthetic datasets on S 2 × S 2 . The data of 6 illustrates the middle branch of the tree, whilst the rightmost branch of the tree is shown in Figure 7.   Figure 7: Fitting data on a polysphere (see Example ??). The best approximating torus S 1 × S 1 ⊂ S 2 × S 2 is shown with two great circles inside the two spheres. Due to the difficulty of plotting the nested approximation S 1 ⊂ S 2 × S 2 we have plotted (right) the projection of the points in S 2 × S 2 to the approximating torus S 1 × S 1 and shown the best approximating S 1 as a subset of this. Now suppose the subspace V is generated by (u 1 , v 1 ), (u 2 , v 2 ), (u 3 , v 3 ). Then either the (u i ) or (v i ) span R 2 , assume that (u i ) do. For any fixed t, if (x, y) ∈ V then x = au 1 + bu 2 + tu 3 in a unique manner, indeed Then and we see that V takes the form (x, Ax + tv) for some fixed A and v.
We will also make use of the following elementary results.
Lemma 6. Let A ∈ so(2) be non-zero, and let B be a 2 × 2 matrix. Suppose that BAB T Bz = BAz for all z ∈ R 2 . Then either B ∈ O(2) or B = 0.
Proof. Firstly note that the condition BAB T Bz = BAz for all z ∈ R 2 is equivalent to BAB T B = BA, by for instance the bijective correspondence between linear transformations and matrices. Let C = B T B; multiplying both sides by B T results in the relation CAC = CA. Note that C is symmetric, whilst A is antisymmetric, and hence CAC is also antisymmetric; as so(n) is onedimensional we have CAC = kA for some k ∈ R. This gives also CA = kA, and taking transposes −AC = −kA. Then CAC = C(AC) = kCA = k 2 A. It follows that k 2 A = kA, and hence k = 0 or 1. Moreover, as A is invertible, we conclude that C = kI and the result follows.
Proof. Without loss of generality we consider a basis such that

Proof of Theorem 2
Proof. We must search for and exponentiate Lie triple systems m ⊂ m + m; these take the form m(V ), where V ⊂ R 2 ⊕ R 2 take the forms described in Lemma 5. Amongst the two-dimensional cases, we begin by those of the form V = {(ξ, Bξ) | ξ ∈ R 2 }, B ∈ Mat ( (2), Lemma 6 shows that we obtain a Lie triple system if either B T B = I, or B = 0. We see by Lemma 7 that taking an orthogonal B results in a subspace of the first kind listed, whilst taking B = 0 trivially results in the second kind. It remains to check subspaces {(t 1 ζ 1 , t 2 ζ 2 ) | t i ∈ R}, ζ i ∈ R 2 , but these are trivially totally geodesic, as then [m , m ] = 0. Exponentiating the resulting subspace gives the third case of the lemma. We then consider the three-dimensional submanifolds, where V = (ξ, Aξ + tζ) | ξ ∈ R 2 , t ∈ R}, ζ ∈ R 2 , A ∈ Mat(2 × 2). We first compute  (2). We are left with two lines in R 2 (as we vary r, p), it follows that we require v to be an eigenvector of BAB T , however as BAB T ∈ o(2) it has no real eigenvectors unless B = 0. The results then follows immediately upon exponentiating m(V ).

Proof of Theorem 3
Proof. We begin by noting that the case S 2 × S 2 follows this pattern: the edge 1 is the twodimensional reduction resulting from a coupling of spheres, the edge 2 comes from the inclusion x 0 ⊂ S 1 , and the edge 3 is an inclusion S 1 ⊂ S 2 . The remaining edges are clearly also of this pattern.
We now sketch a proof that the reductions of Lemma 2 are the only possible ones, even for higher polyspheres. The main point is that result and proof of Lemma 5 is generic, indeed the vector subspaces of R 2 ⊕ · · · ⊕ R 2 take the form {(ξ 1 , . . . , ξ m , . . . , A d 1 ξ 1 + . . . + A d m ξ m + t d v d ) | ξ 1 , . . . , ξ m ∈ R 2 , t 1 , . . . , t d ∈ R} The argument is essentially the same as before, roughly we proceed by letting {u 1 , . . . , u k } be a basis for the subspace, where we can write each vector u i = (u i 1 , . . . u i n ), each u i j ∈ R 2 being a 2d column vector. Form the block matrix U = [u] j i , a rank k matrix of size 2n × k. Form a k × k submatrix of full rank by discarding rows. Consider a vector (x 1 , . . . , x n ); the form of x i depends on the discards in the rows of block i: Where no rows are discarded, we have a free ξ i , if both rows are discarded we A i j ξ j where we discarded both rows, we have in addition the t j v j where we discarded only one.
Consider then the possible Lie triple systems m(V ), where V takes the form above. Pick two terms from V ; these must reduce to one of the cases described in Lemma 2 if we set the other free terms ξ j , t j to zero. This proves that the A i j must be orthogonal or zero, and must be zero if paired with a tv term; it remains to show that for any given i only one A i j can be non-zero. For this purpose we compute