Modelling Anisotropic Covariance using Stochastic Development and Sub-Riemannian Frame Bundle Geometry

We discuss the geometric foundation behind the use of stochastic processes in the frame bundle of a smooth manifold to build stochastic models with applications in statistical analysis of non-linear data. The transition densities for the projection to the manifold of Brownian motions developed in the frame bundle lead to a family of probability distributions on the manifold. We explain how data mean and covariance can be interpreted as points in the frame bundle or, more precisely, in the bundle of symmetric positive definite 2-tensors analogously to the parameters describing Euclidean normal distributions. We discuss a factorization of the frame bundle projection map through this bundle, the natural sub-Riemannian structure of the frame bundle, the effect of holonomy, and the existence of subbundles where the Hormander condition is satisfied such that the Brownian motions have smooth transition densities. We identify the most probable paths for the underlying Euclidean Brownian motion and discuss small time asymptotics of the transition densities on the manifold. The geometric setup yields an intrinsic approach to the estimation of mean and covariance in non-linear spaces.


1.
Introduction. Let M be a smooth, connected, compact manifold of finite dimension. This paper is concerned with stochastic modelling of data on M , a setting that arises in a range of applications, for example in medical imaging and computational anatomy [8] where anatomical shapes reside in non-linear shape spaces.
In [18], a class of probability distributions was introduced that generalizes the Euclidean normal distribution to the non-linear geometry on M . The aim is to allow statistical analysis of non-linear data by fitting the parameters of the distributions to data by a maximum likelihood approach. The method is intrinsic using development of stochastic processes in the frame bundle. In particular, it avoids the linearization of the manifold around a mean point that most non-linear statistical tools rely on.

STEFAN SOMMER AND ANNE MARIE SVANE
In this paper, we will set forth the geometric foundation of the class of Brownian motions on M and their transition distributions: we describe (1) the sub-Riemannian structure in the frame bundle of M and the relation between the Fréchet mean on M and the sub-Riemannian distance on F M ; (2) the Eels-Elworthy-Malliavin construction of stochastic processes via the frame bundle; (3) a factorization through the bundle of symmetric positive definite matrices that represent infinitesimal covariance; (4) reduction to subbundles where the Hörmander condition is satisfied and the density therefore smooth; (5) most probable paths for the generated stochastic processes; (6) small time asymptotics of the transition densities; and (7) applications in statistics.
1.1. Statistical Background. In Euclidean space, it is natural to use the mean of a stochastic variable to describe the typical sample point, while the covariance captures the variation around this point. However, the analogue for manifold valued stochastic variables is not clear. Many techniques use the Fréchet mean [6] as the natural definition of the mean point in M , see also the discussion [16]. Suppose M is equipped with the Riemannian metric g, and let d g denote the associated geodesic distance function on M . Then the Fréchet mean of a stochastic variable X on M is a point x 0 ∈ M that minimizes Ed g (x 0 , X) 2 . If Ed g (x, X) 2 is finite in one point x ∈ M , then such a minimizer exists, but it may not be unique. Given a set of data points x 1 , . . . , x N on M , the Fréchet mean is usually estimated by a point x 0 ∈ M that realizes argmin x0∈M There are various attempts in the literature to define a notion of covariance or principal component analysis (PCA), see e.g. [5,16]. Typically, such approaches are based on a linearization of the manifold around the Fréchet mean, e.g. by using normal coordinates. However, if the covariance is not isotropic, it is arguably natural to replace d g in (1) by a distance measure that takes the anisotropy into account. Mimicking the Euclidean case, the covariance is usually represented by a symmetric positive definite matrix on T x0 M corresponding to an inner product. If the covariance is anisotropic, it is natural to measure distances in a metric that extends the inner product on T x0 M to all of M , rather than using d g .
To enable this, we suggest lifting the problem to the frame bundle. This requires that M has a connection, typically the Levi-Civita connection associated with a Riemannian structure, and a fixed volume measure µ. A point u 0 in the frame bundle naturally defines an inner product σ on T x0 M , and given a connection in the frame bundle, there is a natural way of extending it to a (sub-Riemannian) metric on the whole frame bundle. This induces a (sub-Riemannian) distance function d F M on the frame bundle. We thus suggest replacing (1) by where the infimum is taken over all points u i in the frame bundle projecting to x i . The last term assumes a Riemannian metric on M and is inspired by the normalization factor in a Euclidean normal distribution. The advantage of this approach is that the minimizer encodes both the best descriptor x 0 ∈ M of the data and the covariance of the data represented by the inner product σ on T x0 M that corresponds to the precision matrix, the inverse covariance. Informally, if σ were kept fixed in (2), a minimizing x 0 would correspond to the Fréchet mean with anisotropically weighted distance. However, while σ can be parallel transported along curves in M , the curvature of the manifold generally prevents the notion of a globally fixed σ. We therefore need to minimize over x 0 and σ simultaneously in (2). In [17,18], it was suggested to describe manifold valued data by a parametric model that generalizes the normal distribution in Euclidean space using the fact that normal distributions arise as transition distributions of Euclidean Brownian motions. The family of Euclidean Brownian motions can be naturally generalized to Brownian motions on M through stochastic development in the frame bundle. Evaluating at a fixed time yields a distribution on M , which we will interpret as a normal distribution. The initial point and infinitesimal covariance of the process may be interpreted as the mean and covariance, respectively.
There is a close, though far from trivial or thoroughly understood, relation between distributions on M obtained in this way and the sub-Riemannian distance function in the frame bundle. The non-Euclidean normal distribution will have a density under reasonable assumptions. As we show in this paper, on small time scales, maximizing the log-density corresponds to minimizing the squared sub-Riemannian distance. However, this relationship is weakened as the Brownian motion evolves in time.
One of the main ideas of [18] was that, if we consider our data points as realizations of some distribution on the manifold, then, rather than minimizing the squared length of paths as in (1), one should maximize the probability of a path. While it is not a priori clear how this should be understood for a general stochastic variable on M , there is a way of defining it for a point that results from a stochastic process. The most probable paths are again closely connected to the sub-Riemannian distance.
We aim at linking these ideas in this paper. We start in Section 2 by discussing the geometry of the frame bundle, in particular the sub-Riemannian metric and the factorization of the frame bundle through the bundle of inner products. We then discuss holonomy, reachable sets, and existence of subbundles where the Hörmander condition is satisfied. 1 In Section 3, we recall the Eels-Elworthy-Malliavin construction of Brownian motions in the frame bundle and the associated transition probabilities. In Section 4 we identify the most probable paths for the processes. The small time asymptotics of the transition densities in F M and M are described in Section 5. We conclude the paper by outlining how the theory can be applied in statistics.
2. Sub-Riemannian Geometry in the Frame Bundle. In this section, we recall the definition of the frame bundle. We factor the projection of the frame bundle to the manifold through the bundle of symmetric positive definite covariant 2-tensors, which can be used to represent covariance at points in M in a natural way. Moreover, we define a natural sub-Riemannian structure on the frame bundle and describe its geometry, in particular its holonomy.
2.1. The Frame Bundle. The frame bundle F M of a smooth n-dimensional manifold M is a smooth vector bundle F M π − → M whose points consist of a point x ∈ M and a frame (ordered basis) (u 1 , . . . , u n ) for the tangent space T x M . Alternatively, letting e 1 , . . . , e n denote the standard basis in R n , we may think of a point in F M as an isomorphism u : R n → T x M where u(e i ) = u i . The frame bundle thus has the structure of a principal bundle with fiber GL(n).
A connection on F M is a smooth splitting of its tangent bundle as T F M ∼ = V ⊕H, where V is the kernel of π * : T F M → T M , referred to as the vertical bundle, and H is any complement, referred to as the horizontal bundle. The restriction of π * to H u yields a bijection onto T π(u) M whose inverse is called horizontal lifting and will be denoted h u .
At any point u = (u 1 , . . . , u n ) ∈ F M , there is a basis H 1 , . . . , H n for H u , where H i = h u (u i ). The H i define horizontal vector fields. There is a natural correspondance between Euclidean paths γ : [0, 1] → R n starting at the origin and horizontal paths η : [0, 1] → F M with fixed initial pont u 0 ∈ F M : Ifγ(t) = (α 1 (t), . . . , α n (t)), then η is the solution to the differential equatioṅ η(t) = H i (η(t)) α i (t) with η(0) = u 0 . The map φ u0 taking paths in R n to paths on F M is known as the development map.
If M is Riemannian, the Levi-Civita connection naturally defines a horizontal subbundle of F M as follows. Take any curve through x ∈ M with initial velocity v. Parallel transport of the frame vectors u 1 , . . . , u n ∈ T x M along the curve yields a path in F M starting at u. The derivative of this curve at zero is the horizontal lift h u (v) of v to T u F M . The set of all horizontal lifts forms the vector bundle H. Since parallel transport preserves inner products, the horizontal distribution is tangent to the orthonormal bundle OM consisting of isometries u : R n → T x M .
If we choose coordinates x = (x 1 , . . . , x n ) on M , then there is a set of coordinates on F M given by x 1 , . . . , x n , α j i where u i = α j i ∂ j (here ∂ j = ∂ ∂x j and we have used the Einstein summation notation). We denote the matrix with jith entry α j i by α. Moreover, there are coordinates on Sym + M given at σ ∈ Sym + M by the point By the polar decomposition theorem, there is a diffeomorphism between Sym + M and the quotient F M/O(n) where O(n) acts on F M from the right by (x, α)ρ = (x, αρ). The coset (x, α)O(n) ∈ F M/O(n) corresponds to (x, (αα T ) −1 ) ∈ Sym + M . The horizontal distribution on F M reduces to a distribution H S , which we also call horizontal, on Sym + M . Proposition 2.1. There is a smooth splitting T Sym + M ∼ = H S ⊕ ker q * such that Σ * H = H S . In particular, any vector v ∈ T q(σ) M has a unique horizontal lift to H S σ which we denote h S σ (v). Proof. We must show that for any vector v ∈ T x M there is a unique vector h S σ (v) ∈ T σ Sym + M such that Σ * h u (v) = h S σ (v) for all u with Σ(u) = σ. But this is true because if u,ũ ∈ F M with u i = α j i ∂ j andũ i = (α) j i ∂ j and Σ(u) = Σ(ũ) = σ, then there is an orthogonal map ρ ∈ O(n) with α · ρ =α. Let γ : [0, 1] → M be a path withγ(0) = v and let u(t) = α j i (t)∂ j (γ(t)) be a parallel lift with u(0) = u.
By standard principal bundle arguments, the distribution H S is smooth.
There exists a natural sub-Riemannian metric on the horizontal distribution on the frame bundle. Indeed, for v, w ∈ H u , the inner product is given by Alternatively, we may think of this sub-Riemannian structure as a mapg F M : and they therefore constitute a global orthonormal frame for H. Note that g u is different from the induced metric of bundle type [15] that makes π * an isometry on H because π * g F M u depends on u. The sub-Riemannian length of an absolutely continuous path γ : [0, 1] → F M whose derivative is a.e. horizontal is defined by Ifγ is not a.e. horizontal, we set l(γ) = ∞. The sub-Riemannian distance between u 1 and u 2 in F M is then As in the Riemannian case, a length minimizing curve is called a geodesic. Note that the sub-Riemannian distance between two points may be infinite, even if M is connected, because there may be points that cannot be joined by a horizontal path. Similarly, there is a sub-Riemannian metric g S on H S given on the horizontal vectors v, w ∈ T σ Sym + M by This again leads to a notion of sub-Riemannian length and distance d Sym , the map Σ takes a horizontal curve on F M to a horizontal curve on Sym + M of the same sub-Riemannian length. This implies that where x ∈ M and the distances are the minimal distances to the fibers of F M and Sym + M , respectively.

The Reachable Set and
Holonomy. Not all points in F M can be joined by a horizontal path. We therefore consider the reachable set Q(u) consisting of points that can be reached from u by a horizontal path. For u, p ∈ F M we write u ∼ p if u and p can be joined by a horizontal curve in F M . Then The reachable set Q(u) is a smooth immersed submanifold [21]. We let π Q(u) denote the restriction of π to Q(u). Proof. Theorem 3.2.8 of [10] asserts that the holonomy subgroup is closed because M is Riemannian. The result then follows from Theorem 2.3.6 of [10].

The holonomy group Hol
When M is Riemannian, the holonomy group corresponds to the set of frames reachable by parallel transport along loops starting and ending at π(u). Parallel transport preserves the Riemannian inner product, and Hol u is isomorphic to a subgroup of O(T x M ). Thus, if u is orthonormal, then Q(u) is a subbundle of the orthonormal frame bundle OM . If M is also orientable, then Hol u is a subgroup of SO(T x M ) because parallel transport preserves orientation. Recall the decomposition T u F M ∼ = H u ⊕ V u . The Lie algebra h u of Hol u must be contained in V u . Moreover, the vertical part of Lie(H) u must be contained in h u . Under the condition of Proposition 2.2, Lie(H) u ⊆ h u ⊕ H u = T u Q(u), so any connected submanifold of F M on which the Hörmander condition holds must be contained in Q(u). There are several reasons why we are interested in whether the Hörmander condition satisfied on Q(u 0 ). Some are listed in the next proposition. Below, we give conditions under which the Hörmander condition is indeed satisfied on Q(u). The discussion is based on [3,10,15,21].
Suppose M is Riemannian. Injectivity of the curvature tensor R x : For this, it is enough that the span of H and its first bracket equals When R is not injective, it is still possible that Q(u) satisfies Hörmander's condition in some non-degenerate situations: Theorem 2.5. If Lie(H) u has constant rank for all u ∈ F M , then Q(u) satisfies the Hörmander condition.
The constant rank condition is for instance satisfied for analytic manifolds [15, Appendix C] and homogeneous spaces.
Proof. The distribution Lie(H) is involutive by definition. The constant rank ensures that the Frobenius theorem (see [12] Theorem 3.20) applies. Thus for any u ∈ F M there exists a maximal connected immersed submanifold Q Lie(H) (u) containing u of dimension dim Lie(H) with tangent space Lie(H). By construction, the Hörmander condition is satisfied on Q Lie(H) (u).
Chow's theorem [15, Theorem 2.2] yields that Q Lie(H) (u) ⊆ Q(u). On the other hand, any two points in Q(u) can be joined by a horizontal curve. By the construction of Q Lie(H) (u), this curve must lie in Q Lie(H) (u). We deduce that Q(u) and Q Lie(H) (u) are equal as sets. Moreover, it follows from [15,Exercise C.4], see also [21], that dim Lie(H) u = dim Hol u , so Q(u) = Q Lie(H) (u) as differentiable manifolds.
In general, however, Lie(H) may not have constant dimension. In this case, it is not possible to find a submanifold of F M where H is bracket generating. For instance, if M is flat in a neighborhood of π(u) then dim Lie(H) u = n, while the dimension of Lie(H) will be larger in curved parts of M .
2.6. The Hamilton-Jacobi Equations. A class of geodesics for the sub-Riemannian metric on F M , called the normal geodesics can be computed by solving an ordinary differential equation on T * Q(u 0 ). Recall that the sub-Riemannian metric can be written as a cometricg F M : T * Q(u 0 ) → H ⊂ T Q(u 0 ). For q ∈ Q(u 0 ) and p ∈ T * q Q(u 0 ), we define the Hamiltonian H(q, p) = 1 2 p(g F M (p)).
The Hamiltonian differential equations in canonical coordinates (q i , p i ) on T * Q(u 0 ) look as followsq If the Hörmander condition is satisfied on Q(u 0 ), then the projection of any solution is a (locally) length minimizing geodesic for the sub-Riemannian structure, called a normal geodesic. However, contrary to ordinary Riemannian geometry, there exist length minimizing geodesics that do not arise in this way. See e.g. [15] for details. There is a sub-Riemannian version of the exponential map exp u0 : T * u0 Q(u 0 ) ∩ H −1 1 2 → Q(u 0 ) that takes an initial covector p 0 ∈ T * u0 Q(u 0 ) to the endpoint of the geodesic that results from projecting the solution to the Hamiltonian equations with initial condition (u 0 , p 0 ). However, the exponential map is not a local diffeomorphism around 0 and may not be surjective.
In a similar way, if we choose to work with geodesics on Sym + M , there are Hamiltonian equations for the normal geodesics. This has the practical advantage of reducing the number of equations. Again, there is no guarantee that all geodesics are normal. An expression in local coordinates of the sub-Riemannian Hamiltonian equations on F M for computational purposes can be found in [19].
3. Brownian Motion and Stochastic Development. The Brownian motion W t on Euclidean space starting from µ ∈ R n and having covariance matrix Σ is an almost surely continuous stochastic process with stationary independent increments whose distribution at time t satisfies W t ∼ N n (µ, tΣ) .
The Brownian motion can be generalized to any manifold M with connection through the process of stochastic development that allows any Euclidean semimartingale to be mapped uniquely to a corresponding process in F M given a starting point u 0 ∈ F M [9]. Developing the standard Brownian motion in F M and projecting to M yields a stochastic process on M . Evaluated at a fixed time t, its distribution can be regarded a generalization of the normal distribution to M . If u 0 is orthonormal, the resulting process is what is normally called the Brownian motion on M . If u 0 is non-orthonormal, the resulting distribution can be viewed as a Brownian motion on M with anisotropic covariance matrix [18].
Below, we make this construction concrete, we discuss the reduction to Sym + M , where the distribution on M depends uniquely on the initial value, and the restriction to the subbundles Q(u 0 ) where the transition densities are smooth.
3.1. Brownian Motion in the Frame Bundle. Given a manifold M with connection, there is a stochastic version of the development map from Section 2.1. Given a stochastic process W t in R n starting at 0, the stochastic development of W t in F M starting at u 0 ∈ F M is a stochastic process U t on F M solving the Stratonovich stochastic differential equation with initial condition U 0 = u 0 ∈ F M , see e.g. [9]. The process U t is a diffusion on F M , and the projection X t = πU t onto M yields a stochastic process on M . We denote the stochastic development map W t → X t byφ u0 .
Let M be equipped with the Riemannian metric g. Since parallel transport preserves the Riemannian inner product, the horizontal vector fields are tangent to the submanifolds of F M where the coordinates g(u i , u j ) of the metric g in the frame u are constantly equal those of u 0 . It follows that any solution to dU t = H i (U t )•dW i t with initial condition u 0 will stay inside these submanifolds according to [4, 7.23].

ANISOTROPIC COVARIANCE AND SUB-RIEMANNIAN GEOMETRY 9
In particular, if u 0 is orthonormal, then g u0 (u i , u j ) = δ ij and hence U t will stay inside the orthonormal bundle OM .
If u 0 is orthonormal and W t is a standard Brownian motion, then X t is the usual Brownian motion on M [9]. If W t is a standard Brownian motion but u 0 is not orthonormal, then we may think of X t as a generalized Brownian motion on M . Hence, the distribution of X t at time t = 1 can be thought of as the analogue of a normal distribution on M . The starting point x 0 = π(u 0 ) can naturally be interpreted as the mean of the distribution, while the symmetric bilinear positive definite map Σ(u 0 ) on T x0 M corresponds to the precision matrix, i.e. the inverse of the covariance matrix. The covariance will be anisotropic if the starting frame u 0 is not orthonormal. As we shall see below, the distribution of X t is uniquely determined by Σ(u 0 ).

3.2.
Reduction of Initial Conditions. The n horizontal vector fields H i on F M define a second order differential operator on F M by The Brownian motion U t on F M is an L-diffusion in the sense of [9], meaning that for any smooth f : is a local martingale. The restriction of L to OM descends to the Laplace-Beltrami operator ∆ g on M , i.e. if f : M → R is a smooth function then L(f • π) = ∆ g (f ). Hence the standard Brownian motion can be intrinsically defined on M as a ∆ g -diffusion. In general, L does not reduce to an operator on M , so the generalized Brownian motions are not intrinsically defined. We can, however, reduce it to an operator on Sym + M . The map Σ : F M → Sym + M maps U t to a stochastic process Y t on Sym + M . We will show that there exists a (degenerate) elliptic operatorL on Sym + M such that Y t is anL-diffusion.
Here σ jk = σ −1 (∂ j , ∂ k ). Note that the function η k (u) = i α j i (u) h u (∂ j ) α k i (u) depends only on Σ(u) since if ρ is an orthogonal map, then Letting V i (σ) = (σ 1/2 ) jk h S σ (∂ k ), we thus find that the theorem holds with The last statement follows from the definition of an L-diffusion (5) and the corresponding definition of anL-diffusion.
As a consequence, we find that the distribution of X t is overparametrized by the initial conditions in F M . Proof. According to [9, Thm. 1.3.6] anL-diffusion is uniquely determined by its initial distribution.
We now show that the distribution of X t depends only on Σ(U 0 ). To see this, let a ∈ GL(n) be a linear map. The action u → u · a on F M defines a diffeomorphism F M → F M . Let V t = U t · a. This is a new process with πV t = πU t = X t . Using [9, Prop. 1.

2.4] and that a
with initial condition V 0 = u 0 · a. But a −1 W t is a Brownian motion on R n with covariance matrix (a T a) −1 . Choosing a to be orthogonal yields another proof of Corollary 3.2.
Proposition 3.3. The distribution of the process X t = π(U t ), where U t is a solution of (3) with initial condition U 0 = u 0 , is uniquely determined by Σ(u 0 ).
Proof. Suppose that we are given two processes U 1 t and U 2 t each solving dU ν t = H i (U ν t ) • dW i t , ν = 1, 2, and with U 1 0 · a 1 = U 2 0 · a 2 for some a 1 , a 2 ∈ GL(n). Define V ν t = U ν t · a ν . Then both V ν t solve (6) and satisfy V 1 0 = V 2 0 . The antidevelopment theorem [9,Theorem 2.3.4 + 2.3.5] shows that the process (a ν ) −1 W t can be recovered uniquely by solving two stochastic differential equations involving only πV ν t = πU ν t = X ν t . Since the distribution of a −1 W t is uniquely determined by (a T a) −1 , X 1 t and X 2 t can only have the same distribution if ((a 1 ) T a 1 ) −1 = ((a 2 ) T a 2 ) −1 .

Transition Probabilities.
We first recall that a finite set of vector fields X i on a manifold define a sub-Laplacian with respect to a given volume form µ on the manifold by the formula where the divergence is computed with respect to µ.
Proposition 3.4. Let M be Riemannian and suppose that H satisfies the Hörmander condition on Q(u 0 ). Then there exists a volume form µ Q on Q(u 0 ) such that the horizontal vector fields H i have divergence 0. In particular, L is a sub-Laplacian with respect to this volume form.
Proof. According to Proposition 2.2, Q(u 0 ) is a subbundle of F M . In [14] a Riemannian metric G is defined on F M such that the H i are divergence free with respect to G (see [14,Thm 8.4]). This metric makes the horizontal and vertical distributions orthonormal. LetG be the restriction of G to Q(u 0 ). Moreover, let ∇ and∇ be the Levi-Civita connections with respect to G andG, respectively. Locally choose orthonormal vector fields E 1 , . . . , E n spanning H and extend by V 1 , . . . , V d to an orthonormal basis of T Q(u 0 ) and further extend by W 1 , . . . , W n 2 −d to an orthonormal basis for T F M . The divergence with respect to the volume form associated to G is given by the trace of the connection, so Since∇ is given by projecting ∇ onto T Q(u 0 ) and since ∇ is compatible with the metric, According to [14,Thm 4.3], the fibers of F M are autoparallel, hence ∇ Wi W i is vertical and Still assuming M to be Riemannian, let U 0 be a Brownian motion on F M with initial value u 0 and assume that the horizontal vector fields satisfy Hörmander's condition on Q(u 0 ). Since L is a sub-Laplacian by Proposition 3.4, there exists a unique solution p Q(u0) t (u 1 , u 2 ) to the sub-Riemannian heat equation such that lim t→0 p Q(u0) t (u 1 , u 2 ) = δ u1 (u 2 ) in the sense of distributions and p Q(u0) t is smooth on (0, ∞) × Q(u 0 ) × Q(u 0 ) [20]. The function p Q(u0) t is called the heat kernel and provides a density with respect to µ Q for the distribution of U t at time t (see [9] for an argument in the Riemannian case).
Let π Q(u0) denote the restriction of π to Q(u 0 ). The distribution of X t satisfies for any Borel set C ⊆ M . Here we have used that the volume form µ Q defined in the proof of Proposition 3.4 satisfies dµ Q = dµ π −1 Q(u 0 ) (y) dµ g where µ g is the Riemannian volume form on M associated with g and µ π −1 Q(u 0 ) (y) is the volume form on π −1 Q(u0) (y) coming from regarding π −1 Q(u0) (y) as a subgroup of GL(n) and restricting the Euclidean metric. This follows from the expression of the metric G in [14,Thm 4.1]. Equation (7) shows that X t has a density, and since this only depends on σ = Σ(u 0 ) ∈ Sym + M , we denote it by p M t (σ, y). We see that p M t is given by The density p M t can be taken as a starting point for a statistical estimation procedure, such as maximum likelihood estimation, in order to estimate the initial conditions σ ∈ Sym + M . There is no closed formula for p M t , not even in the well-studied isotropic case (see [9] for asymptotic results). In Section 5, we derive asymptotic results that can be used for heuristic estimation procedures, see Section 6.
4. Most Probable Paths. Instead of using the geodesic distance to measure the similarity between points, it was argued in Section 1.1 that when the data points are considered outcomes of a stochastic process on M , it is natural to consider the maximal probability of a path connecting two points. We are going to describe the notion of most probable paths and their characterization via the Onsager-Machlup functional [7]. Afterwards, we discuss how most probable paths for the driving Euclidean process relate to the sub-Riemannian metric. This does not require M to be Riemannian.
If X t is a stochastic process on a manifold M with initial point X 0 = x 0 , the most probable path from x 0 ∈ M to another point y ∈ M can be defined as the path that maximizes the probability that X t sojourns around the path [7]. More formally, let γ : [0, 1] → M be a smooth path with γ(0) = x 0 . We denote by µ M (γ) the probability that X t stays within distance from γ, i.e. µ M (γ) = P (d g (X t , γ(t)) < , ∀t ∈ [0, 1]) .
The most probable path is the path that maximizes µ M (γ) when → 0 4.1. The Isotropic Case. The most probable paths of a Brownian motion have only been determined in the case where the Brownian motion is isotropic, i.e. the starting frame u 0 is orthonormal. The reason why this case is easier to handle is that the isotropic Brownian motion is intrinsically defined on M as the diffusion generated by the Laplace-Beltrami operator.
Theorem 4.1 (see e.g. [7]). Let X t be an isotropic Brownian motion on M and let γ : [0, 1] → M be a smooth path starting at x 0 . Then as → 0 where c 1 , c 2 are constants independent of γ, L M is the Onsager-Machlup functional and S is the scalar curvature on M .
The most probable paths are thus the paths that maximize the Onsager-Machlup functional 1 0 L M (γ(t),γ(t))dt = −E(γ) + 1 12 involving the energy E(γ) = 1 2 1 0 γ(t) 2 g dt of the path plus a curvature correction term. In comparison, geodesics are the paths that simply minimize the energy.
Hence, the most probable paths are generally not geodesics. The Onsager-Machlup functional thus provides a way of measuring similarity between points different from the geodesic distance. However, in spaces of constant scalar curvature, for instance homogeneous spaces, the most probable paths coincide with geodesics. In particular, this holds in Euclidean space.

Anisotropic
Case. There seems to be no analogue of the Onsager-Machlup theorem available in the literature for Brownian motions with anisotropic covariance. With the construction of anisotropic processes that is the focus of this paper, an Onsager-Machlup theorem would apply to a process in the horizontally connected component Q(u 0 ) in the frame bundle. Several problems occur, many of which are due to the sub-Riemannian structure. For instance, it is not clear whether to use a fiber bundle metric or the sub-Riemannian metric to define tubes. Another problem is the lack of normal coordinates that are essential in all proofs of the Onsager-Machlup theorem. Instead, we model the most probable paths of the driving process, i.e. the underlying Euclidean Brownian motion W t . Another advantage of this is that it does not require a Riemannian structure on M but only a connection. Recall that the stochastic development mapφ u0 takes the paths of a Euclidean Brownian motion W t with W 0 = 0 to the paths of a Brownian motion X t on M by projecting a path U t in the frame bundle with U 0 = u 0 . However, it doesn't directly provide a link between the most probable paths in R n and the most probable paths in M that maximize the Onsager-Machlup function because the development map does not preserve tubes: if the Euclidean Brownian motion stays close to a path γ, the stochastic development of the path does not necessarily stay close to the development of γ in M . Definition 4.3. Let W t be a standard Brownian motion and let X t = φ u0 (W t ) be a Brownian motion on M . A most probable path for the driving process from x 0 = π(u 0 ) ∈ M to y ∈ M is a smooth path γ : [0, 1] → M with γ(0) = x 0 and γ(1) = y such that its anti-development φ −1 u0 (γ) is the most probable such path for W t . That is, γ is given by From the Onsager-Machlup theorem in Euclidean space, we obtain a characterization of the most probable paths for the driving process. The theorem shows that the most probable paths for the driving process coincide with the most probable paths introduced formally in [18].
Theorem 4.4. Let u 0 be a frame in T x0 M , let W t be a standard Brownian motion, and let X t =φ u0 (W t ). Suppose the Hörmander condition is satisfied on Q(u 0 ) and that Q(u 0 ) has compact fibers. Then most probable paths from x 0 to y ∈ M for the driving process of X t exist, and they are projections of sub-Riemannian geodesics in F M minimizing the sub-Riemannian distance from u 0 to π −1 (y).
Proof. Let ζ be a smooth path in R n and consider the development η of ζ in F M such thatη Since the H i are orthonormal in the sub-Riemannian metric g F M , In the following,γ denotes the horizontal lift of a path γ on M withγ(0) = u 0 . The most probable path for the driving process are then given as By compactness of π −1 (y) and continuity of d F M on Q(u 0 ) (see Proposition 2.3), there exists a horizontal pathγ in F M that minimizes the latter expression. This proves the claim.
In the case where M is Riemannian and u 0 is orthonormal, the most probable paths for the driving process are precisely the geodesics. The above theorem thus shows that, in general, the most probable paths for the driving process are not the same as the most probable paths for X t , since the scalar curvature term in (10) does not appear in Theorem 4.4. See [18] for visual illustrations of most probable paths for the driving process on S 2 with varying degrees of anisotropy in u 0 . 5. Small Time Asymptotics. We assume throughout this section that M is Riemannian and that the Hörmander condition is satisfied on Q(u 0 ). Let m denote the dimension of Q(u 0 ). As argued in Section 3.3, the distribution of U t has a smooth density p Q(u0) t . Since the sub-Riemannan structure on Q(u 0 ) is complete by Proposition 2.3, the density exhibits the small time limit [2,23] lim t→0 2t log p Q(u0) t (y) = −d Q(u0) (u 0 , y) 2 .
A point u ∈ Q(u 0 ) is called a smooth point if there exists ξ ∈ T * u0 Q(u 0 ) such that exp u0 (ξ) = u, ξ is a regular point of exp u0 , and the resulting geodesic connecting u 0 and u is unique and length minimizing. The set of smooth points is open and dense [1] but it may not be of measure zero.
For the standard Brownian motion on M , the refined statement holds. In this case, we can interpret the small time limit as the transition density approaching the density of an isotropic diffusion in the linear tangent space T x M because d g (u 0 , y) 2 = Log u0 y 2 g . In the sub-Riemannian case, making the same analogy is complicated by the fact that the exponential map is not a local diffeomorphism in a small neighborhood of u 0 . However, a smooth point u is the image of ξ under exp u0 where ξ is a regular point, so the Log-map is defined in a neighborhood of u. Thus we locally have an interpretation similar to the Riemannian case.
When y is a smooth point, there is a more explicit expansion where v(u 0 , y, t) is smooth for (y, t) ∈ M × (0, ∞) and v(u 0 , y, t) = v(u 0 , y, 0) + O(t) for small t uniformly on compact sets. The function y → v(u 0 , y, 0) is smooth and strictly positive. Another importance of smooth points in our context is the fact that the sub-Riemannian distance d Q(u0) (u 0 , u) is smooth in a neighborhood of (u 0 , u) in {u 0 } × Q(u 0 ) when u is a smooth point [1]. 5.1. Small Time asymptotics on M . We let d = m − n denote the dimension of the fibers of π Q(u0) : Q(u 0 ) → M . For y ∈ M , the distance d Q(u0) u 0 , π −1 Q(u0) (y) below is the minimal sub-Riemannian distance from u 0 to a point in the fibre π −1 Q(u0) (y) in Q(u 0 ). For fixed y, we let Γ ⊂ π −1 Q(u0) (y) be the set of minimizers of d Q(u0) (u 0 , π −1 Q(u0) (y)), and we write Figure 1. Sampled data on an ellipsoid realized as endpoints of sample paths of the process X t (black lines and points). Mean x 0 = π(u 0 ) (green point) and covariance σ −1 (ellipsis over mean) are estimated by minimizing (15). The most probable paths for the driving process connects x 0 and the sample paths (gray lines) and minimize the distances d Sym + M σ, q −1 (x i ) .
with w(y, t) as in Theorem 5.1, and 6. Applications in Statistics. In the classical Fréchet mean estimation, the mean is estimated by minimizing squared geodesic distances as explained in the introduction. However, we can allow anisotropic covariance structures and simultaneously encode mean and covariance in the distance measure by using the frame bundle and replacing the geodesic distance by the sub-Riemannian distance. In this way, the distances are weighted by the covariance in the sense that directions with high variance will contribute less to the distance than directions with small variance. We can then define the mean and covariance of a stochastic variable X on M to be the point σ ∈ Sym + M that minimizes where F is a real valued map that prevents σ from tending to 0. Recall here that σ encodes the precision matrix, and the covariance is found as the inverse σ −1 . As in the case of the Fréchet mean, there is no guarantee that such a minimizer exists or is unique.
Given data points x 1 , . . . , x N on M , we suggest to estimate the mean and covariance by when M is equipped with the Riemannian metric g. The term − N 2 log(det g σ) is chosen such that (15) is exactly the maximum likelihood estimator of the mean and covariance in the Euclidean case, see e.g. [22]. Figure 1 shows estimated mean and covariance for sampled data on an ellipsoid. The most probable paths for the driving process W t that realize the distances d Sym + M σ, q −1 (x i ) are shown along with sample paths from the process X t and samples drawn from X 1 .
From a different viewpoint, the results of this paper allow a maximum-likelihood formulation that resembles the estimator (15). Recall from Section 2.5 that when M is Riemannian and Q(u 0 ) satisfies the Hörmander condition, the process X t on M has a density p M t (Σ(u 0 ), x). We can then consider the estimation of Σ(u 0 ) given the data points x 1 , . . . , x N on M . The maximum likelihood approach suggests that we estimate our parameter σ ∈ Sym + M by No explicit formula for the density p M t is known in general, not even in the isotropic case, but heuristics or numerical approximations can be derived. As an example, we showed in Corollary 5.2 that, in non-degenerate situations, lim t→0 2t log p M t (y) = −d Q(u0) u 0 , π −1 Q(u0) (y) 2 , and, from Theorem 5.1, the log-likelihood has the form suggesting the estimator with t fixed. The terms log(w(x i , t)) are in general not a priori known. They can be numerically estimated or heuristically set to log det(u 0 ) g /2 which is the normalization term of a Brownian motion with covariance Σ(u 0 ) −1 in the linear tangent space T π(u0) M . The latter approach results in the same estimator as (15). Note that both methods involve the length of the most probable paths for the driving process. The error in the approximation of the likelihood will in general be smaller for data that concentrates around π(u 0 ). The approach has been applied for mean/template and covariance estimation on concrete data in [18]. The probability of a path as defined in Section 4 yields yet another way of measuring the distance between points. Maximizing this probability corresponds to maximizing the Onsager-Machlup functional (10). Alternatively, the limit of (9) when → 0 can be viewed informally as a 'density' for the paths of the Brownian motion. The maximum-likelihood method would then suggest to maximize the Onsager-Machlup functional. Note that this density is only heuristic since it is concentrated on smooth paths while the Brownian motion is almost surely nowhere smooth. Moreover, it is not clear with respect to which measure it should be a density. In this setting, there are only results available when the Brownian motion is isotropic. In the isotropic case, the most probable paths are not quite geodesics. The Onsager-Machlup functional also tries to minimize the energy of paths but tends to favour paths through areas with high scalar curvature. It would be interesting to see how this affects the Fréchet mean in practical computations. To obtain results for general Brownian motions, we could work with the most probable paths of the driving process instead. This approach supports the estimator (15).
There are some challenges caused by the complex behaviour of the global structure of a sub-Riemannian geometry. Most geodesics with respect to d F M can be computed by the Hamiltonian equations, which makes practical computations possible. However, there may be geodesics that can not be computed this way. Moreover, the sub-Riemannian exponential map cannot be used to define normal coordiantes on Q(u 0 ), which is often useful in ordinary Riemannian geometry to simplify computations. Finally, there is very little known about how the sub-Riemannian geometry changes when we vary the parameter σ.