PROBABILITY MEASURES ON INFINITE-DIMENSIONAL STIEFEL MANIFOLDS

. An interest in inﬁnite-dimensional manifolds has recently appeared in Shape Theory. An example is the Stiefel manifold, that has been proposed as a model for the space of immersed curves in the plane. It may be useful to deﬁne probabilities on such manifolds. Suppose that H is an inﬁnite-dimensional separable Hilbert space. Let S ⊂ H be the sphere, p ∈ S . Let µ be the push forward of a Gaussian measure γ from T p S onto S using the exponential map. Let v ∈ T p S be a Cameron–Martin vector for γ ; let R be a rotation of S in the direction v , and ν = R # µ be the rotated measure. Then µ,ν are mutually singular. This is counterintuitive, since the translation of a Gaussian measure in a Cameron– Martin direction produces equivalent measures. Let γ be a Gaussian measure on H ; then there exists a smooth closed manifold M ⊂ H such that the projection of H to the nearest point on M is not well deﬁned for points in a set of positive γ measure. Instead it is possible to project a Gaussian measure to a Stiefel manifold to deﬁne a probability. non-degenerate. These results hold also when Gaussian measures are not concentrated

1. Introduction. Probability theory has been widely studied for almost four centuries. Large corpuses have been written on the theoretical aspects. A commonly studied subject was the theory of probability distributions defined on a countable set or on (an open subset of) a finite-dimensional vector space. This setting though was insufficient for some important applications.
In 1935 Kolmogoroff [15] provided the first general definition of a Gaussian measure on an infinite-dimensional space. 1 This setting was derived from, and formalized part of, the theory of stochastic processes. Subsequently the theory was expanded and refined in many works.
Another interesting branch of probability theory is the case of probabilities in finite-dimensional manifolds. This has many important applications in Shape Theory. One example is the Kendall space [13,14,17]. Another example is the Lie The Stiefel manifold is a smooth embedded submanifold of H n , hence it inherits its Riemannian structure.
We will use the above definition also in the case when H is finite-dimensional; in that case we will always assume silently that dim(H) ≥ n (otherwise St (n, H) is empty).
In [21] Younes studied the space M of smooth immersed closed planar curves. Those ideas were then revisited in Younes et al [22], where the authors proved that the quotient of M with respect to translations and scalings, when endowed with a particular Sobolev-type Riemannian metric, is isometric to a subset of the Stiefel manifold St 2, L 2 , where L 2 = L 2 ([0, 1]) is the usual Hilbert space of real square integrable functions. Similarly the quotient of M with respect to rotations, translations and scalings is isometric to a subset of the Grassmann manifold of two-dimensional planes in L 2 .
Sundaramoorthi et al [20] studied these Shape Spaces as well; they noted that there is a closed form formula for the geodesic starting from a given point with a given velocity in St n, L 2 (adapting a method described in [9]); they proposed a novel method for tracking shapes bounded by curves, that is based on a simple first-order dynamical system on St 2, L 2 .
Moreover Harms and Mennucci [12] proved that any two points in the Stiefel manifold (respectively in the Grassmann manifold) are connected by a minimal length geodesic.
Since the manifold St 2, L 2 enjoys all the above useful properties, and it can be identified with a Shape Space of curves, then it is a natural choice for Computer Vision tasks. Many such tasks require that a probability be defined on the Shape Space. Unfortunately little is known in this respect.
In this paper we will present some negative and some positive results. ν 1 = exp p1 #µ of µ is µ itself. At the same time the wrapping ν 2 = exp p2 #µ of µ is the translation of µ, translated by the vector p 2 . It is well known that ν 1 and ν 2 are equivalent if and only if p 2 lies in the Cameron-Martin space of µ 1 , otherwise they are mutually singular. See next section 2 for detailed definitions and further results.
The matter becomes even more intricate in the case of an infinite-dimensional manifold. Let S be the unit sphere in an infinite-dimensional separable Hilbert space. We will show in Theorem 3.7 a result as follows. If we wrap a non-degenerate Gaussian measure around the sphere S, and then we rotate it to obtain a second measure on S, then the two measures on the sphere are mutually singular. We can prove this fact for a class of rotations (i.e., unitary operators) that are intuitively analogous to the Cameron-Martin translations described in Prop. 2.8.
It is currently unknown to us if there exists any non-trivial rotation such that the two measures are equivalent. See also Remark 3.10.

Probabilities by projection.
The second method can be used when M is a smooth embedded closed submanifold of a larger Hilbert space H. In this case we may define a probability on H, and then try to "project" it to M . This will be discussed in detail in Sec. 4.
For any such M consider the set U M ⊂ H of points p ∈ H such that there is a unique point z ∈ M of minimum distance from p; so we define the "projection" that is the map π M : Again, in the finite-dimensional case this works fine. We will see in Prop. 4.2 that the set H \ U M has zero Lebesgue measure. So any probability on H that is defined by a density wrt the Lebesgue measure can be projected to M .
Instead in the infinite-dimensional case this fails. We will show in Theorem 4.6 that for any Gaussian measure defined on 2 there exists a submanifold M ⊆ 2 such that the "projection" fails to be defined almost everywhere, that is 2 \ U M has positive measure.
We will though show in Section 4.3.4 that the projection method works fine in the case of the Stiefel manifold St (n, H). Indeed, for any non-degenerate Gaussian measure η on H n , the projection from H n to the nearest point in St (n, H) is defined for η-almost all points. So we can project η onto St (n, H) to define a "Gaussianlike" probability on it. This is again another point in favor of using the Stiefel manifold as a model in Shape Theory.

Notations and main definitions.
In the following any Hilbert space H will be assumed to be a real separable Hilbert space, with norm · H and scalar product ·, · H .
Given v ∈ H, we will denote by v * the continuous linear functional v * (x) = v, x H .
By "manifold" we will mean a "smooth connected second countable boundaryless Hausdorff differentiable manifold modeled on a Hilbert space".
If M is a Hilbert space, or a manifold modeled on a Hilbert space, we will associate to it the Borel sigma-algebra B(M ).
By "measure" µ on M we will mean a countably additive map µ : If N is another such set and ψ : M → N is a Borel-measurable transformation, then the push-forward is the measure ψ # µ on N that is defined by (ψ # µ)(A) = µ(ψ −1 (A)) for all A ∈ B(N ).
By "probability measure" µ on M (or more simply "probability") we will mean a measure µ such that µ(M ) = 1.
When µ is a probability the push-forward ψ # µ is a probability on N , and is usually called the "distribution" or the "law" of ψ on N . • The measure ν is called "absolutely continuous" with respect to µ if ν(A) = 0 for every set A ∈ B(M ) with µ(A) = 0. We will write ν Î µ in this case.
• The measures µ, ν are "equivalent" if they are mutually absolutely continuous.
2. Gaussian measures. The following is a short presentation of the theory of Gaussian Measures; more details may be found e.g. in [4] and [7].

Gaussian measures.
We recall a few facts about Gaussian measures in Hilbert spaces.
Definition 2.1. A probability measure γ on R is said to be Gaussian if it is either a Dirac measure, or has density with respect to the Lebesgue measure for some parameters σ, m ∈ R. In the first case the measure is called degenerate.
Vice versa, given any m and K as above, there exists a unique Gaussian measure γ satisfying (2).
We recognize that m is the mean and K is the covariance operator of γ, in this sense. Given v, w ∈ H, we have that v * , w * ∈ L 2 (H, γ) and that the mean and covariance are

ELEONORA BARDELLI AND ANDREA CARLO GIUSEPPE MENNUCCI
For this reason we will indicate γ with the usual notation N (m, K). When m is zero, we say that γ is centered. When the kernel of K is {0}, we say that γ is non-degenerate.
The following proposition is an intermediate step in the proof of the above theorem.
By choosing an appropriate Hilbertian base, a Gaussian measure can be seen as a process of independent real Gaussian random variables. In particular, if m n = m, e n H ∈ R and σ i ≥ 0 is the eigenvalue such that Ke n = σ n e n , then e * n# γ ∼ N (m n , σ n ), that is, m n is the mean and σ n the variance of the real Gaussian random variable e * n . Moreover ∞ n=0 σ n = Tr(K) < ∞. Obviously γ is non-degenerate iff σ n > 0 for all n.

Cameron-Martin theory.
We now introduce the Cameron-Martin theory, using a simplified approach, as in Chap. 2 in [7]. Definition 2.6 (Cameron-Martin space). Let γ be a centered Gaussian measure on a Hilbert space H, let K be its covariance. The Cameron-Martin space CM(γ) of γ is the range (i.e., the image) of K 1/2 . In symbols, The above may be expressed as follows. Assume for simplicity that the measure is non-degenerate. Let (v n ) n be the orthonormal basis of eigenvectors of K, so that the coordinate functions v * n are independent (as by Prop. 2.5). Let a n > 0 be the variance of v * n , i.e., the eigenvalue associated to the eigenvector v n . In this case we have that moreover the left hand side is finite if and only if x ∈ CM(γ).
Note that CM(γ) = H if and only if H is finite-dimensional. In the infinitedimensional case, CM(γ) is dense in H, but its γ-measure is zero.

Definition 2.7 (White noise mapping). Consider the mapping
This mapping is an isometry from CM(γ) (with the norm of H) to L 2 (H, γ), so it extends to a unique mapping W : H → L 2 (H, γ), that is called the white noise mapping. • If h ∈ CM(γ) then µ and γ are equivalent, and the Radon-Nicodym derivative is given by the Cameron-Martin formula dµ dγ • If h / ∈ CM(γ) then µ and γ are mutually singular.
The above is a combination of results in Chap. 1 and 2 in [7], and in Chap. 2 Sect. 4 in [4].
3. Image of a probability measure under the exponential map. A possible way to define a probability measure on a Riemannian manifold M is to choose a point p ∈ M , define a probability measure γ on the tangent space T p M in p and then push forward γ under the exponential map to define the desired probability on M .
We recall briefly the definition of the exponential map. More details may be found in [16]. The exponential map exp p : where σ v is the geodesic starting from σ v (0) = p with tangent vectorσ p (0) = v.
If M is a finite-dimensional complete Riemannian manifold, then the exponential map from any point is surjective; this result is part of the Hopf-Rinow theorem (see Theorem 2.8, Chapter 7 of [8]).
If M is an infinite-dimensional complete Riemannian manifold, then the exponential map may fail to be surjective [2]. This is a first problem in applying the above idea.
Moreover the resulting measure exp p γ on M depends also on the point p and, since there is no natural way to compare the tangent spaces, it could be difficult to compare measures obtained starting from different points.
If moreover γ is equivalent to the Lebesgue measure, then µ is equivalent to the Hausdorff measure. In symbols Proof. Suppose that f : T p M → M is a C 1 map; let C f be the set of critical points of f , that is the set of x ∈ T p M such that the differential Df is not invertible at x. We will use the "change of variable" Lemma 5.5.3 in [1]. The first point states The Theorem 4.1 of [19] proves that each of the above sets Γ i is locally contained Suppose now moreover that γ is equivalent to the Lebesgue measure, we want to prove that µ is equivalent to the Hausdorff measure H n . By the previous point, it is enough to prove that H n is absolutely continuous wrt µ, i.e., H n Î µ. We will use some facts that are explained in [18]. Let K p be the cutlocus of the point p, let Ω p = M \ K p , that is an open set. It was proven in [18] that H n (K p ) = 0, so we will ignore K p in the following. Let E ⊂ Ω p be a Borel set such that µ(E) = 0. Let

Infinite-dimensional manifolds.
If the manifold M is infinite-dimensional, one can wonder if there could be a similar result. In the finite-dimensional case, we compare measures on different tangent spaces by relating them with the Lebesgue measure, that can be defined in a standard way on all tangent spaces. The first question to be answered when trying to discuss Prop. 3.1 in the infinite-dimensional setting, is how to compare measures on different tangent spaces.
One tool to address the problem is to connect points using a geodesic, and push forward the measure on the tangent space using the parallel transport. This was the method proposed in [20] when devising a discrete stochastic process on the Stiefel manifold St 2, L 2 , to be used as a model for tracking shapes enclosed by curves. In that case, the geodesic was provided by the model itself. In general, though, this method has two drawbacks. One is that there may be no geodesic connecting two points (even if the manifold is metrically complete [2]). The opposite drawback is that there may be multiple geodesics connecting a pair of points, and so there may be no canonical choice.
Another possible tool to address this problem is a group of transformations that acts transitively on M , if one is available. Again, a drawback is that there may be multiple transformations moving a point to another. (Unless the manifold is also a Lie group, of course).
To simplify utterly the matter, we will study the case of M = S, where S is the unit sphere in an infinite-dimensional Hilbert space. We associate to S the group of unitary transformations, that we call "rotations" for simplicity. In this case the parallel transport coincides with the tangent map of a suitable rotation.
We will in the following show in Theorem 3.7 that, if we wrap a Gaussian measure around the sphere S, and then we rotate it, then the two measures on the sphere are mutually singular. We can prove this fact for a class of rotations, that are described in the statement of Theorem 3.7.
We will prove the following results assuming that Gaussian measures are nondegenerate. These results hold also when Gaussian measures are not concentrated on finite-dimensional spaces. Indeed if a Gaussian measure is supported on an infinite-dimensional closed subspace, then we may restrict the following analysis to that subspace, and the restriction of the measure would be a non-degenerate Gaussian measure on an infinite-dimensional Hilbert space.
We first state a few results and observations, which are useful to prove the following results.

Lemma 3.3 (Law of large numbers).
Let H be a Hilbert space, and γ be a nondegenerate Gaussian measure on it. Let v n be the eigenvectors of the covariance operator K, and σ 2 n be the corresponding eigenvalues. Let f n = v * n /σ n , that is, f n (x) = v n , x H /σ n so that the random variable f n has standard Gaussian distribution N (0, 1). Since the joint distribution of (f 1 , . . . f n ) is centered Gaussian, then orthogonality implies independence. So their squares f 2 i are a sequence of independent, identically distributed random variables each with chi-squared distribution (with 1 degree of freedom) and having mean 1 and variance 2. By the law of large numbers (Theorem 3.27 in [5]), γ is concentrated on the Borel set (a point x such that the above limit does not exists is not in C). This set C has some peculiar properties.
• For every vector x ∈ H there exist either two or no values λ ∈ R such that λx ∈ C; if there are two values, they have opposite sign. So this set is quite "thin" in the radial directions. • At the same time, for any r in the Cameron-Martin space CM(γ) of γ, and for any v ∈ C, then v + r ∈ C. In symbols, So the set C is quite "large" in many linear directions.
Proof. We prove the second point. Suppose for simplicity that H = 2 , and that K is diagonal, so when x = (x n ) n∈N we identify f n (x) = x n /σ n . Letx = x + r, so for all ix We have to deal with the three terms in the right hand side.
Since r is in the Cameron-Martin space, by definition ∞ k=0 We know that the variables x i are independent wrt γ. Note that Summing up we obtain the desired result. where H = Span(e 2 , e 3 , . . . ) and let π be the orthogonal projection on H . By the independence of the coordinate functions, γ can be decomposed as We compute the measure of the sphere using Fubini's theorem for the product measure e * 1 γ ⊗ π γ. For every x ∈ H , there are at most two x 1 such that (x 1 , x ) ∈ S r . Since e * 1 γ is a Gaussian measure on R, finite sets are negligible with respect to it. It follows that S r is negligible for γ, since every slice at x ∈ H fixed is negligible with respect to e * 1 γ. It is worth nothing this fact. The proof is based on the very simple structure of the exponential map of the sphere (see Equation (4)), we omit it.
We now provide a simpler case of the following Theorem 3.7. This case can help understanding the spirit of the proof of the theorem.  Reasoning as in Lemma 3.3, we obtain that γ is concentrated on the set We also know that, for every direction x ∈ T there exist either two or no values λ ∈ R such that λx ∈ C and, if there are two, they have opposite sign. Call µ 1 , µ 2 the push-forward of γ under exp p and exp −p Call C 1 , C 2 ⊆ S the images of C under exp p and exp −p . By the previous Lemma the sets C 1 , C 2 are Borel sets. Clearly, µ 1 is concentrated on C 1 and µ 2 is concentrated on C 2 .
To prove that µ 1 and µ 2 are mutually singular it is sufficient to show that C 1 ∩C 2 is negligible for one of them.
The exponential maps from the points p and −p, defined T → S, could be written as and are symmetric with respect to the reflection through T . From this symmetry and the fact that for each line through the origin in T , if there is one point in C on that line then there are exactly two opposite in sign, it follows that C 1 ∩ C 2 is contained in T ∩ S (see also Figure 1). The equator T ∩ S is negligible for µ 1 (and also for µ 2 ). Let S r be the sphere of radius r in T , then and all those spheres are negligible by Lemma 3.4.
We now come to the general result. The rotation R rotates q in the direction r. So we may think of R as a "Cameron-Martin rotation". This would mislead us into thinking that ν and µ are equivalent. Instead they are mutually singular.
We remark this fact.
Remark 3.8. Let p = Rq, suppose for simplicity that q = −p. Let ξ be the unique minimal geodesic connecting q to p. Define the tangent mapR = D q R : T q S → T p S, thenR coincides with the parallel transport along ξ. Letγ =R # γ, let theñ µ = exp p #γ the wrapping ofγ on S. Thenμ = ν. So the probability ν is also obtained by identifying T q S → T p S using parallel transport, and then wrapping.
We will need the following Lemma.
In this setting there is a family ν z , for z ∈ V ⊥ , with the following properties: each ν z is a non-degenerate Gaussian measure on V ; the family ν z is the conditional distribution of y knowing z, that is, for any continuous bounded f , The above results are proved in Section 3.10 in [4]. It is interesting to note this fact: if V is not contained in the Cameron-Martin space of γ, then the conditional measures ν z exist, but they are concentrated on single points.
We now provide the proof of Theorem 3.7. By using an appropriate choice of Hilbertian base for the space H, we can rewrite the hypotheses of the theorem as follows. Let H = 2 for simplicity. We will denote by e n the canonical coordinate vectors. Let S be the unit sphere in H. We assume that r H = 1 for simplicity. Let θ ∈ (0, π/2) be fixed.
Let p, q ∈ S be given by p = e 1 cos θ + r sin θ , q = e 1 cos θ − r sin θ .
These are the endpoints of the geodesic exp e1 (tr) = e 1 cos t + r sin t for times t = ±θ. We also definẽ p = −e 1 sin θ + r cos θ ,q = −e 1 sin θ − r cos θ ; these are the speeds of the above geodesic at t = ±θ. Note that the plane spanned by p, q is also the plane spanned by e 1 , r; we call this plane V . Moreover the spaces V ∩ T p M , V ∩ T e1 M and V ∩ T q M are one-dimensional, and are spanned by the vectorsp, r andq respectively. We define the rotation R by stating that R is the identity for any vector in V ⊥ , whereas it rotates vectors in the plane V by the angle θ (so that Re 1 = p and Rq = e 1 ).
Note that the rotation defined in the statement of this theorem (and used in the following Remark 3.8) is the square of the rotation here defined in the proof: indeed R 2 q = p.
We have excluded the case θ = 0, when R is the identity map; we also exclude for simplicity the case θ = π/2, in this case R is the antipodal map R 2 x = −x, and this case is equivalent to the case discussed in Proposition 3.6 -note anyway that the result may be proved using the following analysis, paying attention to some details.
LetR p : T e1 S → T p S andR q : T q S → T e1 S be the tangent maps. We assume that γ is a probability measure on T e1 S, and that the covariance operator K is diagonal in the standard base {e 2 , e 3 , . . .} of T e1 S. Let σ 2 k be the eigenvalue of K in direction e k .
We assume that r is in the Cameron-Martin space of γ. We push forward γ to γ p usingR p , and pull it back to γ q using the inverse ofR q . Note thatR p r =p whileR qq = r, so thatp is in the Cameron-Martin space of γ p andq is in the Cameron-Martin space of γ q .
This setting mimics the hypotheses of the theorem, only in a more symmetric fashion. Indeed if µ = (exp q ) # γ q is the wrapping of γ q and ν = (exp p ) # γ p is the wrapping of γ p , then (R 2 ) # µ = ν.
In this setting we have a very powerful situation. Indeed V ⊥ ⊂ T p S, V ⊥ ⊂ T q S and V ⊥ ⊂ T e1 S; moreover the projections of the three measures γ q , γ, γ p on V ⊥ are identical.
We now consider two generic vectors v ∈ T p S and w ∈ T q S. We decompose them (in a unique way) as v = ap +ṽ , w = bq +w with a, b ∈ R andṽ,w ∈ V ⊥ . (Obviously a = p, v H and b = q, w H ). The joint distribution of (a,ṽ) according to γ p is the same as the joint distribution of (b,w) according to γ q . In particular, by what we said above,ṽ,w are identically distributed. Similarly a, b are real-valued marginals and have the same non-degenerate centered Gaussian distribution on R.
For simplicity we will abbreviate t = v H , s = w H . The above quantities are related by a ∈ [−t, t], b ∈ [−s, s] and We can assume t > 0, s > 0, a / ∈ {−t, 0, t}, b / ∈ {−s, 0, s} (the complementary choices correspond to negligible sets in the following reasoning).

ELEONORA BARDELLI AND ANDREA CARLO GIUSEPPE MENNUCCI
We know thatṽ = v − ap and thatp is in the Cameron-Martin space of γ p . By applying Lemma 3.3 we assert that for γ p -almost any v. The same holds for w as well, mutatis mutandis. By Lemma 3.4 we can assume that sinc(s) = 0 and sinc(t) = 0. So for γ p -almost all v and γ q -almost all w, where the last equality comes from eqn. (7). So we obtain sinc(t) = ± sinc(s), sõ v = ±w by equation (7).
We will now use this fact to the best. Foremost, we elaborate on the equation (6). We know that the frame p,p is obtained by q,q by rotating by an angle 2θ. So The points in E 0 are all the positive zeros of sinc, while the points in E 1 are all the positive zeros of its derivative sinc . Let (I n ) n∈N be an enumeration of all open intervals that constitute the complement of E in (0, ∞). We have that I 0 = (0, π); when n ≥ 1, I n has one endpoint in the set E 0 while the other endpoint is in the set E 1 . On these intervals sinc(s) is either always positive or always negative, and is monotonic. We fix n, k ∈ N. We restrict our attention to the case t ∈ I n and s ∈ I k . Then there is a function ϕ = ϕ n,k with these characteristics: ϕ is a homeomorphism between maximal sub-intervals of I n , I k ; each one of these sub-intervals has a zero of sinc as one of its endpoints; ϕ and its inverse are analytic; when t ∈ I n and s ∈ I k the relation sinc(t) = ± sinc(s) holds if and only if s = ϕ(t).
Recall that v is distributed according to γ p . By Lemma 3.9, for almost anyṽ, the conditional distribution of a is Gaussian and non-degenerate. Let us fix such ã v.
From (9) we extract the identity cos(2θ) cos(t) + a sin(2θ) sinc(t) − cos(ϕ(t)) = 0 , where t = a 2 + ṽ 2 H . The left hand side is an analytic function of a. If we move a so that t converges to a zero of sinc t, then s = ϕ(t) has to converge to a zero of sinc(s), so both converge to an integer multiple of π: hence the above left hand side converges to ± cos(2θ) ± 1 that is never zero. So that function is not identically zero, hence it has at most countably many zeros. Then the probability of this event is null. This ends the proof.
Remark 3.10. Suppose in the above proof that r is not in the Cameron-Martin space. We remarked, after Lemma 3.9, that in this case the conditional measures ν z exist, but they may be concentrated on single points. So the above proof cannot be easily adapted to the case when r is not in the Cameron-Martin space.
4. Push-forward of a probability measure under a projection. A simple way to define a probability measure on a manifold M is to choose a probability space (X, F X , P), a measurable map f : X → M and endow M with the push-forward measure f P.
Example 4.1. Let S n ⊆ R n+1 be the n-dimensional unit sphere and γ a Gaussian measure on R n+1 with mean 0 and covariance operator the identity. Consider the projection , which is defined γ almost everywhere. Then the measure π γ on S n coincides with the Hausdorff measure H n restricted to the sphere and normalized.

4.1.
Finite-dimensional manifolds. The above example can be properly generalized, provided that we define a "projection". One easy way to define the projection is by looking at a point of minimum distance. To this end, in this section we consider a closed subset M of a complete finite-dimensional Riemannian manifold N . Let d be the Riemannian distance on N and d M : Since M is closed and N is locally compact, the infimum is a minimum, and then for all x ∈ N there exists a point y ∈ M such that d(x, y) = d M (x). However there may be more than one such point. For those points x such that the closest point y in M is unique, we denote this point by π(x) = y, so that So, given a measure γ which is locally absolutely continuous with respect to the Lebesgue measure, the measure π γ is well defined on M .
Proof. Here is a sketch of the proof, the detailed arguments may be found in [18] and references therein. The distance function d M is Lipschitz. At all points where d M is differentiable, the projection point is unique. Let Σ be the set where d M is not differentiable. By Rademacher Theorem Σ is negligible.
In the case when M is a smooth submanifold, moreover, Σ and its closure both have Hausdorff dimension at most m − 1; see [18]. So the projection is well defined (and smooth) on an open set with negligible complement.

4.2.
Infinite-dimensional manifolds. In the following we will only consider the case when M is embedded in an infinite-dimensional Hilbert space H, for simplicity.
As in the finite-dimensional case, the minimum point is almost surely unique when it exists. (10) (by setting d(x, y) = x − y H as is usual). Let γ be a Gaussian measure on H. Then for γ-almost any x there is at most one point y ∈ M at minimum distance from x.

Proposition 4.3. Let M ⊂ H be a closed subset. Let d M be defined as in equation
Proof. By Theorem 5.11.1 in [4], the set Σ where d M is not Gâteaux differentiable has measure γ(S) = 0. The rest of the proof works as in the finite-dimensional case.
If we now consider an infinite-dimensional manifold M embedded in a Hilbert space H, the projection on the manifold does not necessarily exist. An infinitedimensional Hilbert space is not locally compact, so there could be many points x ∈ H for which there is no point on the manifold at minimal distance.
We first discuss a counterexample; in the next sections we will show some cases in which the projection can be defined.
Let H be a separable Hilbert space. Up to the choice of an orthonormal basis of H, we suppose (without loss of generality) that H = 2 .
Given a submanifold of H, we will denote by d M : H → R the distance from the manifold, defined as in the finite-dimensional case by

Lemma 4.4. Consider in H = 2 the ellipsoid S defined by
where c ∈ R is a positive number and {a i } i∈N ⊆ R is a sequence of positive numbers increasing to 1 c > 0 , a i 1 , a i > 0 . Then 1. the set S is a closed submanifold of H, 2. the distance of the origin from S is d S (0) = c, 3. there is no point on the ellipsoid at distance c from the origin.
Proof. Define the continuous linear function T : H → H as The function f is continuous and differentiable with gradient Note that the set S is the inverse image of c under the function f and so, since f is continuous, S is closed. To see that S is a submanifold of H, we can use the implicit function theorem, see [16] for a proof of the theorem in infinite-dimension. Indeed, the gradient of f is null only in the origin and the origin does not belong to the ellipsoid S, since c = 0.
For every point x ∈ H, using that a i < 1, we get and so for all x ∈ S, x H > c .
This says that there are no points on S at distance c from the origin and gives the bound To get the other inequality, consider the points ca −1 n e n for n ∈ N, Lemma 4.4 shows that, in a separable Hilbert space H, there exists a submanifold for which the distance from the origin does not have a minimum on the manifold. However this is not yet the desired counterexample, because a single point will usually be negligible for a measure and so the projection could still exist almost everywhere.
We now show that there are "many" other points for which there is no point on the manifold at minimal distance.
there is no point on S at minimal distance.
The idea of the proof is the following. Consider a point on one of the ellipsoid's axes, i.e., of the form λ e n . Then there is only one reasonable point that could be at minimal distance from it, the point ca −1 n e n (or −ca −1 n e n , if λ is negative). If λ is small, that point would be too far and it would be convenient to "go to infinity". A similar argument works for points that are linear combinations of the e 1 , . . . , e n for some n ∈ N, by reasoning that the point at minimum distance, if it exists, should be a linear combination of e 1 , . . . , e n as well. For the other points, we show that there are no "reasonable" minima, meaning that the function to minimize has no stationary points on the ellipsoid.
Proof. First of all, observe that E S is inside S, i.e., By symmetry, it sufficient to prove the lemma when x is such that x i ≥ 0 for all i ∈ N. Fix one such x. It is enough to consider only points y ∈ S such that y i ≥ 0 for all i ∈ N.
Let f : H → R be the function f (y) = a 2 i y 2 i . As noted in Lemma 4.4, S = f −1 ({c 2 }), f is differentiable and ∇f (y) = (a 2 i y i ) i∈N . Let also g : H → R be the square of the function we want to minimize on S, i.e., g(y) = y − x 2 H . The function g is differentiable as well, ∇g(y) = 2 (y i − x i ) i∈N and the distance from x attains minimum on S if and only if g has minimum on S.
From differential calculus we know that, if z is a minimum for g on S, then ∇f (z) and ∇g(z) should be linearly dependent, namely there exists λ ∈ R such that This equation gives us some information about λ. Since x i and z i are non-negative Suppose that the point x has infinitely many coordinates different from 0. Then, passing to the limit in Equation (12), Compute f (z) using Equation 11 to substitute the coordinates of z i : On the other side z is on the ellipsoid, and so it should hold f (z) = c 2 but this is not possible, and we can conclude that z does not exist.
It remains to consider the case where the coordinates of x are eventually null. Let n ∈ N be such that x m = 0 for all m > n and decompose every point y ∈ H as y =ȳ +ŷ, whereȳ ∈ Span(e 1 , . . . , e n ) andŷ ∈ Span(e 1 , . . . , e n ) ⊥ . A point y belongs to S if and only if where S(ȳ) is an ellipsoid defined by parameters {a n+1 , a n+2 , . . . } and c 2 − f (ȳ). To simplify notation, call cȳ the number c 2 − f (ȳ).
Compute the infimum of g on S minimizing first inŷ and then inȳ: The innermost inf is minimizing the square of distance from the origin on a ellipsoid if cȳ > 0 and is 0 if cȳ = 0. By Lemma 4.4 the infimum is equal to The function in the above equation has a global minimum at the pointz of coordinatesz Since x ∈ E S , the equation of E S gives thatz is such that f (z) < c 2 and soz realizes the infimum in Equation (13) the set E S is defined by the equation The function f is positive and so its integral is Since σ 2 i is convergent, it is possible to choose a i so that the above integral is finite. For this choice of a i , the function f (x) is finite γ almost everywhere and, up to negligible sets, we choose c large enough so that E S is not negligible for γ.

Remark 4.7.
One may wonder if the provided example is "complete". There are many meanings attached to the word "complete". When a Riemannian manifold is finite dimensional, they are equivalent, due to the Hopf-Rinow theorem. In the infinite dimensional case, they are not. The example ellipsoid S presented in Lemma 4.4. is complete in some senses. 1. It is metrically complete (i.e. any Cauchy sequence converges) since it is a closed subset of a Hilbert space. 2. It is geodesically complete, that is, any geodesic segment can be infinitely prolonged; this is due to the fact that the geodesic spray is bounded in T S. Since it is quite similar to the Grossman example [11], then there are points in S that cannot be connected by a minimal length geodesic.

Stiefel manifolds.
We have seen that, in general, we cannot "project" a Gaussian probability measure on a submanifold of an infinite-dimensional Hilbert space. In this section though we will show that the projection onto Stiefel manifolds is almost everywhere well defined, for all possible choices of non degenerate Gaussian measures. So the "projection method" of considering the projection of a Gaussian measure from the ambient space to the manifold of interest is well defined when the manifold is an infinite-dimensional Stiefel manifold. The simplest case of Stiefel manifold is the unit sphere.  This result holds also when H is finite dimensional and dim(H) ≥ h, or when H is not separable.
Proof. The proof is divided into three steps. In the first step we prove that the minimum exists. In the second step we prove uniqueness in case 1. In the third step we prove that the minimum is not unique in case 2.
Step 1 -The minimum exists. Let v = (v 1 , . . . , v h ) be a generic vector in St (h, H). Since x is fixed and v i H = 1, proving that the distance has a minimum is the same as proving that the function g : St (h, H) → R, has a maximum.
If H is finite-dimensional then the Stiefel manifold is compact and g is continuous so it readily follows that g has a maximum.
In the case where H is infinite-dimensional, let X = Span(x 1 , . . . , x h ) ⊂ H and q be the dimension of X. Without loss of generality we can suppose that x 1 , . . . , x q are a basis of X. We now consider the h + q dimensional subspaces of H containing X and call them "nice" subspaces. Let Y be a "nice" subspace and y 1 , . . . , y h an orthonormal basis of the orthogonal to X in Y . The vectors x 1 , . . . , x q , y 1 , . . . , y h are a basis of Y .
Consider the function g restricted to Y h ∩ St (h, H). Using the above basis, this intersection can be written as Note that S does not depend on Y , but it is the same for all "nice" subspaces.
In the above basis, the supremum of g in The right hand side does not depend on Y . This means that the supremum is the same in all finite-dimensional subspaces of the form Y h for some "nice" subspace Y . Moreover for each v in St (h, H) there exists a "nice" subspace Y ⊆ H such that v ∈ Y h and so the global supremum in St (h, H) is equal to the supremum attained in any subspace of the form Y h for some "nice" subspace Y . But subspaces of that form are finite-dimensional, and there the supremum is clearly achieved.
Step 2 -The minimum is unique in case 1. To show that the minimum is unique when x 1 , . . . , x h are linearly independent we explicitly compute the minimum, choosing a suitable basis of H h . The explicit computation shows also that a point at minimal distance exists, but since step 1 is anyway necessary to prove case 2, we will not stress this fact.
First of all, note that if the components of x are orthogonal, then the minimum is unique and is given by the formula: This point minimizes the distance from x between all vectors whose components have unit norms. Thanks to the fact that the components of x are orthogonal, v min belongs to St (h, H) and then it is the minimum also on the Stiefel manifold. Now we would like to find an isometry of H h that preserves the Stiefel manifold and maps x to a vector whose components are orthogonal.
We recall the following notations. Let A ∈ R h×h and y ∈ H h , the product Ay ∈ H h is defined as follows As usual, we denote by A also the function A : H h → H h y → Ay. We denote by yy T ∈ R h×h the symmetric positive definite matrix whose entries are the scalar product between the components of y yy T i,j = y i , y j H .
Consider the matrix xx T and let A be an orthonormal matrix that diagonalizes it, (d 1 , . . . , d n ) .
Then the function A : H h → H h is an isometry of H, since A is orthonormal. Indeed, using matrix notation,