Characterization of minimizable Lagrangian action functionals and a dual Mather theorem

We show that a necessary and sufficient condition for a smooth function on the tangent bundle of a manifold to be a Lagrangian density whose action can be minimized is, roughly speaking, that it be the sum of a constant, a nonnegative function vanishing on the support of the minimizers, and an exact form. We show that this exact form corresponds to the differential of a Lipschitz function on the manifold that is differentiable on the projection of the support of the minimizers, and its derivative there is Lipschitz. This function generalizes the notion of subsolution of the Hamilton-Jacobi equation that appears in weak KAM theory, and the Lipschitzity result allows for the recovery of Mather's celebrated 1991 result as a special case. We also show that our result is sharp with several examples. Finally, we apply the same type of reasoning to an example of a finite horizon Legendre problem in optimal control, and together with the Lipschitzity result we obtain the Hamilton-Jacobi-Bellman equation and the Maximum Principle. This version contains errata correcting an issue in the published version.

(Communicated by the associate editor name) Abstract. We show that a necessary and sufficient condition for a smooth function on the tangent bundle of a manifold to be a Lagrangian density whose action can be minimized is, roughly speaking, that it be the sum of a constant, a nonnegative function vanishing on the support of the minimizers, and an exact form.
We show that this exact form corresponds to the differential of a Lipschitz function on the manifold that is differentiable on the projection of the support of the minimizers, and its derivative there is Lipschitz. This function generalizes the notion of subsolution of the Hamilton-Jacobi equation that appears in weak kam theory, and the Lipschitzity result allows for the recovery of Mather's celebrated 1991 result as a special case. We also show that our result is sharp with several examples.
Finally, we apply the same type of reasoning to an example of a finite horizon Legendre problem in optimal control, and together with the Lipschitzity result we obtain the Hamilton-Jacobi-Bellman equation and the Maximum Principle.
This is a corrected version of the published manuscript. For details, please refer to the list of changes is presented at the end. This revision was done in November 2021.
1. Introduction. In this section, we will recall three areas to which our analysis techniques can be applied, and we sketch the respective result we obtain for each of them and its significance.
our main result, Theorem 2.12, reduces to the following statement. (2) is actually an equality throughout the image of (γ, γ ′ ), in other words, and 5. the momenta of γ coincide with the differential df , that is, where ∂/∂v denotes differentiation in the fiberwise direction of T γ(t) M .
In Theorem 2.12, the minimization is considered in a more general context: we allow closed measures on T M -rather than just curves-to be the candidates for minimization, and we allow them to have non-trivial boundary.
But the crux of the matter is already present in the toy version, Theorem 1.1. Those familiar with Mather's theory [24] will recognize item 3 as analogue to Mather's Lipschitz regularity result, except here the theorem gives regularity of the momenta ∂L/∂v; compare with item 5. Those familiar with Fathi-Siconolfi's weak kam theory [15,17,18] will recognize properties in items 1 and 4 as meaning that f is a critical subsolution of the Hamilton-Jacobi equation. Those familiar with Mañé's theory [10] will recognize the real number −A L (γ) as Mañé's critical value, as in this case it coincides with − inf η A L (η), where the infimum is taken over all absolutely continuous closed curves η.
The twist here is that we do not require the function L to be either convex, superlinear, bounded from below, Tonelli, quasi-convex, or coercive; our key assumption instead is that its action is minimizable. We also do not assume that the action functional is semi-continuous, or sequentially continuous. Furthermore, we do not assume that the minimizers are invariant under the Euler-Lagrange flow, and this flow may in fact not be well defined. The minimizers may not even enjoy a graph property, that is, a single velocity vector v for each point x in their projected support π(supp µ). In particular, the minimizers need not define a foliation or a lamination on M .
Similarly, the Hamiltonian flow and the Hamiltonian itself (that is, the Legendre-Fenchel convex conjugate of L) may not be well defined. However, we do find an energy conservation principle (Corollary 2.3).
Since we do not restrict the set of possible minimizers very much, the regularity part of our result does not occur on them (see Example 1) -instead, it occurs in the momenta ∂L/∂v. This motivates the the term "dual" in the title of the paper, as Mather's original result concerned the velocities of the minimizers and the result presented here concerns the dual object, namely, their momenta ∂L/∂v.
A version of Theorem 2.12 for time-dependent Lagrangians is given in Corollary 2.22. Before that, we establish a preliminary result that requires less regularity on the Lagrangian L and gives a weaker characterization result; this is Theorem 2.2, whose version in the time-dependent setting is Corollary 2.10; their proof is the goal of Section 2.1.
In the literature, results stating that an object in the tangent bundle T M is contained in the graph of a section of the bundle, or equivalently, that it intersects each fiber at no more than one point, are known as "graph theorems"; see for example [24,13,14,12,10]. In contrast, our result gives a sort of "dual graph theorem" for all minimizers, in the sense that, although the support of a minimizer µ need not be contained in the graph of a section of the tangent bundle T M , the momenta ∂L/∂v on the support of µ do need to be contained in a section of the cotangent bundle T * M .
The literature is extensive for results giving sufficient conditions for the existence of minimizers, and these conditions typically come in the form of coercivity, boundedness, or super-linearity of the function L; see for example [11,19]. In this direction there is also the line of research initiated by Morrey [25] related to quasiconvexity, a condition that was found to be necessary and sufficient for the weak sequential lower continuity of the action. In this paper, however, we assume the existence of the minimizers, and we do not examine the question of the continuity of the action.
We give examples of the application and sharpness of Theorem 2.12 in Section 3. In particular, we show that, in the level of generality that we need to work in so as to obtain a full characterization, it is impossible to prove any regularity of the minimizers (Example 1), and instead only the regularity of the momenta df = ∂L/∂v can be established (which is done in Section 2.2), and the regularity we prove is sharp (Examples 3, 4, and 5). In particular, Bernard's result [5] of existence of C 1,1 subsolutions of the Hamilton-Jacobi equation for Tonelli Lagrangians cannot be recovered in this setting. However we are able to recover Mather's original Lipschitz regularity result (Example 2). We also show how to use our theorem to prove some regularity of the distance function for non-strictly convex Finsler metrics (Example 7).
A question that remains open is that of the regularity of the form df at the boundary ∂µ of the minimizers.

1.2.
Optimal mass transport. The optimal mass transport context can be formulated as follows [1, §7.2]: given two probability measures ν 1 and ν 2 on a manifold M , the problem is to find a measure µ on T M whose boundary (understood as the boundary of the current on M induced by µ) is ∂µ = ν 2 − ν 1 and µ minimizes a Lagrangian action. The measure µ is understood to be encoding a bunch of curves that dictate how the mass of ν 1 should be moved to where the mass of ν 2 is, and the integrand is usually the arclength of those curves encoded by µ, so that the overall interpretation is that one is finding the way to move ν 1 into ν 2 with the least possible effort.

RODOLFO RÍOS-ZERTUCHE
The Young Superposition Principle (Example 6) is the statement that a measure µ with boundary ∂µ = ν 2 − ν 1 is a convex combination of generalized curves delineating trajectories that join ν 1 and ν 2 , so that the description above is meaningful.
There are a few formulations of the optimal mass transport problem, according to the properties of the Lagrangian function L involved in the action.
For the case in which the Lagrangian L is a C 2 function throughout the tangent bundle, our results in Theorem 2.12 and Corollary 2.22 characterize the Lagrangians for which this problem can be considered meaningfully, draw a relation with the corresponding Hamilton-Jacobi equation, and give a priori C 1,1 -regularity on the projected support of µ for the subsolutions of that equation that govern the properties of µ; cf. [1,Theorem 6.2.7]. Our results also generalize the Lipschitz regularity result obtained by Bernard-Buffoni [8], where the authors rely on assumptions of convexity, super-linearity, and completeness of the geodesic flow, which are not necessary in our version.
Another formulation of the optimal transport problem involves Lagrangian densities L that fail to be C 2 at the origin, but instead are positively homogeneous of degree 1, namely, L(x, λv) = λL(x, v) for all λ > 0, (x, v) ∈ T M , such as is the case for Finsler metrics. In this case, Lipschitz regularity results were also obtained by Bernard-Buffoni [7] and other authors mentioned in their work, and our results in Example 8 generalize those, as we manage to avoid the convexity assumption. Also in that example, in the case in which L is the distance associated to a Riemannian metric, we recover the result [1] that the transport is done along a gradient flow.
1.3. Optimal control. We also apply the same line of reasoning developed for Theorem 2.12 to analyze an optimal control problem in Section 4, and we are able to give a coarse characterization of minimizable integrands in Theorem 4.2 akin to the one developed in Section 2.1, as well as a result, Theorem 4.3, that gives sufficient conditions to obtain Lipschitz regularity of the momenta. We explain in Remark 4.4 the close connection with the Hamilton-Jacobi-Bellman equation satisfied by the value function and the Maximum Principle.
1.4. Other remarks. The author's view is that this paper is about the importance of the holonomy constraint ∂µ = c (see Definition 2.1), from whose exploitation stem most of the results that we obtain here. A recurrent motif is that whenever we minimize within a set of measures that vanish for a certain class of functions (exact forms in our case), the minimizable functionals correspond to functions that are nonnegative up to a addition of function in that class. What is interesting then is how the seemingly innocent assumption of minimizability in a set that satisfies ∂µ = c implies already some regularity as well as familiar concepts like energy conservation, the existence of calibrations, the maximum principle in optimal control, and the ubiquity of the Hamilton-Jacobi equation, among others.
The results of this paper have been applied to the characterization of deformations of closed measures in [27].
2. Characterization of minimizable action functionals. On a manifold X, let C ∞ (X) denote the set of smooth functions on X with the topology induced by the seminorms | · | K,k associated to each compact subset K of X and to each positive integer k, and given by Definition 2.1. A (compactly-supported) normal 0-current is a continuous realvalued functional C ∞ (M ) → R given by a signed, Radon, compactly-supported measure on M . Given a compactly-supported, Radon measure µ on T M , the boundary A measure µ is closed if ∂µ = 0. It can be checked that, if M is connected, a normal 0-current c is a boundary if, and only if, c, 1 = 0. For a fixed normal 0-current c, let H (c) be the set of compactly-supported, positive, Radon measures µ with ∂µ = c; in the special case c = 0, we additionally require the elements of H (0) to be probability measures.
Let E be a complete, sequential, locally-convex topological vector space of Borel measurable functions on T M that contains C ∞ (T M ) as a subset. We will assume that H (c) ⊂ E * and that the topology of C ∞ (T M ) described above is finer than the one this space inherits from E, so that every open set in the inherited topology is an open set in the topology described above. This assumption implies that every continuous linear functional ϑ ∈ E * defines a compactly-supported distribution when restricted to C ∞ (T M ).
To give some examples, E could be a space of C ℓ functions on T M with ℓ ∈ [0, ∞] and the topology induced by the seminorms (4) for k ≤ ℓ. For the verification of their adequacy, it may be useful to recall that it is enough for a topological vector space to be normed, metric, or first-countable in order for it to be sequential. Theorem 2.2. Let c be a normal 0-current on M that is a boundary.
If L is an element of E such that the action functional ν → T M L dν reaches its minimum within H (c) at some point µ, then there exist functions where the limits are taken in E. In particular, if c = 0,

RODOLFO RÍOS-ZERTUCHE
Conversely, if L has this structure, its action reaches its minimum within H (c) at µ.
We immediately have the following consequence of the fact that g i dµ → 0.
Corollary 2.3 (Energy conservation). Let L be an element of E and assume that the action functional of L reaches its minimum within H (c) at a minimizer µ. Define the Hamiltonian associated to L to be the function on T * M given by Then the value of H is constant throughout µ-almost all the support of µ and equals

Remark 2.4 (Terminology). The statement of Corollary 2.3 is known as an Energy
Conservation Principle because of historical reasons, which we now sketch. The quantity H is known as the Hamiltonian energy. If we add a time coordinate (of which L remains independent) and situate ourselves in the time dependent setting (see Section 2.1.2), then the minimizing measure µ satisfies [6, Lemma 16], in the sense of distributions, a continuity equation of the form which allows one to interpret µ as flowing along the vector field V . A particular consequence of the corollary is that the quantity H remains constant for almost all times t, and this gives meaning to the word "conservation" in the present context. (Theorem 2.12 will show that, with slightly stronger assumptions on L, H in fact is constant for all t.) For the proof of Theorem 2.2, we will need the following lemma.
Lemma 2.5. In the setting of Theorem 2.2, let Q = {ℓ : E * → R | ℓ is affine and continuous, its linear part is induced by evaluation at an element of E, and ℓ(µ) ≥ 0 for all µ ∈ H (c)}, Then For the proof of Lemma 2.5, we recall Remark 2.7. It is an easy consequence of the Hahn-Banach Separation Theorem that if X and Y are two convex cones in a locally-convex topological vector space V and X ′ = Y ′ , then their closures coincide, X = Y .
Proof of Lemma 2.5. We will show that Q ′ = R ′ in E * ⊕ R; in fact, these dual sets coincide with R ≥0 H (0) when c = 0 and with H (c) when c = 0. To see why, note first that the set of linear functionals ℓ g (ξ) = ξ(g) induced by nonnegative elements g ∈ C ∞ (T M ), g ≥ 0, is a subset of both Q and R. So if ξ ∈ Q ′ or ξ ∈ R ′ , necessarily ℓ g (ξ) ≥ 0 for all g, and by [23, §6.22] ξ can be represented as integration over a compactly supported, nonnegative, Radon measure.
Also, if f ∈ C ∞ (M ), then the affine functional This shows that ξ is contained in and consider the affine functional on E * given by Then ℓ satisfies Thus ℓ belongs to the set Q in the statement of Lemma 2.5. It follows that ℓ also belongs to R. Since E is sequential, the topological closure equals the sequential closure, so there exists a sequence of functionals ℓ i ∈ R of the form Comparing the linear and constant parts of ℓ and ℓ i , we conclude that We also have that Definition 2.8. When talking about the time-dependent setting, we will refer to the situation in which M is of the form N × P , where N is a smooth (d − 1)dimensional manifold and P is a connected, 1-dimensional, smooth manifold that plays the role of time. We fix a parameterization t : I ⊆ R → P , and we use it to distinguish, at each point p ∈ P , the vector 1 ∈ T p P such that dt p 1 = 1. We will denote by H 1 (c) ⊂ H (c) the set of elements of H (c) that are supported within the set T N × P × {1} ⊂ T M , which we will identify with T N × P .
Remark 2.9. We interpret measures µ in H 1 (c) as advancing in the time direction with "velocity" 1 ∈ T p P ∼ = R, p ∈ P , which roughly means that time itself always moves a the same speed.

RODOLFO RÍOS-ZERTUCHE
Let E be a complete, sequential, locally-convex topological vector space of Borel measurable functions on T N ×P that contains C ∞ (T N ×P ), such that H 1 (c) ⊂ E * , and such that the topology inherited by C ∞ (T N × P ) from E is finer than the topology induced by the seminorms (4). For example, E could be C ℓ (T N × P ) for ℓ ∈ [0, ∞] with the topology induced by the seminorms (4) for k ≤ ℓ.
It is straightforward to adapt the reasoning that gives Theorem 2.2 in order to obtain Corollary 2.10. Let N × P be a manifold that is the product of C ∞ manifolds N and P , with P playing the role of time, dim P = 1, so that we are in the timedependent setting. Let c be a normal 0-current on N × P such that H 1 (c) is not empty. Assume that L is an element of E such that ν → T N ×P L dν reaches its minimum within H 1 (c) at some point µ. Then there exist functions where the limit is taken in E, and Conversely, if L has this structure, its action reaches its minimum within H 1 (c) at µ.
This follows from Theorem 2.2 taking M = N × P . Note that ∂f i /∂t is constant on each fiber T x N .
Although a version of the energy conservation result, Corollary 2.3, holds in the time-dependent setting, it is not very transparent because it involves the lesstangible ∂f i /∂t.
Denote by π : T M → M the fiberwise projection.
Theorem 2.12. Let c be a normal 0-current on M that is a boundary, so that the space H (c) from Definition 2.1 is nonempty. Let L be an element of C 2 (T M ) such that the action functional ν → T M L dν reaches its minimum within H (c) at some point µ. Let U be an open subset of M with compact closure and such that π(supp µ) ⊆ U . Then there exist a Lipschitz function f : U → R, a nonnegative function g : T U → R ≥0 , and a bounded (possibly discontinuous) section α : U → T * U such that: which in particular means that Conversely, if L has this structure, its action reaches its minimum within H (c) at µ.
Remark 2.13. In items 3 and 4 we work away from the support of c for simplicity. Our proof shows however that the Lipschitzity should hold on all points x ∈ π(supp µ) for which there exist absolutely continuous γ : Proof. By replacing L with L − L dµ, we may assume that L dµ = 0.
By Theorem 2.2, there are functions f 1 , f 2 , . . . in C ∞ (M ) and g 1 , g 2 , . . . in with the topology induced by the seminorms (4) with k ≤ 2, g i ≥ 0, lim i→∞ c, f i = 0, and lim i→∞ g i dµ = 0. We may additionally assume that f 1 , f 2 , . . . are uniformly bounded on U . Since both f i and df i are uniformly bounded on the compact set U , by an application of Arzelà-Ascoli, perhaps passing to a subsequence, we may assume that the sequence We let α be a Clarke differential of f on U ; since df ≤ L wherever it is defined, by continuity of L we know that it is possible to choose α so that α ≤ L. We set g = L − α ≥ 0 on T U . With these definitions, we have that items 1 and 2 hold.
Similarly, since and g ≥ 0, it follows that g = 0 µ-almost everywhere. It will follow from the continuity of df on supp µ (a consequence of item 4) that g actually vanishes throughout supp µ, as stated in item 5. It remains to show that items 3 and 4 are true. We will prove these items for each open subset V of U that does not intersect supp c and is diffeomorphic to the open ball in R d , and the result will follow from the compactness of supp µ. We pass through the chart of V to the unit ball in R d , but for simplicity we keep all notations the same.
Let K be a subset of T V of the form {(x, v) ∈ T V : |v| < C 0 } for some C 0 ≫ 0 such that supp µ ∩ T V ⊂ K. Let φ : T V → R ≥0 be a smooth function that vanishes on K and grows larger than a positive multiple of |v| 2 outside a neighborhood of K. Note that replacing L with L + φ changes neither f , nor df , nor the statements of items 3 and 4; thus, we may assume, without loss of generality, that the Lagrangian L grows super-quadratically in v.
The convexificationL of L is defined bỹ It is locally Lipschitz throughout V . Also, it follows from [20,Theorem 4.2] thatL| TxV is in C 1,1 loc (T x V ) for each x ∈ V , that is, it is differentiable and its derivative is locally Lipschitz. We show in Lemma 2.14 that (x, v) → ∂L ∂v (x, v) is locally Lipschitz. The LagrangianL can also be decomposed asL = α +g for some function 0 ≤g ≤ g that is convex and C 1,1 loc in the fibers of T V . The rest of the proof is inspired in [15,Section 4.11]. It follows from Lemma 2.16 that for π * µ-almost every x in π(supp µ) ∩ V there is a curve γ x as in the hypothesis of Lemma 2.21; let us call A the set of such points x. This means that the hypothesis for the criterion for f having a Lipschitz derivative, Lemma 2.19, hold for all points in A ∩ V , so that f is differentiable throughout A and its derivative x → df x is locally Lipschitz in the set A 1 3 = A ∩ 1 3 V . While x →γ x (0) may not be continuous, the family of linear functionals is Lipschitz, so it can be extended in a unique way to the closure A 1 3 . Additionally, sinceL and ∂L/∂v are locally Lipschitz, it follows that at each point x in A 1 3 there is a vector v x such that the Lipschitz extension can be written as The functions f and x → ∂L/∂v(x, v x ) being locally Lipschitz (by Lemma 2.14), the condition must hold throughout the closure A 1 3 (replacingγ(0) with v x ). By another application of Lemma 2. 19, it follows that f is actually differentiable throughout A ∩ 1 9 V ⊃ π(supp µ) ∩ 1 9 V and that the map x → df x is Lipschitz there, as we wanted to prove.
Lemma 2.14. Let A and B be two open, convex subsets of R m and R n , respectively, for some m, n > 0. Assume that the function c : A × B → R is Lipschitz, and for each u ∈ A the map v → c(u, v) is convex and C 1,1 loc . Then ∂c/∂v is locally Lipschitz.
Proof. Since c is known to be C 1,1 loc in the B-direction, we need only show that, for each fixed vector v 0 , the mapping u → ∂c/∂v(u, v 0 ) is locally Lipschitz. For where the intersection is taken over all neighborhoods W ⊆ B of v 0 . Since c is Lipschitz, the maps u → f v1,...,vn+1 (u, v i ) are Lipschitz too, and so also the maps u → for all x in a neighborhood of x 0 . In particular,L • σ is everywhere upper Dini differentiable in V .
Proof. Let C D be an upper bound for the norm of the Hessian of L in D.
For each (d + 1)-tuple of vectors v 1 , v 2 , . . . , v d+1 ∈ T x0 V ∼ = R d and for each x ∈ V such that σ(x) is in the interior of their convex hull, let Since the second derivatives of all the functions on φ(v 1 , . . . , v d+1 ; ·) are bounded by C D , the statement of the lemma follows.
Lemma 2.16. For π * µ-almost every x ∈ π(supp µ) ∩ V there is some t 0 > 0 such that for almost every 0 < t ≤ t 0 , there is an absolutely continuous curve γ : Proof. This follows immediately from the decomposition result of [28]; see also the exposition in [3, §3].
where the infimum is taken over all absolutely-continuous curves γ : For π * µ-almost every x ∈ π(supp µ) ∩ V there is t 0 > 0 such that for all 0 < t ≤ t 0 , the function f satisfies where the supremum is taken over all absolutely-continuous curves γ : [0, t] → V with x = γ(0). Moreover, the infimum (6) and the supremum (7) are both realized by absolutely-continuous curves whose images are contained in π(supp µ).
Lemma 2.18. Let γ : [−t, t] → V be an absolutely continuous curve passing through γ(0) = x ∈ π(supp µ) and such that γ minimizes the action ofL among all absolutely continuous curves with the same endpoints γ(−t) and γ(t). Then for almost every s ∈ [−t, t], dγ(s) is defined in the sense that s is a Lebesgue point of dγ ∈ L 1 ([−t, t]), and we have thatL is differentiable at dγ(s).
Proof. The fact that dγ is defined almost everywhere on [−t, t] follows from the Lebesgue Differentiation Theorem. We know thatL is C 1,1 loc in the fibers, so in order to prove the first statement we need only show that the derivative exists in the horizontal direction, that is, we need to show the existence ofL x (dγ(s) for almost every s ∈ [−t, t].
Recall that, by virtue of Lemma 2.15, we know that for each point ( as δ ց 0. Let h : [−t, t] → R d be a smooth map such that h(−t) = 0 = h(t). Then we have, by virtue of (9) and of the minimality of γ, and by a degree 1 Taylor expansion in the fiberwise direction, As δ ց 0, this sandwiches the expression whose limit would correspond to the integral of the derivative in the horizontal direction (whose existence we want to prove), namely,L between two expressions that are linear in h. These expressions must hence coincide, thus implying the existence ofL x for almost every s ∈ [−t, t]. Equation (8) Then the map h has a derivative at each point x ∈ A K,h , and d x h = ϕ x . Moreover, the restriction of the map x → d x h to {x ∈ A K,u : x < 1/3} is Lipschitzian with Lipschitz constant ≤ 6K.
As a partial converse we also have Lemma 2.20. Let B be the open unit ball in the normed space E. If h : B → R is differentiable and the map x → dh x is K-Lipschitz for some K > 0, then for all x, y ∈ B, we have Lemma 2.21. Let γ be a curve as in Lemma 2.18, and additionally assume that γ is differentiable at 0 in the sense that 0 is a Lebesgue point for dγ ∈ L 1 ([−t, t]; R d ). Let x = γ(0). Then there is some K > 0 such that, for y ∈ V , Proof. Let 0 < ε ≪ t. Let D be an open subset of T V that contains supp µ ∩ T V as well as all vectors of size ≤ 1/ε, and has compact closure D, and let K 0 be the Lipschitz constant of (x, v) → dL(x, v) on D. For q ∈ R d , q ≤ 1, let γ q,ε : [−ε, 0] → V be the curve such that SinceL = α +g withg ≥ 0, we have and also, since f satisfies (6), Thus, where the last inequality follows from (11). Note that it follows from Lemma 2.18 thatL x (dγ(s)) is well defined for almost every s. Now we will show that 0 −εL We let ψ : R → R be a smooth, nonnegative function, vanishing in a neighborhood of 0, and equal to 1 outside a slightly larger neighborhood of 0. We write, for 0 < r ≪ 1, (15) The term (14) vanishes because h(s) = ψ( s r ) ε+s ε q satisfies the condition for (8) to hold (it vanishes at s = −ε, 0). Now, as r → 0, we see that the first term in (15) vanishes asymptotically because 1 − ψ( s r ) tends to 0. So we are left with the second term in (15), which we expand to get Now, the second term in (16) vanishes as r → 0 because, again, 1 − ψ( s r ) tends to 0. The first term, on the other hand, contains − 1 r ψ ′ ( s r ), which approximates the Dirac delta function at s = 0 as r → 0, so the integral converges toL v (dγ(0))q, which proves (13).

2.2.2.
Time-dependent setting. Recall that the time dependent setting was defined in Section 2.1.2. The following is a corollary of Theorem 2.12, with M = N × P .
Corollary 2.22. Let N × P be a manifold that is the product of two C ∞ manifolds N and P , with P playing the role of time, dim P = 1, so that we are in the timedependent setting. Let c be a normal 0-current on N × P such that H 1 (c) is not empty. Assume that L is an element of C 2 (T N × P ) such that ν → T N ×P L dν reaches its minimum within H 1 (c) at some point µ. Let U be an open subset of N × P with compact closure such that π(supp µ) ⊆ U . Then there exist a Lipschitz function f : U ⊆ N ×P → R, a nonnegative function g : T U ∩(T N ×P ×{1}) → R ≥0 , and a bounded (possibly discontinuous) section α : U ⊆ N × P → T * N × R such that: 1. α is a Clarke differential of f (coinciding with α = df wherever f is differen- which in particular means that L = T N ×P L dµ + df + g, c = 0, df + g, c = 0, wherever f is differentiable; 3. for every open set V ⊂ N × P not intersecting the support of c, f is differentiable on π(supp µ) ∩ V and, for (x 0 , t 0 , v 0 , 1) ∈ supp µ ∩ T V , we have where v ∈ T x0 N , τ ∈ T P , and ∂L/∂v denotes the derivative of the restriction Conversely, if L has this structure, its action reaches its minimum within H 1 (c) at µ.

Examples.
Example 1 (Exact form). If L is itself an exact form, that is, if L = df for some f ∈ C ∞ (M ), then its action can be minimized by any closed measure µ ∈ H (0). This shows that it would be impossible to prove any regularity for the minimizers without stronger hypotheses on L. It also shows that every measure µ ∈ H (0) is a minimizer of infinitely many Lagrangians, so that it would be hopeless to try to prove the regularity of the minimizers.
Example 2 (Tonelli Lagrangians). In the time-dependent setting on N × P with P = S 1 = R/Z, if L is strictly convex and super-linear in the fibers of T N , the existence of minimizers was proved by Tonelli; see for example [15]. From Corollary 2.22 we recover Mather's theory [24] in slightly greater generality because we do not require the minimizers to be invariant under the Euler-Lagrange flow (and we are not the first ones to achieve this greater degree of generality; see [15]): for minimizers in H 1 (0), the Lipschitzity of df implies in this case that supp µ defines a Lipschitz fibration.
This is a context that has been studied very extensively. Among other results, we mention that it has been proved that f can be chosen to be C 1,1 throughout N × P [5] or as a so-called viscosity solution of the Hamilton-Jacobi equation [15], a property that implies much stronger regularity than we prove in Theorem 2.12 and has interesting consequences regarding the associated dynamical system. A good summary can be found in [16].
Example 3 (Irregularity outside π(supp µ)). Let M = S 1 = R/Z, and let f : S 1 → R be a Lipschitz function, differentiable at 0, and with f ′ (0) = 0. We let µ = δ (0,0) . Let L : T S 1 → R be a smooth function with L ≥ df , L(0, 0) = 0, and such that S 1 L(x, r) − df x (r) dx → 0 as r → ±∞. Then this f is the only possible such function in the statement of Theorem 2.12. In the theorem, f is shown to be C 1,1 on π(supp µ), and this example shows that no better result can be obtained outside π(supp µ).
Example 4 (Sharpness of the Lipschitzity of df ). This example was communicated to the author by Stefan Suhr, who learned it from Victor Bangert.
In the Beltrami-Klein model of 2-dimensional hyperbolic space, the geodesics correspond to the straight lines on the unit disc D. Let g be the corresponding Riemannian metric on D. Take the family Γ of straight rays that emanate from R ≤0 ⊂ C and are vertical within {Rez ≤ 0} and radial from the origin within {Rez ≥ 0}. The family Γ foliates D\ R ≤0 . Note that the derivatives of the geodesics in Γ are only Lipschitz-varying, as the rate of change of these derivatives is not differentiable at {Rez = 0}.
We consider the case in which L(x, v) = g x (v, v) = |v| 2 . Take any closed subset A of D \ R ≤0 that is bounded in the metric g (in particular, it is bounded away from the circle ∂D). The geodesics in Γ can be indexed by R, so that Γ = {γ r } r∈R . Take a measure ν on R. Assuming that we have a unit-speed parameterization of the geodesics γ ∈ Γ, g(γ,γ) = 1, the measure µ defined by is a minimizer of the action of g within the set of measures that share its boundary, that is, within H (∂µ). The function f in this case corresponds to the distance to R ≤0 .
Since g is smooth, and since necessarily we have (for x ∈ π(supp µ) and for r and t such that x = γ r (t)), it follows that df only has Lipschitz regularity in this example. This shows that the version of Theorem 2.12 for measures with boundary ∂µ cannot be improved, and suggests that the same is true for closed measures.
Example 5 (More irregularity in the wild). This example was suggested to the author by Marie-Claude Arnaud.
In [9, §4.2], a Riemannian metric is constructed on the 2-dimensional torus T 2 = R 2 /Z 2 that is hyperbolic except inside a small disc D ⊂ T 2 . A method is then described to find a measure µ that minimizes the action and does not correspond to a closed orbit because it has irrational homology. The measure µ is invariant under the geodesic flow of T 2 .
The theory developed in [2] can be adapted to analyze the regularity of supp µ. That theory is about maps on the annulus S 1 × R. To adapt it, take a smooth circle β ⊂ M transversal to restriction of the geodesic flow determined by supp µ, and look at the map φ : T β → T β determined by the first-return map of that flow.
What the theory of [2] tells us is that x → df x in this case cannot be the the restriction of a C 1 section of T * T 2 . Already from Mather's theory [24] (or from Theorem 2.12) we know that it must be Lipschitz, and the question remains as to whether it is something in-between. Let G(c) be the set of generalized curves connecting supp c − and supp c + , namely, the set of compactly-supported, positive, Radon measures µ on T M such that there are some T > 0 and a Lipschitz curve γ : [0, T ] → M such that γ(0) ∈ supp c − , γ(T ) ∈ supp c + , and for all x ∈ π(supp µ) there is t ∈ [0, T ] with γ(t) = x and for almost every t ∈ [0, T ], TxM v dµ x = γ ′ (t), where µ = M µ x d(π * µ) is the canonical fiberwise disintegration of µ. We take G(0) to be the set of measures of the form just described with γ(0) = γ(T ).
Let C(c) be the set of measures µ γ induced by C ∞ curves γ : [0, T ] → R starting in γ(0) ∈ supp c − and ending in γ(T ) ∈ supp c + by pushing forward the Lebesgue measure Leb [0,T ] into T M with the derivative of γ, that is, µ γ = (γ, γ ′ ) * Leb [0,T ] . Again, we take the set C(0) to be the set of measures of the form just described with γ(0) = γ(T ).
Here we show how to use Theorem 2.2 to show This is known as the Young Superposition Principle [6, §5] since it first appeared in Young's book [29].
Proof. To prove these statements, observe that the set G(c) is closed, and that C(c) is dense in it, so the second statement will imply the first one. To prove the second statement, we will use the fact that a closed convex cone K in the set of compactly supported distributions E ′ is completely determined by the set K ′ of continuous linear functionals that are nonnegative on it; see Remark 2.7. Let A be the set of positive, compactly supported, Radon measures µ on T M whose boundary is a signed measure ν on M , ν = ν + − ν − , with supp ν ± ⊂ supp c ± . By Theorem 2.2 we know that A is the subset of the set of compactly-supported distributions ν ∈ E ′ with ν, L ≥ 0 for all L ∈ C ∞ (T M ) of the form supp c − f i ≥ 0 and g i ≥ 0. Similarly, Theorem 2.2 shows that the same functionals that are nonnegative on the set conv R ≥0 (C(c) ∪ C(0)) are exactly the same ones. Observe that the set H (c) is the (closed and convex) subset of A consisting of measures µ with ∂µ = c. Taking the intersection in the right-hand side of (17), we get the equality. For each m of this kind, there exists a Lagrangian L : T M → R that is C 2 , convex, super-quadratic on the fibers, L ≥ m, and L(x, av) = m(x, av) for exactly one a = 0 for each (x, v), v = 0. Applying Theorem 2.12 to this function L together with the minimizers µ, we obtain in fact that the function f (x) = dist m (X, x) is of class C 1,1 loc throughout A. This extends some of the results of [21,22] to the non-strictly convex case.
Example 8 (Finsler-like optimal mass transport). Let m : T M → R be a function that is homogeneous of degree 1 in the fibers, meaning that m(x, av) = a m(x, v) for all a > 0, and assume that m is C 2 away from the zero section. Let ν 1 and ν 2 be two positive Radon measures on M , and consider the following optimal mass transport problem: minimize the cost T M m(x, v) dµ among all measures µ on T M whose induced current has boundary ν 2 −ν 1 . It follows from Theorem 2.12 that, if there is a measure µ on T M solving that problem, then m must be of the form m = df + g for some Lipschitz function f : M → R that is C 1,1 loc on π(supp µ)\(supp ν 1 ∪supp ν 2 ) and some measurable, nonnegative function g : T M → R ≥0 that vanishes identically on supp µ \ π −1 (supp ν 1 ∪ supp ν 2 ).
To apply Theorem 2.12 in this case, first note that, due to the 1-homogeneity of m, we may assume that the minimizers are supported on the unit sphere of the fibers of T M . Then take the function L : T M → R to be C 2 , equal to m outside a neighborhood U away from the zero section, and slightly greater than m in U .
The argument sketched above only gives us information relating to what happens away from supp ν 1 ∪ supp ν 2 ; to overcome this issue, one can use the Young Superposition Principle (Example 6) to conclude that the same result given by Theorem 2.12 must be valid for every subset of the generalized curves that compose the minimizer µ, with the same functions f and g, and note that (the support of) the boundary of each generalized curve consists of just two points on M .
This generalizes some of the results of [7]. We remark also that the result that, when m 2 is a Riemannian metric, the flow encoded by µ actually goes along a gradient vector field [1] also follows from the statements sketched above: that vector field is precisely the dual of df , namely, ∇f .

4.
Optimal control and the maximum principle. To illustrate how the methods we have developed in Section 2 can also be used to understand problems of optimal control, we will apply them to a slight relaxation of a problem discussed in [4, §iii.3], which is known as a finite horizon Legendre problem. Although the theory could be developed in greater generality, we refrain from doing this here in the interest of simplicity.
The main result of this section is Theorem 4.3, which is analogous to Theorem 2.12, and its content is linked to the previously-existing theory of optimal control in Remark 4.4, where we recover the Maximum Principle and the Hamilton-Jacobi-Bellman equation. Also, Theorem 4.2 is a coarse characterization of the minimizable functionals analogous to Theorem 2.2.
The relaxation we will consider replaces -in an analogous way as we do in our treatment in Section 2-curves with measures; this is almost equivalent to the relaxation described in [4, §iii.2.5 (pp. 113-116)] that entails the introduction of socalled relaxed or chattering controls. As in the statement of [4, Corollary iii.2.21], this relaxation of the problem is not very significant because the resulting value function should coincide with the one corresponding to the original finite horizon Legendre problem.
We will try to keep the same notations as in the book, although we will immediately translate to the setting that interests us.  (19) and (20) below draw the connection between the approach taken here and the optimal transport theory described in [6,8]. Thus in the present context it is also possible to think about the minimizer as describing a sort of transport plan, except that the velocities must be obtained through the controls. Still, a version of the Young Superposition Principle (Example 6) holds in this context, so that the measure µ can be understood as encoding a set of generalized curves describing the trajectories of the transportation of the component of ∂µ at t = 0 to its component at t = t 0 . Also, the Lipschitz regularity results of Theorem 4.3 generalize those obtained in [8].
4.1. Setting. We fix a time interval I = [0, t 0 ], t 0 > 0. We let t : I → R be a chart, and we will denote by 1 the vector field tangent to I such that dt(1) = 1.
We let A be a topological space to serve as the set of controls, and N > 0 denote the dimension of the space R N of states.
We assume that we are given a continuous function f : R N × A → R N that gives the dynamics; thus, if we had a curve y : I → R N in the space of states corresponding to a control α : I → A, y would satisfy y ′ (t) = f (y(t), α(t)), t ∈ I.
However, instead of considering such curves, we will consider the set I of compactly-supported, Radon, probability measures ν on R N × I × A satisfying the condition This amounts to requiring ∂ν to be supported at times 0 and t 0 . We remark that in the case of the curve y above, this condition would take the form where we assume that the running cost ℓ : R N × I × A → R smooth in the R N and I variables x and t.

Coarse characterization.
Theorem 4.2. In the setting just described, assume additionally that the cost J reaches its minimum within I at the probability measure µ. Then there exist sequences of functions u i ∈ C ∞ (R N × I) such that u i ≡ 0 on R N × ∂I, and where the limit is taken in the space of continuous functions C 0 (R N × I × A) with the topology of uniform convergence on compact sets, and Conversely, if ℓ has this structure, J reaches its minimum within I at µ.
Sketch of proof of Theorem 4.2. One can prove the theorem by following essentially the same ideas as for Theorem 2.2. A lemma analogous to Lemma 2.5 holds in this setting, with the definitions Q ={l : C 0 (R N × I × A) → R | ℓ is affine, its linear part is induced by evaluation at an element of C 0 (R N × I × A), and l , ν ≥ 0 for all ν ∈ I }, R ={l : C 0 (R N × I × A) → R |l(ν) ≥ ν, du • (f, 1) for some u ∈ C ∞ (R N × I) vanishing on R N × ∂I and for all ν ∈ I .} and with the same conclusion that R = Q, and then the rest of the argument can be adapted easily.

4.4.
Lipschitzity. The partial equivalence between the minimization of J and the minimization of the action of L, together with the results we have obtained for the latter and their proof, suggest that in order to obtain results on the regularity of the value function (defined in (22) and discussed in further depth below), assumptions must be made that will ensure first the regularity of the fiberwise convexificationL (defined in (20)) of L.
In this direction, we present the following result, whose technical-looking conditions are relatively mild; see Example 9.
Then there exist a Lipschitz function u : U × I → R, a nonnegative function w : U × I × A → R ≥0 , and a (possibly discontinuous) bounded section β : U × I → T * U such that his patience in listening to sometimes very confused presentations of these results, and for his help in clarifying my ideas with numerous questions and suggestions. I am deeply grateful to Marie-Claude Arnaud, Victor Bangert, Jaime Bustillo, Albert Fathi, Uwe Helmke, and Stephan Suhr for their numerous suggestions and discussions. I am very grateful to theÉcole Normale Superieure de Paris and the Université de Paris -Dauphine for their hospitality and support for development of this research. Finally, I am very grateful to the anonymous referee for suggesting interesting connections to parts of the theory that I was unaware of, and for providing other helpful comments.
List of changes.
• The main issue this correction is addressing was an error in the proof of Theorem 2.2. The proof is now corrected, and for this the statement and proof of Lemma 2.5 has been adapted considerably, and the proof of Theorem 2.2 has also changed. The main idea, however, remains the same.