A convergent Lagrangian discretization for $p$-Wasserstein and flux-limited diffusion equations

We study a Lagrangian numerical scheme for solution of a nonlinear drift diffusion equation of the form $\partial_t u = \partial_x(u \cdot c[\partial_x(h^\prime(u)+v)])$ on an interval. This scheme will consist of a spatio-temporal discretization founded in the formulation of the equation in terms of inverse distribution functions. It is based on the gradient flow structure of the equation with respect to optimal transport distances for a family of costs that are in some sense $p$-Wasserstein like. Additionally we will show that, under a regularity assumption on the initial data, this also includes a family of discontinuous, flux-limiting cost. We show that this discretization inherits various properties from the continuous flow, like entropy monotonicity, mass preservation, a minimum/maximum principle and flux-limitation in the case of the corresponding cost. Convergence in the limit of vanishing mesh size will be proven as the main result. Finally we will present numerical experiments including a numerical convergence analysis.


Introduction
In this paper, we want to study a spatio-temporal discretization of a family of parabolic equations, We will consider external potentials v ∈ C 2 ([a, b]) which will be convex and internal energy potentials h m : [0, ∞) → R which will either be the Boltzmann entropy h 1 (s) = s log(s) + s (where we take 0 log(0) = 0) or the Rényi entropy h m (s) = 1 m−1 s m . c * will denotes the Legendre-transform of c i.e. c * (s) := sup r∈R (sr − c(r)). We consider a family of functions for c, that can be called "p-Wasserstein-like" cost functions. A cost function is called a p-Wasserstein cost function (or Monge-Kantorovich cost of order p) if it has the form s → c(s) = 1 p |s| p (c.f. [26]).
Let p ∈ (1, ∞). To be a "p-Wasserstein-like" cost function, c has to have the following properties. (1, ∞), which induces the p-Wasserstein distance and lends its name to this family of cost functions. We present two PDEs arising from certain choices of parameters.
• Let q ∈ N with q > 1. Then choose p ∈ (1, 2) such that 2−p p−1 = q and pick m = 3 − p > 1. For any constant external potential, the equation becomes the q-Laplace equation The dynamic of such a q-Laplace equation can be seen in Fig. 5. • Let p = 2. Then the equation has the form of a Fokker-Plank equation . If we chose v = const., then the above properties of c will suffice to show the claims of this paper. If v is not a constant, however, we require c to be 2-Wasserstein like. For example, if p ≥ 2, c (0) = 0 is implied which leads to (c * ) (r) = (c ) −1 (r) not being Lipschitz any more at r = 0. Therefore we cannot guarantee well posedness of our equation.
Assuming some regularity of the initial data, another interesting family of cost functions is included in our calculations. Let ∂ x u 0 (x), the weak spatial derivative of our initial data, be bounded. We will consider a family of cost functions, called "flux-limiting", satisfying (A.1) and the following properties: (Af.1) c −1 ([0, ∞)) = [−γ, γ] for some γ > 0, so it has to have a symmetric compact proper domain. (Af.2) c ∈ C 2 ((−γ, γ)). where γ > 0 (c.f. [20]). In these cases and with m = 1 our equation becomes Roseanu's tempered diffusion equation introduced in [22]: 1.1. Gradient flow structure. The equation (1.1) can be written as a transport equation where the velocity V itself depends on the solution To arrive at the in-time discretization of our problem we will make use of the following fact (c.f. [1]). Solutions to our equation form a gradient flow in the energy landscape of with respect to the metric induced by the optimal transport distance with cost c. Note that we will write L for the Lebesgue measure.

Inverse Distribution Functions.
We will carry out our analysis not on the densities u but much rather on the inverse distribution functions X of u.
Recall that for a probability measure ρ ∈ P(I) with density u such that u + 1 u ∈ L ∞ (I), the (cummulative) distribution function is given as U u (x) := x a u(ζ) dζ and the corresponding inverse distribution function is given by X u = U −1 u . Furthermore note that in this case, X is a.e. differentiable with ∂ ξ X = 1 u•X . This shift from u to X u allows us to pass from the set P(I) equipped with the optimal transport metric W c (u, w) to the space L p ([0, 1]) equipped with W c (X u , X w ) (defined in the Lemma below) which is equivalent to the L p -distance of X u to X w as we can see by (A.3) which tells us that This shift is rigorously justified by the following Lemma that is an adaption of [25, Thrm. 2.17.]: . Then the optimal transport distance W c,τ (u, w) can be expressed in terms of IDFs X u , X w as follows So instead of gradient flows in the energy landscape of (1.7) w.r.t. the optimal transport distance, we will consider the corresponding gradient flows of the inverse distribution functions in the energy landscape of where h m (s) = sh m (1/s). This corresponds to (1.7) in the sense that H m (u) = H m (X u ) which can be checked easily when using that X u pushes the measure ρ forward to the Lebesgue measure on [0, 1]. Instead of equation (1.1), we will consider which is the corresponding IDF-version of the (1.1). The formal correspondence between (1.10) and (1.1) can be seen easily, when one uses the push-forward connection between u and X u which translates, since X u is differentiable a.e., to u(x) = (∂ ξ X u (U u (x))) −1 .
1.3. Discretization. We will take care of the in-time discretization first. To that end, we will make use of the celebrated JKO-scheme, also known as the minimizing movement scheme (c.f. [14]). Expressed in our case, we will approximate a solution of the equation with X : by the piecewise constant in time interpolation The sequence X (n) itself is given by the recursion where the minimization takes place on the set of inverse distribution functions and X (0) is an approximation to the IDF of the initial data.
The discretization in space will be a simple restriction to piecewise constant IDFs on a uniform grid (0, 1 k , 2 k , . . . , 1) on [0, 1] and the values x i of such an X on the grid will be encoded in a vector x = (a = x 0 , x 1 , . . . , x k = b). This way we receive by Since an interpretation of the IDF is, that for ξ 1 < ξ 2 the value ξ 2 − ξ 1 prescribes how much mass u has on the interval [X(ξ 1 ), X(ξ 2 )], we recover our piecewise constant u from the vector x as Finally we define a replacement for the spatial derivative of X, which will be the piecewise constant function of finite forward differences given by the abbreviation δx i := xi+1−xi A short explanation why we are dealing with piecewise constant IDF instead of the picewise affine linear ones, which would allow for a easier way to deal with the "∂ ξ X" appearing in H m , is in order. The main reason lies in the technical difficulties arising in the calculations to follow. The Euler-Lagrange equation will relate the approximate IDF X, which would be piecewise affine linear, and its derivatives, possibly of higher order, which would be piecewise constant at best. To bring these different forms together would require an unnecessary amount of technicalities that would all but divert from the results we want to discuss in this paper.
With this spatial discretization, we arrive at our fully discrete functionals that will be subject to minimization in each time-step.
The distance, the integral incorporating the external potential and the internal energy functional will take on the discrete form is the approximate energy functional applied to our vector x.
1.4. Approximate solution and main theorem. This leaves us with a recursive algorithm to receive a sequence x (n) from which we will recover an approximate solution to our problem (1.1)-(1.3).
Algorithm. Let k ∈ N and T, τ k > 0 such that T /τ k ∈ N. Let x (0) k ∈ X k (I) be some initial data. Then solve for n = 1, 2, . . . , T /τ k the minimization problem (1.13) We will abbreviate the functional that is minimized in each step as Φ(τ k ; x (n−1) , x). The algorithm is well defined, as will be shown in the succeeding section.
This way we receive a sequence x (n) k of vectors inducing IDFs by (1.11) and densities by (1.12). Indeed we will not show, that these approximate densities converge to a solution of our PDE (1.1)-(1.3) directly, but show that the approximate IDFs define a sequence of functions converging to a solution of the PDE in terms of IDFs.
Let us introduce in short the approximate IDFs first. Let T = N k τ k . Then the piecewise constant in time and space functions X k are defined as (1.14) A more detailed introduction of the IDFs and other auxiliary functions is given in Definition 15.
To simplify the expressions appearing in the following claims and proofs, we have already abbreviate the finite forward difference quotient of our vectors x (n) as δx Furthermore we now define the second symmetric difference quotient for a vector x ∈ X k ([a, b]) as This way δ[δx i−1 ] = δ 2 x i holds. Note that we will denote with p the Hölder conjugate p = 1 1−1/p . k are bound from above and below by δx (0) and δx (0) respectively. Then the sequence X k , defined by (1.14), has the following properties. There is an unrelabled subsequence such that and additionally ∂ ξ X * ∈ L 1 (0, T ; W 1,p ((0, 1))). (2) The limit X * solves the following weak formulation of (1.10): The initial data are assumed continuously lim t 0 X * (t) = X (0) in L 1 ([0, 1]).
As already announced, if we assume more regularity of the initial data, the above claim also holds for the flux-limiting cost.
Theorem 3. Let the prerequisites of Theorem 2 hold, v = const. and c is not p-Wassersteinlike but flux-limiting. Additionally we assume finite bounds δ 2 x (0) , δ 2 x (0) , such that the symmetric second difference quotient δ 2 x (0) k are bound from above and below by δ 2 x (0) and δ 2 x (0) respectively. Then convergence to a weak solution of (1.10) holds in the sense of Theorem 2 with p = 2 and the initial data are again assumed continuously.
1.5. Related results from the literature. The idea to use a Lagrangian scheme for a problem of the form (1.1) can be traced back to papers by MacCamy and Socolovski [17], where m = 2 and v = 0 and Russo [24], where h m (s) = s and again v = 0. The former paper presents a discretization for the densities u that is very similar to ours, the latter comparing Lagrangian discretizations and discussing generalizations to the two-dimensional case. In [4] a moving mesh is used to capture numerically self similar solutions of the porous medium equation. In [7] aggregation equations are studied in terms of a Lagrangian scheme basing on evolving diffeomorphisms.
The gradient flow structure of our equation was investigated by Agueh for p-Wasserstein cost [1] and by McCann and Puel for flux-limiting cost [20] both showing convergence of the minimizing movement scheme to a solution of the equation (1.1).
The connection between Lagrangian schemes and the gradient ow structure was investigated by Kinderlehrer and Walkington [15] and in a series of unpublished theses [21], [16]. Westdickenberg and Wilkening obtain in [27] a similar scheme as a by-product in the process of designing a structure preserving discretization for the Euler equations. Burger et al [5] devise a numerical scheme in dimension two on basis of the gradient ow structure, using the dynamic formulation of the Wasserstein distance [3] instead of the Lagrangian approach. The Lagrangian approach was adapted to fourth order equations, namely by Cavalli and Naldi [8] for the Hele-Shaw ow, and by Dring et al [10] for the DLSS equation.
The spatial discretization we will be analyzing was proposed by Laurent Gosse and Giuseppe Toscani and analysed in [13]. It is worth noting that therein the approach taken in our paper, to solve the equation in terms of IDFs, was taken as well, as was in [18] to analyse a aggregationadvection-diffusion equation. In [19] convergence for a family of energies including a potential energy and more general h m in the special case quadratic Wasserstein distance.
We want to mention another numerical scheme close to the one examined in this paper, but with the possibility for generalization to higher dimensions. Instead of decomposing the transport map G, we could have discretized the diffeomorphism G itself. This approach was recently thoroughly investigated by Carillo et al [6].
1.6. Outline. Proving Theorem 2 will basically consist of three steps.
First, we will show some properties about the sequence of minimizers x (n) k : it is well defined, it satisfies a maximum/minimum principle, we will give a discrete Euler-Lagrange equation corresponding to the minimization problem and finally we will establish a priori estimates which gives us control over the right hand side of the Euler-Lagrange equation.
Then we will prove the main convergence results, especially the strong convergence of δ ξ X k , the weak compactness of the sequence of functions V k corresponding to the r.h.s. of the Euler-Lagrange equation and consequently of δ Another convergence result that we have to established is the limit of the Euler-Lagrange equation in terms of functions. However, since (c * ) is not linear, we cannot identify the limit of the r.h.s. V k right away to receive the equation that is satisfied by X * .
This final step of identifying the limit will be done by using a monotonicity argument, the so called Browder Minty trick. After the identification of the non-linear limit, we have shown the convergence of our sequence X k to a limit X * that satisfies the weak formulation of the PDE in terms of IDF. We will combine this with the in-time regularity of X * , which is Hölder-continuous on [0, T ] for α = 1/p, to receive continuity at t = 0.
Concerning Theorem 3, the main reason we can apply the arguments for the p-Wasserstein cost to the flux-limiting cost is a second maximum-/minimum principle, this time for δ 2 x (n) , which shows that if we start with initial data that are regular enough, then our minimization problem in some sense does not see the discontinuity of our cost and the cost can be treated as 2-Wasserstein like. We will note the key parts of the proofs where a difference has to be made if one is working with flux-limiting cost.
In the last part of the paper, we give some numerical examples to illustrate the dynamics of the p-Wasserstein diffusion and flux-limiting diffusion. Additionally there will be an approximate solution to a parabolic q-Laplace equation and some numerical convergence analysis.

Properties of the minimization problem
In this section we want to lay the foundation for the succeeding section, which will show the crucial convergences that are needed for the main theorem. First of all, we will show that the sequences x (n) are indeed well defined. This is followed by the Euler-Lagrange equation for the minimization. Additionally we will show two estimates.

Existence and uniqueness.
Lemma 4. The minimization problem (1.13) has a unique minimizer Additionally, the functional is strictly convex, since both summands are and τ > 0, so the minimizer is unique. Note that h m is monotonously decreasing. This yields the following inequalities with Now that we know that the minimization is well defined, we can establish the following result.

Proof.
Plugging in the feasible x = x (n−1) in the minimization problem combining it with x (n) being a minimizer and with W c,τ k (x (n−1) , x (n−1) ) = 0 yields the result after rearranging.

The Euler-Lagrange equation.
Formulating the Euler-Lagrange equation n the case of p-Wasserstein cost is straight forward. In the case of flux-limiting cost, however, we have to make sure the minimization problem in some sense does not see the discontinuity.
Lemma 6. In the case of flux-limiting cost, the minimizer x (n) lies in Define a partial convex combination of x (n) and x (n−1) w.r.t. P as We will now show that if P = ∅, then for a suitable ε the partial convex combination x ε is a feasible candidate with Φ(τ, ) contradicting x (n) being a minimizer in the first place. We note that Recall c, h m and v are strictly convex C 1 -functions, which allows us to calculate x ε ) will be positive, which is our sought for contradiction.
. Let x (n) be the corresponding minimizer in (1.13). Then it satisfies the system of equations for each i = 1, . . . , k − 1. Note that the above equation reduces to Remark 8. We will abbreviate the argument of (c * ) above: We receive (2.3) as the first order optimality conditions, using (c * ) = (c ) −1 .
2.3. The discrete minimum/maximum principle. We will prove the first discrete minimum/maximum principle next. It bounds the forward difference quotient of x (n) k uniformly from above and away from zero, if we have initial data as described in Theorem 2 (4). This initial condition corresponds to u 0 being bound from above and away from zero.
, v and c and is given by Remark 10. In the second case, there is still a uniform lower bound on δx Proof. The proof will consist in both cases of considering an index i such that δx for all j ∈ {0, . . . , k − 1} (or ≤ for the bound from below) and then showing that the r.h.s. of the Euler-Lagrange equation is non-positive for that i, implying that δx For the bound from below in the second claim the calculations will be a little more involved and we can only show (1 + C)δx which then results in the special form of the lower bound in that case.
In any of the cases, the first step will consist of the use of Taylor's theorem on (c * ) in the r.h.s. of the Euler-Lagrange equation, after applying the forward difference δ to the equation, to receive (1) We will show representatively the bound from above. The bound from below can be obtained by nearly the same calculations. So let i ∈ {0, . . . , k − 1} be an index such that δx Since h m is monotonously increasing, h m (δx With this inequality at hand we can conclude Additionally, since (c * ) is monotone and since (A.V.) holds we can furthermore bound Indeed by c ≥ c > 0 the factor (c * ) (ζ) can be bound from above, independently of ζ as follows.
and we have shown the claim. Let now, on the other hand, δx Then, by an analogous argument as above,

This inequality implies by
where we abbreviated κ := v c . This proves the lower bound. We will show the second minimum/maximum principle next. Its claim, expressed in terms of the probability density u, is as follows. Assume we start with initial data u 0 that are bounded from above and away from zero and the weak derivative of u 0 is bounded. Then the weak spatial derivative of the solution u of the PDE (1.1) will be bounded as well.
Note that with initial data x (0) k bounded as in Theorem 3, this implies the bounds min{δ 2 i.e. for all i, n, k.
Proof. We will, again, exemplarily derive the upper bound and the lower one can be found by very similar calculations.
This proof is a little bit more elaborate than the proof of Lemma 9. We will again start by assuming for all j ∈ {1, . . . , k − 1} but this time we do not a priori know that δ 2 x (n) i is positive, which is the reason for the structure of the bounds. Additionally it does not appear explicitly on the right hand side of the Euler-Lagrange equation but is hidden in for all {1, . . . , k − 1} and lets furthermore assume δ 2 x (n) i > 0 since otherwise there is nothing left to show. Then we receive as the second symmetric difference quotient of the Euler-Lagrange equation next. To that end, note that our assumptions imply δx i−1 and consider the following two cases.
With a similar argument one can prove that which then implies the claim.
Proof. Recall δx . The equation (2.5) has to be equal to zero at a minimizer implying for i = 1, . . . , k − 1

Multiplying these equations by (x
) and summing them up gives us, after rearranging and an index-shift as well as using convexity of h m This shows the first claim. The second one follows directly when summing up the first one from n = m 1 + 1 to m 2 and minding the telescopic sum on the r.h.s. .
Corollary 13. The estimates (2.6) imply Hölder-type estimates. Let t 1 = m 1 τ k and t 2 = m 2 τ k . Then holds for some C depending only on m, H m (x (0) ) and c.

Now with
Note that these bounds do not depend on k.
Proof. The starting point of this proof will be the second inequality in (2.6). We will show an equation for the r.h.s. and consequently the upper bound will apply to a (n) i,k as well. Let T = N τ k . Then we receive when applying the Euler-Lagrange equation in the last step The expression above on the l.h.s. invites us to use the estimate from (A.3) α |s| p ≤ sc (s) ≤ β |s| p .

Convergences of the approximate solution
In this section we prove the convergence of the Euler-Lagrange equation in terms of functions and obtain a convergence result needed in the next section when it comes to identifying the limit of the Euler-Lagrange equation.
Before we can talk about convergences in terms of functions, we have to define our approximate IDF and other approximate functions.
If i is of a smaller index set, as with a (n) i,k for example, then the a (n) i,k is taken to be zero at the missing indices.
With this auxiliary mapping P at hand, we can define the sequences of functions we will deal with.
Finally the r.h.s. of the discrete Euler-Lagrange equation in Lemma 7 will receive a corresponding sequence of functions called discrete velocity, in reference to the role its corresponding part plays in the PDE (1.1). It will be denoted as V k : [0, T ] × [0, 1] → R and it is defined as t, ξ)) .
Remarks 17. Before we proceed some remarks are in order.
(1) X k is uniformly bounded in L q ([0, T ] × [0, 1]) for every q ∈ [1, ∞]. Indeed |X k (t, ξ)| ≤ max{|a| , |b|} by construction. The Hlder-type estimate from Lemma 13 implies a similar estimate for our X k : Furthermore, for each t X k (t, ·) it is a sequence of monotonously increasing functions and therefore converges, up to a subsequence pointwise L-a.e. to some monotonously increasing function (c.f. [2, Lemma 3.3.1]). By dominated convergence theorem, that subsequence also converges in L q (J) for every q ∈ [1, ∞) to a limit X * (t, ·). Lacking the pointwise convergence, it will be the aim of the following subsection to establish convergence of δ ξ X k in measure to allow us to apply the dominated convergence theorem to prove strong convergence of this sequence. Proof.
The above lemma makes sure that F is suitable. Now we show that δ ξ X k is tight w.r.t. F.
Proof. We begin with Var(δ ξ X k (nτ k , ·), (0, 1)) . Now note that for every t ∈ [0, T ] we have that X k (t, ·) : [0, 1] → R is a piecewise constant function and therefore its total variation can be calculated as the sum over the modulus of the jumps. This gives us

Now our goal is to utilize the estimates concerning a (n)
i,k from Proposition 14. To that end we see that by maximum-/minimum principle, positivity and monotony of h m we receive a D > 0 such that we have the lower bound Incorporating this estimate we arrive at −−−−−→Ŷ whereŶ ∈ W 1,p ((0, 1)) with ∂ ξŶ = Y . Now consider two sequences Y k , Z k converging in L p ([0, 1]) to Y and Z respectively. Then Finally we show the equiintegrability w.r.t. g.
Lemma 21. The sequence δ ξ X k is equiintegrable w.r.t. g, that is to say Proof. First we establish an estimate concerning I[δ ξ X k ] and X k . Let t ∈ [0, T − r] and j(ξ) = ξ/d k Together with the Hölder-estimate from Corollary 13, this yields, for some constant C Proof. This will be an application of [23,Thm. 2]. Note that t → δ ξ X k (t) is a sequence of L 1 ([0, 1])-valued functions. The functional F is normal and coercive in the sense of [23, (1.7 a-c)] as was shown in Lemma 18 and Lemma 20 shows that the substitute distance g is lower semicontinuous. Additionally, our sequence is tight w.r.t. F and it is equiintegrable w.r.t. g which was shown in Lemma 19 and Lemma 21 respectively.
We This shows that Theorem 2 of [23] is applicable and we receive a subsequence of δ ξ X k that converges in measure as a curve δ ξ X k : [0, T ] → L 1 ([0, 1]) to some limit curve Y .
Furthermore, since δ ξ X k is dominated on [0, T ] by the maximum-/minimum principle, we can enhance this result, possibly by passing to a subsubsequence, to strong convergence in L p ([0, T ], L p ([0, 1])).
Finally we have to establish a connection between the cluster points of δ ξ X k and X * the limit of X k . Corollary 23. Up to a subsequence δ ξ X k converges strongly w.r.t. L p ([0, T ]×(0, 1)) to ∂ ξ X * , h m (δ ξ X k ) to h m (∂ ξ X * ) and h m (δ ξ X k ) to h m (∂ ξ X * ).
Proof. The first convergence follows by by Appendix A.1 and the last two by the first convergence and Appendix A.2 .

The limit of the Euler-Lagrange equation.
The discrete Euler-Lagrange equation we received in Lemma 7 expressed in terms of approximate IDFs takes the form where we abbreviated the temporal backwards difference quotient δ τ k X k : We want to prove a convergence result for the sequence V k next, by transferring the bounds found in Proposition 14 to our newly defined sequences.
Lemma 24. The sequences V k , and consequently δ τ k X k , converge, up to a subsequence, weakly Proof. The result for δ τ k X k follows by (3.5), so we only have to show the claim for V k . We receive the uniform bound on the L p -norm by Proposition 14 as This shows that the sequence is uniformly bounded w.r.t. L p ([0, T ] × [0, 1])-norm which in turn implies, by Banach-Alaoglu, weak * -compactness and by equivalence of weak and weak * convergence in L p ([0, T ] × [0, 1]) we receive the claim.
Lemma 25. In the p-Wasserstein case, up to a subsequence, A k converges weakly in L p ([0, T ] × [0, 1]), to ∂ ξ h m (∂ ξ X * ) − v (X * ) which we will abbreviate consistently as Proof. Using the bound on A k from Proposition 14 we receive With this bound at hand and Banach-Alaoglu we arrive at a weakly convergent subsequence (w.r.t. L p ([0, T ] × [0, 1])).
In order to identify the limit, we use A k = A k + A k from Definition 16.
A k is the Lipschitz-function v applied to X k and with Appendix A.2 we receive strong convergence in L p ([0, T ] × [0, 1]) implying weak convergence. Now converges weakly w.r.t. L p ([0, T ] × [0, 1]) and is consequently bounded therein. The claim now follows from Appendix A.1 .
To finally receive the limit of the Euler-Lagrange equation (3.5), we identify the limit of δ τ k X k with the weak derivative w.r.t. t of the limit X * . This will then give us the equation Indeed, we can apply Appendix A.1 to it and receive the limit of our Euler-Lagrange equation which holds on (0, T ) × (0, 1).

Identification of the nonlinear limit
In the preceeding section we have established some convergence results for X k , δ ξ X k , A k and V k , some only up to subsequences. In this section we assume that X k etc. are already non-relabelled subsequences, such that all of the above convergence results hold.
So far we have shown that our Euler-Lagrange equation admits a limit, but the limit of the r.h.s. is, by nonlinearity of (c * ) still not identified. To identify this limit and therefore receive our IDF X * as a weak solution of (1.1) in terms of IDF we will make use of a Browder-Minty argument w.r.t. the monotone (c * ) .
The monotonicity inequality we want to consider will be where ε > 0 and ψ ∈ C ∞ ([0, 1]). Since the above inequality holds for every (t, ξ) ∈ (0, T ) × (0, 1) we can integrate it weighted in time with some non-negative u ∈ C ∞ c ((0, T )) and making use of the abbreviation V k to arrive at In order to prove the limit in (4.2), we will split the expression up by its bilinearity. Plugging together the results and using subadditivity of the lim sup then shows the sought for estimate lim sup

We begin with splitting up
The second integral can be treated easily, since A k → v (X * (t, ξ)) strongly w.r.t.
To deal with the remaining integral will require more work. We begin by applying the Euler-Lagrange equation (3.5) to the integral to receive We begin with exploiting the piecewise constant structure We will abbreviate the integral in the last line as Z u n := nτ k (n−1)τ k u(t) dt. Now we apply summation by parts w.r.t. i, a convexity estimate and summation by parts w.r.t. n to receive where the inequality is justified by the convexity of h m . Now as was shown, h m (X k ) converges strongly to h m (∂ ξ X * ) and since u ∈ C ∞ c ((0, T )) the difference quotient converges uniformly in k. Consequently we receive in the limit Now we would want to get the partial differentiation w.r.t. t back to the h m again to receive by chain rule a possibility to get our V * back. Unfortunately the expression h m (δ ξ X * ) does not have enough regularity to allow for that. But we can help ourselves by undoing the limit of just the difference quotient again.
where we used summation by parts w.r.t. n disguised in t + τ k and convexity of h m as well as integrate by parts. In the last step we used ∂ ξ h m (δ ξ X * (t, ξ))u(t) ∈ L p ([0, T ] × [0, 1]) and X * ∈ W 1,p ((0, T ) × (0, 1)) which implies a strict enough convergence of the difference quotients to pass to the limit.

4.2.
The inequality we want to show next is lim inf Recall Indeed we have by 1
5. Solution of our PDE in the sense of IDFs and the flux-limiting case 5.1. Solution of our PDE. We will now show that the limit X * is indeed a solution to our PDE (1.1) in terms of IDF. To that end, we still have to show that X * (t) is indeed an inverse distribution function for every t ∈ [0, T ], i.e. it is non-decreasing and X * (t, 0) = a as well as X * (t, 1) = b holds for all t ∈ [0, T ].
Consider the sequence X k (t).We know X k (t) is non-decreasing and X k (t, 0) = a, X k (t, 1) = b holds. These properties survive the pointwise limit, so X * (t) is again non-decreasing and X * (t, 0) = a as well as X * (t, 1) = b holds.
Furthermore we have to show that the initial data are assumed X * (t) → X (0) * for t → 0. This will be a consequence of X * being 1/p-Hölder continuous as a curve in L 1 ([0, 1]) which, in turn, will be a consequence of Corollary 13 . Indeed, we have for every k and t 1 , where C does not depend on k, since we assumed sup k H m (x (0) k ) to be finite. The strong convergence X k (t) → X * (t) in L 1 ([0, 1]) established in Remark 17 now yields the Hölder-continuity of X * and this implies in particular lim t 0 X * (t) = X (0) in the L 1 ([0, 1]) sense.

5.2.
The flux-limiting case. To prove Theorem 3, we will show that, given the prerequisites of Theorem 3, the flux-limiting cost functions can be regularized to be actually p-Wasserstein cost without changing the minimizers of our algorithm steps.
Let us assume c, x k are as in Lemma 11. Then, by the very same lemma, the bounds for δ 2 x (0) k hold for all δ 2 x (n) k , too. As a first consequence, this yields finite bounds from above and below for a (n) i,k . Indeed, by the properties of h m and consequently h m we receive . If it is the case that δ 2 x (0) and δ 2 x (0) lie on the same side of zero, one of the bounds in the interval has to be replaced with zero. Now since (c * ) is monotonously increasing, we receive bounds for the discrete temporal backward difference again with the appropriate corrections if δ 2 x (0) and δ 2 x (0) lie on the same side of zero. So to summarize, there are uniform bounds C > −γ and C < γ such that for every k, n our minimization problem in the algorithm can be narrowed down to a minimization over x such that showing that the results concerning p-Wasserstein-like soct hold for flux-limiting cost as well, as soon as we have initial data that satisfies the additional regularity assumption from Theorem 3.
6. Numerical experiments 6.1. Implementation. We perform the minimization of F : x → Φ(τ ; x (n−1) , x) in Algorithm (1.13) by a damped Newton scheme p j = −HF (x j ) −1 ∇F (x j ) x j+1 = x j + h j p j , j = 0, 1, . . . for the gradient of F . The choice of the step size h j in each step is realized by an Armijo-type heuristic, i.e. we choose h j as the largest value from the sequence {1, 1 2 , 1 4 , . . .} for which i.e. such that the next iterate x j+1 is still an IDF and has a well defined optimal transport distance in the flux limited case.
A Matlab code for the following experiments is given in the Appendix.

Linear diffusion.
We start with the case m = 1, i.e. the case of the Boltzmann entropy for the internal energy potential. All experiments have been carried out with k = 1000 grid points and time step τ = 0.01. Figs. 1 and 2 show the evolution of an initial distribution with localized support over the time interval [0, 2] for Wasserstein costs with p = 7, while Fig. 3 shows the same evolution for the flux limited case with c given by (1.4), γ = 1.