A BDF2-Approach for the Non-linear Fokker-Planck Equation

We prove convergence of a variational formulation of the BDF2 method applied to the non-linear Fokker-Planck equation. Our approach is inspired by the JKO-method and exploits the differential structure of the underlying $L^2$-Wasserstein space. The technique presented here extends and strengthens the results of our own recent work on the BDF2 method for general metric gradient flows in the special case of the non-linear Fokker-Planck equation: firstly, we do not require uniform semi-convexity of the augmented energy functional; secondly, we prove strong instead of merely weak convergence of the time-discrete approximations; thirdly, we directly prove without using the abstract theory of curves of maximal slope that the obtained limit curve is a weak solution of the non-linear Fokker-Planck equation.

In the framework of L 2 -Wasserstein gradient flows, existence of solutions has been shown via the JKOscheme, named after the authors of [18]. This scheme is a variational formulation of the Implicit Euler method given as follows: for fixed time step size τ ∈ (0, τ * ) construct inductively, starting from ρ 0 , a sequence of probability measures (ρ k τ ) k∈N as the minimizer of an augmented energy functional: . (1.4) It is known that the thus obtained discrete gradient approximation converge to a solution of the non-linear Fokker-Planck equation (1.1) as τ tends to zero. Note, this scheme has been similarly applied to a variety of PDEs and systems of PDEs with gradient flow structure in the L 2 -Wasserstein or in a L 2 -Wasserstein-like space: non-local Fokker-Planck equations [9,11,35]; Fokker-Planck equations on manifolds [13,34]; fourth order fluid and quantum models [15,16,25]; chemotaxis systems [4,5,39]; Poisson-Nernst-Planck equations [20]; multi-component fluid systems [22]; Cahn-Hilliard equations [24]; degenerate cross-diffusion systems [29,40].
Besides the theoretical use to construct solution for (1.1), this particular discretization (1.4) provides also a structure preserving numerical scheme. The approximate solution inherits automatically positivity, mass conservation and energy dissipation. Different approaches to actually compute the minimizers of (1.4) have been investigated: particle schemes [6,8,7,38]; evolving diffeomorphisms [8,10]; Lagrangian schemes [3,12,14,19,26,28]; entropic regularization [31]. However, it turns out that the application of these schemes to gradient flows in L 2 -Wasserstein space is intricate, since computing the L 2 -Wasserstein distance and its gradient is difficult in dimension two or more.
We proposed in our own recent work [27] a different variational formulation of a semi-discretization in time, i.e., of the Backward Differentiation Formula 2 (BDF2) method. In this context the BDF2 method reads as follows: for each sufficiently small time step τ ∈ (0, τ * ), let a pair of initial data (ρ −1 τ , ρ 0 τ ) be given that approximate ρ 0 . Then, define inductively the discrete solution (ρ k τ ) k∈N as the minimizers of the following augmented energy functional, Similar to the JKO-scheme the BDF2 method is structure preserving in the sense that the discrete solution inherits automatically positivity, mass preservation and is almost energy dissipating (see lemma A.1). We remark that recently also other variational formulations of formally higher-order time discretizations have been investigated, namely Runge-Kutta methods [21,23].
Our main contribution in this work is to improve the convergence result of [27] from weak to strong convergence of the discrete solution (ρ k τ ) k∈N . Also in contrast to [27], our approach is independent of the uniform semi-convexity of the augmented energy functional on the right-hand side of (1.5). More in the spirit of the original works on the linear Fokker-Planck equation of Kinderlehrer et al. [18], we solely utilize the differential structure of both the L 2 -Wasserstein space and of the augmented energy functional.
Note, the BDF2 method and the techniques presented here have two further possible applications. Firstly, PDEs with gradient flow structure such that the energy function F do not possesses any uniform semiconvexity property -like the Hele-Shaw equation seen as L 2 -Wasserstein gradient flow -are not covered in [27]. However, as long as the subdifferential calculus in the L 2 -Wasserstein space is applicable to F our method is feasible. With this technique at hand on can compute from (1.5) the discrete Euler-Lagrange equations for the discrete approximation by variations along solutions of the continuity equation (likwise theorem 4.1). Hence, passing to the limit as τ tends to zero could yield directly a distributional solution for the aforementioned class of PDEs without using the abstract theory of curves of steepest descent for λ-contractive gradient flows. Secondly, the formally higher-order approximation in time is expected to improve the performance of numerical simulations due to the better resolution of the solution with respect to a coarser time grid.
In conclusion, the BDF2 method provides a structure preserving numerical scheme of formally higherorder approximation in time with a strong notion of convergence.
Our main results concerning the well-posedness and the limit behavior as τ ց 0 of the interpolated solution ρ τ , which is defined as the piecewise constant interpolation in time of the discrete solution (ρ k τ ) k∈N obtained by the BDF2 method (1.5), is stated in the following theorem. The threshold τ * is specified in (3.2). ) n∈N of step sizes τ n ∈ (0, τ * ) and initial data (ρ −1 τn , ρ 0 τn ) satisfying Assumption 2.2, then the following hold: (i) Existence of the discrete solutions. For each step size τ ∈ (0, τ * ) there exists a sequence (ρ k τ ) k∈N obtained by the BDF2 scheme (1.5), which satisfies the step size independent bounds (3.6) on the kinetic energy, on the internal energy, and on the second moments.
(ii) Narrow convergence in P 2 (Ω). There exists a (non-relabelled) subsequence (τ n ) n∈N and a limit curve ρ * ∈ AC 2 (0, ∞; (P 2 (Ω), W 2 )) such that for any t ≥ 0: narrowly in the space P 2 (Ω) as n → ∞. (iii) Step size independent L 2 (0, T ; BV (Ω))-estimate. For each fixed time horizon T > 0 there exists a non-negative constant C, depending only on m, V, W , and T such that for each τ ∈ (0, τ * ): (ρ τ ) m L 2 (0,T ;BV (Ω)) ≤ C. (iv) Strong convergence in L m (0, T ; L m (Ω)). With the notations from (ii), there exists a further (non-relabelled) subsequence (τ n ) n∈N such that for all T > 0: (a) In the case of an open and bounded set Ω ⊂ R d with Lipschitz-continuous boundary ∂Ω, we have : (b) In the case of the entire space, i.e., Ω = R d , we have for every open and bounded set Ω ⋐ R d : (v) Solution of the non-linear Fokker-Planck equation. The limit curve ρ * from (ii) satisfies the non-linear Fokker-Planck equation with no-flux boundary condition (1.1) in the distributional sense, i.e., we have for each test function The plan of the paper is as follows. In section 2 we recall the basic notation of the theoretical framework of the gradient flow formulation of the non-linear Fokker-Planck equation, of our particular time-discretization and of BV (Ω)-spaces. Section 3 is concerned with basic properties of the augmented energy functional and of the approximation obtained by that scheme. In Section 4 we derive the discrete Euler-Lagrange equations by means of a variation of the augmented energy functional along solutions to the continuity equation. From these discrete Euler-Lagrange equations we derive BV (Ω)-regularity estimates. In Section 5 we complete the proof of the main theorem and prove the convergence of the approximation to the distributional solution of the non-linear Fokker-Planck equation (1.1).

Setup and Assumptions
i.e., narrow convergence is equal to weak * -convergence, which is induced by the pairing of the continuous and bounded functions C b (Ω) with the corresponding dual space of finite Borel measures M f (Ω). A curve µ : [0, ∞) → P 2 (Ω) is said to be L 2 -absolutely continuous, we write µ ∈ AC 2 (0, ∞; (P 2 (Ω), W 2 )), if there exists a function A ∈ L 2 loc (0, ∞) such that The corresponding energy functional F of the non-linear Fokker-Planck equation (1.1), defined in (1.3), is the sum of three parts: internal energy U m ; external energy V; interaction energy W. The internal energy U m is given by where the measure µ is absolutely continuous with respect to the Lebesgue measure L d with density ρ, i.e., µ = ρL d . For measures ρ which are singular with respect to the Lebesgue measure, we set U m (ρ) = ∞. This convention makes the internal energy U m lower semi-continuous with respect to narrow convergence, see [1]. Therefore, by a slight abuse of notation, we shall always identify an absolutely continuous measure µ with its corresponding density ρ. The according proper domains of the H and U m are given by Further, the external energy V and the interaction energy W are defined via For the rest of the paper our assumption on the external potential V and on the interaction kernel W reads as follows: Assumption 2.1. Let the external potential V ∈ C 1 (Ω) and the symmetric interaction kernel W ∈ C 1 R d be bounded as follows: Note, this standard assumption guarantees that all integrals with respect to any measure ρ ∈ P 2 (Ω) and with integrands V , W , ∇V , or ∇W are well-defined and finite. Further, the functionals V and W are continuous with respect to narrow convergence by this assumption [2].

Functions of Bounded Variation.
We recall the basic definitions and properties of functions of bounded variation, following [17]. A function ρ ∈ L 1 (Ω) is called a function of bounded variation if and only if The set of all functions of bounded variation is denoted by BV (Ω) and can be equipped with the norm: For open sets Ω ⊂ R d the set BV (Ω) is a Banach space and the norm is lower semi-continuous with respect to the weak convergence in L 1 (Ω). In case that Ω is an open and bounded set in R d with Lipschitz-continuous boundary ∂Ω, sets of functions uniformly bounded in the BV (Ω)-norm are relatively compact in L 1 (Ω), see [17,Theorem 1.19] for the statement and the proof.

Lower Bounds and Lower Semi-Continuity.
We establish the following two basic properties of the BDF2 penalization Ψ, which will be essential for the solvability of problem (2.1): Ψ(τ, η, ν; · ) is bounded from below and lower semi-continuous.
There exist a non-negative constant d 2 such that the BDF2 penalization Ψ satisfies for each τ > 0 and for all ρ, η, ν ∈ P 2 (Ω): Without loss of generality we assume that such that ρ → Ψ(τ, η, ν; ρ) is bounded from below by a constant.
Proof. Without loss of generality we can assume ρ is an absolutely continuous measure with density ρ.
Observe that H is not bounded from below by a constant on P 2 (Ω). However, we derive from the Carleman estimate a lower bound of H in terms of the second moment M 2 , see [18], i.e., there exist non-negative constants Since the external potential V and the interaction kernel W grow at most quadratically at infinity, the corresponding energies can be estimated from below in terms of the second moment M 2 by From the elementary inequality Combining all three inequalities, we can deduce the following lower bound: which is equivalent to the desired inequality (3.1).
Proof. Due to the lower semi-continuity with respect to narrow convergence of the internal energy U m , the external potential V, and the interaction energy W, the energy F is also lower semi-continuous with respect to narrow convergence as sum of lower semi-continuous functions.
Thus it remains to prove the lower semi-continuity of the auxiliary functional A : . First, we simplify the auxiliary functional A. Let p 1 ∈ Γ(ρ, ν) and p 2 ∈ Γ(ρ, η) be two optimal transport plans. Further, introduce the special three-plan p ∈ Γ(ρ, ν, η) := {p ∈ P(Ω×Ω×Ω) : (π 1 ) # p = ρ, (π 2 ) # p = ν, (π 3 ) # p = η} such that p has marginal with respect to the x-and y-components equals to p 1 and the marginal with respect to the x-and z-components is equal to p 2 , i.e., (π (1,2) ) # p = p 1 and (π (1,3) The existence of such a three-plan is guaranteed by the gluing lemma, see [2,Lemma 5.3.2]. Then we can rewrite the auxiliary functional A as Now, let (ρ n ) n∈N be a narrowly converging sequence with limit ρ * ∈ P 2 (Ω). Since (ρ n ) n∈N is narrowly converging to ρ * , the sequences (p 1 n ) n∈N and (p 2 n ) n∈N are relatively compact in P 2 (Ω 2 ) with respect to narrow convergence and any limit point is an optimal transport plan, see [2, Proposition 7.1.3]. Thus we can extract a non-relabelled subsequence such that (p 1 n ) n∈N and (p 2 n ) n∈N converge narrowly to an optimal transport plan p 1 * ∈ Γ(ρ * , ν) and to an optimal transport plan p 2 * ∈ Γ(ρ * , η), respectively. By the same argument, the sequence (p n ) n∈N of three-plans is relatively compact in P 2 (Ω 3 ) with respect to narrow convergence. Therefore we can extract a further non-relabelled subsequence such that (p n ) n∈N narrowly converges to some three-plan p * ∈ Γ(ρ * , ν, η). Taking marginals is continuous with respect to narrow convergence, so we have (π (1,2) ) # p * = p 1 * and (π (1,3) ) # p * = p 2 * , i.e., this limit three-plan p * is admissible in (3.5). Next, we want to apply the lower semi-continuity result [2, Lemma 5.1.7] to the alternative representation of A. The uniform integrability of the negative part of the integrand in (3.5) with respect to (p n ) n∈N in the sense of [2] follows by the elementary inequality Thus the lower bound on 4 Since the second moments of ν and η are finite that difference is uniform integrable with respect to the family (p n ) n∈N . Hence, we can invoke [2, Lemma 5.1.7] to conclude Therefore the auxiliary function ρ → A(ρ) = 4W 2 2 (ν, ρ) − W 2 2 (η, ρ) is lower semi-continuous with respect to narrow convergence.

Existence of Minimizer.
Recall that the well-posedness of a single step of the BDF2 scheme is equivalent to the existence of a minimizer in (2.1). The augmented energy functional Ψ shares no uniform semi-convexity as in the case of [27], so we cannot exploit the convexity to ensure the existence of a minimizer. Nevertheless, a standard technique from the calculus of variations yields the existence of a minimizer.
Proof. Take a minimizing sequence (ρ n ) n∈N for the BDF2 penalization ρ → Ψ(τ, η, ν; ρ). To extract a convergent subsequence, we use the auxiliary inequality (3.1). Since τ < τ * , the pre-factor of the second moment M 2 (ρ) in (3.1) is positive. Hence, the second moment (M 2 (ρ n )) n∈N of the minimizing sequence (ρ n ) n∈N is bounded. Also the internal energy U m (ρ n ) of the minimizing sequence is bounded, since Due to the super-linear growth of ρ → ρ log(ρ) and of ρ → ρ m , we can apply the Dunford-Pettis Theorem to the densities (ρ n ) n∈N and we can extract a non-relabelled subsequence (ρ n ) n∈N converging weakly in L 1 (Ω).

3.3.
Step size independent estimates. By the previous theorem, the sequence (ρ k τ ) k∈N given by the BDF2 method is well-defined for τ ∈ (0, τ * ). Next, we deduce three step size independent bounds: on the kinetic energy, on the internal energy, and on the second moment. We want to emphasize that these estimates are intrinsic properties of the scheme, which do not rely on any uniform semi-convexity of the augmented energy functional Ψ. The original proof of those estimates can be found in [27] and for the sake of the completeness we recall a proof in Appendix A adapted to the L 2 -Wasserstein formalism.
Proof. The proof of this theorem is given in Appendix A.
3.4. Narrow Convergence. We are able to prove our first weak convergence results. The step size independent bounds (3.6) and the Arzelà-Ascoli theorem, which can be found in [2, Proposition 3.3.1], guarantee the narrow convergence of the interpolated solution ρ τ .
Using the step size independent bounds (3.6) we obtain for N T = max{N | N τ n ≤ T }: Indeed, A n ∈ L 2 (0, T ) and the L 2 (0, T )-norm of A n is uniformly bounded independently of the step size τ n . Therefore, the sequence A n possesses a non-relabelled subsequence weakly convergent in L 2 (0, T ) with limit A ∈ L 2 (0, T ). To derive an uniform Hölder-estimate for ρ τn , choose 0 ≤ s ≤ t ≤ T arbitrary and define k t = max{k ∈ N | kτ n ≤ t}, then Taking the limit n → ∞ yields, together with the weak convergence in L 2 (0, T ) of A n to A, Moreover, the second moments of the discrete solutions (ρ k τn ) k∈N are uniformly bounded independently of the step size τ n and therefore the interpolated solutions ρ τn is uniformly contained in a set K which is compact with respect to narrow convergence. Hence, we can apply the Arzelà-Ascoli Theorem [2, Proposition 3.3.1] yielding the existence of a non-relabelled subsequence and a limit curve ρ * : [0, T ] → P 2 (Ω) such that ρ τn (t) converges narrowly to ρ * (t) for each fixed t ∈ [0, T ]. Additionally, the limit curve ρ * is L 2 -absolutely continuous with modulus of continuity A ∈ L 2 (0, T ). A further diagonal argument in T → ∞ yields the narrow convergence on for any t ≥ 0 and ρ * ∈ AC 2 (0, ∞; (P 2 (Ω), W 2 )).

Discrete Euler Lagrange Equation and Improved Regularity
In theorem 4.1, we derive the discrete Euler-Lagrange equations for the weak formulation of the non-linear Fokker-Planck equation (1.1). The key idea is the JKO-method introduced in [18], i.e., we determine the first variation of the augmented energy functional Ψ(τ, ρ k−2 τ , ρ k−1 τ ; · ) in the space (P 2 (Ω), W 2 ) along solutions to the continuity equation for an arbitrary smooth vector field ξ ∈ C ∞ c (Ω, R d ). The solution ρ s is explicitly given by the push-forward of ρ k τ under the flow Φ s , i.e., ρ s = (Φ s ) # ρ k τ , such that the flow Φ s satisfies the initial value problem: Note that the flow Φ s exists and each Φ s is a diffeomorphism on Ω. Additionally, we can calculate the derivative of det(D Φ s ) and we have an explicit representations of the perturbed density ρ s , i.e., Theorem 4.1 (Discrete Euler-Lagrange equations). The discrete solution ρ k τ k∈N obtained by the BDF2 method satisfies for each k ∈ N and for all vector fields ξ ∈ C ∞ c (Ω, R d ) where p k τ ∈ Γ(ρ k τ , ρ k−1 τ ) and q k τ ∈ Γ(ρ k τ , ρ k−2 τ ) are optimal transport plans.
Proof. Fix ρ k τ , ρ k−1 τ , ρ k−2 τ and ξ ∈ C ∞ c (Ω, R d ). We consider the perturbation ρ s of ρ k τ as the solution of the continuity equation with velocity field ξ starting at ρ k τ , i.e., ρ s is the solution of (4.1). To actually compute the first variation of the heat energy H we use exact value of the derivative of det(D Φ s ) and the explicit representation of the perturbed density ρ s , given in (4.2), to obtain as limit of the difference quotient Similarly, we can compute the first variations of the internal energy U m for m > 1, the external potential V, and the interaction energy W. The first variation of the energy F along the solution to the continuity equation amounts to The differentiability of the quadratic L 2 -Wasserstein distance W 2 along the solution ρ s of the continuity equation is more technical, for the proof we refer to [2,36]. Since ρ k−2 τ , ρ k−1 τ , ρ k τ , ρ s are all absolutely continuous measures, Theorem 8.13 from [36] is applicable and we can conclude: where p k τ ∈ Γ(ρ k τ , ρ k−1 τ ) and q k τ ∈ Γ(ρ k τ , ρ k−2 τ ) are optimal transport plans. Since ρ k τ is a minimizer of the BDF2 penalization Ψ(τ, ρ k−2 τ , ρ k−1 τ ; · ) and since s → Ψ(τ, ρ k−2 τ , ρ k−1 τ ; ρ s ) is differentiable at s = 0, Indeed, we have the desired equality (4.3).
The already obtained regularity results for the interpolated solution ρ τ are not sufficient to pass to the limit in the first term of the discrete Euler-Lagrange equation (4.3). Nevertheless, the following bounds in the BV (Ω)-norm of (ρ k τ ) m are sufficient to obtain the desired regularity results. These estimates can be derived from the discrete Euler-Lagrange equation quite naturally.

Proposition 4.2 (
Step size independent BV (Ω)-estimate). Fix a time horizon T > 0. There exists a constant C, depending only on d 1 to d 4 and T , such that the corresponding discrete solutions (ρ k τ ) k∈N satisfy for all τ ∈ (0, τ * ) and for all k ∈ N with kτ ≤ T : Proof. The L 1 (Ω)-norm of (ρ k τ ) m is equal to (m − 1)U m evaluated at ρ k τ . Hence, we can bound the first term in the definition of the BV (Ω)-norm uniformly by the classical estimates (3.6). In order to estimate the variation of (ρ k τ ) m we estimate the term inside the supremum of the definition of V ((ρ k τ ) m , Ω). Thus let ξ ∈ C ∞ c (Ω, R d ) with ξ ∞ ≤ 1, then we can use the discrete Euler-Lagrange equations (4.3) to substitute (4.7) By Assumption 2.1 we have quadratic growth bounds for ∇V and ∇W , so using the step size independent bounds on the second moment (3.6), we can estimate the first terms in (4.7) as follows: The second integral on the right-hand side of (4.7) can be estimated using Jensen's inequality and similar for the third integral of the right-hand side of (4.7). Hence, we have the following upper bound for the variation of (ρ k τ ) m : In conclusion, the discrete solution (ρ k τ ) k∈N satisfies the desired bound (4.6). Proof. We use the classical estimates on the kinetic energy (3.6) and the result from Proposition 4.2 to estimate the L 2 (0, T ; BV (Ω))-norm of (ρ τ ) m . Let N T := max{N ∈ N | N τ ≤ T }, then we have By the triangle inequality W 2 (ρ k τ , ρ k−2 τ ) ≤ W 2 (ρ k τ , ρ k−1 τ ) + W 2 (ρ k−1 τ , ρ k−2 τ ) in combination with a Cauchy type inequality we obtain Finally, we can conclude, under the step size independent bounds on the kinetic energy (3.6), the desired estimate (4.8) for some universal constant C, which only depends on d 1 to d 4 and T , but not on the step size τ ∈ (0, τ * ).

Convergence
In this section we prove our main theorem, the strong convergence of the approximation ρ τ to the solution of the non-linear Fokker-Planck equation. The convergence in the strong L m (0, T ; L m (Ω))-topology follows by the improved L 2 (0, T ; BV (Ω))-estimates (4.8) and by a general version of the Aubin-Lions Theorem [32,Theorem 2], which is recalled in Appendix B.
Theorem 5.1 (Strong convergence in L m (0, T ; L m (Ω))). Under the same assumptions as in Theorem 3.6 and given the vanishing sequence (τ n ) n∈N of step sizes τ n ∈ (0, τ * ) and the limit curve ρ * therein, then there exists a further (non-relabelled) subsequence (τ n ) n∈N such that for all T > 0: (a) In the case of an open and bounded set Ω ⊂ R d with Lipschitz-continuous boundary ∂Ω, we have: in L m (0, T ; L m (Ω)) as n → ∞. if ρ ∈ P 2 (Ω) and ρ m ∈ BV (Ω), +∞ else.
Using the remark in the introductory section about functions of bounded variations it follows that the functional A is measurable, lower semi-continuous with respect to the L m (Ω)-topology, and has compact sublevels. Next, we choose as pseudo-distance g = W 2 on L m (Ω). The L 2 -Wasserstein distance is lower semi-continuous with respect to the L m (Ω)-topology and clearly compatible with A. Next, we verify the assumption (B.1) on (ρ τn ) n∈N of Theorem B.1. By the L 2 (0, T ; BV (Ω))-estimates of Theorem 4.3 it is clear, that the sequence (ρ τn ) n∈N is tight with respect to A, since we have: For the proof of the relaxed averaged weak integral equicontinuity condition of (ρ τn ) n∈N with respect to W 2 , we use the auxiliary function A n and the estimate (3.7) from the proof of weak convergence results to obtain: Indeed, using the weak convergence in L 2 of A n to some A ∈ L 2 loc (0, ∞) it follows lim inf Therefore, we can conclude that there exists a non-relabeled subsequence (τ n ) n∈N such that ρ τn converges in M(0, T ; L m (Ω)) to some curve ρ + . Due to the uniform bounds in L ∞ (0, T ; L m (Ω)), we obtain by a dominated convergence argument also convergence in L m (0, T ; L m (Ω)) as desired. Moreover, the limit curves ρ + and ρ * have to coincide, since ρ τn converges also in measure to ρ + and ρ * , so both limits have to be equal.
In the case of Ω = R d we have to alter the proof given above, since the embedding of BV (R d ) into L 1 (R d ) is not compact anymore. So we restrict ourself to the open and bounded sets Ω = B R (0). This subset is clearly open and bounded with Lipschitz-continuous boundary ∂ Ω, so the embedding of BV ( Ω) into L 1 ( Ω) is compact again.
Proof of Theorem 5.1 for Ω = R d . Fix T > 0. Without loss of generality we can assume Ω = B R (0), since every open and bounded subsetΩ ⋐ R d is contained in a ball with radius R and convergence in L m (0, T ; L m (B R (0))) implies convergence in L m (0, T ; L m (Ω)).
As before, we want to use the Aubin-Lions Theorem B.1 for the Banach space L m ( Ω) equipped with the natural topology induced by the L m ( Ω)-norm applied to ( ρ τn Ω ) n∈N , the restriction of the density ρ τn to the subspace Ω. In this case we consider the functional A : L m ( Ω) → R, defined via Now, the functional A is measurable, lower semi-continuous with respect to the L m ( Ω) topology, and has compact sublevels. Since A( ρ| Ω ) ≤ A(ρ), we obtain by the same calculations as above the tightness of ( ρ τn Ω ) n∈N with respect to A.
Since the measure ρ| Ω does not have unit mass anymore, we cannot consider the L 2 -Wasserstein distance W 2 as pseudo-distance anymore. However, we can use the following pseudo-distance g: where C is the constant from the classical estimates (3.6) for the specific T . Since Σ(ρ) and Σ(ν) are compact sets with respect to the narrow topology, the infimum is attained at some pair ρ * , ν * . The pseudo-distance g is compatible with A, i.e., if ρ m , ν m ∈ BV ( Ω) and g(ρ, ν) = 0 then ρ = ν a.e. on Ω. The lower semicontinuity of the pseudo-distance g with respect to the L m ( Ω)-topology can be proven as follows. Choose to convergent sequences ρ n → ρ and ν n → ν in L m ( Ω) with sup n g(ρ n , ν n ) < ∞. By the remark from above, there exists ρ n , ν n such that g(ρ n , ν n ) = W 2 ( ρ n , ν n ). Since the second moments are by definition of Σ(ρ) uniformly bounded, we can extract a non-relabeled convergent subsequence which converges narrowly to ρ ∈ Σ(ρ), ν ∈ Σ(ν). By the lower semi-continuity of W with respect to narrow convergence, we get in the end Therefore, the pseudo-distance g is lower semi-continuous with respect to the L m ( Ω)-topology. Thus, g satisfies the assumptions of theorem B.1. Further, one has g( ρ| Ω , ν| Ω ) ≤ W 2 (ρ, ν). Thus we derive, using the same proof as above, the equicontinuity of ( ρ τn Ω ) n∈N with respect to the pseudo-distance g.
Hence, we can conclude that there exists a non-relabeled subsequence of ρ τn Ω which converges in M(0, T ; L m ( Ω)) to some limit ρ + . As before, we use the uniform bounds in L ∞ (0, T ; L m ( Ω)), to obtain the strong convergence in L m (0, T ; L m ( Ω)) by a dominated convergence argument. Moreover, the limit curves ρ + and ρ * | Ω have to coincide on Ω, since ρ τn Ω converges also in measure on Ω to ρ + and ρ * | Ω , so both limits have to be equal on Ω. Two diagonal arguments in T → ∞ and R → ∞ yield the desired convergence result.
To complete the proof of the main theorem 1.1, we have to validate that ρ * is indeed a solution to (1.1) in the sense of distributions. For each k ∈ N insert the smooth function x → ∇ψ((k − 1)τ, x) in the discrete Euler-Lagrange equation (4.3) for the vector field ξ ∈ C ∞ c (Ω). Summing the resulting equations from k = 1 to N T + 1 and multiplying with τ yields: Due to the strong convergence in L m (0, T ; L m ( Ω)) of ρ τ to ρ * and due to the uniform convergence of ψ to ψ lim τ ց0 To rewrite I 2 , we use, as in [18], the second order Taylor expansion for a time independent function ψ, to obtain x − y 2 dp k τ (x, y) Replacing the time independent function ψ with ψ((k − 1)τ ) yields as approximation of I 2 We rearrange the sum of the first term in I 2 as follows NT k=0 Ω Finally, use the fundamental theorem of calculus and the classical estimate (3.6) to bound the second term in I 2 , to obtain Indeed, combining the narrow convergence of ρ τ with the uniform convergence of ∂ t ψ(t + τ ) to ∂ t ψ(t) and with the narrow convergence of the initial data (ρ 0 τ , ρ −1 τ ) to ρ 0 , the limit of I 2 is given by: lim τ ց0 Finally, we can conclude that for arbitrary test functions ψ ∈ C ∞ c ([0, ∞) × Ω) the limit curve ρ * satisfies : This yields that ρ * is a distributional solution to the non-linear Fokker-Planck equation (1.1).
In this part we fill the technical gap in the proof of the step size independent bounds of Theorem 3.5. The first result is an auxiliary inequality which will be used to derive the step size independent bounds. Despite the auxiliary character of this inequality, we want to emphasize that this property is of interest by itself, since we can give a precise estimate of the energy decay of the BDF2 scheme in every step.
By the triangle inequality and the binomial formula, Substitute this in the left-hand side of (A.2). This yields (A.1) Theorem A.2 (Classical estimates). Fix a time horizon T > 0. There exists a constant C, depending only on d 1 to d 4 and T , such that the corresponding discrete solutions (ρ k τ ) k∈N satisfy for all τ ∈ (0, τ * ) and for all N ∈ N with N τ ≤ T .
This yields (A.5). A Cauchy type inequality with (A.5) yields Substitute (A.4) into this inequality: The first term of the right-hand side is estimated by The energy F evaluated at ρ 0 τ is estimated using Assumptions 2.1 and 2.2 and estimate (A.6): A lower bound of the energy F evaluated at ρ N τ is derived by the same way as in the prove of Lemma 3.1, i.e., there exist constants d 2 and γ ∈ ( d d+1 , 1) such that Hence, there is a universal constant C, not depending on the step size τ , such that We rearrange terms and use the definition of (3.2) to arrive at the time-discrete Gronwall inequality By induction on N we obtain So the second moments M 2 (ρ N τ ) of the discrete solution are uniformly bounded independent of the step size τ ∈ (0, τ * ) and for all N ∈ N with N τ < T .
The remaining estimates can be derived from this. An upper bound for the energy F (ρ N τ ) follows from (A.4) and (A.7) combined with Assumption 2.1. The lower bound on F (ρ N τ ) follows by the lower bounds on U m , V, and W in terms of the second moment. Hence, the boundedness of F (ρ N τ ), V(ρ N τ ), and W(ρ N τ ) yields the boundedness of U m (ρ N τ ). The upper bound for the kinetic energy follows from the lower bound for the energy F (ρ N τ ), (A.4), and (A.7) combined with Assumption 2.1.

Appendix B. Auxiliary theorems
The following theorem is an extension of the Aubin-Lions Theorem for Banach-spaces proven in [32]. This theorem is the main tool to prove the convergence of the interpolated solution.
Theorem B.1 (Extension of the Aubin-Lions Theorem). Let X be a separable Banach space, A : X → R ∪ {∞} be measurable, lower semi-continuous and with compact sublevels in X, and g : X × X → R ∪ {∞} be lower semi-continuous and such that g(u, v) = 0 for u, v ∈ D(A) implies u = v. Let (u n ) n∈N be a sequence of measurable functions u n : (0, T ) → X such that given in the original version of the theorem. In the proof it is sufficient to have the relaxed averaged weak integral equicontinuity, given in the theorem above.