CONTRACTION IN THE WASSERSTEIN METRIC FOR THE KINETIC FOKKER-PLANCK EQUATION ON THE TORUS

. We study contraction for the kinetic Fokker-Planck operator on the torus. Solving the stochastic diﬀerential equation, we show contraction and therefore exponential convergence in the Monge-Kantorovich-Wasserstein W 2 distance. Finally, we investigate if such a coupling can be obtained by a co-adapted coupling, and show that then the bound must depend on the square root of the initial distance. 2 space weighted by the reciprocal of the equilibrium measure. Here, in the spatially homogeneous setting, contractivity is established by showing that the generator of the Fokker-Planck semi-group is coercive on this L 2 space, which implies that the generator has a spectral gap. In the spatially inhomogeneous setting, which is common in kinetic theory, the generator is, however, not coercive in this space and this method fails.


1.
Introduction. The kinetic Fokker-Planck equation, also known as the Kramers equation, is a basic model for the spreading of a solute due to interaction with the fluid background. It is derived from Langevin dynamics, where the time scale of observation is much larger than the correlation time of the solute-fluid interactions (see e.g. [17]).
We prove contraction properties of the spatially periodic kinetic Fokker-Planck equation in the Wasserstein metric, and show to what extent the probabilistic technique of coupling can be used in such situations. This is of interest, both intrinsically, and in the broader context of analytic and probabilistic methods of proving convergence to equilibrium and contraction properties of Fokker-Planck equations which we summarise in the paragraphs below. The Monge-Kantorovich-Wasserstein (MKW) distance comes from optimal transport and is defined as where Π µ,ν is the set of all couplings between µ and ν.
A common analytic technique to show contraction or convergence to equilibrium of Fokker-Planck equations is to work in a L 2 space weighted by the reciprocal of the equilibrium measure. Here, in the spatially homogeneous setting, contractivity is established by showing that the generator of the Fokker-Planck semi-group is coercive on this L 2 space, which implies that the generator has a spectral gap. In the spatially inhomogeneous setting, which is common in kinetic theory, the generator is, however, not coercive in this space and this method fails.
This lead to the development of the celebrated theory of hypocoercivity for which an excellent reference is [15], where a spectral gap and contraction of the semi-group are shown, roughly speaking, by constructing equivalent 'skew' L 2 or Sobolev norms on which the generator is coercive. This theory is well developed and in Proposition 5 in the first section of [15], Villani shows that the equation on the law for a large class of SDEs can be put in the A * A+B form he uses in his theorems. Hypocoercive SDEs are also studied in [14] which is one of the early works in the development of hypocoercivity. The techniques of hypocoercivity apply also to collisional models [6]. The kinetic Fokker-Planck equation in particular has received much attention [7,5,13] both in the case of a spatial confining potential and in, the analytically simpler, case of spatial periodicity. The paper [5] considers exactly our equation and finds explicitly the optimal rates of convergence in weighted L 2 space. The motivation is similar to that of this paper, which is to study a simple toy model on which more explicit calculations can be performed in order to explore alternative methods for proving hypocoercivity. These works, however, do not address the question of contraction in the Wasserstein metric W 2 , as this distance is inaccessible from these analytic tools; the closest result to this being [11] where W 1 results are obtained by duality. Using interpolation estimates and convergence results in other spaces, one can conclude exponential decay in the Wasserstein W w distance. However, then the control in terms of the initial data only holds for a power strictly less than one.
Another viewpoint, strongly related to the first, comes from the theory of gradient flows [8], in which the Fokker-Planck equation is identified with the steepest descent flow of an entropy functional in the Wasserstein space W 2 . However, the theory does not cover the considered model due to the kinetic structure. Dissipation in the Wasserstein distance can also be shown for non-gradient drifts in the homogeneous setting using analytic methods [1].
A common probabilistic technique to show contraction or convergence is to construct a coupling between two copies of the stochastic process that realises the desired bound on the metric between the laws. In the spatially homogeneous Fokker-Planck equation, the synchronisation coupling, where the infinitesimal motions of the noise are coupled together, gives contraction in Wasserstein metrics when the velocity potential is strongly convex. In the spatially inhomogeneous case with a confining potential, such a straightforward coupling only establishes contraction if the confining potentials are quadratic (or a small perturbation thereof) see for example [2]. Establishing contraction in the Wasserstein metric for more general confining potentials is an open problem. In the spatially periodic case results are even more limited. In this case the synchronisation coupling does not cause the spatial distance on the torus to decay. Thus the spatially periodic case is more difficult in the probabilistic case. This is in contrast to the analytic setting, where having the spatial variable on the torus means hypocoercivity can be shown by a very similar, and in fact slightly simpler, computation to that in part 1 section 7 of [15] will show hypocoercivity.
In this work we study the contraction properties in the Wasserstein metric of the kinetic Fokker-Planck equation with spatial variable on the torus and a quadratic velocity potential. Despite the simplicity of this equation, to the authors' knowledge this question has not been answered in the literature, and a second goal of this manuscript it to understand what difficulties might explain this. This kinetic Fokker-Planck equation describes the law of a particle moving in the phase space T × R whose location in the phase space is (X t , V t ) and evolves as dX t = V t dt, where W t is a Brownian motion and the spatial variable is in the torus T = R/(2πLZ) of length 2πL. The corresponding law µ t on T × R evolves as where this equation is considered in the weak sense. The equilibrium state for this equation is 1 2πL Leb Solving the stochastic evolution, we show exponential decay of the distance between two solutions.
for a constant c only depending on L.
Remark 1. We are not aware of any paper showing optimal rate of convergence for this process in W 2 . The paper [5] shows that for large times this is the optimal rate of convergence in a weighted L 2 space. Also, we show later that we can split the process in components which are broadly an Orstein-Uhlenbeck process with rate λ and a Brownian motion with diffusivity 1/λ on the torus. One would expect the optimal rate of convergence for an O-U process in any reasonable distance to be λ and the optimal rate of convergence for the diffusion process to be 1/2λ 2 L 2 . Therefore it seems likely that our rates are optimal.
The key idea is that, after conditioning on the final velocity, the spatial variable has enough randomness left to allow such a coupling. This approach is not based on a functional inequality which is integrated over time.
In fact the evolution is not a contraction semigroup in the considered distance which we can show directly in a straightforward way using the explicit solution to the SDE. Precisely, Proposition 1. The kinetic Fokker-Planck operator is not coercive in the MKW distance. The inequality In order to construct a coupling showing convergence in the MKW distance, random variables (X i continuous Markov processes with initial distribution µ 0 and ν 0 , respectively, and whose transition semigroup is determined by (1). For such couplings we can consider a more restrictive class of couplings. Definition 1.2 (co-adapted coupling). The coupling ((X 1 This is an important subclass of couplings, which contains many natural couplings, and an even more restrictive subclass is the class of Markovian couplings, where additionally the coupling itself is imposed to be Markovian. The existence and obtainable convergence behaviour under this restriction has already been studied in different cases, e.g. [10,3,4]. Note that the co-adapted coupling is equivalent to the condition that the filtration generated by (X i t , V i t ) is immersed in the filtration generated by the coupling, which motivates Kendall [9] to call such couplings immersed couplings.
By adapting the reflection/synchronisation coupling, we can still obtain exponential convergence but with a loss in dependence on the initial data. Theorem 1.3. Given initial distributions µ 0 and ν 0 , then there exists a co-adapted coupling (( (1 + t) 4L 2 λ 3 = 1 and C is a constant that depends only on λ and L.
Here we used the notation |X 1 t − X 2 t | T to emphasis that this is the distance on the torus T. In fact the filtrations generated by (X 1 , V 1 ) and (X 2 , V 2 ) agree which Kendall [9] calls an equi-filtration coupling.
Remark 2. This achieves the same exponential decay rate as the non-Markovian argument, except for the case 4L 2 λ 3 = 1, when the spatial and velocity decay rates coincide and we have an addition polynomial factor.
In general the loss in the dependence is necessary. Theorem 1.4. Suppose there exists a function α : R + → R + and a constant γ > 0 such that for all initial distributions µ 0 and ν 0 there exists a co-adapted coupling Then there exists a constant C such that for z ∈ (0, πL] we have the following lower bound on the dependence on the initial distance The idea is to focus on a drift-corrected position on the torus, which evolves as a Brownian motion. By stopping the Brownian motion at a large distance we can then prove the claimed lower bound.

WASSERSTEIN CONTRACTION FOR THE KINETIC FOKKER-PLANCK EQUATION 1431
This shows that a simple hypocoercivity argument on a Markovian coupling cannot work. Precisely, there cannot exist a semigroup P on the probability measures over (T × R) ×2 , whose marginals behave like the solution of (1) and which satisfies H(P t (π)) ≤ cH(π)e −γt for Otherwise, the Markov process associated to P would be a coupling contradicting Theorem 1.4.
2. Set up. The stochastic differential equation (1) has an explicit solution, when posed in R 2 . For clarity, when we are considering X to be in R rather than the torus we will denote itX. The explicit solution iŝ where W t is the common Brownian motion. In this we separate the stochastic driving as (A t , B t ) given by the stochastic integrals which evolve as a vector in R 2 with the common Brownian motion W t . By Itô's isometry (A t , B t ) is a Gaussian random variable with covariance matrix Σ(t) given by From this we calculate that the conditional distribution of A t given B t is a Gaussian with variance Σ AA (t) − Σ 2 AB (t)Σ −1 BB (t) and mean given by . We write g A|B for the conditional density of A given B and g B for the marginal density of B. Hence is the joint density of A and B. The last part of the set up is the change of variables we will need for the Markovian coupling. We define new coordinates (Y, V ) in T × R by taking the drift away   The motivation for this change is the explicit formulas found in (3) from which we see that Y is the limit as t → ∞ of X t without additional noise. In the new dV t = −λV t dt + dW t , for the common Brownian motion W t . Note that the motion of Y t does not depend explicitly upon V t and is a Brownian motion on the torus.
It remains to show that these new coordinates define an equivalent norm on T × R. This follows from the triangle inequality and we have and the other direction is similar. Thus, the two norms are equivalent up to a constant factor that depends only on λ.
3. Non-Markovian coupling. We wish to estimate how much the spatial variable will spread out over time. We will then use this to construct a coupling at a fixed time t which exploits the fact that a proportion of the spatial density is distributed uniformly. In order to do this we give a lemma on the spreading of a Gaussian density wrapped on the torus.
We have the following estimate on the spatial spreading Proof. We define the Fourier transform of a function on T to be By the definition of Q, the Fourier transform of Qh is given by where we have used the well-known Fourier transformation of a Gaussian.
Writing Qh in terms of its Fourier series and subtracting the k = 0 term, we have for any x ∈ T

WASSERSTEIN CONTRACTION FOR THE KINETIC FOKKER-PLANCK EQUATION 1433
We want this to be positive. Therefore it is sufficient to show that We estimate the left hand side by where the final equality follows from summing the geometric series.
We can now use this to construct a coupling at time t. We will use this coupling to prove exponential decrease in the Wasserstein distance.
Lemma 3.2. Let t ≥ 0, be large enough so the variance of g A|B is greater than 2L log (3), and β be such that where g A|B is defined by (7) above. Let µ t resp. ν t be the distribution of the solution to the Fokker-Planck equation (2) with deterministic initial data µ 0 = δ x 1 0 ,v 1 0 and ν 0 = δ x 2 0 ,v 2 0 respectively, at time t. Then there exists a coupling ((X 1 t , V 1 t ), (X 2 t , V 2 t )) between µ t and ν t satisfying Proof. Let us construct such a coupling. Since we have seen that g A|B is Gaussian density with variance σ 2 = Σ AA (t) − Σ 2 AB (t)Σ −1 BB (t), we can use Lemma 3.1 to split the distribution Qg A|B as Then by assumption s is again a probability density for the variable a on the torus T. We now consider the torus as a subset of R and then Qg A|B and 1/2πL are probability density functions. Therefore, s is also probability density functions supported on [0, 2πL]. Let B be an independent random variable with density g B (t, b), let Z be an independent uniform random variable over [0, 1] and let U be an independent uniform random variable over the torus. Finally let S be a random variable on R with density s(t, ·, B), viewed as a density function on R, only depending on B.
With this define the random parts A 1 , A 2 of X 1 t , X 2 t as We then construct (X 1 t , V 1 t ) defined bŷ We then construct X i t by wrappingX i t onto the torus (i.e. X i t ∈ [0, 2πL) and X i t ≡X i t mod 2πL). By construction the pairs (X i , V i ) have the right laws so they form a valid coupling.
We find and we can use Young's inequality to find the claimed control.
We now put these two lemmas together to prove Theorem 1.1, which states exponential convergence in the MKW W 2 distance.
Proof of Theorem 1.1. We first show that we can reduce to working with deterministic initial conditions. We denote µ x,v t to be the law of the solution to the SDE initialized at (x, v). Suppose we know that Since, µ t , ν t are the laws of Markov processes we know that, Hence given, π a coupling of µ 0 , ν 0 we can construct a coupling of µ t , ν t by π t (ψ) = ψ((y 1 , u 1 ), (y 2 , u 2 ))dµ The couplings of this form are a subset of all the couplings of µ t , ν t therefore we can take the infimum over these couplings in order to bound the Wasserstein distance. Then given any coupling π of initial measures µ 0 , ν 0 we have Then taking an infimum over π shows that this implies Given any initial points ((x 1 0 , v 1 0 ), (x 2 0 , v 2 0 )), we can use Lemma 3.2 to construct a coupling ((X 1 asymptotically as t/λ 2 . Hence by Lemma 3.1 we can choose β so that 1 − β → 0 exponentially fast with rate 1/2λ 2 L 2 . This, combined with the control from the second lemma, shows that The explicit solution also allows to prove that the evolution is not a contraction semigroup.
Proof of Proposition 1. We will prove the theorem by contradiction. Suppose γ > 0 and let a = b be two distinct points on the torus. Consider the initial measures At time t the spatial distribution of µ t and ν t , interpreted in R, is a Gaussian with variance Σ AA which by the explicit formula (4) can be bounded as for a constant C A and t ≤ 1.
Hence for d > 0 and t ≤ 1 the spatial spreading is controlled as for positive constants C 1 and C 2 , where we have used the standard tail bound for the Gaussian distribution (see e.g. [12,Lemma 12.9]). For any d > 0 small enough that a ± d and b ± d do not wrap around the torus, any coupling between µ t and ν t must transfer at least the mass Hence the Wasserstein distance is bounded by Taking d = |a − b| T t 3/2 for t sufficiently small, this shows that However, for all small enough positive t, we have contradicting the assumed contraction. For the second estimate we use exp(−c/t) ≤ (1 + c/t) −1 = t/(c + t).

4.1.
Existence. For Theorem 1.3 we construct a reflection/synchronisation coupling using the drift-corrected positions Y i t . As the positions are on the torus we can use a reflection coupling until Y 1 t and Y 2 t agree. Afterwards, we use a synchronisation coupling which keeps Y 1 t = Y 2 t and reduces the velocity distance. For a formal definition let ((X 1 0 , V 1 0 ), (X 2 0 , V 2 0 )) be a coupling between µ and ν obtaining the MKW distance (the existence of such a coupling is a standard result, see e.g. [16,Theorem 4.1.]).
We then define the evolution of this coupling in two stages. First, define (X 1 t , V 1 t ) and (X 3 t , V 3 t ) to be strong solutions to (1) with initial conditions ((X 1 0 , V 1 0 ) and (X 2 0 , V 2 0 ) respectively and driving Brownian motion W 1 t . Then we recall the definition of Y i from (8), and define the stopping time T := inf{t ≥ 0 : Then we define a new process W 2 t by By the reflection principle, W 2 is a Brownian motion. We use this to define a new solution (X 2 t , V 2 t ) to be the strong solution to (1) with driving Brownian motion W 2 and initial condition (X 2 0 , V 2 0 ). Note now that T = inf{t ≥ 0 : For the analysis we introduce the notation Then by the construction the evolution is given by where M t evolves on the torus T.
As a first step we introduce a bound for T .
Proof. As M t evolves on the torus, T is the first exit time of a Brownian motion starting at M 0 from the interval (0, 2πL). See [12, (7.14-7.15)], from which the claim follow after rescaling to incorporate the 2/λ factor.  There exists a constant C such that for any t > 0 the following holds Proof. Using (12) and the inequality sin(x) ≤ x for x ≥ 0, we have where on the second line we have bounded the sum by an integral.
Using these simple estimates, we now study the convergence rate of the coupling. Lemma 4.3. There exists a constants C such that for any t ≥ 0 we have the bound 2λ > 1/(2λ 2 L 2 ).
Proof. Without loss of generality we may assume that Z 0 and M 0 are deterministic in order to avoid writing the conditional expectation. Applying Itô's lemma, we find from (11) that d|Z t | 2 = −2λ|Z t | 2 dt + 4 · 1 t≤T Z t dW 1 t + 2 · 1 t≤T dt. After taking expectations we see that By explicitly solving (14) and using Lemma 4.2, we obtain E|Z t | 2 = |Z 0 | 2 e −2λt + 2e −2λt Let us bound I t . As the integrand is locally integrable, we have for a constant C I t ≤ C 1 + t 0 e (2λ−1/(2λ 2 L 2 ))s ds .
Here the s −1/2 term can be bounded by 1 for s > 1 and for s ≤ 1 the additional contribution can be absorbed into the constant. To bound the remaining integral we consider three cases: • 2λ < 1/(2λ 2 L 2 ): The integral (and I t ) are uniformly bounded, I t ≤ C.
In each case we multiply I t by e −2λt to obtain the decay rate. In the first two cases this gives the dominant term with |M 0 | T (as opposed to |Z 0 |) dependence, while in the last case it is lower order than the e −t/(2λ 2 L 2 ) decay we obtain from E|M t | 2 T below. Next let us consider E|M t | 2 T . Using the finite diameter of the torus we have the simple estimate E|M t | 2 T ≤ π 2 L 2 P(T > t).