A gradient flow approach of propagation of chaos

We provide an estimation of the dissipation of the Wasserstein 2 distance between the law of some interacting \begin{document}$ N $\end{document} -particle system, and the \begin{document}$ N $\end{document} times tensorized product of the solution to the corresponding limit nonlinear conservation law. It then enables to recover classical propagation of chaos results [ 20 ] in the case of Lipschitz coefficients, uniform in time propagation of chaos in [ 17 ] in the case of strictly convex coefficients. And some recent results [ 7 ] as the case of particle in a double well potential.


1.
Introduction. This paper introduces a new method to prove the so called propagation of chaos (we refer to the classical lecture notes [20]) for the interacting particle system where ((B i t ) t≥0 ) i=1··· ,N are N independent d-dimensional Brownian motions, (X i 0 ) i=1,··· ,N are random variables of symmetric joint law G N 0 ∈ P sym (R dN ) independent of the (B i ) i=1··· ,N , and b : R 2d → R d is an interaction field.
For the sake of completeness we recall some basic notions on this topic, and refer to [20] for some further explanations. We begin with the Definition 1.1 (Definition 2.1 in [20]). Let (G N ) N ≥1 be a sequence of symmetric probabilities on E N (G N ∈ P sym (E N )), with E some polish space. We say that G N is g-chaotic, with g ∈ P(E), if for any k ≥ 2 and φ 1 , · · · , φ k ∈ C b (E) it holds In other words a sequence (G N ) N ≥1 is g-chaotic if and only if for any k ≥ 2 where G k N is the k-particles marginal of G N . It is well known since the seminal work of McKean ( [18]) (when b is regular), that if the initial law of particle system (1) is µ 0 -chaotic, then for any further time t > 0 the law of the solution at time t > 0 to the particle system is µ t -chaotic, where µ t is the solution to the nonlinear conservation law starting from µ 0 (with the notation b * µ(x) = R d b(x, z)µ(dz)). Usually this result is proved by the coupling method, which consists in introducing a system of nonlinear (independent, non interacting) particles driven by the same independent Brownian motions as in (1) and the nonlinear force of the limiting solutions. This technique is then called coupling, since the only random objects used to define the trajectories ((X i t ) t≥0 ) i=1··· ,N and ((Y i t ) t≥0 ) i=1··· ,N are the same, namely the initial condition (X i 0 ) i=1··· ,N and the independent Brownian motions ((B i t ) t≥0 ) i=1··· ,N . Therefore the trajectories are built on the same stochastic space, and one says that they are coupled.
Assuming that the (X i 0 ) i=1,··· ,N are i.i.d. of law µ 0 (which is stronger than µ 0chaotic), makes the (Y i t ) i=1,··· ,N i.i.d. of law µ t for any t > 0. Therefore (assuming some second order moments for the sake of simplicity) it holds But it is straightforward to obtain Then since the (Y i t ) i=1,··· ,N are i.i.d.of law µ t we straightforwardly obtain so that Gronwall's inequality yields Using again the fact that the (Y i t ) i=1,··· ,N are i.i.d.of law µ t , we obtain from [9], that for some constant C > 0 depending on (µ t ) t≥0 Note that the only feature of the (Y i t ) i=1,··· ,N which is used by this proof is that they are i.i.d.of law µ t for any t > 0, and are built on the same probability space as the (X i t ) i=1,··· ,N (coupled). Due to the choice which has been made to drive the trajectories of the (Y i t ) i=1,··· ,N by the exact same Brownian motions as the ones which drive the (X i t ) i=1,··· ,N , this kind of coupling is called synchronous. In some sense, it is the cheapest coupling one can think of. Indeed it does not take in account the diffusion, and makes the proof of propagation of chaos for the particle system (1) very similar to the proof of the mean field limit for the same particle system without diffusion (see for instance [6]). This is probably the reason of some confusion between these two notions in the literature.
Later another coupling approach has been introduced (see for instance [8]), which turns out to be particularly powerful in the context of uniform in time propagation of chaos. Given N independent Brownian motions ((B i t ) t≥0 ) i=1,··· ,N , define the interacting and nonlinear particles system as are still Brownian motions, since I d − 21 s≤Ti e i s e i s is an orthonormal matrix. But since, for j = i the Brownian motions (B i t ) t≥0 and (B j t ) t≥0 are independent, the covariance between (B i t ) t≥0 and (B j t ) t≥0 is null thus these new Brownian motions are also independent.
But the advantage is that now, compared to the synchronous coupling approach, the difference between X i t and Y i t includes the martingale part −21 t≤Ti t 0 e i s e i s , dB i s which can be particularly handy in the context of asymptotic in time estimate. This technique has been used recently in a general framework [7], which enables to treat the tricky particular case of particles in a double well confinement interacting through small Lipschitz interaction.
The aim of this paper is to address this question in an analytical framework, in particular to extend the strategy of [2] at the particle level. The basic idea is that we do not consider some suitable coupling, built on some probability space, between G N t the law of the solution to the particles system (1), and µ ⊗N t the N times tensorized product of the solution to (2), but the optimal coupling w.r.t. the W 2 metric, as given by Brenier's polarization Theorem (see [22,Theorem 9.4]). Moreover since (G N t ) t≥0 and (µ ⊗N t ) t≥0 are continuous curves of P(R dN ), the dissipation this optimal coupling can be not only estimated, but explicitly computed (see for instance [22,Theorem 23.9], even if in practice, we have only use of the upper bound). Thanks to these two basic tools, one obtains the dissipation of the best possible coupling (w.r.t. W 2 ), and thus, a method which results will compare to the ones obtained by sophisticated probabilistic approach.
The rest of the paper is organized as follows. In Section 2, we recall some earlier results about gradient flow in probability spaces and dissipation of Wasserstein 2 metric, give a new proof of the classical results presented in the Introduction and state the main theorem of this paper. In Section 3, we give some tools about optimal transport on P sym (R dN ) which enables to use the techniques recalled in Section 2 at the microscopic level. In section 4 we give the proof of the main theorem of this paper.
In the rest of the paper (and sometimes in the above introduction as well), M k (R) stands for the square matrix of size k with real coefficients, T r [·] stands for the trace operator on this space, and · ⊥ stands for the transpose (for a vector or rectangular matrix as well). P sym k (R dN ) will denote the set of symmetric probability measures over R dN , with N ≥ 2, with order k moment. When we will consider some probability measure, it will implicitly always be assumed absolutely continuous w.r.t. the Lebesgue measure, with smooth density, and we will often abuse the confusion between a measure and its density. For a convex functional ψ, we denote ψ * its Legendre convex conjugate. W 2 stands for the Wasserstein metric (see for instance [22]). We will use the shortcuts For a function f ∈ C 2 (R dN ), and i = 1 · · · , N we denote ∇ i f ∈ R d the gradient, and ∇ 2 i,i f ∈ M d (R) the Hessian matrix w.r.t. to the i-th component.

2.
Preliminaries and main results. The main motivation of this paper is the following. Instead of trying to find a more or less optimal coupling between the law of the particle system (1) and the solution to (2), at the level of particles trajectories as described in the introduction, we can use the gradient flow approach which links optimal transport, entropy and heat flow (watch for instance [21]).
2.1. Wasserstein 2 metric dissipation and WJ inequality. We begin this section by the following observation. If we denote G N t ∈ P sym (R dN ) the law of the random vector (X 1 t , · · · , X N t ) solution at time t > 0 to (1) then it solves the following Liouville equation It can be rewritten as so that G N t can be seen as the pushed forward of G N 0 by the application ξ N t,0 defined as the solution to the ODE Let now (µ t ) t≥0 be the solution to (2) starting from µ 0 . Similarly µ ⊗N t is the pushed forward of µ ⊗N 0 by the application ζ ⊗N t,0 Before giving the main result of this section, we recast some notions about dissipation of W 2 distance along continuous curves of P 2 , taken from [2,3] Definition 2.1. Let µ, ν ∈ P 2 (R d ) and ψ ∈ C 2 (R d ) the maximizing Kantorovitch potential (m.K.p.) associated to the pair (ν, µ) as given by Brenier's Theorem, i.e. such that ∇ψ#ν = µ optimally w.r.t. the W 2 metric, and ψ * the dual of ψ. We define the functional J as We say that ν ∈ P 2 (R d ) satisfies a W J(κ) inequality if and only if there is some constant κ > 0 such that for any µ ∈ P 2 (R d ) it holds κW 2 2 (µ, ν) ≤ J (µ|b, ν).
More generally, ν ∈ P 2 (R d ) satisfies a symmetric W J(κ) inequality if and only if there is some constant κ > 0 such that for any N ≥ 2, G N ∈ P sym 2 (R dN ) it holds where ψ N is the m.K.p. associated to the pair (ν ⊗N , G N ).
Remark 1. The first part of this definition is taken from [2, Definition 3.1]. The fact that the maximizing Kantorovitch potential is of class C 2 , follows from the fact that we only consider probability measures with smooth density, and the regularity theory for the Monge-Ampère equation (see [5]). One can check that if ν satisfies a symmetric WJ(κ) inequality, then it satisfies a WJ(κ) inequality in the sense of [2, Definition 3.1]. Indeed for any µ ∈ P 2 (R d ) if ψ N is the m.K.p. associated to the pair (µ ⊗N , ν ⊗N ) and ψ the one associated to the pair (µ, ν), one has for any ( So that . Conversely, it does not seem clear whether or not, if ν satisfies a WJ(κ) inequality in the sense of [2, Definition 3.1] then it satisfies a symmetric WJ(κ) inequality. Even though in practice, the same estimates which enable to establish that some probability measure satisfies a WJ inequality enable to establish that it also satisfies a symmetric WJ inequality, as the reader can check in the proof of Proposition 3.
The functional J defined so, is the rate of time dissipation of the Wasserstein 2 distance between two solutions to (2). We emphasize that this rate is by construction sharper than the rate of time dissipation of synchronous coupling between two such solutions, which would be given (roughly speaking) by only the second term in the r.h.s. of (4) or (5). In the symmetrical case, this functional appears in the estimate of the time dissipation of the W 2 distance between the solution to the Liouville equation associated to the N -particles system (1) and the N times tensor product of the solution to (2). It is the object of the be the law of solution to the particle system (1) starting with initial condition of law and are locally Lipschitz. Then for any η > 0, t ≥ 0 it holds The proof of this proposition relies on standard results, so that we give it in this section Then using that ∇ψ N * (∇ψ N (x)) = x we rewrite I 1 is untouched. Then by integration by parts we obtain Finally using Young's inequality yields

Main result.
It is possible to recover from the estimate provided by Proposition 1, some propagation of chaos results obtained by synchronous coupling. All the discussion below relies on the assumption that the vector field b N − ∇ ln G N matches the required technical assumptions. We will carefully check that these assumptions are indeed fulfilled later in the paper, when b is explicitly defined. We just obtained in Proposition 1 that Taking no advantage of the negative term we have just thrown away, can be seen as analogous to what is done in synchronous coupling. The method is then quite close to the one used in [11] in the quantum framework, the main difference being that there it uses the coupling version of the definition of Wasserstein metric, whereas here it uses the optimal transport one. Therefore we can reprove in this way any propagation of chaos result obtained by synchronous coupling.
Let us first consider the most classical one, studied in [20], and then by Gronwall's inequality We now consider the case studied in [17], Then in this case we have by oddness of ∇W Using the polynomial growth assumption on W and assuming uniform in time estimate for moment of some adequate order on µ t yields and choosing η < 2β yields the following uniform in time propagation of chaos result It could be possible to go on revisiting some synchronous coupling result. But we stop it here and emphasize that this method has no hope to apply in the degenerate diffusion cases, such as hypoelliptic equations. Also it could be interesting to introduce the reverse crossed term in the proof of Proposition 1 to investigate what 5738 SAMIR SALEM could be done with the identity which would correspond to what is done, at the particle level, in [14] in the case of Holder interaction. Keeping in mind the relation between the W 2 metric and the relative entropy through the heat flow (see [21]), it could be interesting to investigate the links with the relative entropy method recently introduced in [15,16] to see if the method presented here could be adapted to non uniform in time propagation of chaos for the kinetic or singular interaction cases, or the relative entropy method to the uniform in time propagation of chaos for non convex confinement. We delay all these questions to some possible future works. In all the illustrated examples presented above, the only advantage of the gradient flow approach with respect to the usual synchronous coupling technique, is that it enables more straightforwardly to have the particles system (1) starting from initial condition which are else than i.i.d. of law µ 0 .
If one wants to obtain some more interesting results, we have to make a better use of the nonnegative term in the dissipation functional −J , that we have thrown away so far. In probabilistic terms, this consideration is the same as the one already discussed in the introduction regarding synchronous and reflecting coupling. Likewise to [7], it then enables to obtain uniform in time result in the case of convex outside some ball confinement, given in the There is a * > 0 such that if a ∈ (0, a * ) and ε ∈ (0, ε a ), µ ∞ ∈ P 2 (R d ) is the unique stationary solution to (2), and G N t ∈ P sym (R dN ) the law at time t > 0 of the solution to (1) with initial law G N 0 ∈ P sym 6 (R dN ) ∩ L ln L(R dN ), then there are constants C, α > 0 such that for any t ≥ 0, N ≥ 2 it holds Note that the maximal depth of the wells a * comes form the unitary diffusion coefficient in (2). For a potential of given depth of wells, the result could also be obtained if we replace the diffusion factor √ 2 in (1) with √ 2σ for σ large enough w.r.t the depth. For σ not large enough the limit equation (2) is known to admits several stationary solution (see [13]).
Also note that Theorem 2.2, only provides uniform in time propagation of µ ∞chaos with optimal N −1 convergence rate. One can obtain propagation of µ t -chaos for any (µ t ) t≥0 non stationary solution to (2), but with less sharp convergence rate in the Corollary 1. Let the assumptions of Theorem 2.2 be in force. Let (µ t ) t≥0 be the solution to (2) starting from µ 0 ∈ P 6 (R d ) ∩ L ln L(R d ), and G N t ∈ P sym (R dN ) be the law at time t > 0 of the solution to (1) with initial law µ ⊗N 0 . There are C > 0 and β ∈ (0, 1) such that for any N ≥ 2 and t ≥ 0 it holds In particular, if ((X i t ) i=1,··· ,N ) t≥0 denotes the solution to the particle system (1) for

3.
Optimal transport on P sym (R dN ). We begin this section with the elementary Remark 2. Such a result is likely to be already given somewhere in the literature. Indeed it can be seen roughly speaking as a super additivity property of the normalized W 2 2 functional on P sym 2 (R dN ). But this is well known (see [4], [12,10]) for the Boltzmann's entropy, for which the heat flow is a gradient flow w.r.t. the W 2 metric, as well as for the dissipation of Boltzmann's entropy along the heat flow which is the Fisher information (see also [19] for the case of the dissipation of the Boltzmann's entropy along the fractional heat flow).
We will prove this Lemma using probabilistic formalism, and it is the only time that we will do so. Indeed it is more handy when it comes to passing to marginals.
Therefore, if for some sequence G N ∈ P sym 2 (R dN ) and some probability measure , then the sequence (G N ) is µ-chaotic, in the sense of Definition 1.1.
We provide the result which enables to adapt the techniques of [2,3] at the particle level in the 5740 SAMIR SALEM Proposition 2. Let F N , G N ∈ P sym 2 (R dN ) be two symmetric probabilities and let ψ N ∈ C 2 (R dN ) be the m.K.p. associated to the couple (F N , G N ) and ψ N * its convex conjugate. Then for any x ∈ R dN it holds Proof. First observe that by definition of ψ N it holds Since ∇ 2 ψ N (x) is on the one hand symmetric by Schwarz's Theorem since ψ N ∈ C 2 (R dN ), and invertible on the other hand, ∇ 2 ψ N (x) is symmetric positive definite for any x ∈ R dN . Moreover and we conclude by using Proposition 4 in Appendix A.

Proof of Theorem 2.2 and Corollary 2.2. In this section, we set
In order to prove the claimed uniform in time propagation of chaos result, we first need to extend the techniques of [2] used in the case of degenerately convex confinements to the case of non convex ones. We begin by the (iii) for any µ ∈ P 2 (R d ) and R > 0 using that |x| 2 x − |y| 2 y, x − y =|x| 4 − x, y |x| 2 − x, y |y| 2 + |y| 4 ≥ (|x| 2 + |x||y| + |y| 2 )(|x| − |y|) 2 ≥ 0, which concludes the proof of the first point. Then so that for any |x| > R it holds We use [2, Lemma 3.12] to conclude the proof of the second point. Then observe that and then for any |x| ≤ 3R so that the result is proved.
Next we need some moments estimates given in the Lemma 4.2. Let (µ t ) t≥0 be a solution to (2) strating from µ 0 ∈ P 2 (R d ). Then for any t, δ > 0 it holds In particular if µ ∞ is a stationary solution to (2), then it holds Proof. Let (µ t ) t≥0 be a solution to (2) then d dt So that for any δ > 0 it holds d dt In particular for any δ > 0 We minimize the r.h.s. by choosing δ = (a + ε) 2 + d which yields the desired result. Then for k ≥ 2 d dt Using a similar argument as above and Young's inequality again yields for some constants c, C > 0 Then we use Gronwall's inequality and obtain Then by Ito's rule, it holds so that taking the expectation and averaging over i = 1, · · · , N , we obtain by symmetry and using symmetry and similar arguments as the one used above yields Before giving the main proposition of this section, we need to look at the existence of stationary solution to (2). Consider the functional F defined on P 2 (R d ) as We use [3,Proposition 4.4,point (iii)] to deduce that there is µ ∞ ∈ P 2 (R d ) which minimizes F , and that such a measure is a stationary solution to (2) and satisfies We finally need to state the 5744 SAMIR SALEM Lemma 4.3. Let a, ε > 0 and for any R > 0 define There is a * > 0 such that if a ∈ (0, a * ) and ε ∈ (0, a/2), there are κ a > 0 and R a > a 6 such that ≤ (3R) 4 + 6(a/6 + 2ε/6)(3R) 2 + 72ε/6(a/6 + ε/6) + 2 √ dε.
Since ε < a/2, for any R > a 6 it holds for some C > 0 For a > 0, define So that it holds C 2 (R a , a, ε) > 4a. Then by the observation of the beginning of the proof it holds We define the function g(a) = 36R 2 a −1 e −2C(R 4 a +R 2 a ) −2a. Since g is continuous and g(0) > 0, there is some a * > 0 such that for any a ∈ (0, a * ), C 1 (R a , a, ε) ≥ g(a) > 0, and the result is proved with κ a = (4a) ∧ g(a).
We can now give the main result of this section, which from which will follow the main theorem of this paper, in the Proposition 3. There is a * > 0, and for any a ∈ (0, a * ), ε a > 0, such that if ε ∈ (0, ε a ) and µ ∞ is a stationary solution to (2) , then there is a constant κ > 0 such that for any N ≥ 2, G N ∈ P sym 2 (R dN ) it holds i.e. µ ∞ satisfies a symmetric W J(κ) inequality.
Most of the material of the proof of this proposition is taken from [2, Proposition 3.4]. Nevertheless we will write it completely. Because on the one hand we will need to make the constants in the estimates more precise since we are in a non convex coefficients framework. And on the other hand, because we want to emphasize what differs from the classical and the P sym (R dN ) contexts.

Proof. First recall that
and that Step one. Estimate of T 1 First we easily obtain Step two. Estimate of T 2 Let R > a 6 1/2 to be fixed later. We denote for i = 1, · · · , N Step three. Estimate of T 1 For the rest of the proof for x = (x 1 , · · · , x N ) ∈ R dN , z ∈ R d we will denote

SAMIR SALEM
We begin by rewriting We will write the rest of the proof as if α d = Z to make the notations lighter. Consider now i = 1, · · · , N , (x 1 , · · · , x i−1 , x i+1 , · · · , x N ) ∈ R d(N −1) and θ ∈ S d−1 fixed and define R i θ := sup{r ≥ 0,x i rθ ∈ X N i }, and r i θ > 0 as Then for any r ∈ [0, R i θ ] it holds by Taylor's expansion • Estimate of R i 2 First we use Holder's inequality to obtain and for r > 0 such thatx i rθ ∈ X N i we have by definition ∇ i ψ N (x i rθ ) ≤ 2R. Moreover, by definition ofx i r i θ θ and R i θ it holds

Finally using Proposition 2
By the respective definitions of R i θ and r i θ it holds Gathering all these estimates yields Step four. Conclusion.
Gathering the results obtained in the above steps yields . We finally choose R = R a as given by Lemma 4.3 so that there is κ a > 0 such that for any ε ∈ (0, a/2) it holds (C 1 (R a , a, ε) ∧ C 2 (R a , a, ε)) > κ a .
Proof of Theorem 2.2 and Corollary 1. We begin by checking that the technical assumptions are fulfilled. Let (G N t ) t≥0 be the solution to (3) with b(x, y) = −∇V (x) − ε∇W (x − y), and µ ∞ be a stationary solution to (2). First the vector fields ∇V + ε∇W * µ ∞ + ∇ ln µ ∞ and b N − ∇ ln G N t are locally Lipschitz since the solutions to (3) or (2) have smooth and positive density for any time t > 0. Then observe that and then Then using Lemma 4.2, for any t ≥ 0 it holds Similar computations would also yield By Proposition 3 and Remark 1, it holds that if a, ε > 0 satisfy the assumptions of Theorem 2.2, then any µ ∞ stationary solution to (2) satisfies a W J(κ) inequality. In particular it is unique. Indeed ifμ ∞ is another stationary solution, then it holds for any t ≥ 0 and then W 2 2 (µ ∞ ,μ ∞ ) = 0. Then by by definition of F N it holds for any N ≥ 2

SAMIR SALEM
So that using Propositions 1, 3 and Lemma 4.2, we obtain for η < κ and Theorem 2.2 is proved, using Gronwall's inequality.
We now turn to the proof of Corollary 1. Consider (µ t ) t≥0 the solution to (2) starting from µ 0 . Then by [22,Theorem 23.9] we have and since µ ∞ satisfies a W J(κ) inequality we deduce which yields the classical synchronous coupling result We now assume that for a constant C > 0 which depends only on a, ε and µ 0 and µ ∞ . The second part of the statement follows from following observation. Let (X 1 , · · · , X N ) be a R dN -valued random variable of law G N ∈ P sym 2 (R dN ), and µ ∈ P 2 (R d ). By definition, we have Observe now that the function is 1/N -Lipschitz w.r.t. the Euclidean norm on R dN . Indeed, by the reverse triangular inequality, one has Then due to Kantorovitch-Rubinstein duality, and increasing property of the Wasserstein metric, we obtain that On the other hand, thanks to Jensen's inequality, one has that δ xi , µ µ ⊗N (dx 1 · · · dx N ) 1/2 , and we conclude by using [9].
Appendix A. Superaddittivity of the trace of the inverse. We proceed by recursion, let be N = 2 and let be A = a c c b , a symmetric positive definite matrix. Note that it holds ab − c 2 > 0, a, b > 0 and and P(2) holds true. Assume now that P(N ) holds true for some N ≥ 2. Let be S N +1 ∈ M N +1 be a symmetric definite positive matrix which we write as Note that necessarily S N ∈ M N (R) is symmetric positive definite (and so is (S N ) −1 ), δ N ∈ R N and z > δ ⊥ N (S N ) −1 δ N . Then Also note that which concludes the first step.
Step two. the multidimensional case. Let us denote the property We proceed by recursion, let be N = 2 and let be S ∈ M 2d (R) a symmetric positive definite matrix. Then Let be O ∈ M d (R) be an orthonormal matrix such that is symmetric definite positive, it holds λ k > a k,k > 0 for any k = 1 · · · , d. So that where we used the commutativity of the trace in the first and last line, and the first step of this proof in the second. And P(2) holds true. Assume now that P(N ) holds true for some N ≥ 2. Let be S N +1 ∈ M d(N +1) be a symmetric matrix which we write as