A non-linear kinetic model of self-propelled particles with multiple equilibria

We introduce and analyse a continuum model for an interacting particle system of Vicsek type. The model is given by a non-linear kinetic partial differential equation (PDE) describing the time-evolution of the density $f_t$, in the single particle phase-space, of a collection of interacting particles confined to move on the one-dimensional torus. The corresponding stochastic differential equation for the position and velocity of the particles is a conditional McKean-Vlasov type of evolution (conditional in the sense that the process depends on its own law through its own conditional expectation). In this paper, we study existence and uniqueness of the solution of the PDE in consideration. Challenges arise from the fact that the PDE is neither elliptic (the linear part is only {\em hypoelliptic}) nor in gradient form. Moreover, for some specific choices of the interaction function and for the simplified case in which the density profile does not depend on the spatial variable, we show that the model exhibits multiple stationary states (corresponding to the particles forming a coordinated clockwise/anticlockwise rotational motion) and we study convergence to such states as well. Finally, we prove mean-field convergence of an appropriate $N$-particles system to the solution of our PDE: more precisely, we show that the empirical measures of such a particle system converge weakly, as $N \rightarrow \infty$, to the solution of the PDE.


Introduction
The emergence of coordinated movements and self-organization in the collective motion of large groups of individuals is ubiquitous in nature; e.g., in biological systems (flocking of birds, swarming of bacteria, etc.) or in human dynamics (movement of crowds, communications networks, etc.). Regardless of the great differences in the specific type of individuals, the collective motion of large groups of organisms exhibits similarities suggesting the existence of general rules that underlie collective dynamics. During the last two decades, this subject has drawn the attention of physicists and applied mathematicians, who developed several mathematical models, trying to extract the essential features of these phenomena, a first step towards the build of a full theory.
In particular, in the pioneering work [58], the authors introduce the notion of self-propelled particles (SPP), which consist of locally interacting particles with an intrinsic driving force (internal to the particles, as it happens in real organisms). The model in [58], commonly known as the Vicsek model, is given in two dimension, but similar SPP models exhibiting self-organization were also introduced in one and three dimensions [23,24]. An explanation of these results by means of continuum theories have been proposed in [23,52,53], where hydrodynamic equations of motion for the density and velocity fields were used. Concerning the direct connection with real systems, we recall, e.g., the interesting work [3].
After then, many other models of self-propelled particles with pairwise interactions have been shown to exhibit coordinated motion, both microscopic particle models [49,42,14,20,21,22,43] or macroscopic (fluid-like) models in the continuum [49,54,55,30]. More recently, mesoscopic models based on kinetic equations for the distribution function in the one-particle phase space have been considered, see, e.g., [35,17], which represent a bridge between the particle-and the hydrodynamicdescription. Interesting developments are also contained in the works [7,6,33,45]. The subject is impressively vast and a comprehensive literature is out of reach, so here and in the following we only mention the works that, to the best of our knowledge, are closest to the model treated in the present paper. For more precise literature review we refer the reader to the review papers [59,29].
As already noticed, self-organization phenomena may occur also in one dimensional systems. In particular, in [23] it is demonstrated that spontaneous symmetry breaking and ordering take place in one dimension as well. This is shown through the numerical analysis of a SPP model in one dimension, and heuristically justified by means of a continuum theory describing this kind of systems. It is worthwhile to quote also the interesting research [12], where the behaviour predicted by this one dimensional SPP model is confirmed in experiments with marching locusts.
A mathematically rigorous explanation of these one dimensional phenomena as well as their continuum description is a natural query. In this paper, we introduce and study a kinetic model for a system of self-propelled particles moving in the one dimensional torus, which can be thought of as a mean-field version of the particle system introduced in [23]. Related models have been considered in [27,28,5,16].
To write the kinetic evolution equation in consideration, some preliminary definitions and notation are needed. Letting T = R/Z be the one dimensional torus of length one, the phase space distribution function is denoted by f t = f t (x, v), where (x, v) ∈ T × R and t ∈ R is the time. As customary in kinetic theory, the function f t (x, v) represents the density of particles which are at time t in phase-space point (x, v). A key ingredient in the definition of the dynamics is given by a function G : R → R such that This function G incorporates both the propulsion and friction forces, whose effect is to set the average velocity to a prescribed value; further comments on the role of the function G will be given after equation (1.9). The typical choice in the literature [23,24] is (1.2) G(u) = (u + 1)/2 for u > 0 , (u − 1)/2 for u < 0 , or its generalization, G β (u) = u + β sign(u) 1 + β , β > 0 .
The advantage of this choice is that G ′ β (u) = 1/(1 + β) for any u = 0, but the above functions have a jump at u = 0, rendering them less amenable to analysis. Therefore, we will consider smooth versions of the above functions which retain the property (1.1), see Hypothesis 1.1 and Hypothesis 1.2 for precise assumptions; the regularization does not alter the asymptotic behavior of the resulting dynamics (however, see Remark 1.3 on the matter). We also introduce a real-valued function ϕ ∈ C ∞ (T) with the properties, Such a function will describe the interaction among particles (see few lines below). Finally, for any phase space distribution function f = f (y, w) and observable F = F (y, w) we define, dy R dw f (y, w) ϕ(x − y) F (y, w) .
With this notation in place, the evolution of the system is governed by the following kinetic equation for the density f t , where σ > 0 is a given parameter and .
To stress the dependence of M on f we should write M ft (t, x) in place of M (t, x). We refrain from doing so to avoid cumbersome notations. Since the total mass is conserved, we can assume that The above and related equations are well studied in the literature (see the works [60,37,36,48]); equation (1.7) describes the evolution of the density of the law of a particle the dynamics of which is subject to transport, damping and noise (the addends on the right-hand side, respectively). More explicitly, Eq. (1.7) is the Fokker-Plank equation for the following Langevin-type dynamics with W t the standard one-dimensional Brownian motion. The measure Z −1 e −v 2 /2σ , Z being the normalization constant, is the unique invariant measure for the SDE (1.8). Let us now come to explain the nature of the nonlinearity. To this end, it may be useful to point out that the solution f t (x, v) of (1.4) can be interpreted as the joint law of the McKean-Vlasov process (x t , v t ) ∈ T × R satisfying the nonlinear SDE, where (x ′ t , v ′ t ) ∈ T × R denotes a random variable with law equal to the joint law of (x t , v t ), i.e., with probability density f t (again, a better notation for M (t, x) should be M ft (t, x)). We emphasize that, in view of (1.10), the evolution (1.9) is non-linear in the sense of McKean, as the process depends on its own law through the expectation in (1.10). Furthermore, from the expression (1.10) one can see that M (t, x) is a sort of "local average velocity" around x, weighed with the auxiliary function ϕ. For example, take ϕ to be the characteristic function of the interval [−b, b] (this choice does not satisfy the assumptions (1.3), but it is meaningful for expository purposes): in this case, M (t, x) reduces to the conditional expectation The role of the nonlinear force G(M (t, x t ))−v t is then quite clear: it tends to increase (decrease, respectively) the velocity of x t if the latter is larger (smaller, respectively) of the local average velocity M (t, x t ) around x t . In other words, while the function ϕ describes the spatial interaction, the function G has the role of "herding" the particles towards a certain average velocity. This will be more clear after the results of Section 2, see in particular equation (2.6).
In this paper we study various aspects of the evolution (1.4). Section 3 is devoted to the study of the well-posedness of the non-trivial Cauchy problem associated with (1.4) and in Section 4 we prove that an appropriate N -particle system converges, in the continuum limit N → ∞, to the solution of the PDE (1.4) . Section 2 and Section 5 contain results on invariant measures and long time behaviour, for some particular choices of the interaction function ϕ.
Let us come to describe the main results of this paper in more detail. In Section 3 we show global (in time) existence and uniqueness for the Cauchy problem associated with (1.4). We adopt here, with non-trivial modifications, the approach taken in [26] to study well-posedness of the Vlasov-Fokker-Planck equation. While the methods we use in this section are purely analytical, alternative techniques, probabilistic in flavour, have been used to study similar problems, see [11].
As already observed, Eq. (1.4) has been introduced as a mean-field counterpart of the discrete model in [23]. To make more precise the connection with particle systems, in Section 4 we rigorously show how the kinetic evolution considered here can be derived as the mean-field limit of a suitable model of self propellingparticles with additive Brownian noises. In this framework, we recall that the bulk behavior of interacting diffusions and Ornstein-Uhlenbeck processes has been already considered in the case of forces depending solely on particle position, see, e.g., [13,47,56,57,34]. Here, we consider the following system of N interacting particles: for every i ∈ {1, . . . , N }, the position and velocity of the i-th particle, denoted by x i,N t and v i,N t , respectively, evolve according to the SDE where the W i 's are one-dimensional independent standard Brownian motions. (Because G is assumed to be bounded and Lipschitz continuous, existence and uniqueness (for every N fixed) of the strong solution of the above system follows from standard SDE theory.) The main result of Section 4 can then be expressed as follows (for a precise statement see Theorem 4.1): as N → ∞, the motion of each particle converges weakly to the solution of the non-linear diffusion (1.9). To put it differently, the empirical measure of the system, i.e., the measure S N t (dx, dv) : , converges to a measure with a density with respect to the Lebesgue measure; such a density is precisely the solution of the PDE (1.4). We notice in passing that the method of proof adopted here produces an existence and uniqueness result in measure space for the evolution (1.4), see Proposition 4.4 for precise statement.
Regarding the long time behaviour, in Section 2 we start by considering the simplified setting in which the solution does not depend on the spatial-variable. That is, we only consider space-homogeneous evolutions f = f t (v). In this case we give a complete description of the asymptotic behaviour: we show that the system admits three space-homogeneous stationary densities, which are Gaussian distributions with same variance σ and average velocity 0 and ±1, respectively. To understand why this is the case, one should observe that the points 0 and ±1 are precisely the only points such that G(x) = x and reconcile such an observation with the above discussion regarding the nature of the non-linear term of the PDE (1.4). In this framework, the Gaussians N (±1, σ) correspond to clockwise/anticlockwise coordinated motion; one could argue that such measures may be the only physically relevant equilibria -in the sense that, had we considered G as in (1.2), the mean zero Gaussian would not be an equilibrium, as the function (1.2) does not intersect the diagonal at the origin, see Remark 1.3 and Remark 2.6. Still under the simplifying assumption f t = f t (v), we prove that if the initial average velocity is positive, i.e., R dv v f 0 (v) > 0, then the solution of the PDE converges to the Gaussian N (1, σ); if the initial profile f 0 has negative average velocity, then the solution converges to the Gaussian with negative mean, N (−1, σ). This implies that, regardless of the initial conditions, the system is driven towards a coherent motion (in the context of flocking models this phenomenon is sometimes called unconditional flocking). Moreover we prove that in either cases decay to equilibrium is exponential. We prove this fact by using two different approaches: in Section 2.1 we close equations on the cumulants of the distribution (closing equations directly on the moments did not seem possible); in Section 2.2 we identify a suitable entropy functional for the system and employ an appropriate modification of the Bakry-Carrillo-MacCann-Toscani-Villani strategy to show exponentially fast decay of such a functional. To summarize, in this simpler case we can determine all the equilibria of the system as well as their basin of attraction and the rate of decay. This is a rather "lucky case". Indeed, for dynamics with multiple equilibria, there is at present no general theory to address the problem of convergence (the primary problem here being the description of the basin of attraction of each invariant measure) -as opposed to the ergodic framework, where several approaches have been developed [10,60,15,1,44]. However, we flag up the work [18], where the authors characterize a class of SDEs with multiple equilibria for which one can give a complete description of the basin of attraction of each stationary state. We also mention the case of a particular nonlinear Fokker-Planck equation, introduced in [8] as a kinetic model for a one-dimensional granular medium, which is characterized by a unique, but non-Maxwellian, steady distribution, see [9].
The analysis of the long-time behavior for the full equation (1.4) appears indeed much challenging (not last, because the the dynamics (1.4) is not in gradient form) and it will be the content of future work. In particular, in this fully non-linear and non-space-homogeneous case (1.4), one can observe by direct calculation that the three space-homogeneous Gaussians N (0, σ) and N (±1, σ) are still equilibria for the system (one can also determine an evolution system of measures for such a dynamics, see Remark 2.4); but we have been unable to prove, in such generality, that these are indeed the only invariant measures for the system. However, in Section 5 we show that if the interaction function is a "perturbation of the constant function", i.e., if ϕ(x) = 1 + λψ(x), where 0 < λ ≪ 1 and ψ is a mean zero, bounded function (defined on the one-dimensional torus), then the above three Gaussians are still the only invariant measures of the dynamics.
1.1. Notation and standing assumptions. Throughout the paper, we make the following assumptions regarding the interaction function ϕ and the "herding function" G.   tions ϕ with compact support. We don't do this here to avoid further technicalities in the proofs but we reserve to drop this assumption in future work. (iii) Clearly, one could pick the function G to intersect the diagonal in an arbitrary number of points. The choice x = 0, 1, −1 is purely arbitrary. If G intersects the diagonal elsewhere, say at a > 0 (so that by antisymmetry one also has G(−a) = −a), then also the Gaussian measures N (a, σ) and N (−a, σ) will be invariant for the reduced space-homogeneous equation considered in Section 2 as well as for the full dynamics (1.4). This can be seen with calculations completely analogous to those presented at the beginning of Section 2, see also comments after (2.6).
Throughout, we will interchangeably use the notation f t and f (t) for timedependent functions. Further notation will be somewhat local to individual sections and we introduce it when needed.

The space-homogeneous case
Here, we study the space-homogeneous solutions of Eq. (1.4). We work under Hypothesis 1.1 and Hypothesis 1.2. In view of (1.3), (1.5), and (1.6), in the spacehomogeneous case f = f t (v) the kinetic equation reads, where w f denotes the average of the distribution f , that is Later on, see Eq. (2.6), we will observe that the mean M 1 (t) := w ft solves a simple ODE. Existence and uniqueness of classical solutions of the Cauchy problem associated with (2.1) with initial datum f 0 (x) such that |f 0 (x)| ≤ c exp(av 2 ), a, c > 0, is shown in [2]. Existence and uniqueness for (2.1) in Sobolev spaces follows after a change of variable: if f t (v) is the solution to (2.1) then the function which is the Fokker-Planck equation for the Ornstein-Uhlenbeck process, whose well posedness is well established (even for measure-valued solutions, see, e.g., [51,Chap.1]). 1 For such a process it is well known that if the following assumption holds, [H] The initial datum f 0 is such that f 0 , |f 0 | 2 and |∂ v f 0 | 2 have finite moments of any order then the solution at time t > 0 enjoys the same property and in the reminder of this section we shall work under such an assumption on the initial profile (see [46,Theorem 2.29 and other results within that section] or [4]). 2 We do not linger on matters of well posedness and we move forward to characterize the stationary distributions (invariant measures) of (2.1).
After observing that G( w f ) does not depend on the velocity variable, the invariant measures are the normalised solutions to the equation Clearly, solving the above amounts to solving the first order ODE, where C is a generic constant. Integrable positive solutions to the above ODE exist only for C = 0 and have the form, The normalization condition R dw f (w) = 1 givesC = exp(G 2 ( w f ))/( √ 2πσ). From (2.2), one also obtains the condition G( w f ) = w f which implies, by our choice of G, w f = 0, ±1. In conclusion, Eq. (2.1) admits three stationary solutions, namely the following three Gaussian densities, 1 Furthermore, this equation can also be seen as a special case of the more general setting considered in Proposition 3.1. 2 This assumption is certainly not sharp but sufficient to our purposes. Here we are not interested in minimal assumptions but in a description of the dynamics. For sharper conditions see the cited references.
The asymptotic behaviour of the time-dependent distributions can be fully characterized, we show below two different approaches. (We will favour the use of the notation M n (t) when we wish to stress the timedependence and the use of the notation w n ft when we wish to stress the dependence on the function f t .) By an explicit computation we havė The first equation is the conservation of total mass (so that M 0 (t) = 1 for all t ≥ 0) and by the last equation the variance converges to σ. Eq. (2.6) is instead a statement about the average velocity: if the average velocity is positive (negative, respectively) at time t = 0, then it converges to +1 (−1, respectively). Notice that in order for this fact to be true one just needs G(M 1 )−M 1 to be positive in (0, 1) and (−∞, −1), negative in (−1, 0) and (1, ∞) (the antisymmetry of G or other detailed properties of such a function don't really matter to this end). If the function G had more than three fixed points, then, for every a ∈ R such that G(a) = a, one would have an additional invariant measure, namely the Gaussian N (a, σ) (and, by antisymmetry of G, the measure N (−a, σ) as well). Whether the mean velocity ever converges to a (or −a) would depend on the sign of G(x) − x for x > a and for x > a.
For moments of any order, after multiplying (2.1) by v n and integrating by parts, 3 one finds the recursive relation, .
We also recall the following relation between moments and cumulants [50], Convergence to the stationary solutions for the dynamics (2.1) is now a consequence of the following lemma, whose proof is postponed to the end of this subsection.
Lemma 2.2. Assume [H] holds. With the notation introduced so far, we then have , then f t converges to the stationary state µ + as t → ∞. Analogously, if the initial datum has negative (zero, respectively) mean, then the solution of (2.1) converges to µ − (µ 0 , respectively).
Proof of Proposition 2.3. By lemma 2.2 the cumulants of order n ≥ 3 of the density f t tend to zero as t → ∞. Therefore, the solution to the space-homogeneous Eq. (2.1) converges to a Gaussian density (it is a standard fact that the only distributions with vanishing cumulants of order n ≥ 3 is the Gaussian distribution). Equations (2.6) and (2.7) describe the evolution of the mean and the variance of the solution of (2.1). As already observed, by (2.7) the limiting variance is σ. Because of our assumptions on G, by (2.6) the limiting mean is ±1 or zero if M 1 (0) is positive, negative or zero, respectively. Remark 2.4 (On the non space homogeneous dynamics (1.4)). Another elementary consequence of Lemma 2.2 together with (2.9) and (2.10) is the following. Suppose the initial datum for the dynamics (2.1) is a Gaussian measure with given mean M 0 and variance σ. Then the law of the process at time t is still Gaussian, with variance σ and mean M 1 (s), where M 1 (s) is the solution to (2.9) with initial datum M 0 . Otherwise stated, for each M 0 ∈ R, the family of measures {ν t ∼ N (M 1 (t), σ)} t≥0 is an evolution system of measures, see [25,41].
Clearly, these families of measures constitute an evolution system even for the non-homogeneous dynamics (1.4). More generally, we can consider densities f t (x, v) of the form, and B(t, x) > 0 , (2.13) and look for the class of (regular enough) functions A = A(t, x) and B = B(t, x) such that the density f t is a solution to (1.4). By substituting in (1.4) one obtains a long expression, which can be rearranged to be a polynomial in (v − A). By comparing the coefficients with equal power, one obtains the following system of constraints, By (2.14) and (2.17), Substituting the above into (2.15), one gets ∂ Proof of Lemma 2.2. The proof is by induction on n. The inductive basis, i.e., the case n = 3, can be done by direct calculation. Assuming (2.12) holds for every cumulant up to order n − 1, we want to show that the statement holds for n. By (2.11), for any n > 3, where we used the inductive assumption for j ≤ n − 3 and (2.9), (2.10) for j = n − 2, n − 3. Then, in view of (2.8), having used the identities, j n−1 It remains to notice that, again by (2.11), the expression inside the first square brackets on the right-hand side is equal to C n , while those inside the last two square brackets vanish. The lemma is thus proved.
Remark 2.5. We are using the properties of G only to prove convergence of the first moments. The behaviour of the higher cumulants is independent of the choice of G.

2.2.
Liapunov function and its rate of decay. If f is a probability density with finite variance, we let where V (u) is the opposite of an antiderivative of the function G(u), i.e., V ′ (u) = −G(u). We claim that the functional S(·) is a Liapunov functional for the dynamics (2.1). That is, S(·) is bounded below and dS(ft) dt ≤ 0. Let us start by proving that the functional is bounded from below; to this end, observe that the following inequality holds: Since G is bounded, V (u)/u 2 → 0 as |u| → ∞, whence the sum of the last two terms in the right-hand side is bounded from below. As for the sum of the first two terms, this is bounded below as well. Indeed, let ρ(v) be the probability measure The first addend on the right-hand side is non-negative (by Pinsker's inequality).
The sum of the last two addends is positive as well, Let us now come to compute the time derivative of S: Remark 2.6. Analogously to the previous computations, one can formally show that the Gaussians µ 0 (v) and µ ± (v) are the unique critical points of the functional S under the constraint R dv f (v) = 1; so, when t → ∞, S(f t ) can only converge towards S(f ∞ ), with f ∞ being one of such Gaussians. We already know that f t decays towards µ + (µ − , µ 0 , respectively) when the initial datum f 0 is such that In what follows we will only focus on studying the case in which w f0 = M 1 (0) = 0. This is because if M 1 (0) = 0 then M 1 (t) = 0 for every t ≥ 0 (from (2.6)); because G(0) = 0, in this case the process reduces to the simple Orstein-Uhlenbeck process and the entropy functional simplifies to the classic form which is well studied in the literature, see [15,1] and [4]. Furthermore, observe that the equilibrium µ 0 is somewhat "unphysical": had we chosen G to be as in (1.2), µ 0 would not be an invariant measure at all.
Let us now come to study the rate of decay of the functional S. To this end, we use the by now well established Bakry-Carrillo-MacCann-Toscani-Villani strategy [15,1]. We first give an outline of how to adapt this approach to our context and then state and prove the main result of this section, Proposition 2.7 below. To explain how we modify the Bakry-Carrillo-MacCann-Toscani-Villani strategy [15,1], set The functional D S (f t ) is often referred to as the entropy production functional. By (2.20) we then have Suppose there exist constants c, γ, K > 0 (possibly depending on f 0 ) such that Then, integrating the above inequality on the interval (t, s), one gets, When s tends to infinity, D S (f s ) tends to zero (this can be deduced from (2.22), with calculations analogous to those leading to (2.23) below); so that, letting s → ∞ in the above, one gets hence exponential convergence follows, It is therefore clear that, in order to prove exponential decay of the relative entropy S(f t |f ∞ ), we only need to show the inequality (2.22). This is the purpose of the following proposition. As a premise, notice that under Hypothesis 1.2 we have G ′ (1) < 1 (and, because G is an odd function, G ′ is an even function, hence G ′ (−1) = G ′ (1)). Proof. Recalling (2.21), we have, having used the identity R dv 2(v − G(M 1 ))∂ v f t = −2 (obtained by integrations by parts). Taking the time derivative along the solutions f t , and using the equations (2.1), (2.6), and (2.7), one gets By the definition of D S (f t ), the last identity can be rewritten in the following form, We claim R ≤ 0. To prove the claim, we integrate by parts the first term and then express ∂ t f via the right-hand side of (2.1). After some straightforward computation we get, We next observe that an integration by parts gives, which implies, Using above identity to rewrite the expression of R we finally obtain, By (2.24) we then conclude We finally observe that, because G ′ (1) < 1, the solution M 1 (t) to Eq. (2.6) with initial condition M 1 (0) ≷ 0 converges monotonically to ±1 exponentially fast, with any rate 0 < γ ′ < 1 − G ′ (1) (recall G ′ (1) = G ′ (−1) as G is an odd function). In particular, given γ < 2 − 2G ′ (1) as in the statement of the proposition and choosing γ ′ = γ/2 we deduce that with η := max{|M 1 (0)|, 1} and a suitable K > 0 depending on γ and M 1 (0). 4 By (2.25) and the above estimate, the inequality (2.22) follows. The proposition is thus proved.

Well posedness of the non-homogeneous equation
The main result of this section is Theorem 3.6, which is an existence and uniqueness result for the nonlinear problem (1.4). The proof of Theorem 3.6 follows the strategy adopted in [26] (in Remark 3.2, point (iv), we highlight the differences between our work and [26]). The solution of Eq. (1.4) is constructed through a classical iterative procedure, as a limit of a sequence of functions, {f n } n∈N ; each one of the functions f n = f n t (x, v) solves a linear equation of kinetic Fokker-Plank type, see (3.1) and (3.11). This section is therefore organized as follows: after introducing the necessary notation, we start by stating several preliminary results on the linear equations satisfied by the functions f n (see Subsection 3.2). The proof of such results is postponed to Appendix A. In Subsection 3.3 we then present the proof of Theorem 3.6; that is, we present the iterative procedure which produces a (unique) solution in L 1 ( √ 1 + v 2 ) for the non-linear problem (1.4) (space L 1 ( √ 1 + v 2 ) defined few lines below). We assume Hypothesis 1.1 to hold throughout.
3.1. Preliminary notation. We will work with the following function spaces: and, for any m ≥ 0, We denote by , and X := X 0 . Finally, if Z is a row vector, then Z T denotes its transpose. If H(x, v) is a vector-valued or matrix-valued function of x and v, then H L ∞ is the sum of the L ∞ -norms (as defined above) of the components of H.

3.2.
Well-posedness of the linear problem. As anticipated, we will construct the solution of the problem (1.4) as limit of a sequence of functions {f n } n≥0 , each of them solving a linear problem. The linear problems solved by the functions f n are all of the form x, v) are given and satisfy certain conditions specified later and c ∈ R is a given constant. In this section we therefore gather several results on the well posedness and on the properties of equation (3.1). The weak formulation of the linear problem (3.1) is given by and L m is the functional Proposition 3.1. Consider the problem (3.1), where the coefficient b(t, x, v) and the forcing U (t, x, v) satisfy the following assumptions: . Then, for any initial condition g 0 ∈ L 2,m , the problem (3.1) admits a unique solution in X m . This solution is unique in the sense that ifg is another solution in X m , then g −g X m = 0. Moreover, the solution belongs to the space (iii) Assumption ii) of Proposition 3.1 is satisfied as soon as (As H 1 ⊂ L 2 ⊂ H −1 , and this is true for the weighted spaces at hand as well.) (iv) This proof of Proposition 3.1 follows the structure of proof adopted in [26,Prop. A.1]. To compare with [26], observe that Eq. (3.1) can be rewritten as Equations of this type are analized in [26], see [26,Eqs. (40) and (47) directly. Nonetheless, the same scheme of proof applies and, with slight modifications, it also allows to remove the assumption that a should have bounded v-divergence (which, in our case, translates into dropping the assumption ∂ v b ∞ < ∞). It is important to notice that while in [26] the author works in flat L 2 spaces, we work in the weighted spaces introduced in Subsection 3.1. This is done in order to deal with the fact that the coefficient a is unbounded and to obtain information about the integrability (in the velocity variable) of the solution of (3.5).
Proof. See Appendix A. R v ), ∂ v b ≤ 0 and the initial datum belongs to L 1 (T x × R v ) ∩ L 2,m , for some m ≥ 2; then the following Duhamel formula holds, Proof. See Appendix A.
The practical (and straightforward) criterion that we will use in order to make sure that all of the above results hold is the following lemma, which puts an emphasis on the assumptions on U that one needs to verify in practice (typically the assumptions on b will be straightforward to check).
From this, the statement follows.

3.3.
Construction of the solution to the nonlinear problem: the iterative procedure. After the preliminary results of Subsection 3.2, this section is devoted to the proof of the main existence and uniqueness result, Theorem 3.6 below. Before stating it, we clarify that the weak formulation of problem (1.4) is given by where we recall that Φ is the space of C ∞ functions with compact support in [0, T ) × T x × R v and the functional E N L is defined as with M (t, x) defined in (1.5). In what follows, we set Df (x, v) = (∂ x f, ∂ v f ) and Then, for any initial datum f 0 satisfying the following assumptions, • f 0 ∈ L 2,m for some m ≥ 8, ,m for some m ≥ 2, there exists a unique weak solution of (1.4). The solution belongs to the space We first explain the strategy of proof and state two necessary technical lemmata, Lemma 3.8 and Lemma 3.10. We then move on to the proof of Theorem 3.6.
We construct the (unique) solution to the nonlinear Eq. (1.4) with initial condition f 0 (x, v), as the limit of a sequence of functions {f n (x, v)} n≥0 which are recursively defined according to the following scheme: we set f 0 t (x, v) := f 0 (x, v) and, for any n ≥ 1, we let f n t (x, v) be the solution to the linear equation,  1 (t, x)). Because G is assumed to be bounded, Proposition 3.1 can be applied to Eq. (3.11), which is therefore well-posed in the space Y m for any initial datum in L 2,m . Moreover, the following fundamental lemmata hold. (3.13) Then the following holds.

then there exists a non-negative function α m (t) which is bounded on compacts and such that
Df 0 ∈ L 2,2 ∩L ∞ , then there exists a non-negative function β m (t) which is bounded on compacts and such that  Proof. See Appendix A.
Remark 3.9. If f 0 ∈ L 2,5 and √ 1 + v 4 Df 0 ∈ L 2,2 ∩L ∞ , using point (b) of Lemma 3.8, the interpolation inequality (3.9) and the continuity in x of the integral function on the left-hand side (which is true by hypoellipticity), one has where κ(t) denotes (here and in the following) a generic time-dependent function, bounded on compact sets. Similarly, if vf 0 √ 1 + v 4 (which grows at infinity like Finally, let h n be defined as in (3.21).
Proof. They are consequence of the following identities, d dt f n t 2 L 2 , which can be easily verified by (3.11) and integration by parts. In doing such integrations by parts, notice that, by Lemma 3.8, v 3 (f n t ) 2 and v 2 f n t ∂ v f n t vanish at infinity.
Proof of Theorem 3.6. In the sequel, we will assume t ∈ [0, T ] where T > 0 is any fixed time, and we will denote by C a generic positive constant, whose numerical value (possibly depending on f 0 and T ) may change from line to line. Analogously, κ(t) will denote a generic time-dependent function, bounded on compact sets; the specific expression of this function may change from line to line. With the notation introduced so far, the scheme of proof is classical: after setting Indeed, by iterating (3.22) we deduce that f n and √ 1 + v 2 f n converge to f and To prove that the limit of the sequence f n is actually a solution of the nonlinear PDE (and in particular to pass to the limit in the nonlinear term (3.10)) one uses such a convergence plus Lemma 3.8, which implies the existence of a subsequence which converges weak* in L ∞ ([0, T ] × T x × R v ). We omit the details, which are standard. This proves existence of a solution of the nonlinear problem. Uniqueness can be obtained with calculations similar to those that allow one to derive (3.22). We therefore only need to prove (3.22). This is done in three steps (with calculations that are standard but lengthy, so in places we only indicate how to complete them).
• Step 1. This step consists in showing the following inequality In order to show the above, we consider the function g = g t (x, v) defined as g := f n − f n−1 . Such a function satisfies the linear equation, By Proposition 3.1 (applied to Eq. Observe now that . This can be seen by using point (b) of Lemma 3.8, together with the interpolation inequality (3.9). Under such an assumption on the initial datum f 0 , the Duhamel formula of Lemma 3.4 holds. Using this fact and the Lipschitzianity of G, we obtain Looking at the integrand on the right-hand side, The inner integral can be estimated by using (3.17). Moreover, having used the fact that ϕ is positive and uniformly bounded below, ϕ ≥ ǫ > 0, and (3.18). Therefore (3.23) follows.
• Step 2. The second step consists in bounding the first addend on the righthand side of (3.23). In particular, we will show the following inequality, (3.26) To this end, letg =g t (x, v) be defined asg := h n − h n−1 . From (3.21) and (A. 10) we deduce that the functiong solves the equation, where, setting A 1 (v) := v/ √ 1 + v 2 and recalling the definition (A.11), Because A 1 , A ′ 1 := ∂ v A 1 and G are bounded, all the functions that multiply the differences (f n −f n−1 ) and ∂ v (f n −f n−1 ) on the right-hand side above are bounded. Analogous reasoning holds for the difference R n,1 −R (n−1),1 (see (A.11)). Moreover, as soon as f 0 ∈ L 2,3 , so that the well posedness of (3.27) is ensured. By the Lipschitzianity of G we can then write Now observe that the right-hand side of (3.27) is in L 1 as, by our assumptions on f 0 , it is in L ∞,2 . We can then apply the Duhamel formula (to (3.27)) and use (3.19) to obtain having used (3.25) and (3.23) in the last inequality. Hence (3.26) is proved.
• Step 3. We now need an estimate of the last term on the right-hand side of (3.26). To this end, acting like in the proof of Lemma 3.8, we observe that the differences p(x, v) = ∂ v (f n − f n−1 ) and q(x, v) = ∂ x (f n − f n−1 ) satisfy PDEs with a structure similar to the one of the equations encountered so far (see (A.12) and comments thereafter), so one can apply again the strategy that we have already used and that we only sketch in this case. By using (A.12) we find an equation for p(x, v). The Duhamel formula applied to such an equation gives (3.29) Acting similarly, one also gets (3.30) Using point (c) of Lemma 3.8 and the interpolation inequality (3.9) we have, Therefore, the first addends on the right-hand side of (3.29) and (3.30) can be treated in a standard way. Concerning the last addend on the right-hand side of (3.30), from (A.13) and point (b) of Lemma 3.8, Now, again by (A.13) and using the Lipschitzianity of G, Using (A.14) we have, where we used that ϕ is strictly positive in the second inequality, and (3.18) in the last inequality.

Particle system
In this section we work under Hypothesis 1.1. Consider the system of N interacting particles, each of them having position ad valecity (x i,N t , v i,N t ) described by the SDE (1.11)-(1.12). We recall that because G is assumed to be bounded and Lipschitz continuous, existence and uniqueness (for every N fixed) of the strong solution of the system (1.11)-(1.12) follows from standard SDE theory. Let The empirical measure is, for each fixed t > 0 (and for each fixed N ∈ N), a random measure; in particular, S N t : Ω → Pr 1 . 6 Therefore, (for each fixed N ∈ N and T > 0) the stochastic process S N = {S N t } t∈[0,T ] can be seen as a random variable with values in C. We will denote by Q N the law of the random variable S N , so {Q N } N is a sequence of probability measures on C. As customary, if α is a measure on T × R and ψ a function on the same space, we use in this section the notation α, ψ := T×R ψ(x, v) α(dx, dv) .
With this notation in place, the main result of this section is Theorem 4.1 below: roughly speaking, as N → ∞, the empirical distribution S N converges to the solution of equation (1.4).
Theorem 4.1. Suppose the initial data of the particle system (1.11)-(1.12) are such that where f 0 is the initial datum of (1.4) and the above is supposed to hold for every ψ ∈ C ∞ 0 (T × R). Then, with the notation introduced so far, the following holds: for each t > 0, the sequence of random measures {S N t } converges to a deterministic measure, S * t , which has a density with respect to the Lebesgue measure. Such a density is a function in the space L 1 ( √ 1 + v 2 dx dv) and coincides with the unique solution of equation (1.4) (given by Theorem 3.6).
Proof. The proof is in four steps.
Step 1. We start by finding the equation satisfied by S N t (in weak sense). For any Applying Itô formula, one finds Letting ϕ x (·) = ϕ(x − ·), we can write More in general, if α is a measure, we define ϕ α (x) := α, ϕ x andφ α (x) := α, ϕ x P . Therefore, having set Step 2. We want to show that the sequence of measures {Q N } N ∈N , or, equivalently, the sequence of random variables {S N } N ∈N , is tight. To this end, we use [39,Prop. 1.7]. Namely, let {ψ k } k be a dense subset of C b (T × R). Then the sequence S N is tight if and only if, for every k ∈ N the real-valued stochastic process (4.6) H N t := S N t , ψ k is tight (it would be more correct to include the index k in the notation for H N t , but we drop such an index to avoid cumbersome notations). In other words, [39,Prop. 1.7] reduces the problem of studying tightness of a family of probability measures on C([0, T ]; Pr 1 ) to the simpler problem of studying tightness of a family of probability measures on C([0, T ]; R). We can now apply Kolmogorov's criterion to the process H N t ; we therefore need to prove i) sup  This is the content of Lemma 4.3. Notice that condition (4.8) is automatically satisfied, for example for a = 0, thanks to the boundedness of the functions ψ k . So the proof of Lemma 4.3 is concerned with checking (4.7).
Step 3. From step 2, every (sub-)sequence of S N admits a weakly convergent (sub-) sequence. We want to show that if S * = {S * t } t≥0 (with law Q * ) is the limit of any such subsequences, then S * is a weak measure-valued solution of Eq. (1.4) (in particular this implies that S * is deterministic). We clarify that a path {π t } ⊂ C is a weak measure-valued solution of (1.4) with initial datum π 0 ∈ Pr 1 if the identity is satisfied for every test function ψ ∈ C ∞ 0 (T × R). To this end, for each ψ ∈ C ∞ 0 (T × R), consider the functional J π0 ψ : C −→ R + , defined as follows (4.10) We stress that here π 0 is the initial datum for (4.9). The rationale underlying the choice of this functional can be understood by comparing with (4.5): if the path ψ is a positive functional, to prove the above it suffices to show that E Q * J π0 ψ = 0 for every ψ. The functional J π0 ψ is bounded and continuous, see Lemma 4.2. Therefore, if C > 0 is a generic positive constant, for every ψ ∈ C ∞ 0 (T × R), we have The right-hand side of the above is easy to estimate; indeed, by definition of S N , The first addend in the above goes to zero by assumption, if we take π 0 to be a measure with density f 0 . Regarding the second: Step 4. As a consequence of Step 2 and Step 3, we know that Eq. Lemma 4.2. For each function ψ ∈ C ∞ 0 (T × R), the functional J π0 ψ (defined in (4.10)) is continuous.
Proof. Let π N = {π N t } t be sequence in C. Suppose π N converges in C to π. Then, for every t > 0, π N t converges weakly to π t . Hence, by definition π N t , ψ → π t , ψ . The same argument can be applied to all the other terms on the first line of (4.10). To pass to the limit in the nonlinear term (the last term in (4.10)), we first act with manipulations analogous to those in (3.25) and then conclude with the same argument as above. We omit the details.  Moreover, if the initial datum π 0 has finite first moment, i.e. π 0 , |v| < ∞, then the solution π t has finite first moment as well.
Proof. The existence claim is a result of Step 2 and Step 3, so we concentrate on proving uniqueness. This is done by using the same strategy adopted in [32,Section 6], [31,Sect. 3] and [19,Thm. 4.2]. Here we follow [32,Sect. 6] and [31, Section 3] so we outline the strategy but we don't repeat all the details. Let A be the second order differential operator defined on smooth functions (of x ∈ T and v ∈ R) as (4.11) A := v∂ x − v∂ v + σ∂ vv and let A * denote its formal adjoint (in the flat L 2 space). Then we can formally rewrite (4.9) as . That is, (4.12) π t , ψ = π 0 , e tA ψ − t 0 ds G φ πs ϕ πs π s , ∂ v e (t−s)A ψ .
Let us now introduce the following metric on Pr 1 : where the supremum is taken over all measurable bounded functions ψ : T × R → R with ψ ∞ ≤ 1. Notice that the metric d can be equivalently defined as Showing the equivalence of the two definitions is a simple application of the dominated convergence theorem (see [31,Remark 3.2]). Let π = {π t } t≥0 and ν = {ν t } t≥0 be two solutions of (4.9), with the same initial datum. The aim of the rest of the proof is to close an integral inequality on the quantity sup t∈[0,T ] d(π t , ν t ), thereby showing that π t = ν t for every t ∈ [0, T ]. However (similarly to what happened in the calculations of Section 3.3) closing the inequality on d(π t , ν t ) is not possible. For this reason we will instead work with the quantitỹ We will repeatedly use the inequality We will show in some detail how to study the term π t −ν t , ψ , the term π t −ν t , |v| ψ can then be treated analogously (and we will only point out the slight difference, without redoing the calculation). Throughout the proof C > 0 will be a generic constant. From (4.12) we have, where In view of (4.13), the term I 2 is readily bounded, having used, in the last inequality, the following heat kernel type of bound (4.14) ∂ The above estimate is classical and holds for every continuous and bounded function ψ, see for example [40] or, for a more explicit proof adapted to this case, see [32,Lemma 12]. As for the term I 1 , after using the lipschitzianity of G and manipulations similar to those in (3.25), one finds and If we multiply and divide the integral of the inner integrand in the definition of A 1 , we then obtain In order to estimate the term I 1,2 , all we need to do is to find a bound onφ νs (x). This can be done by using the exact same approach that we have used so far. In particular, it suffices to show that π t , |v| ≤ κ(t) where κ(t) is a generic function bounded on compacts. Again we write, We now use the estimate (again, a constructive proof of the above can be found in [32,Lemma 12]) and (4.14) to conclude (4.16) By taking ψ ≡ 1, one gets T×R |v| π s (dx, dv) ≤ Ce Ct . Hence The proof is concluded after use of the (generalised) Gronwall Lemma.

Invariant measures: some comments
The non-homogeneous stationary problem reads, .
Assuming that the boundary terms do not contribute, integration with respect to the velocity gives ∂ x R dvf (x, v) v = 0. Hence the local average velocity R dvf (x, v) v does not depend on x; we denote it by α. Moreover, as TL dy ϕ(y) = 1, where * denotes convolution on T. Suppose now that the interaction ϕ is close to the constant function, i.e., ϕ = 1 + λψ with 0 < λ ≪ 1, ψ L ∞ = 1, and T dx ψ(x) = 0. Eq. (5.1) then reads, As G(0) = 0, if α = 0 the equation reduces to the linear Langevin equation with vanishing force, hence the Gaussian distribution N (0, σ) gives the (unique) solution.
Let us consider the case α = 0. By symmetry, without loss of generality we can restrict to the case α > 0.
Suppose the following holds: (i) The density f is smooth and admits a convergent expansion in λ, (ii) The coefficients f n (x, v) are differentiable in x, v and the x, v-derivatives of f can be computed by term by term differentiation, obtaining convergent series. Moreover, f n ∈ L 2 (T x ; H 1,2 v ). Then f coincides with the Gaussian density N (1, σ).
Proof. We have to prove that This is achieved by induction on n. We start with the inductive basis, which is the case n = 0. By (5.3) we have, Plugging (5.3) and the above in (5.2), after equating the 0-th order in λ on both sides we get, We now turn to the inductive step: given n > 1, prove that if f 0 = N (1, σ) and f k = 0 for any 1 ≤ k ≤ n − 1, then f n = 0. Under the inductive hypothesis and recalling that ψ has zero average, Moreover, because f 0 has mass one, Plugging the expansions (5.3) in (5.2) and equating the n-th order terms in λ we obtain, We multiply both sides of the above equation by v, and then we integrate on T x ×R v . Using that T dx R dv f n (x, v) = 0, after an integration by parts we obtain the identity −α n G ′ (1) = −α n , which implies α n = 0 because G ′ (1) = 1. Therefore, Now observe that f n = 0 is the unique solution, in L 2 (T x ; H 1,2 v ) and with zero average, of the above equation. 8 This concludes the proof.
Appendix A. Analysis of the non-homogeneous linear equation We here study Eq. (3.1). In what follows, we denote by A m (v) the derivative of √ 1 + v 2m , i.e., For any function of v, say h(v), we write h ∼ v p to indicate that asymptotically h grows like v p . I.e., we We also (improperly) use the notation Remark A.1. We point out the simple fact -which we will use more or less explicitly in the following, sometimes without mention -that if g solves the weighted variational problem (3.2) for some m > 0, then it also solves the flat problem E 0 (g, φ) = L 0 (φ) (to see this just considerφ = φ √ 1 + v 2m ).
Proof of Proposition 3.1. We follow the approach in [26, Prop. A.1], so we don't repeat the whole proof but just highlight the differences. To make comparison easier, we try to use a notation similar to the one in [26]. If g solves (3.5), after the change of unknown u(t, x, v) = e −λt g(t, x, v), the function u solves the equation In the following we will simply denote by U the function U e −λt . The weak formulation of the problem (A.2) is given by and E m , L m and the space Φ have been defined at the beginning of Subsection 3.2. After observing that for any m ≥ 0 and φ ∈ Φ one has 8 See e.g. [60].
it is easy to show that the bilinear form E m λ is also coercive (i.e., there exists a constant C > 0 such that E λ (φ, φ) ≥ C φ 2 X m , when we choose λ > 1 large enough, depending on b ∞ and c) and continuous on X m , for every fixed φ ∈ Φ. 9 Therefore Lions' Theorem [26, page 534] can be applied and gives existence of a solution in X m . As for uniqueness, we show this part in some detail; we follow [26, pages 535-536], but focus on uniqueness in the space X m : we need to prove that if u 0 = 0 and U = 0 then u ≡ 0. First of all, one needs to show that the spaceΦ of smooth functions with compact support in [0, T ]×T x ×R v is dense in Y m (with the topology induced by the X m -norm). 10 This can be done following the proof of [26, Lemma A.1], so we don't repeat it. With this technical detail in place, it is straightforward to see that for any two functions u,ũ ∈ Y m (just functions of Y m , not necessarily solutions of our problem) one has, So, if u is a solution of our problem with u 0 = 0 and U = 0 then Using (A.3) one can see that the second term in (A.5) is positive: Notice that the integration by part used to obtain the above equality is allowed and gives zero boundary terms because, being the initial datum u 0 = 0 in L 2,m for every m > 0, the function u is in X m for every m > 0. Therefore u and vu √ 1 + v 2m are in H 1 , so that the boundary terms in the integration by parts disappear. As for 9 This can be done with calculations very similar to those that we will show in detail below to prove uniqueness, so we omit them. 10 The the first term in (A.5), using the so-called Young's inequality with ǫ one has, 11 (A.7) The last term in (A.6) can be estimated analogously. Choosing ǫ small enough (and possibly λ large enough) and putting everything together one obtains 0 ≥ d u X m , where the constant d > 0 depends on λ, ǫ, b ∞ , c and σ. This implies u X m = 0. Finally, if u 0 ∈ L 2,m (which implies that u ∈ X m ) then, by definition, b∂ v u, ∂ vv u, u ∈ (X m ) * ; as for v∂ v u, this is in (X 1 ) * ⊂ (X m ) * . Overall, one has The proposition is thus proved. Proof of Lemma 3.4. Again, we just need to prove the result when c = 0 (see point (i) of Remark 3.2). If c = 0 the statement can be proved by acting similarly to what has been done in [26,Prop. A.4], and we will therefore be, once again, brief. We stress that, like in [26], for this proof we work in a flat space, with flat scalar products (see Remark A.1). In [26] the sought result is proved under the assumption that the coefficient a(t, x, v) of Eq. (A.2) has zero v-divergence. Moreover, the xdomain is the whole of R. In particular, the proof of the Duhamel formula relies on [26, Eqs. (60)-(62)]. [26,Eqs. (60) and (62)] remain unaltered (even if we change the x-domain). As for [26,Eq. (61)], upon inspecting the strategy of proof used in [26], the result follows if one proves that the left-hand side of [26,Eq. (61)] is non-negative (it doesn't need to vanish). In our case, Eq. (61) reads, 12 (A.8) 11 By acting in this way, one avoids to do integration by parts in this term, so the boundedness of ∂vb is not needed. 12 We recall that in [26] the notation f, g X ′ ,X simply stands for the pairing f, g X ′ ,X = T 0 dt T dx R dv f (t, x, v)g(t, x, v).
where ψ ǫ is defined like in [26, page 540]. That is, ψ ǫ (u) is a smooth function of u with ψ ǫ (u) = 0 if u ≤ 0, ψ ǫ (u) = 1 if u ≥ ǫ and ψ ǫ is increasing in [0, ǫ]. Letting ϕ ǫ be a primitive of ψ ǫ , i.e., (A.9) ϕ ′ ǫ := ψ ǫ , one has that ϕ ǫ (u) is non-negative, i.e., ϕ ǫ (u) ≥ 0 13 and it converges weakly to u + . Using (A.9), the first term on the right-hand side of (A.8) becomes, The boundary terms in the above integration by parts vanish thanks to the properties of the solution u, which is in H 1 v , so that overall the right-hand side is nonnegative (as ∂ v b ≤ 0). The second term in (A.8) can be treated similarly: integrating by parts 14 we have, Again, the right-hand side is positive. This is enough to conclude the proof.
Proof of Lemma 3.8. We prove the estimates (3.14), (3.15), and (3.16). Throughout this proof C > 0 will be a generic constant, so the value of C may change from line to line. Proof of (3.14). After some calculations, one finds that the function Y n,m 13 This is true by the definition of ψǫ. 14 In this integration by parts the boundary terms disappear as uv ∈ H 1 when the initial datum is at least in L 2,2 and ψǫ(u) is bounded.
Because A m (v) ∼ v m−1 and A ′ m (v) ∼ v m−2 (and they are both bounded on compact sets), all the functions of v that multiply |Y n,m t | in the addends on the right-hand side of the above, are indeed bounded functions. Therefore,