A mean-field model with discontinuous coefficients for neurons with spatial interaction

Starting from a microscopic model for a system of neurons evolving in time which individually follow a stochastic integrate-and-fire type model, we study a mean-field limit of the system. Our model is described by a system of SDEs with discontinuous coefficients for the action potential of each neuron and takes into account the (random) spatial configuration of neurons allowing the interaction to depend on it. In the limit as the number of particles tends to infinity, we obtain a nonlinear Fokker-Planck type PDE in two variables, with derivatives only with respect to one variable and discontinuous coefficients. We also study strong well-posedness of the system of SDEs and prove the existence and uniqueness of a weak measure-valued solution to the PDE, obtained as the limit of the laws of the empirical measures for the system of particles.


Introduction
We propose a model for the action potential of N neurons, with positions fixed in time, that follow integrate-and-fire type dynamics subject to noise and interact with each other through their spikes. The interaction we consider depends also on the positions of the neurons and is of mean-field type. Therefore, in the limit as N tends to infinity each neuron interacts with infinitely many other neurons. The presence of noise in the neuronal dynamics is experimentally confirmed and has been considered by various authors (see the monographs [GK02], [Tuc88]). Integrate-and-fire (IF) models describe a simplified dynamics in which such effect can be studied in detail. Considering large networks of interacting neurons, each one having a membrane potential that evolves following a IF dynamic, leads to modeling the mean-firing rate of the network as the solution to a nonlinear partial differential equation that, at least for mean-field type interactions, is of Fokker-Planck type. Fokker-Planck PDEs for neural networks have been studied recently in [OBH09], [FTB05], [CCP11], [CP14], [DIRT15b], based on an IF model for the potential of each neuron given in [LR03]. As pointed out in [DIRT15b], not much attention is paid in the literature to how the Fokker-Planck PDE is obtained; in particular one expects that the empirical measures of a network with N neurons converge as N → ∞ to the solution of the PDE. This has been rigorously shown only in [DIRT15a], proving convergence to a McKean-Vlasov stochastic differential equation, and in [DMGLP15], where the hydrodynamic limit is considered. The Fokker-Planck PDE obtained in the cited works exhibits blow-up in finite time, thus there is no global well-posedness, for certain ranges of parameters, due to the interaction term. The model we propose here is simpler but it incorporates two additional aspects: a refractory period after the spike and a localized version of the interaction term, that is, an explicit dependence of the interaction on the positions of the neurons. The refractory period accounts for the fact that after emitting a spike, each neuron is inhibited from interaction. The dependence on a space variable allows to precisely prescribe the interation between different parts of the network; it can also describe the subdivision of the network in sub-populations, whose interaction with each other is of particular interest in neuroscience (see e.g. [KMS96], [SDG17]). This leads to a description of finite speed signal propagation along the network. More precisely, our mean-field interaction term has two main features: first, it depends both on the positions and on the voltage of the neurons, unlike many models available in the literature; second, it contains indicator functions of suitable intervals in R, thus requiring us to study a system of SDEs and a Fokker-Planck type PDE with irregular coefficients and dependence on the positions of the neurons that we treat as stochastic parameters. We allow for great generality in the choice of the law of the positions of the neurons, only requiring finite first moment. Hence one can prescribe the geometry of the neural network choosing the law accordingly. We study the limit behaviour of the empirical measures of the network and prove that the limit measure-valued function is the unique weak solution to a nonlinear PDE of Fokker-Planck type and that it exists up to any fixed time T , thus not exhibiting blow-up. From the technical point of view, to study the limit of the empirical measures we will also use some ideas of [Oel89]. Our model includes discontinuous coefficients, and is therefore a first step in the study of stochastic interacting particle systems with irregular coefficients. Some of the results we obtain can be immediately generalized to the case of SDEs with measurable and bounded coefficients, but we are able to study the limit PDE only when the coefficients are discontinuous on a set with 0 Lebesgue measure (see, in particular, Lemma 5.5). Therefore we stick in the main part of the paper to the particular coefficients coming from the model, and mention some possible generalizations in Appendix.
The potential V of each neuron is modeled, as a function of time, with a stochastic differential equation whose solution is projected on a torus given by the interval [0, 2] with the identification 0 ≡ 2. This choice allows to model the cycle of spikes of each neuron as we describe below. It is important to notice that, similarly to what is done in most IF models, we do not give a precise description of the spike phenomenon, but we model only the charging phase from the resting potential v R = 0 to the firing threshold v F = 1 and the refractory period after the spike; moreover we assume that there are no external input currents.
Consider a single cycle, that is 0 ≤ V ≤ 2. As 0 ≤ V < 1 the neuron charges, subject to spikes by nearby neurons (i.e., to interaction), to randomness and to the effect of discharge with constant rate (that corresponds to the fact that if no spikes happen in the connected neurons then some charge is lost as time passes); when V reaches the threshold value 1 the neuron fires and emits a spike into the network. On a real neuron this would have two effects: the potential would rapidly decrease below 0 and then be restored to 0, and the neuron would be at rest, inhibited from interacting and spiking for a small amount of time (the refractory period). We model this effect "switching off" the interaction term when V > 1 and letting V evolve as dV = dt until it reaches the value 2, where it is restored to 0 (through the equivalence relation that defines the torus) and the charging cycle begins again. Therefore the values of V between 1 and 2 do not correspond to a real life situation but are only a tool we resort to in order to have a convenient mathematical description of the phenomenon.
To consider the interaction between N neurons we deal with three factors (see also equation (2) below). Indeed if we consider the voltage V i,N and position X i of the i-th neuron, following the description above, a factor θ(X i , X j ) accounts for the neuron being connected to some of the other neurons with positions (X j ) j ; a factor 1 [0,1] (V i ) is due to the fact that the neuron feels the interaction only if it is in the charging phase; finally a factor 1 [1,1+δ] (V j ) is due to the fact that the interaction considers contributions to the charging process only from neurons that have just had their spike (δ ∈ (0, 1)). The choice of the values 0, 1 and 2 is completely arbitrary, and is just used for our mathematical description; we also do not specify explicitly the form of some of the functions involved, since we only need to make assumptions on their regularity.
A possible more accurate model of the inhibition phase could require that also the noise term be switched off during the refractory period, that is, in our setting, as V becomes larger than 1. We are forced to include a small noise also in the inhibition phase, for mathematical reasons (i.e., we need below to be strictly positive, see in particular Theorems 2.2 and 3.3 and Lemma 5.1). The effect of oscillations due to the noise at the transition between the active phase and the inhibition phase appears to be negligible on macroscopic scales, thanks to the drift (see for example figure 1b). On the other hand, the analysis of a model in which noise contributes only to the charging phase is mathematically extremely interesting, and we will face it in a future work. Now we will introduce precisely the equations describing the model and will give an account of our results and of the following sections. We also include some figures obtained simulating our model for a finite number of interacting neurons, showing that, even if simple, the model we propose gives a realistic description of single neurons and networks.

The model
For a Borel set A in an euclidean space, we will denote by L A the Lebesgue measure restricted to A. Let (Ω, F, P) be a probability space, let D be an open connected domain in R 3 and [0, T ] ⊂ R a time interval. The microscopic model is as follows: for each N ∈ N consider N neurons, each identified by random variables with finite first moment and such that ∀i P ξ i ∈ D = 1. We denote by ν the law of each ξ i . Since the neurons do not move, their position is modeled by the system of trivial equations ii. its action potential given by We assume that all random variables η i are i.i.d with lawρ 0 L [0,2) andρ 0 ∈ L 2 (0, 2). Moreover, we assume that {ξ 1 , ..., ξ N , η 1 , ...η N } are independent for any N ∈ N .
The functions appearing above are given by θ(x, y) is a bounded uniformly continuous function on D × D ; For each i ∈ N the processes B i t are independent real-valued Brownian motions, independent of ξ i and η i , and δ is a fixed real number in (0, 1).
One could use as λ any bounded function on [0, 2) that has a jump discontinuity in v = 1 and is continuous elsewhere; all the results herein apply in this case with no modifications in the arguments, therefore we stick to the simple case given just above. We will show that for each N the system of equations (2) has a unique solution V i,N i=1,...,N with V i,N having continuous trajectories in R. This forces the trajectories of V i,N (mod 2) to have jump discontinuities at every t such that V i,N t ∈ 2Z . However continuity is easily restored seeing V i,N (mod 2) as a process with values on the torus T := R/2Z (where 2Z is seen as a subgroup of translations on R). This corresponds to considering the interval [0, 2] with the identification 0 ≡ 2. Moreover T is homeomorphic to 1 /πS 1 ⊂ R 2 , the circle with radius 1 /π.
On T we consider the metric where on the right-had side v 1 and v 2 are seen as elements in [0, 2) ⊂ R; this corresponds to the shortest-path (or geodesic) metric, which is the arc-length on S 1 . This metric induces the quotient topology on T . We will always consider the Euclidean metric on D and endow D × T with the product metric, denoted by d D×T .
The choice to represent solutions on the torus is natural since the coefficients we introduced above are clearly 2-periodic. To stress periodicity and also to lighten notation for v ∈ R and x, y ∈ D we define the functions so that equation (2) takes the more readable form Since T is homeomorphic to 1 /πS 1 ⊂ R 2 , one can define the Lebesgue measure on T as the push-forward of the Lebesgue measure on [0, 2) through the map t → (cos(πt), sin(πt)); since T is endowed with the quotient topology, any measure on the Borel sets of T can be obtained in this way. Therefore we can interpret a Borel measure on D×T as a Borel measure on D×[0, 2), and we will do so henceforth. Notice that any Borel measure on T defines a Borel measure on the whole R by 2-periodic replication; we will not distinguish between the two in the sequel. We will show that the solution to (4) has a density which is 2-periodic, thanks to the form of the coefficients; hence this density can be identified with a Borel measure on D × T . Let S N t denote the empirical measure and set for every x, y ∈ D and v ∈ R σ 2 (x, v) := σ 2 (v) , θ(x, y, v) := θ(x, y) and λ 2 (x, v) := λ 2 (v) for later use (see for instance (8)). To any function φ on D × T which is continuous corresponds a unique continuous function on D × R that is 2-periodic with respect to its second variable, given by (x, v) → φ(x, v (mod 2)); in the sequel we will identify these two functions and denote by φ also its 2-periodic representation on D × R. With this convention, for any smooth and compactly supported function φ on D × T we have where we use the notation → 0 as N → ∞ (due to the stochastic integrals being uncorrelated). If we suppose that the sequence of random measures S N t ( dx, dv) converges in probability (in a suitable space) to a probability measure ρ t (x, v) dx dv on D × T , then, heuristically, a passage to the limit in N suggests that ρ t solves weakly the partial differential equation of Fokker-Planck type In the sequel we will prove rigorously a similar assertion involving measures µ t instead of densities ρ t .

Main results
The main aim of the paper is to show that the empirical measure actually converges in a weak sense to a limiting probability measure µ t ( dx, dv) such that the marginal with respect to v has a L 2 -density and which is the unique solution to the above PDE (8). This is the content of Theorem 5.7. The paper is organized as follows. In Section 2 we will prove strong well-posedness for the system of SDEs (2); a modification of the standard theory for finite-dimensional SDEs with bounded and measurable drift is needed here to deal with the dependence of the equations on the random variables X i 0 . In Section 3 we define a weak measure-valued solution and show that the PDE (8) has at most one such solution.
To show existence of a solution to the PDE we first prove that the laws Q N of the empirical measures of (X i , V i,N (mod 2)) (see (5)) are tight as probability measures on the space of continuous measure-valued functions of time (Section 4). Then we prove that any limit point Q of Q N gives full measure to the set of functions with values that are continuous measures with marginal with respect to v having a L 2 -density, that Q is supported by the set of weak measurevalued solutions to the Fokker-Planck PDE and that actually all the sequence Q N converges to the same limit. This provides existence of a solution and is discussed in Section 5. Section 6 briefly shows that well-posedness of the Fokker-Planck PDE implies existence of a unique strong solution to the McKean-Vlasov SDE associated with the particle system. We conclude with an appendix giving some immediate generalizations of our results together with some indications on how the proofs have to be adapted to this more general setting.

The system of particles
Consider independent Brownian motions B i , i ∈ N and assume that the random variables η i introduced in the previous section are i.i.d. and independent from all augmented with the P-null sets. We also denote by G the σ-algebra σ ξ i , i ∈ N and assume that for any t ≥ 0, G and F 0 t are independent. Finally we introduce the filtration Let us write our system of equations in vector form: we fix N ∈ N and introduce, for the Setting Ξ = ξ 1 , . . . , ξ N , Ψ = η 1 , . . . , η N , we want to show existence and uniqueness of strong solutions to The classical reference for existence of a strong solution for SDEs with bounded measurable drift is [Ver81]. However, the results proved therein do not apply directly to equation (9), because they do not guarantee the measurable dependence of the solution V on the stochastic parameter Ξ. Therefore we introduce the following definition.
Let us denote by S the Banach space of all continuous paths from [0, T ] into R N endowed with the supremum norm |·| S . We also denote by ν N the law of the random variable Ξ on the Borel σ-algebra of D N .
The above definition is motivated by the fact that if (V x t ) is a strong solution, then the F tadapted process (V Ξ t ) satisfies equation (9) P-almost surely for any t ∈ [0, T ]. In fact, for Ξ random variable as above, the process (V Ξ t ) is well-defined with values in R N , has continuous paths and is F t -adapted.
To prove that V Ξ t satisfies (9) it is enough to compute, using conditional expectation with respect to G and independence, Theorem 2.2. For every N ∈ N there exists a strong solution to (4). Two strong solutions on the same probability space associated to the same initial condition Ψ are indistinguishable for admits a unique strong solution V x by the results proved in [Ver81]. We now clarify the measurability of V x with respect to x. One difficulty is that the proof of the main well-posedness results in [Ver81] is based on the Yamada-Watanabe theorem and is indeed abstract and noncontructive. This is why to prove such a measurability property we follow the approach in [GK96]. Fix a sequence of partitions {π n } n∈N of [0, T ], where each π n is given by points 0 = t n 0 < t n 1 < · · · < t n n = T , set κ n (t) = n−1 i=0 t n i 1 [t n i ,t n i+1 ) (t) and consider Euler's approximations to equation (10) given by (11) In the next steps, we will use that, for any n ≥ 1, the processes V x,n enjoy all the measurability properties we need. By Theorem 2.8 in [GK96] we know that, for any x ∈ D N and δ > 0 Note that, for any p ≥ 2 (using the boundedness of the coefficients of the SDEs) there exists C p > 0 (independent of n and x ∈ D) such that, for any n ≥ 1, and using also the Hölder inequality, we easily deduce that, for any x ∈ D N , Hence, for any x ∈ D N , V x,n converges to V x in the Banach space L 2 (Ω; S) and so the mapping i.e., (V ·,n ) converges to V · in Z = L 2 (D N , ν N ); L 2 (Ω; S) . It follows that (V x,n ) is a Cauchy sequence in Z. Using that, for any n, m ≥ 1, It follows that, for a.e. x ∈ D N , we haveṼ x = V x in S, P-a.s. (we have obtained a version of the strong solution which has the required measurability properties with respect to x). Uniqueness. It follows directly from the celebrated Veretennikov result.
Remark 2.3. To show existence and uniqueness of a solution one could weaken the assumptions on the regularity of σ , similarly to what is done in the references [Ver81] and [GK96]. What is needed above is that pathwise uniqueness holds for equation (10), and there are many wellknown conditions assuring that this happens. However at a later stage in the paper (Section 5) we will need to assume that σ is differentiable with bounded derivative. This does not seem to be a limitation on the model, since there is no reason to assume that the diffusion coefficient be particularly rough.

The limit PDE: uniqueness of measure-valued solutions
Let Pr(D × T ) be the space of Borel probability measures over D × T . Let Pr 1 (D × T ) ⊂ Pr(D × T ) be the space of probability measures over D × T with finite first moment, endowed with the 1-Wasserstein metric W 1 . Suppose as above that the empirical measures S N t converge in a weak sense to a probability measure µ t on D × T . Without assuming that µ t has a density, we expect that it solve the PDE with initial condition ν ×ρ 0 L T , meaning that µ 0 ( dx, dv) = ν( dx)ρ 0 (v)L T ( dv). Fix T > 0; we will denote by C the space and for a measure ζ ∈ Pr(D × T ) we will adopt the notation throughout the rest of the paper. In the sequel we will often use that if g : R → R is 2-periodic and differentiable on R then its derivative is also 2-periodic (thus g can be identified with a differentiable function on T ). Recall the Banach space B b (D × T ) consisting of all Borel and bounded functions f : D × T → R endowed with the supremum norm · ∞ . We will also consider consisting of all bounded continuous functions. We introduce the space of test functions Definition 3.1. We say that µ ∈ C is a weak measure-valued solution of the nonlinear Fokker-Planck equation (16), with initial condition for every test function φ ∈ T .
Consider now the total variation distance on Borel probability measures on D × T Remark 3.2. Let µ 1 , µ 2 ∈ C . One can show that the mapping To this purpose we first remark that, for given probability measures ν 1 and ν 2 on Borel sets of D × T , one has and has compact support). As before if , by truncating f and by considering standard mollifiers (defined on R 4 ) we can find a sequence and the Lebesgue convergence theorem we obtain that the previous formula holds even when f n is replaced by f ; this leads to (19).
Then we show that there exists a countable set As a simple consequence we get that To prove assertion (20) and so there exists a countable set K n ⊂ F n which is dense in F n . To finish we define K ∞ = ∪ n≥1 K n .
Proof. Given a function f : T → R we still denote by f its 2-periodic version defined on R. Let ψ ∈ T and define the operator It is well known that A is the infinitesimal generator of a diffusion semigroup T t : where the density p t (·, v ) ∈ C 2 (R), for any v ∈ R, t > 0 (see, for instance, Chapter 6 in [Fri75]). Moreover, for t > 0, are continuous functions on R 2 . In addition, for any g ∈ C b (D × R), t ∈ (0, T ), we have Finally Since in our case σ 2 is also 2-periodic, it is not difficult to prove that, for t > 0, p t is 2-periodic in both variables, i.e., are 2-periodic continuous functions in both variables. Hence, in particular, T t ψ ∈ T if ψ ∈ T , t ≥ 0. One can prove that µ ∈ C is a weak solution to (16) if and only if it is a mild solution, i.e., if and only if We only show that any weak solution is a mild solution (this is the part we need to prove our uniqueness claim). We fix φ and t > 0. Differentiating with respect to s ∈ (0, t) the mapping Integrating with respect to s on [0, t] we find the assertion. Now we prove the claim of the theorem. Let µ 1 , µ 2 ∈ C be two solutions to (16) with the same initial condition µ 0 ; then for every t (in the sequel we can consider the supremum over φ ∈ K ∞ ⊂ C ∞ c such that φ ∞ ≤ 1 as in the previous remark) The function b(µ 2 s )∂ v T t−s φ is bounded and measurable and we have the estimate (cf. (23)) we can thus bound the term (26) by Similarly, (25) is bounded by An application of a generalized version of Gronwall's lemma (see, for instance, [Hen81, Section 1.2.1]) yields that µ 1 t = µ 2 t for every t.

The laws of the empirical measures
We denote by Q N the law of S N on C (we are considering each S N as a r.v. with values in C ). As explained in the introduction, we need to show tightness of the family Q N .
Theorem 4.1. The sequence Q N N ∈N is tight in C . Proof. Fix any (x 0 , v 0 ) ∈ D × T and consider the set where we choose α ∈ (0, 1) and p ≥ 1 such that αp > 1. We show that K M,R is relatively compact in C . Let B (x 0 ,v 0 ) (r) denote the open ball with radius r and center (x 0 , v 0 ) in D × T . Then for µ ∈ K M,R and t ∈ [0, T ] Therefore for every e > 0 and for every t ∈ [0, T ] we can find r = r(e, t) such that for every µ ∈ K M,R . By the Sobolev embedding theorem, if β < (αp − 1)/p, we have that, for any Lipschitz continuous function φ on D × T and any t, s ∈ [0, T ], so that, thanks to Kantorovich-Rubinstein characterization of the 1-Wasserstein distance, we can take the supremum over Lipschitz functions on D × T with Lipschitz seminorm bounded by 1 on both sides of the previous inequality, obtaining Therefore the collection of measures K M,R is equicontinuous; this together with (27) implies relative compactness by the Ascoli-Arzelà theorem.
To show tightness we now compute For the first term we have P sup for a certain constant C = C(λ, θ, σ , T ), thanks to the Burkholder-Davis-Gundy inequality, the boundedness of λ 2 , g 2 and σ 2 and the fact that ν ×ρ 0 L T has finite first moment. For the second term Let φ be a Lipschitz function on D × T with Lipschitz constant K φ ≤ 1. Then so that, by the Kantorovich characterization of the 1-Wasserstein distance and by Hölder's inequality, Recalling (3), (4) and notation (17), we can write for a suitable constant C = C (λ, θ, σ ), again by boundedness of the coefficients and the Burkholder-Davis-Gundy inequality. Choosing p > 2 and α such that αp < p /2 − 1 we find For any e > 0 we can now choose M and R so that Q N K M,R < e, concluding the proof.
An alternative approach to prove theorem 4.1 could be based on tightness results from [Szn89, Chapters I and II], using the boundedness of the coefficients and the interchangeability of the V i,N . However, the above direct proof can be applied to more general situations as well.
5 The limit PDE: existence and convergence

Density Estimates
Recall that the empirical measure We consider a smooth probability density γ : T → R defined as follows: and introduce a correspondent family of mollifiers γ N (v) = α −1 N γ α −1 N v on T . Note that there exists C > 0 such that Concerning the positive scaling factor, we assume that α N → 0 as N → ∞ and (where sums and differences are understood on T , i.e. for v 1 , v 2 ∈ T , v 1 ± v 2 = (v 1 ± R v 2 ) (mod 2) ∈ T ). It satisfies and according to (7) we write In the next lemma we will use that σ 2 (v) is differentiable with bounded derivative and Lemma 5.1. There exists a constant C > 0 such that Proof.
Step 1 (energy identity). One has by Itô's formula, integrating by parts, Step 2 (deterministic terms). Using the assumptions on σ 2 (v), one has integrating by parts Since (due to the boundedness of λ 2 and g 2 ) We have got that P-a.s.
Step 3 (martingale terms and conclusion). It remain to handle the sum dv is a martingale, hence it has mean zero. Indeed, for every N and i = 1, ..., N , As to the corrector, we have under the assumption α −3 N ≤ N . Using the assumption that the law of the initial data η i has an L 2 density, it is not difficult to show that the L 2 norm of u N 0 is bounded uniformly with respect to N . To this purpose let us recall that we denote byρ the density of each η i . Using also standard property of convolutions we get: where C > 0 is independent of N . We can therefore take expectation and apply Gronwall's lemma, thus deducing the claim from the results of the two previous steps.
Proof. Arguing as in Lemma 5.1, we have, P-a.s., for any 0 ≤ s ≤ t ≤ T , φ ∈ H 2 (T ), Step 1 of the previous lemma. Then (using the same inequalities proved above in Step 2 of the previous lemma) It is sufficient (because of the claim of the previous lemma) to prove that Recall that where we have used the estimate T |γ N (v)| 2 dv ≤ Cα −1 N and the assumption α −3 N ≤ N . The proof is complete. Now let Q u N denote the law of the process u N . From the previous two lemmas, we deduce that the family (Q u N ) is tight in L 2 0, T ; L 2 (T ) due to a generalized version of Aubin-Lions lemma, which claims that the space is relatively compact in L 2 0, T ; L 2 (T ) , for α > 0 (cf. [Sim87]).
Remark 5.3. Introducing the mollifiers γ n (v) = α −1 n γ α −1 n v , n ≥ 1, with |γ (w) w| ≤ Cγ (w) , w ∈ T , α n → 0 as n → ∞, and α −3 n ≤ N, and following the proof of Lemma 5.1, we can obtain that there exists a constant C > 0 such that . Now note that given a Borel probability measure ν on T , if there exists c > 0, such that, for any n ≥ 1, then ν ∈ L 2 (T ) and ν L 2 (T ) ≤ c. Indeed, by (29), for any φ ∈ L 2 (T ), we have Passing to the limit as n → ∞ and using the Riesz theorem we get the assertion. Estimate (28) and the previous argument could be used to prove existence of solutions to (16) inX (see the next section) avoiding the previous Aubin-Lions lemma.

Convergence and existence of solutions
Set for notational convenience where π v µ t is the marginal on the v-component of µ t : Lemma 5.4. The spaceX is a Borel subset of C .
Proof. It is enough to show that Λ = µ ∈ Pr 1 (D × T ) : We consider the continuous mapping J : Pr 1 (D × T ) → Pr 1 (T ) given by J µ = π v µ, for any µ ∈ Pr 1 (D × T ). If we prove that is Borel in Pr 1 (T ) then we get that Λ = J −1 (Γ) is Borel and this finishes the proof. Let us check the assertion on Γ.
Let µ ∈ Pr 1 (T ) . Using the Riesz theorem and the fact that C(T ) is dense in L 2 (T ), we know that µ ∈ Γ if and only if there exists c > 0 such that (indeed if (31) holds for µ ∈ Pr 1 (T ) then µ can be uniquely extended to a linear functional on L 2 (T )). Let us define, for integers N ≥ 1, Γ N = {µ ∈ Pr 1 (T ) : (31) holds with c replaced by N } It is easy to check that each Γ N is closed in Pr 1 (T ). We have Γ = N ≥1 Γ N and this shows that Γ is Borel.
For any test function φ ∈ T and any µ 0 ∈ Pr(D × T ) define on C the functional Lemma 5.5. For every φ ∈ T and every µ 0 ∈ Pr(D × T ), the bounded Borel measurable functional Φ µ 0 φ : C → R is continuous at every point of X. Therefore, if Q N N ∈N and Q are probability measures on C such that Q N → Q weakly and Q X = 1, then Proof. Since in the definition of Φ µ 0 φ we can consider the supremum over rational numbers in [0, T ], to prove the measurability of Φ µ 0 φ we can fix t ∈ [0, T ] and study separately the measurability of three functionals: for µ ∈ C . Note that Φ 1 and Φ 3 are even continuous mappings on C . Concerning the measurability of Φ 2 we first approximate pointwise the functions λ 2 and g 2 by regular functions λ n 2 and g n 2 (indeed λ 2 (·) and g 2 (x, ·, y, ·) have only simple discontinuities) and then consider the corresponding functions b n given by It is not difficult to prove that for each n the functional Φ n 2 : C → R, is continuous on C . By the dominated convergence theorem we deduce that Φ n 2 (µ) → Φ 2 (µ) as n → ∞, for any µ ∈ C . This shows that also Φ 2 is measurable.
Let now µ ∈ X and µ n ∈ C be given with µ n → µ in C . This implies µ n t → µ t in weak sense, hence µ n t , φ → µ t , φ , uniformly in t ∈ [0, T ]. The convergence of µ n s , T ] is similar and, by Lebesgue dominated convergence theorem, the last integral in the definition of Φ µ 0 φ converges, uniformly in t ∈ [0, T ]. It remains to prove that the first integral converges. Again by Lebesgue dominated convergence theorem, the problem is reduced to prove that, for a.e. s ∈ [0, T ], . This is more difficult since λ 2 and g 2 contain discontinuities. Since µ ∈ X, we know that π v µ s L T for a.e. s ∈ [0, T ], thus in the sequel we restrict to such values of s. Let us first explain why The is bounded; and it is continuous except on the set These sets are exceptional for the measure µ s : Now, the following fact is known: if a sequence of probability measures ρ n on a Polish space Y converges weakly to a probability measures ρ and f : Y → R is a bounded Borel measurable function, continuous on a set Y ⊂ Y with ρ Y = 1, then Y f dρ n → Y f dρ. The proof is easy using Skorohod representation theorem. We apply this fact with Y = D × T , ρ n = µ n s , ρ = µ s , Y = S c , f = λ 2 ∂ v φ and deduce (34).
Finally, let us explain why The previous difference can be rewritten as the sum of two terms: The convergence to zero of the second term is similar to (34), because the function is continuous on S c . To treat the first term in the sum, we first fix τ > 0. By the weak convergence, we know that (µ n s ) is tight and so there exists a compact set To check (35) we have to prove that g n → g uniformly on K . We know it converges pointwise, by the same argument used above for (34), because the function T v → 1 [1,1+δ] (v ) is continuous apart in v = 1 and v = 1 + δ. Uniform convergence then follows from the fact that the family {g n } is equi-bounded and equi-uniformly continuous; the last fact is a consequence of the assumption that θ is uniformly continuous on D × D.
This completes the proof of the first claim of the lemma. The second claim is a simple consequence using the convergence criterion recalled above in this proof, applied with Y = X, Y = X, ρ n = Q n , ρ = Q, f = Φ µ 0 φ .
Lemma 5.6. Recall that Q N are the laws on C of the empirical process S N . If Q is a weak limit point of any subsequence of Q N then Q X = 1. Proof.
Step 1. We have already proved not only tightness of the family Q N in C (see Section 4) but also tightness of the family of laws of u N in H : dv ) (see the end of Section 5.1). Consider the pair S N , u N with values in C × H; their laws ρ n = L S N , u N form a tight family in C × H. Given a weak limit point Q of Q N , there is thus a subsequence N k such that ρ N k converges weakly to a probability measure ρ on C × H, with marginal Q on C .
By the Skorohod embedding theorem there exists a new probability space Ω, F, P , C × H-valued random variables S N k , u N k and S, u , with laws ρ N k and ρ respectively, such that S N k , u N k → S, u in the strong topology of C × H, with P-probability one. Notice that the law of S is Q .
Step 2. Let D be a countable dense family in C (T ). Let L [0,T ] be the Lebesgue measure To prove this claim, let us start from the definition Note that this implies that It follows that given N k , with P-probability one, as an identity in H. We can also say that P ⊗ L [0,T ] -almost everywhere, for any k ∈ N the previous identity holds in L 2 (T ). Therefore given φ ∈ D, we have Up to passing to a subsequence N k we can assume that u N k converges to u in the strong topology And, for P-a.e. ω ∈ Ω, for every t ∈ [0, T ], S N k t converges in the weak topology of probability measures to S t , so where we have used the property because φ is continuous and γ N are mollifiers on the torus. This proves (36).
Step 3. Since D is countable, property (36) holds true uniformly in φ ∈ D. Hence we can say that, for P L T with density in L 2 (T ). This implies that for P-a.e. ω ∈ Ω, we have the property that π v S t ( ω) Hence P ω ∈ Ω : S ( ω) ∈ X = 1, which implies Q X = 1 (recall that the law of S is Q ). The proof is complete.
Let us eventually study the nonlinear Fokker-Planck equation (16), that is with initial condition µ 0 , where b (µ t ) is given by (17).
Theorem 5.7. Let µ 0 = ν ×ρ 0 L T with ν ∈ Pr 1 (D) andρ ∈ L 2 (0, 2) (cf. Section 1.1). Then: i) the nonlinear Fokker-Planck equation (16), with initial condition µ 0 , has one and only one weak measure-valued solution µ ∈ C ; this measure belongs to the space X (see (30)); ii) let Q N be the laws on C of the empirical process S N : then Q N converges weakly to δ µ . Moreover, S N converges in probability to µ, in the topology of C .
Proof. Let us consider the sequence (γ N ) of Section 5.1. Note that the L 2 norm of is bounded uniformly in N . From Section 4 we know that the family Q N is tight. Let Q N k be a weakly convergence subsequence, with limit Q . We are going to prove below that Q is supported by the set of weak measure-valued solutions of equation (16), with initial condition µ 0 . This implies existence of at least one such solution. Uniqueness has been proved in Section 3; recalling Lemma 5.6, we then immediately have claim (i). Moreover, we also have Q = δ µ , by the uniqueness of weak measure-valued solutions; therefore, since any weak limit point of Q N is the same measure δ µ , it follows that the full sequence Q N converges weakly to δ µ . Since S N converges in law to a constant, it also converges in probability (in the topology of C ).
It remains to prove the claim made above that Q is supported by the set of weak measurevalued solutions of equation (16) with initial condition µ 0 , i.e., the following set (cf. (32)) Σ = µ ∈ C : Φ µ 0 φ (µ) = 0 for all φ ∈ T is a Borel subset of C and Q (Σ) = 1. Arguing as in Remark 3.2 it is not difficult to show that given φ ∈ T there exists a sequence Hence by the dominated convergence theorem we have Moreover, since there exists a countable set H 0 ⊂ C 2 c (D ×T ) such that for any φ ∈ C 2 c (D ×T ) we can find a sequence (φ k ) ⊂ H 0 satisfying we obtain that Σ = {µ ∈ C : Φ φ (µ) = 0 for all φ ∈ H 0 } which is a Borel subset of C. To finish the proof we need to prove that for every φ ∈ H 0 . Using identity (6) we have Therefore, from Doob's inequality and the boundedness of σ 2 and ∂ v φ, for some constants generically denoted by C > 0, we have which implies (38) and completes the proof.

The McKean-Vlasov SDE
Similarly to what is done in Section 2, let B be a standard real Brownian motion defined on (Ω, F, P), let η be a T -valued random variable independent of B with densityρ 0 ∈ L 2 (0, 2) and denote by G 0 t the augmentation of σ(B s , 0 ≤ s ≤ t) ∨ σ(η) with the P-null sets. Let then ξ be a random variable with values in D and law ν, having finite first moment, independent of G 0 t for every t. Finally denote by G t the completion of G 0 t ∨ σ(ξ). Let us consider the so-called McKean-Vlasov SDE associated with the system (10) In analogy with Definition 2.1, we say that a strong solution to equation (39) is a family of continuous G 0 t -adapted processes (V x t , µ x t ), x ∈ D, with values in R×Pr 1 (D ×T ) such that the mapping: ) is a well-defined continuous G t -adapted process with values in D × R × Pr 1 (D × T ) which satisfies equations (39) P-almost surely for any t ∈ [0, T ].
By the same arguments as in Section 2 it can be proved that there exist a unique strong solution (Vμ ,x ) x to the SDE Hence for X 0 as in (39) we have that Vμ := Vμ ,X 0 satisfies Now choose asμ the unique weak solution in C to the nonlinear Fokker-Planck PDE (16) with initial conditionμ 0 = ν ×ρ 0 L T ; given the associated process Vμ as above, denote by µ t the law of the vector (ξ, Vμ t ). Then µ is a solution in C to the linear PDE Since there is uniqueness of measure-valued solution to the latter (the proof being a simplified version of that of uniqueness in the nonlinear case; see Theorem 3.3), and clearly alsoμ is a solution, we necessarily haveμ = µ and (V, µ) is a strong solution to (39).
Let now (V ,μ) be another solution. Thenμ x solves the Fokker-Planck equation (16) with initial condition δ x ×ρ 0 L T , for ν-almost every x ∈ D, and thereforeμ x = µ x . Finally x t a.s. for every t and ν-almost every x ∈ D, since they both satisfy a SDE like (41) withμ = µ x for which strong uniqueness holds. Hence (V, µ) is the unique solution to (39).

A Appendix: Extension of some results
We state here some further results that are easy generalizations of what we presented so far; we will comment on the proofs when needed.
Notice that in Section 2 neither the particular form of the coefficients nor the fact that they are 2-periodic plays any role. Since the cited results we built our proof on apply to multidimensional SDE with bounded and measurable drift, we immediately obtain the following theorem, with proof identical to that of Theorem 2.2. Here, similarly to what done previously, for k, l, m, n ∈ N we consider a m-dimensional Brownian motions W and independent random variables Ξ = Ξ j j=1,...,n and Ψ = Ψ j j=1,...,n (independent from W as well) with values in E n ⊆ R k n and R l n , respectively (E is an open subset of R k ), all defined on the common probability space (Ω, F, P) and with finite first moment. We assume that the Ξ j 's are identically distributed with law ν and the Ψ j 's are identically distributed with absolutely continuous law ρ 0 L R l , with ρ 0 ∈ L 2 (R l ). We define the σ-algebra E = σ(Ξ), the filtration (A 0 t ) t as the augmentation of σ(W s , 0 ≤ s ≤ t) ∨ σ(Φ) with the P-null sets and the filtration (A t ) t as the completion of A 0 t ∨ E. Finally we fix T > 0. Finer refinements are possible: for example one can treat the cases when T = ∞ and the SDE is to be solved on a domain U ⊂ R l×n , and the assumptions on σ can be weakened (cf. Remark 2.3). We refer to [GK96] for details.
Also when studying our Fokker-Planck PDE the periodicity of the coefficients plays no particular role, nor does the compactness of the torus T . We need anyway some more assumptions on the coefficients b(t, x, y, x , y ), t ∈ [0, T ], x ∈ E n , y ∈ R t×n , x ∈ E, y ∈ R l , and σ(t, y) above, namely: (E1) b does not depend on t, is bounded and uniformly continuous in x and x (uniformly in y and y ) and the set of discontinuities of the map (y, y ) → b(x, y, x , y ) has Lebesgue measure 0 in R l×n × R l , for any x, x ; (E2) σ does not depend on t and belongs to C 1 b (R l×n ; R l×n×m ).
One can repeat the arguments of Sections 3, 4, 5 with minor modifications, working in the space Pr 1 (E × R l ) with the 1-Wasserstein metric, using the Euclidean norm in place of the metric d D×T and choosing all the test functions accordingly. If we solve equation (42) above for Y and consider the empirical measure S n , we can define the empirical densityū n as u n t (y) = E×R lγ n (y − y )S n t ( dx , dy ) where (γ n ) is a family of compactly supported mollifiers in R l .
We say that f ∈ L 2 loc (R l ) if f ∈ L 2 (K) for every compact set K ⊂ R l . Consider a strictly increasing sequence (P j ) j∈N of compact sets in R l such that R l = ∪ j P j ; then is a metric on L 2 loc (R l ). Lemmata 5.1 and 5.2 apply toū n 's as well thanks to assumption (E2), and imply tightness of their laws in the space L 2 0, T ; L 2 loc R l due to a generalized version of Aubin-Lions lemma, which claims that the space L 2 0, T ; W 1,2 R l ∩ W α,2 0, T ; H −2 R l is relatively compact in L 2 0, T ; L 2 loc R l . Let C := C [0, T ]; Pr 1 E × R l . The space X of Subsection 5.2 has to be consequently substituted with X = µ ∈ C : π v µ t L R l with d (π v µ t ) dL l R ∈ L 2 (R l ), for a.e. t ∈ [0, T ] .
To show that X is Borel it is enough to repeat the argument in the proof of Lemma 5.4 using the density of C c (R l ) in L 2 (R l ). The functionals Φ µ 0 φ are now continuous on X, because the proof of Lemma 5.5 can be repeated thanks to assumption (E1). To show that any limit point of the family of laws of the S n gives full measure to X one can repeat the proof of Lemma 5.6, noting that it is enough to check identity (36) for φ ∈ C b (R l ). Now we can repeat verbatim the arguments in the proof of Theorem 5.7 obtaining: Theorem A.2. Let ζ 0 = ν × ρ 0 L R l . Then: i) the nonlinear Fokker-Planck equation ∂ t ζ t + div y (ζ t b, ζ t ) = Tr D 2 y σσ T 2 ζ t (43) with initial condition ζ 0 , has one and only one weak measure-valued solution ζ; this measure belongs to the space X.
ii) Let Q n be the laws on C of the empirical process S n ; then Q n converges weakly to δ ζ . Further S n converges in probability to ζ, in the topology of C.
The extension fo the well-posedness result for the McKean-Vlasov equation given in Section 6 is then straightforward. For related results on strong well-posedness for McKean-Vlasov equations (without dependence on stochastic parameters) see [MV16] and references therein.