Nonlinear Fokker-Planck equations for Probability Measures on Path Space and Path-Distribution Dependent SDEs

By investigating path-distribution dependent stochastic differential equations, the following type of nonlinear Fokker--Planck equations for probability measures $(\mu_t)_{t \geq 0}$ on the path space $\mathcal C:=C([-r_0,0];\mathbb R^d),$ is analyzed: $$\partial_t \mu(t)=L_{t,\mu_t}^*\mu_t,\ \ t\ge 0,$$ where $\mu(t)$ is the image of $\mu_t$ under the projection $\mathcal C\ni\xi\mapsto \xi(0)\in\mathbb R^d$, and $$L_{t,\mu}(\xi):= \frac 1 2\sum_{i,j=1}^d a_{ij}(t,\xi,\mu)\frac{\partial^2} {\partial_{\xi(0)_i} \partial_{\xi(0)_j}} +\sum_{i=1}^d b_i(t,\xi,\mu)\frac{\partial}{\partial_{\xi(0)_i}},\ \ t\ge 0, \xi\in \mathcal C, \mu\in \mathcal P^{\mathcal C}.$$ Under reasonable conditions on the coefficients $a_{ij}$ and $b_i$, the existence, uniqueness, Lipschitz continuity in Wasserstein distance, total variational norm and entropy, as well as derivative estimates are derived for the martingale solutions.


Introduction
In this paper, we investigate nonlinear PDEs for probability measures on the path space using path-distribution dependent SDEs. To explain the motivation of the study, let us start from the following classical PDE on P(R d ), the set of probability measures on R d equipped with the weak topology: for a second-order differential operator where a = (a ij ) : R d → R d ⊗ R d and b = (b i ) : R d → R d are locally integrable. (1.1) is just the (linear) Fokker-Planck-Kolmogorov equation (FRKE) associated to the operator L in the sense of [2]. We call µ ∈ C(R + ; P(R d )) a solution of (1.1), if To construct and analyze solutions of (1.1) using the time marginal distributions of Markov processes as proposed by A. N. Kolmogorov [10], K. Itô developed the theory of stochastic differential equations (SDEs), see e.g. [9]. Let σ be a matrix-valued function such that a = σσ * , and let W (t) be a d-dimensional Brownian motion. Consider the following Itô SDE (1.2) dX(t) = b(X(t))dt + σ(X(t))dW (t).
By Itô's formula, the time marginals µ(t) := L X(t) = the law of X(t) for t ≥ 0, solve the equation (1.1). This enables one to investigate FPKEs using a probabilistic approach. Obviously, (1.1) is a linear equation. In applications, many important PDEs for probability measures (or probability densities) are nonlinear, see, for instance, [4,5,6,7,8,15] and references within for the study of Landau type equations. Such PDEs are also of Fokker-Planck type, but are non-linear (see Sections 6.7 and 9.8 (v) in [2]). To analyze non-linear FPKEs for probability measures, the following distribution-dependent version of (1.2) has been studied in the recent paper [23] by the third named author: (1.3) dX(t) = b(t, X(t), L X(t) )dt + σ(t, X(t), L X(t) )dW (t), are measurable. For any t ≥ 0 and µ ∈ P(R d ), consider the second order differential operator Under reasonable integrability conditions on σ and b, by Itô's formula we see that for a solution X(t) of (1.3), µ(t) := L X(t) solves the nonlinear FPKE in the sense that In [23], by investigating existence, uniqueness, exponential convergence, and gradient-Harnack type inequalities for the distribution dependent SDE (1.3), the existence of a class of regular solutions to the nonlinear FPKE (1.4) is proved.
In the above two situations, the stochastic systems are Markovian (or memory-free); i.e. the evolution of the system does not depend on its past. However, many real-world models, in particular those arising from mathematical finance and biology, are with memory, so that the associated evolution equations are path dependent. In this case, the distributions of the solution solve non-linear FPKEs for probability measures on path space. In this paper, we investigate such a class of FPKEs by using path-distribution dependent SDEs.
In Section 2, we introduce the framework of the study and the main results on nonlinear FPKEs for probability measures on path space. To prove these results, we investigate the corresponding path-distribution dependent SDEs in Sections 3-5, where strong/weak existence and uniqueness of solutions as well as Harnack type inequalities are derived respectively. We will mainly follow the ideas of [23], but substantial additional efforts have to be made in order to generalize the results in there to the case, where the coefficients do not only depend on the time marginals, but are also on the distribution of the path.

Nonlinear PDEs for measures on path space
Throughout the paper, we fix r 0 > 0 and consider the path space C := C([−r 0 , 0]; R d ) equipped with the uniform norm ξ ∞ := sup θ∈[−r 0 .0] |ξ(θ)|. Let P C 2 be the class of probability measures on C of finite second-order moment, i.e. µ( · 2 where C (µ, ν) denotes the class of couplings for µ and ν. It is well known that (P C 2 , W 2 ) is a Polish space and the W 2 -metric is consistent with the weak topology. We will study non-linear FPKEs on P C 2 . Let For any t ≥ 0, µ ∈ P C 2 , consider the following differential operator L t,µ from C ∞ 0 (R d ) to the set of all B(C )-measurable functions: for f ∈ C ∞ 0 (R d ), Then the associated nonlinear FPKE for probability measures (µ t ) t≥0 on the path space C is where µ(t) is the marginal distribution of µ t at θ = 0; i.e.
We will investigate martingale solutions of (2.2) which are realized by marginals of probability measures on the infinite-time path space C ∞ := C([−r 0 , ∞); R d ). For a probability measure µ ∞ on C ∞ , consider its marginal distributions where σ(π(s) : s ≤ t 1 ) is the σ-field on C ∞ induced by the projections π(s) for s ∈ [−r 0 , t 1 ].
To construct the martingale solutions of (2.2) using path-distribution dependent SDEs, we need the following assumptions.
From now on, for any ν 0 , µ 0 ∈ P C 2 , we denote µ t and ν t the martingale solutions of (2.2) staring at µ 0 and ν 0 respectively.
To estimate the continuity of µ t in µ 0 with respect to entropy and total variational norm, we make the following stronger assumption.
Remark 2.1. According to Theorem 2.1(2), if there exists a constant ε ∈ (0, 1) such that holds for some constants c, λ > 0; i.e. the solution to (2.2) has exponential contraction in W 2 . If σ(t, ·, ·) and b(t, ·, ·) do not depend on t, i.e. the equation is time-homogenous, we µ t = P * t µ 0 . By the uniqueness we see that P * t is a semigroup, i.e. P * t+s = P * t P * s , s, t ≥ 0. Then (2.8) implies that P * t has a unique invariant probability measure µ ∈ P C 2 . Combining (2.9) with the semigroup property of P * t and (2.5)-(2.6), we conclude that (2.8) also implies the exponential convergence in entropy and total variational norm: Finally, we investigate the shift quasi-invariance and differentiability of µ t along Cameron-Martin vectors in H 1 := {ξ ∈ C : 0 −r 0 |ξ ′ (s)| 2 ds < ∞}. For η ∈ C and a probability measure µ on C , we say that µ is differentiable along ξ if for any A ∈ B(C ), ∂ ξ µ(A) := d dε µ(A+εξ) ε=0 exists and ∂ ξ µ(·) is a signed measure on C . Theorem 2.3. Assume (A) and let b(t, ·, µ) be differentiable on C , σ(t, x) = σ(t) be independent of x. Then for any t > r 0 , η ∈ H 1 and µ 0 ∈ P C 2 , µ t is differentiable along η, both ∂ η µ t and µ t (· + η) are absolutely continuous with respect to µ t , and for some Ψ ∈ C(R + ; Proof of Theorems 2.1-2.3. For µ 0 ∈ P C 2 , take a F 0 -measurable random variable X 0 on C such that L X 0 = µ 0 . According to Theorem 3.1, Corollary 4.2, Corollary 5.2 and (2.4), µ t := L Xt satisfies the estimates in Theorems 2.1-2.3 under the corresponding assumptions. So, it suffices to show that (L Xt ) t≥0 is the unique martingale solution of (2.2). Let We have L Xt = µ ∞ t . By (3.1) and Itô's formula, for any is a martingale solution of (2.2). When the coefficients are distributionfree, it is well known that the weak solution of (3.1) is equivalent to the martingale solution, so that the uniqueness of the martingale solutions of (2.2) follows from Theorem 3.1(3) below. In the following, we explain that the same is true for the present distribution dependent case.
Let µ t = µ ∞ t , for some probability measure µ ∞ on C ∞ , be a martingale solution of (2.2). We intend to prove µ ∞ = L {X(s)} s∈[−r 0 .∞) , so that the martingale solution is unique. Let Ω := C ∞ ,F t for t ≥ 0 be the completion of σ(π(s) : s ≤ t) with respect to µ ∞ , andP := µ ∞ . By Theorem 3.1(3) below, it suffices to prove that the coordinate process is a weak solution to (3.1). To this end, for the given (µ t ) t≥0 , definē and consider the corresponding operator is a martingale solution of (2.2), for any f ∈ C ∞ 0 (R d ), the process is a martingale on the probability space (Ω, (F t ) t≥0 ,P). By (H1)-(H3), the martingale property also holds for f being polynomials of order 2. In particular, by taking f (x) = x we see that Then according to Stroock-Varadhan (see, for example, Theorems 4.5.1 and 4.5.2 in [13]), we may construct a d-dimensional Brownian motionW (t) on a product probability space of (Ω,F t ,P) with (Ω,F t ,P) as a marginal space, and when σ is invertible these two spaces coincide, such that Combining this with (2.10), we see thatX(t) solves the stochastic functional differential equation i.e. (X,W ) is a weak solution of (3.1). Noting that µ ∞ := LX |P = LX |P , by the weak uniqueness of (3.1) due to Theorem 3.1(3) below, we obtain µ ∞ = L {X(s)} s∈[−r 0 ,∞) as desired.

Path-distribution dependent SDEs
Recall that for γ(·) ∈ C([−r 0 , ∞); R d ), the segment functional γ · ∈ C(R + ; C ) is defined by For σ, b in (2.1), consider the following path-distribution dependent SDE on R d : where W = (W (t)) t≥0 is a d-dimensional standard Brownian motion with respect to a complete filtered probability space (Ω, F , {F t } t≥0 , P), L Xt is the distribution of X t . We investigate the strong solutions of (3.1) and determine properties, of their distributions. We first recall the definition of the strong and weak solutions, see for instance [23, Definition 1.1] in the path independent setting. For simplicity, we will only consider square integrable solutions. E |b(r, X s,r , L Xs,r )| + σ(r, X s,r , L Xs,r ) 2 dr < ∞, t ≥ s, and (X s , (t) := X s,t (0)) t≥s satisfies P-a.s. X s , (t) = X s (s) + t s b(r, X s,r , L Xs,r )dr + t s σ(r, X s,r , L Xs,r )dW (r), t ≥ s.
We say that (3.1) has (strong or pathwise) existence and uniqueness, if for any s ≥ 0 and F s -measurable random variable X s,s with E X s,s 2 ∞ < ∞, the equation from time s has a unique solution (X s,t ) t≥s . When s = 0 we simply denote X 0, = X; i.e. X 0, (t) = X(t), X 0,t = X t , t ≥ 0.
(2) A couple (X s,t ,W (t)) t≥s is called a weak solution to (3.1) from time s, ifW (t) is a d-dimensional Brownian motion a complete filtered probability space (Ω, {F t } t≥s ,P), and X s,t solves (3.1) is said to satisfy weak uniqueness, if for any s ≥ 0, the distribution of a weak solution (X s,t ) t≥s to (3.1) from s ≥ 0 is uniquely determined by L Xs,s . When (3.1) has strong existence and uniqueness, the solution (X t ) t≥0 is a Markov process in the sense that for any s ≥ 0, (X t ) t≥s is determined by solving the equation from time s with initial state X s . More precisely, letting {X ξ s,t } t≥s denote the solution of the equation from time s with initial state X s,s = ξ, the existence and uniqueness imply When (3.1) also has weak uniqueness, we may define a semigroup (P * s,t ) t≥s on P C 2 by letting P * s,t µ = L Xs,t for L Xs,s = µ ∈ P C 2 . Indeed, by (3.3) we have For simplicity we set P * t = P * 0,t , t ≥ 0. (1) For any s ≥ 0 and X s,s ∈ L 2 (Ω → C ; F s ), (3.1) has a unique strong solution (X s,t ) t≥s with for some increasing function H : R + → R + .
We will prove this result by using the argument of [23]. For fixed s ≥ 0 and F s -measurable C -valued random variable X s,s with E X s,s 2 ∞ < ∞, we construct the solution of (3.1) by iterating in distribution as follows. Firstly, let For any n ≥ 1, let (X (n) s,t ) t≥s solve the classical path-dependent SDE Moreover, for any T > 0, there exists t 0 > 0 such that for all s ∈ [0, T ] and X s,s ∈ L 2 (Ω → C ; F s ), Proof. The proof is similar to that of [23, Lemma 2.1]. Without loss of generality, we may assume that s = 0 and simply denote X 0, (t) = X(t), X 0,t = X t , t ≥ 0.
(1) We first prove that the SDE (3.6) has a unique strong solution and (3.7) holds. For n = 1, letb Then (3.6) reduces to By (H1)-(H3), the coefficientsb andσ satisfy the standard monotonicity condition which imply strong existence, uniqueness and non-explosion for the stochastic functional differential equation (3.9), see e.g. [18,Corollary 4.1.2] with D = R d and u n = 1. It is also standard to prove (3.7) using Itô's formula Combining this with (H3) and applying the BDG inequality for p = 1, for any N ∈ [1, ∞) and τ N := inf{t ≥ 0 : This implies By first applying Gronwall's Lemma then letting N → ∞, we arrive at Therefore, (3.7) holds for n = 1. Now, assuming that the assertion holds for n = k for some k ≥ 1, we intend to prove it for n = k + 1. This can be done by repeating the above argument with (X · ), so, we omit the proof.
(1) Since the uniqueness follows from Theorem 3.1(2), which will be proved in the next step, in this step we only prove existence and estimate (3.5). By Lemma 3.2, there exists a unique adapted continuous process (X t ) t∈[0,t 0 ] such that where µ t is the distribution of X t . By (3.6), Then (3.10), (H1), (H3) and the dominated convergence theorem imply that P-a.s.
Finally, since C is a Polish space, for any µ 0 , ν 0 ∈ P C 2 , we can take F 0 -measurable random variables X 0 , Y 0 such that , we deduce the estimate in Theorem 3.1(3) from that in Theorem 3.1(2).

Harnack inequality and applications
To prove Theorem 2.2, we investigate Harnack inequalities of the operator P t defined by We will consider the Harnack inequality with a power p > 1 introduced in [16], and the log-Harnack inequality developed in [12,19], where classical SDEs on R d and manifolds are considered. To establish these inequalities for the present path-distribution dependent SDEs, we will adopt coupling by change of measures introduced in [1,17]. We refer to [18] for a general theory on this method and applications.
As a consequence of Theorem 4.1, we have the following result, see, for instance, the proof of [22, Prposition 3.1].
(c) There exists C ∈ C(R + ; R + ) such that By the definition ofb we see that (Y t ,W (t)) is a weak solution to the equation (4.5) with initial distribution ν 0 , so that by the weak uniqueness, L Yt |P T = ν t , t ∈ [0, T ]. Combining this with (b) we obtain Letting R T =R TRT , by Young's inequality and Hölder's inequality respectively, we obtain Then, Theorem 5.1(2) implies that is a densely defined bounded linear functional on L 2 (µ T ) with By the Riesz Representation Theorem, it uniquely extends to a bounded linear functional for some g ∈ L 2 (µ T ) with µ T (g 2 ) ≤ C(T ). Consequently, µ T is differentiable along η with (∂ η µ T )(A) = A gdµ T , A ∈ B(C ), and ∂ η µ T is absolutely continuous with respect to µ T such that