Entropy dissipation of Fokker-Planck equations on graphs

We study the nonlinear Fokker-Planck equation on graphs, which is the gradient flow in the space of probability measures supported on the nodes with respect to the discrete Wasserstein metric. The energy functional driving the gradient flow consists of a Boltzmann entropy, a linear potential and a quadratic interaction energy. We show that the solution converges to the Gibbs measures exponentially fast with a rate that can be given analytically. The continuous analog of this asymptotic rate is related to the Yano's formula.


Introduction
Optimal transport theory reveals many deep connections between partial differential equations and geometry.For example, in the seminal work [14], it is proved that the linear Fokker-Planck equation (FPE) is the gradient flow of a free energy in the probability space equipped with Wasserstein metric [12,21,24,25].This gradient flow interpretation has been extended to mean field settings, in which the free energy contains an interaction energy [1].Many studies have been carried out showing that the solution of FPE converges to its equilibrium in an exponential rate, and this is known as the entropy dissipation in the literature [4,8,18].The goal of this paper is studying the entropy dissipation of FPE in discrete settings, for example on finite graphs.Such a consideration is motivated by applications in biology, game theory, and numerical schemes for partial differential equations (PDEs).The optimal transport metric on graphs has been established by several groups independently [7,16,19].The gradient flow structure based such a metric attracts a lot of attentions in recent years.For example, Mass and Erbar studied the discrete heat flow, and further gave the Ricci curvature lower bound in the discrete space [10].More generalizations are followed in [11,13,17].Mielke proposed the discrete reaction diffusion equation [20].Erbar, Fathi and collaborators introduced a discrete McKean-Vlasov equation [9], which is the evolution equation for the probability density function of mean field Markov process.Various convergence properties of these gradient flows have been brought into attentions as well [5,10,13].
Following the setups in [7], we further study the dynamical properties of the gradient flows in the discrete Wasserstein geometry in this paper.Special attention are given to a free energy containing a quadratic interaction energy, a linear potential and the Boltzmann entropy.In this case, the gradient flow can be viewed as the nonlinear FPE on graphs, which is a set of ordinary differential equations (ODEs).We show that the solution of FPE converges to, the unique or one of the multiple when the free energy is non-convex, Gibbs measure exponentially fast, which mimics the entropy dissipation property, but in a discrete space.We further provide an explicit formula that bounds the convergence rate.The continuous analog of the asymptotic rate formula is related to the Yano's formula in Riemanian geometry [26,27].
The structure of this paper is arranged as follows.We review discrete 2-Wasserstein metric and Fokker-Planck equations on graphs in the next section, then study its convergence in Section 3. In Section 4, we discuss some properties stemed from the convergence rate, including the connection with Yano's formula.

Optimal transport on finite graphs
In this section, we briefly review the constructions of 2-Wasserstein metric and corresponding FPE on a graph .We mainly follow the approaches given in [7,16], with some modified notations for a simpler presentation.
Consider a weighted finite graph G = (V, E, ω), where V = {1, 2, • • • , n} is the vertex set, E is the edge set, and ω = (ω ij ) i,j∈V is the weight of each edge, We assume that G is undirected and contains no self loops or multiple edges.The adjacency set of vertex i ∈ V is denoted by The probability set (simplex) supported on all vertices of G is defined by where ρ i is the discrete probability function at node i.Its interior is denoted by P o (G).We introduce the the following notations and operations on G and P(G) and use them for the construction the discrete 2-Wasserstein metric.
For a given probability function ρ ∈ P(G) and a vector field v, we define the product ρv ∈ R n×n , called flux function on G, by where θ ij (ρ) are specially chosen functions.For example, θ ij (ρ) can be the logarithmic mean or an upwind function of ρ, which are used in [7,16], see more details in [15].In this paper, we select θ ij (ρ) as the average of ρ i and ρ j , i.e.
for the simplicity of illustration.
We define the divergence of ρv on G by div G (ρv Given two vector fields v, w on a graph and ρ ∈ P(G).The discrete inner product is defined by The coefficient 1/2 in front of the summation accounts for the fact that every edge in G is counted twice.In particular, we have With these definitions, we introduce an integration by part formula on graphs that will be used throughout this paper: For any vector field v and potential function Φ on a graph, the following properties hold and It is worth to remark that we prefer not to replace θ ij by its explicit formula, as done in [7,16], to emphasize the freedom of using different θ ij , which can result in different definitions for the flux function, divergence operator and inner product, and hence lead to different formulas for the discrete 2-Wasserstein metric.
2.1.2-Wasserstein metric on a graph.The discrete analogue of 2-Wasserstein metric W 2 on probability set P o (G) can be given as following.For any given ρ 0 , ρ 1 ∈ P o (G), define where the infimum is taken over all vector fields v on a graph, and ρ is a continuously differentiable curve ρ : [0, 1] → P o (G).This is the corresponding Benamou-Brenier formula [2] in discrete space.Modifying a similar proof as given in [16], one can show the following lemma, see details in [15].Lemma 1.Given a vector field on a graph v = (v ij ) (i,j)∈E with v ij = −v ji , and a measure ρ ∈ P o (G), there exists a unique decomposition, such that v = ∇ G Φ + u , and div G (ρu) = 0 , where Φ is a function defined on V .In addition, the following property holds, One may view Lemma 1 as a discrete analogue of the well-known Hodge decomposition.Using it, the metric (1) can be proven equivalent to where the infimum is taken over all potentials Φ : [0, 1] → R n .
Let us denote the tangent space at ρ ∈ P o (G) as We define a weighted graph Laplacian matrix L(ρ) ∈ R n×n : where D ∈ R |E|×|V | is the discrete gradient matrix We would like to emphasize that the weights in L(ρ) depend on the distribution ρ, and this is very different from the commonly used graph Laplacian matrices.
Lemma 2. For any given σ ∈ T ρ P o (G), there exists a unique function Φ, up to a constant shift, satisfying Proof.If ρ ∈ P o (G), all diagonal entries of the weighted matrix Θ(ρ) is nonzero.Consider which proves the lemma.
Based on Lemma 2, we write ) are eigenvalues of L(ρ) arranged in ascending order, and T is its corresponding eigenvector matrix.We denote the pseudo-inverse of L(ρ) by Then matrix L −1 (ρ) endows an inner product on T ρ P o (G).Definition 3.For any two tangent vectors σ 1 , σ 2 ∈ T ρ P o (G), define the inner product g : where Hence metric ( 1) is equivalent to where C is the set of all continuously differentiable curves ρ(t) 2.2.Gradient flows on finite graphs.We now consider the gradient flow of Theorem 4 (Gradient flows).For a finite graph G and a constant β > 0, the gradient flow of F(ρ) on Proof.For any On the right hand side, where we denote 5), (6), and the definition of gradient flow on manifold, we obtain Clearly, ( 4) is the discrete analog of Wasserstein gradient flow in continuous space where δ δρ F is the first variation of F. In what follows, we consider a particular free energy, which contains a quadratic interaction energy, a linear potential and the Boltzmann entropy: where V ∈ R n , and W ∈ R n×n is a symmetric matrix.Its gradient flow becomes which is the discrete analog of nonlinear FPE So we call (7) nonlinear FPE on graphs.A particular attention is given to which can be viewed as a nonlinear representation of Laplacian operator for ρ.We shall show that such a nonlinearity is the key for many dynamical properties of (7) later on.

Entropy dissipation
In this section, we focus on the convergence properties of FPE (7).Denote the nonlinear Gibbs measure It is easy to verify that the Gibbs measure is the equilibrium of (7).Our main theorem here is to show how fast ρ(t), the solution of FPE (7), converges to ρ ∞ .
Theorem 5. Assume ρ 0 ∈ P o (G) and F(ρ) is strictly positive definite in P(G), then there exists a constant C > 0, such that Furthermore 2 , where Deg(G) is the maximal degree of graph, L = D T D is the graph Laplacian matrix, λ sec ( L) and λ max ( L) are the second smallest and the largest eigenvalue of L respectively, Before giving the complete proof, we want to point out the main difficulties that we must overcome.Since F(ρ) is strictly convex and ρ ∞ is its unique minimizer, it is not hard to show ρ(t) converging to ρ ∞ .In general, the rate of convergence is determined by comparing the ratio between the first and second derivative of F(ρ(t)) along the gradient flow.If one can find a constant C > 0, such that holds for all t ≥ 0, one can obtain, by integration, Then ( 8) is proved following the Gronwall's inequality.
For FPE (7), the first derivative of F along (7) gives where Comparing d dt F(ρ(t)) with d 2 dt 2 F(ρ(t)), we find Quadratic Cubic (10) However, it is not simple to get an estimation of C. In the continuous case, there are only a few examples [4], depending on special interaction potentials W, that allow us to find C explicitly.In the discrete space, we overcome this difficulty by borrowing techniques from dynamical systems.If ρ is close enough to the equilibrium ( ρ is near zero), estimating C in (10) becomes possible.This is because the cubic term of ρ in (10) becomes one order smaller than ρT L −1 (ρ) ρ, and the dominating quadratic term can be estimated by a solvable eigenvalue problem.
Following this idea, the sketch of proof is as follows: In lemma 6, we first show that the solution of FPE ( 7) is well defined, and it converges to ρ ∞ .In fact, it can be shown ρ ∈ B(ρ 0 ), a compact subset in P o (G).Then we estimate the convergence rate in B(ρ 0 ) by two parts, depending on a parameter x > 0 controlling the closeness between ρ and ρ ∞ .If ρ(t) is far away from ρ ∞ , the dissipation formula (10) Proof.First, we prove (i) by constructing a compact set B(ρ Then we define We shall show that if ρ 0 ∈ B(ρ 0 ), then ρ(t) ∈ B(ρ 0 ) for all t ≥ 0. In other words, the boundary of B(ρ 0 ) is a repeller for the ODE (7).Assume ρ(t 1 ) ∈ ∂B(ρ 0 ) at time t 1 , this means that there exist indices i We will show d dt On the other hand, since ρ(t 1 ) ∈ B(ρ 0 ), for any i ∈ A, then k∈A\{i} ρ k (t 1 ) ≤ 1 − ǫ l−1 , and from the assumption (11), ρ i (t 1 ) + k∈A\{i} ρ k (t 1 ) = 1 − ǫ l , we obtain Combining equations ( 12) and ( 13), we know that for any i ∈ A and j ∈ A c , where the last inequality is from ǫ l = ǫ l−1 1+(2M ) Since the graph is connected, there exists i * ∈ A, j * ∈ A c ∩ N (i * ) such that By combining ( 14) and ( 15), we have where the third equality is from (i,j)∈A θ ij (F j −F i ) = 0. Therefore, we have ρ(t) ∈ B(ρ 0 ), thus min i∈V,t>0 ρ(t) ≥ m(ρ 0 ).(ii) can be proved similarly as in [7], so we omit it here.
Lemma 7.For ρ ∈ P o (G), then and the Laplacian matrix L has the simple eigenvalue 0 with eigenvector (1, This implies that min By the definition of L −1 (ρ), we can prove the other inequality.
We are now ready to prove the main result.
Proof of Theorem 5. Given a parameter x > 0, we divide B(ρ 0 ) into two parts: We consider the convergence rate in B 1 first.
Lemma 8. Denote r 1 (x) = C 1 x, where for any t ≤ T = inf{τ > 0 : Proof.We shall show min If this is true, then for t ≤ T , From the Gronwall's inequality, ( 16) is proven.
By Taylor expansion on ρ, we have where ρ = ρ + s(ρ ∞ − ρ), for some constant s ∈ (0, 1).Denote the Euclidean projection matrix onto T ρ P o (G) by The above implies Thus which finishes the proof.
Next we give the convergence rate in B 2 . where and Proof.We shall show min Suppose it is true, then holds for all t ≥ T .Integrating this formula in [t, +∞), we obtain By Gronwall's inequality, ( 17) is proven.
We come back to estimate r 2 (x).Since T is a constant vector, by Taylor expansion, we have where ρ, ρ are two discrete densities between the line segment of ρ and ρ ∞ .
Combining these two estimates, we get ) where the last inequality comes from λ sec (L(ρ)) ≥ m(ρ 0 )λ sec ( L) and We are ready to find the overall convergence rate.By Lemma 8 and Lemma 9, one can show that for any t ≥ 0, for any x > 0. We estimate a constant rate C by showing max It is clear that the maximizer x * > 0 is achieved at which finishes the proof.
We remark that for general choice θ ij (ρ) ∈ C 1 , the explicit rate can also be established.Following the proof in Lemma 9, one only needs to replace r by r = r max ρ∈B(ρ 0 ),(i,j)∈E ∂θ ij ∂ρ i , then the generalized convergence rate is In addition, when W = 0, Theorem 5 gives the exponential convergence of linear FPE on graphs, for any potential V ∈ R n .
3.1.Inequalities.In literature, it is well known that the convergence of FPE can be used to prove the so called Log-Sobolev inequality and a few others.We mimic this result on graphs and further extend the inequality to the case that includes the nonlinear interaction energy.For simplicity, we take β = 1 and consider F(ρ) = 1 2 ρ T Wρ + V T ρ + n i=1 ρ i log ρ i , which is strictly convex in P(G).Again, we denote ρ ∞ as the Gibbs measure.
The Log-Sobolev inequality describes a relationship between two functionals named relative entropy and relative Fisher information, which can be expressed using our notations in the following formulas, Relative entropy ; (18) and Corollary 10.If F(ρ) is strictly convex in P(G), then there exists a constant λ > 0, such that We want to point out that when W = 0, corollary (10) is reduced to the standard Log-Sobolev inequality.In this case, functionals (18) and ( 19) can be written as Their continuous counterparts are Proof.We use the fact that the dissipation of relative entropy is the relative Fisher information along FPE (7), Similar as in Theorem 5, we divide P(G) into two regions based on a given parameter x > 0: We shall show two upper bounds of H(ρ) I(ρ) in D 1 and D 2 respectively.On one hand, consider FPE (7), with ρ(t) starting from an initial measure ρ ∈ D 1 .Since H(ρ(t)|ρ ∞ ) is a Lyapunov function, then ρ(t) ∈ D 1 for all t > 0. Following Lemma 9, there exists r 2 (x) > 0, such that where lim t→∞ On the other hand, if ρ ∈ D 2 , we shall show It is trivial that H(ρ|ρ ∞ ) is bounded above.We only need to show inf ρ∈D 2 I(ρ|ρ This implies F i (ρ * ) = F j (ρ * ) for any (i, j) ∈ E. Since G is connected, then ρ * = ρ ∞ = arg min ρ∈P(G) F(ρ), which contradicts ρ * ∈ D 2 .By choosing 1 2λ = max{λ 1 , λ 2 }, we prove the result.
3.2.Asymptotic properties.If W is not a positive definite matrix, there may exist multiple Gibbs measures.Facing these multiple equilibria, it may not be possible to find one explicit rate for any initial conditions, unless there are only a finite number of equilibria.However, the asymptotic convergence rate can be established whenever the solution is near a equilibrium.In what follows, we study such an asymptotic rate.
Assume that the initial measure ρ 0 is in a basin of attraction of an equilibrium ρ ∞ , meaning (A) lim t→∞ ρ(t) = ρ ∞ and ρ ∞ is an isolated equilibrium .
Theorem 11.Let (A) hold and Then for any sufficiently small ǫ > 0 satisfying (λ − ǫ) > 0, there exists a time T > 0, such that when t > T , Proof.Since lim t→∞ ρ(t) = ρ ∞ , for sufficient small ǫ > 0, there exists t > T , such that Similar to the proof of Lemma 9, we have Following strategies in (9), we prove the result.
The techniques used in this proof can also be applied to some non-gradient flows, for example, the FPEs with a non-symmetric interaction potential W. In this approach, the free energy F(ρ) is no longer exists.However, the relative Fisher information always exists, which is used to measure the closeness between ρ(t) and ρ ∞ .Corollary 12. Let (A) hold and where J F is the Jacobi operator on vector function F (ρ). Then for any sufficiently small ǫ > 0 satisfying (λ − ǫ) > 0, there exists a time T > 0, such that when t > T , Following the proof in Theorem 11, it is straightforward to show that if t > T , there exists ǫ > 0, such that By the Gronwall's equality, we prove the result.
In the end, we shall give an explicit formula for the quadratic form in (10), i.e.
From Lemma 2, there exists a unique Φ ∈ R n , up to constant shift, such that σ = L(ρ)Φ.Thus We can rewrite the formula (20) explicitly.Introducing s.t. 1 2 In fact, it is not hard to show that λ is the eigenvalue problem of Hessian operator at the equilibrium in (P o (G), W 2 ).In the next section, we shall present what (21) suggests in its continuous analog.

Connection with Wasserstein geometry
We exploit the meaning of h ij,kl by examining its continuous analog in this section.Our calculation indicates a nice relation to a famous identity in Riemannian geometry, known as the Yano's formula [26,27].
Consider a smooth finite dimensional Riemannian manifold M. We assume that M is oriented, compact and has no boundary.We denote P(M) the space of density functions supported on M, T ρ P(M) the tangent space at ρ ∈ P(M), i.e.T ρ P(M) = {σ(x) : M σ(x)dx = 0}.Following calculus in [22,25], for any σ(x) ∈ T ρ P(M), there exists a function Φ(x) satisfying σ(x) = −∇ • (ρ∇Φ(x)).This correspondence and the 2-Wasserstein metric endow an scalar inner production on T ρ P(M) (σ(x), σ(x)) = (∇Φ, ∇ Φ) ρ := M ∇Φ • ∇ Φρdx .Now consider a smooth free energy F : P(M) → R. We assume that ρ * ∈ P(M) is an equilibrium satisfying where δ δρ(x) is the first variation operator in L 2 metric.To understand h ij,kl , we calculate the Hessian of F at ρ * with respect to the 2-Wasserstein metric, and show where the third equality holds by the integration by parts formula and the fact that M has no boundary.
where ρ is an arbitrary density function, D 2 is the second covariant derivative and Ric is the Ricci curvature tensor on M. Evaluating (29) at the equilibrium ρ * and comparing it with (28), we observe