Quantitative logarithmic Sobolev inequalities and stability estimates

We establish an improved form of the classical logarithmic Sobolev inequality for the Gaussian measure restricted to probability densities which satisfy a Poincar\'e inequality. The result implies a lower bound on the deficit in terms of the quadratic Kantorovich-Wasserstein distance. We similarly investigate the deficit in the Talagrand quadratic transportation cost inequality this time by means of an ${\rm L}^1$-Kantorovich-Wasserstein distance, optimal for product measures, and deduce a lower bound on the deficit in the logarithmic Sobolev inequality in terms of this metric. Applications are given in the context of the Bakry-\'Emery theory and the coherent state transform. The proofs combine tools from semigroup and heat kernel theory and optimal mass transportation.


Introduction and main results
The classical logarithmic Sobolev inequality of L. Gross [21] for the standard Gaussian measure dγ(x) = dγ n (x) = e −|x| 2 /2 dx (2π) n/2 on the Borel sets of R n (cf. e.g. [34,35,4]) states that if dν = f dγ is a probability measure with density f with respect to γ, where H(ν) = H ν | γ) = R n f log f dγ is the relative entropy of ν with respect to γ and is the Fisher information of ν with respect to γ. Logarithmic Sobolev inequalities (LSI) are a useful tool in analysis and probability in the study of convergence to equilibrium, large deviations, and measure concentration. They are also equivalent to hypercontractivity for their associated semigroup (cf. [34,35,4]). To ensure that the various terms of the LSI are well-defined, some smoothness and positivity properties of the density f of ν have to be considered. These may be handled by approximation and regularization (see e.g. [4]). When dealing with entropy H(ν) and Fisher information I(ν) (and below the LSI deficit δ LSI (ν) (1.3)), it will be usually implicitly understood that they are well-defined (and finite) for suitable density functions f .
The constant 1/2 in the Gaussian LSI (1.1) is known to be optimal, and it was first shown in [11] that the cases of equality are exactly the measures of the form In other words, the extremal densities f are exponential functions. (Note that b is the barycenter of γ b , so that in particular the only centered extremal measure is γ itself.) However, the study of the logarithmic Sobolev deficit δ LSI (ν) = 1 2 I(ν) − H(ν) (1.3) to quantify proximity with the extremal measures is still largely open in spite of recent developments for classical Sobolev and related isoperimetric inequalities. In the broader context of stability results for functional inequalities, when looking at a functional inequality with known optimal constants and optimizers, a natural question is indeed whether functions that are close to achieving the optimum are close to some optimizer. The task is to bound from below the deficit by some functional that measures how far we are from some optimizer (typically, a distance). Examples of such results are the recent quantitative stability estimates for Sobolev [12,19], Brunn-Minkowski [17,16], and isoperimetric inequalities [20,18,15,23].
The first main result of this note is to propose a (strict) strengthening of the Gaussian LSI (1.1) within a subclass of probability measures ν which in turn produces a lower bound on the deficit δ LSI (ν). Denote by P(λ) the class of probability measures ν on the Borel sets of R n satisfying a Poincaré inequality with constant λ > 0 in the sense that for every smooth g : R n → R such that R n gdν = 0, Note that under such a Poincaré inequality, the measure ν necessarily has a second moment.
Theorem 1. For any centered ( R n xdν = 0) probability measure dν = f dγ in the class P(λ), The constant is sharp, as can be seen when taking ν with density f (x) = √ λ e (1−λ)x 2 /2 , λ > 0, on the line. Of course, since the constant 1/2 in the Gaussian LSI is optimal, such a strengthening can only be expected to hold on a subset of probability measures.
In dimension n = 1, the class of probability measures satisfying a Poincaré inequality (1.4) has been completely characterized. A probability measure ν with density p with respect to the Lebesgue measure and median m satisfies a Poincaré inequality if and only if the following holds (see [6,4]): Moreover, the optimal Poincaré constant λ opt for ν satisfies In higher dimension, there is no such simple characterization, but fairly general sufficient conditions are available. For example, if ν has a density of the form e −V with respect to the Lebesgue measure, a sufficient condition is the existence of a ∈ ]0, 1[ such that a|∇V | 2 − ∆V is bounded from below by some positive constant outside of some ball (see [2]). A more classical condition is the Bakry-Émery criterion on the potential V ( [3,34,4]) ensuring a Poincaré inequality with constant λ = η.
Theorem 1 improves upon the recent [22] where stronger conditions on the Hessian of the density f are considered (in particular parts of the class P(λ)), with weaker dependence of the constant. The work [22] actually investigates how far an admissible density is from saturating the logarithmic Sobolev inequality as measured with Wasserstein distance, providing a control of the deficit δ LSI (ν) in the logarithmic Sobolev inequality by the (quadratic) Kantorovich-Wasserstein distance W 2 (ν, γ). Within the class P(λ), this is easily achieved via Theorem 1 together with the Talagrand quadratic transportation cost inequality [33] (cf. [34,35,4]) holding for all probability measures ν (absolutely continuous with respect to γ). Recall that the Kantorovich-Wasserstein distance W 2 (ν, µ) between two probability measures ν and µ is given by where the infimum is over all couplings π of probability measures on R n × R n with respective marginals ν and µ. Note that if ν ∈ P(λ), it has necessarily a second moment so that the Kantorovich-Wasserstein distance W 2 (ν, γ) is finite.
This corollary may be compared to the Otto-Villani HWI inequality [29] (cf. [34,35,4]), valid for any probability ν, It should be mentioned that one cannot expect to hold for some c > 0 and all probability measures ν. Indeed, such an inequality combined with the HWI inequality would then imply the logarithmic Sobolev inequality H(ν) ≤ 1+c 2+4c I(ν) with therefore a constant strictly better than the optimal 1/2. A complete stability result for the Gaussian LSI therefore requires a distance weaker than W 2 . In this direction, Theorem 1 may also be used to provide a lower bound on the deficit δ LSI in terms of the total variation. Indeed, as the standard Gaussian measure γ satisfies a (1, 1)-Poincaré inequality (cf. e.g. [24]) for every smooth g : R n → R with mean zero, if dν = f dγ, by the Cauchy-Schwarz inequality. We then only state the consequence of (1.6) in the centered case.
Corollary 4. For any centered probability measure dν = f dγ in the class P(λ), While Corollaries 3 and 4 are strictly weaker than Theorem 1, they have the advantage of providing a lower bound on the deficit in the Gaussian LSI in terms of a metric.
A one-dimensional stability result of the same kind as Corollary 3 is proven in Corollary 4.4 of [7], however with a worse constant of proportionality. The main assumption is uniform log-concavity of ν (i.e. (1.5)) which is used to apply a (1, 1)-Poincaré inequality. As far as we know, the argument of [7] does not extend to higher dimensions. Nevertheless, the one-dimensional result may be combined with a tensorization argument to cover the case of n-dimensional random vectors with uniformly log-concave distributions whose one-dimensional projections form a martingale. Such an assumption is not the same as simply assuming that the mean of ν is zero. More generally, whereν is the law of a random vectorX obtained by modifying a random vector X with law ν in such a way that its one-dimensional marginals X 1 , . . . , X n form a martingale [7]. For unconditional random variables, this is the same as assuming the mean to be zero, but in general it does not seem like W 2 (ν, γ) and W 2 (ν, γ) can be easily compared. The contribution [7] also contains deficit estimates for general ν, but with lower bounds that are either not a power of a distance, are dimension-dependent, or involveν. For example, there is a universal constant c > 0 such that for all smooth probability measures ν on R n , whereν is the previously discussed martingale rearrangement of ν and T is a transportation cost associated to the function t → t − log(1 + t).
The second main result of this note investigates the deficit in the Talagrand quadratic transportation cost inequality (1.8). A result of Otto and Villani [29] states that a measure satisfying a logarithmic Sobolev inequality automatically satisfies a Talagrand-type inequality. It is easy to see, using the HWI inequality (1.9), that the cases of equality for Talagrand's inequality are exactly the same as for the Gaussian LSI. Therefore, it is natural to investigate lower bounds on the Talagrand deficit In dimension one, it was shown by Barthe and Kolesnikov [5] that the deficit δ Tal (ν) satisfies where the infimum is over couplings π of ν and γ, and ϕ(t) = t − log(1 + t). Note that the right-hand side in this inequality is an optimal transport cost, with a cost that is quadratic-then-linear in the distance. This inequality immediately yields the weaker version where W 1 is the L 1 -Kantorovitch-Wasserstein distance (with ℓ 2 -cost function on R n ) between the onedimensional measures ν and γ.
We establish here the following multi-dimensional version of the Barthe-Kolesnikov result. Let be the L 1 -Kantorovich-Wasserstein distance with ℓ 1 -cost function on R n where the infimum is over couplings π of ν and µ.
Theorem 5. There is a numerical constant c > 0 such that for any centered probability measure dν = f dγ on R n with finite second moments and f > 0 locally bounded, One feature of this result is that it is valid for general measures. Moreover, the lower bound is expressed in terms of a metric on the space of probability measures on R n and the exponent is independent of the dimension. In general, the deficit in Theorem 5 is only optimal for small perturbations of the Gaussian. For an n-dimensional product measure ν n = ν ⊗n , δ Tal (ν n ) = nδ Tal (ν) grows linearly in n. This is also the behavior of When n >> W 1,1 (ν, γ 1 ) −2 , the expected growth is lost. Nevertheless, for product measures whose onedimensional marginals are close enough to γ = γ 1 (i.e. such that W 1,1 (ν, γ 1 ) 2 ≤ c n ), Theorem 5 yields the correct order of magnitude in the dimension.
Theorem 5 furthermore yields a new proof of the equality case for the Gaussian LSI. Indeed, by the HWI inequality, Therefore, if ν is such that δ LSI (ν) = 0, then I(ν) = W 2 (ν, γ) 2 . By the conjunction of the Talagrand (1.8) and LSI (1.1) inequalities, so that there is also equality in Talagrand's inequality and thus δ Tal (ν) = 0. Therefore, Theorem 5 implies that the only centered measure satisfying δ LSI (ν) = 0 is precisely γ. The non-centered case follows as for Corollary 2.
The preceding argument may be quantified in terms of the W 1,1 metric and yields a general stability result for LSI. Recall ν b from (1.7).

Corollary 6.
There is a numerical constant c > 0 such that for any probability measure dν = f dγ on R n with f > 0 locally bounded and positive entropy, and with barycenter b = b(ν), Indeed, as above, by the HWI (1.9), logarithmic Sobolev (1.1) and Talagrand's (1.8) inequalities, Hence The result then follows from Theorem 5 for a centered ν, and in the general case by recentering as above.
Note that the inequality given by Corollary 6 is of a similar form to (1.11) established in [7] for smooth measures. It does not seem that the measureν involved in (1.11) is directly comparable to ν in general, whereas ν b is an explicit transformation of ν. In particular, Corollary 6 immediately implies the equality cases of LSI for general measures without any additional argument.
Finally, there is also a lower bound on the deficit δ LSI (ν) which may be expressed only in terms of Kantorovich-Wasserstein distances. For simplicity, only the centered case is considered.
There is a numerical constant c > 0 such that for any centered probability measure For the proof, argue as for Corollary 6 combining the HWI, logarithmic Sobolev and Talagrand inequalities to get that Write W 2 = W 2 (ν, γ) and W 1,1 = W 1,1 (ν, γ) to ease the notation. By Theorem 5, for some c ′ > 0 only depending on c, and the claim follows.
The rest of the paper is organized as follows. In Section 2, we prove the main results. In Section 3, we establish several one-dimensional results. Lastly, in Section 4, we present an improvement of the Bakry-Émery theorem for symmetric measures satisfying a Poincaré inequality and obtain quantitative versions of the Wehrl conjectures established by Lieb [25] and Carlen [10] in the context of the coherent state transform.

Proofs of Theorems 1 and 5
We start with the proof of Theorem 1. The results in [22] rely on mass transportation tools. The arguments here are based on the standard semigroup interpolation along the Ornstein-Uhlenbeck semigroup going back to [3] (cf. [1,4]), together with heat kernel inequalities as developed in [4] (to which we refer for the necessary background).
Proof of Theorem 1. Recall the Ornstein-Uhlenbeck semigroup (P t ) t≥0 given on suitable functions g : R n → R by The Ornstein-Uhlenbeck semigroup (P t ) t≥0 is invariant and symmetric with respect to γ and, on smooth functions, ∇P t g = e −t P t (∇g) (as vectors). For each t ≥ 0, set dν t = P t f dγ. The classical de Brujin's formula indicates that H(ν) = ∞ 0 I(ν t )dt. (2.12) This identity follows from the fact that the Fisher information I(ν t ) is the time-derivative of the entropy along the Ornstein-Uhlenbeck flow.
In the first step of the argument, we show that for any t ≥ 0, ν t satisfies a Poincaré inequality (1.4) with constant To prove this, consider a smooth function g with R n g dν t = R n g P t f dγ = R n P t g dν = 0 (by symmetry of P t ). First, by the local Poincaré inequalities for (P t ) t≥0 (cf. [4]), for every t ≥ 0, Then, by the Poincaré inequality applied to P t g, since R n P t gdν = 0, where we used the heat kernel inequality |∇P t g| 2 ≤ e −2t P t (|∇g| 2 ) and again the symmetry of P t . The claim follows. Towards the second step of the argument, recall that by integration by parts, for every t > 0, As is classical (cf. [1,4]), Since ν has a first moment, |∇P t f | ∈ L 1 (γ) for every t > 0. Then, if v t = log P t f , by the Gaussian integration by parts formula, Since ν t satisfies a Poincaré inequality with constant λ t , applied to v t = log P t f for which therefore As a consequence, d dt I(ν t ) ≤ −2(λ t + 1) I(ν t ).
Integrating this differential inequality, for every t ≥ 0, Finally, by de Brujin's formula (2.12), the conclusion follows. The proof of Theorem 1 is complete.
We now turn to the proof of Theorem 5, which is based on mass transportation arguments.
Optimizing in α > 0 concludes the proof of Theorem 5.
Remark 8. In the proof of Theorem 5, [28] was employed to deduce W 2,1 -regularity of the potential function φ. In our framework, one may also infer the regularity in a different way. Indeed, from [8] it follows that if φ is not strictly convex at a point, then it is affine on a line. Since φ is globally convex, this implies that it only depends on (n − 1) variables. In particular, ∇φ(R n ) is contained in an (n − 1)dimensional subspace, and this contradicts that ∇φ pushes dγ onto f dγ. Hence, φ is strictly convex on R n , and the desired regularity follows from [14].

One dimensional estimates via mass transfer
The proof of Theorem 1 relies on heat kernel theory. In this section, we establish an L 1 estimate via mass transfer theory for measures satisfying a (1, 1)-Poincaré inequality on the real line for some λ > 0 and every smooth mean zero g : R → R. Sufficient conditions to guarantee the (1,1)-Poincaré are given in [2] (see e.g. Theorem 1.5 there). In general, the L 1 Poincaré is stronger than the standard L 2 inequality (1.4), which makes Theorem 9 below weaker than Theorem 1. However, the emphasis here is on the method of proof.
The next corollary is achieved as Corollary 2. As already mentioned, since Theorem 1 cannot hold for all probability measures, one may not hope to generalize Corollary 2 by enlarging the function space. However, this does not prevent the weaker estimates in Theorem 9 and Corollary 10 from being true in general. If these estimates held in full generality, without the assumption that ν satisfies some Poincaré inequality, then they would automatically recover the equality cases of the Gaussian logarithmic Sobolev inequality.
We conclude this section by proving a version of Corollaries 2 and 3 on the real line for probability measures satisfying a second moment bound (without assuming a Poincaré inequality). The proof is again based on mass transfer. Recall the functionφ (3.17) from the proof of Theorem 9.
Theorem 11. Let dν = f dγ be a probability measure on R with barycenter b = b(ν) such that Var ν (x) ≤ 1. Then, for some C > 0, In particular, for some numerical c > 0, where γ b is given in (1.2).
A multidimensional version of this result was proved in [7], with a smoothness assumption on f . The proof there is based on a rescaling property of the LSI. The contribution here is an alternative technique of proof. It would be of interest to see if the multidimensional version can be similarly obtained using transport arguments.
Proof. By approximation, it may be assumed that f has compact support and is smooth enough with derivative at least in L 1 (γ). Letting as above T : R → R be the increasing map pushing ν onto γ, we have By Gaussian integration by parts, R f ′ dγ = R xf dγ = b and similarly After some algebra, it follows that Using (3.15) and (3.18), we get that But R |(log f ) ′ − b| 2 dν is the relative Fisher information of ν with respect to the non-centered Gaussian dγ b = e b.x−b 2 /2 dγ which satisfies a logarithmic Sobolev inequality with constant 1 2 . Therefore, together with Talagrand's inequality (1.8), By definition of the Wassertein distance W 2 , under the assumptionVar ν (x) ≤ 1. Since ϕ behaves quadratically near the origin, it finally follows that for some numerical c > 0,

The Bakry-Émery theorem for symmetric measures in P(λ)
In what follows we describe an extension of Theorem 1 to families of log-concave measures. Let dµ = e −V dx where V : R n → R a smooth potential be a probability measure on R n satisfying the convexity condition (1.5), that is Hess(V ) ≥ η Id for some η > 0. The Gaussian case corresponds to the quadratic potential V (x) = |x| 2 2 with η = 1.
Given a probability measure dν = f dµ with density f with respect to µ, the relative entropy and Fisher information with respect to µ are defined as in the Gaussian case by and the Bakry-Émery LSI (see [3,34,35,4]) ensures that As for the Gaussian LSI, the proof relies on the semigroup (P V t ) t≥0 with infinitesimal generator L V = ∆ − ∇V · ∇ for which the analogues of (2.12) and (2.13) read, with dν t = P V t f dµ, where, this time, If we try to mimic the proof of Theorem 1 in this context, it should be proved that as soon as ν belongs to P(λ), ν t belongs to P(λ t ) with (which is proved as in the Gaussian case), and that, whenever ν is centered, R n ∇v t dν t = 0 for all t ≥ 0 where v t = log P V t f . The latter requirement is however not true in this general context. It can nevertheless hold in some more restricted setting, for example as soon as V is even and ν is symmetric (i.e. if f is also even) in which case R n ∇v t dν t = R n ∇V dν t = 0.
These observations lead to the following improvement of the Bakry-Émery theorem for symmetric measures in P(λ).
Theorem 12. Assume that dµ = e −V dx is a symmetric probability measure such that Hess(V ) ≥ η Id for some η > 0, and let dν = f dµ be a symmetric probability measure in the class P(λ) for some λ > 0. Then, for every t ≥ 0, Consequently, if λ = η, and, if λ = η, Note that this result is not a stability result, since the constant given by the Bakry-Émery theorem is not optimal in general. Theorem 12 nevertheless yields improved estimates on the speed of convergence to equilibrium for the semigroup, of interest for example in the context of Monte Carlo Markov Chain sampling of the measure µ.
Similar estimates can obtained for measures which are given by bounded perturbations of uniformly convex potentials, using the Holley-Stroock approach. This includes the important example of the quartic double-well potential V (x) = (x 2 − 1) 2 (which is used in statistical physics for continuous versions of the Ising model).

Coherent state transform
For h > 0, let dµ h denote h −n times the Lebesgue measure on C n viewed as R 2n . The coherent state transform is an integral transform mapping (L 2 (R n ), dx) isometrically onto a subspace of (R 2n , dµ h ) and given explicitly by ψ → Lψ(p, q) = e ip·q/2h * R n e ip·x/h * e −|x−y| 2 /2h * ψ(x)dx with h * = h 2π . The map L is built out of Weyl's representation of the Heisenberg group and has applications in quantum mechanics, where |Lψ| 2 is interpreted as the phase space density in the state ψ. Bounds on |Lψ| 2 are useful in estimating, e.g., the ground state energy of a Schrödinger operator (see [26,10]).
The concentration of a density ρ can be measured via the entropy functional S defined by Note that this is the physical entropy, which is the negative of the mathematical entropy. Wehrl [36] conjectured n to be a lower bound on the entropy of phase space densities induced by L acting on (L 2 (R n ), dx), that is S(ρ) ≥ n whenever ρ = |Lψ| 2 and ψ ∈ (L 2 (R n ), dx). Lieb [25] established this inequality with a method based on the sharp Young and Haussdorf-Young inequalities. Carlen [10] recovered Lieb's result via an approach based on the logarithmic Sobolev inequality and also settled the problem of characterizing the cases of equality.
In what follows we apply our results from the previous sections to show that in some configurations, one can obtain positive lower bounds on the Wehrl deficit Thus, if M > 3π h , the previous theorem applies in F M . It is well known that the range of L is closely related to the space A 2 of entire function Φ on C n such that |Φ(z)| 2 e −2π|z| 2 /h dp dq < ∞ where z = (q + ip)/ √ 2. The precise statement is that for every ψ ∈ (L 2 (R n ), dx), Lψ(p, q) = e ip·q/2h * Φ((q − ip)/ √ 2)e (p 2 +q 2 )/4h * where Φ ∈ A 2 . In fact, Segal [30,31] (see also [32]) proved that the map L : ψ → Φ is unitary from (L 2 (R n ), dx) onto A 2 , and therefore Carlen [10] calls L the Segal transform. With this in mind, the Segal transform may be useful in characterizing the subspace of functions ψ in the domain of L mapping to functions |LΨ| 2 admitting a Poincaré inequality and hence a dimensionless W 2 -estimate via Theorem 14.