Weak regularization by stochastic drift : result and counter example

. In this paper, weak uniqueness of hypoelliptic stochastic diﬀerential equation with H¨older drift is proved when the H¨older exponent is strictly greater than 1/3. This result then “extends” to a weak framework the previous works [4, 23, 10], where strong uniqueness was proved when the regularity index of the drift is strictly greater than 2/3. Part of the result is also shown to be almost sharp thanks to a counter example when the H¨older exponent of the degenerate component is just below 1/3. The approach is based on martingale problem formulation of Stroock and Varadhan and so on smoothing properties of the associated PDE which is, in the current setting, degenerate. This is a reprint version of an article published in Discrete & Continuous Dynamical Systems - A, 2018, 38 (3) : 1269-1291.


Introduction
Let d be a positive integer and M d (R) be the set of d × d matrices with real coefficients. For a given positive T , given measurable functions F 1 , F 2 , σ : [0, T ]×R d ×R d → R d ×R d ×M d (R) and (B t , t ≥ 0) a standard d-dimensional Brownian motion defined on some filtered probability space (Ω, F, P, (F t ) t≥0 ) we consider the following R d × R d system for any t in [0, T ]: where x 1 and x 2 belong to R d and where the diffusion matrix a := σσ * is 1 supposed to be uniformly elliptic.
In this work, we aim at proving that this system is well-posed (i.e. there exists a unique solution), in the weak sense, when the drift is singular. Indeed, in that case, uniqueness of the associated martingale problem from Stroock and Varadhan's theory [21] fails since the noise of the system degenerates. We nevertheless show that under a suitable Hölder assumption on the drift, Lipschitz condition on the diffusion matrix, and hypoellipticity condition on the system, weak well-posedness holds for (1.1). By suitable, we mean that there exists a threshold for the Hölder-continuity of the drift with respect to (w.r.t.) the degenerate argument. This Hölder-exponent is supposed to be strictly greater than 1/3. We also show that this threshold is almost sharp thanks to a counter-example when the Hölder exponent is strictly less than 1/3.
Mathematical background. It may be a real challenge to show well-posedness of a differential system with drift less than Lipschitz continuous (see [9] for a work in that direction). The Peano example is a very good illustration of this phenomenon: for any α in (0, 1) the equation (1.2) has an infinite number of solutions of the form ±c α (t − t ⋆ ) 1/(1−α) 1 [t ⋆ ;+∞) , t ⋆ ∈ [0, T ]. Nevertheless, it has been shown that this equation is well-posed (in a strong and weak sense) as soon as it is infinitesimally perturbed by a Brownian motion. More precisely, the equation admits a unique strong solution (i.e. there exists an almost surely unique solution adapted to the filtration generated by the Brownian motion) as soon as the function b : R d ∋ x → b(x) ∈ R d is measurable and bounded. This phenomenon is known as regularization by noise.
Regularization by noise of systems with singular drift has been widely studied in the past few years. Since the pioneering one dimensional work of Zvonkin [26] and its generalization to the multidimensional setting by Veretenikov [22] (where stochastic system with bounded drift and additive noise are handled), several authors extended the result. Krylov and Röckner [17] showed that SDE with additive noise and L p drift (where p depends on the dimension of the system) are also well-posed and Zhang [24] proved the case of multiplicative noise with uniformly elliptic and Sobolev diffusion matrix. More recently, Flandoli, Issoglio and Russo [11] studied the case of weak well-posedness of (1.3) for a distributional drift, i.e. in the Hölder space C α where α is strictly greater than −1/2 2 and Delarue and Diel [5] for Hölder regularity strictly greater than −2/3 in the one dimensional case. This last result has been generalized in any dimension by Cannizzaro and Chouk [2]. Also Catellier and Gubinelli [3] considered systems perturbed by fractional Brownian motion. We refer to the notes of Flandoli [12] for a general account on this topics.
In our case the setting is a bit different since the noise added in the system acts only by means of random drift (i.e. the system degenerates). Indeed, taking F 1 = 0, σ = I d , F 2 (t, x 1 , x 2 ) = x 1 + f 2 (x 2 ), the archetypal example of system (1.1) writes where the function f 2 is supposed to be only Hölder-continuous. Thus, the system can be seen as a classical ODE whose drift is perturbed by a Brownian motion: the perturbation is then of macroscopic type. We hence consider a regularization by stochastic drift. 3 The first work in that direction is due to Chaudru de Raynal [4] where strong well-posedness of (1.1) is proved when the drift is Hölder continuous with Hölder exponent w.r.t. the degenerate argument strictly greater than 2/3 and where the system is also supposed to be hypoelliptic. Since then, several Authors have studied the strong well-posedness of (1.1) with different approaches and have obtained, with weaker conditions, the same kind of threshold: in [23], the Authors used an approach based on gradient estimates on the associated semi-group to show that the system is strongly well-posed when the drift satisfies a Hölder-Dini condition with Hölder exponent of 2/3 w.r.t. the degenerate component; in [10], the Authors used a PDE approach and obtained strong well-posedness as soon as the drift is weakly differentiable in the degenerate direction, with order of differentiation of 2/3.
Hence, many techniques have been used to study the strong well posedness of such a system and all of them end with this particular threshold of 2/3. Thus, this critical value seems to be not an artefact, but something which is deeper and related to the nature of the system and of the well posedness as well. When investigating the sharpness of such a threshold, we obtained a class of counter examples for the weak uniqueness as soon as the Hölder regularity of the drift (of the degenerate component) is less than 1/3. Actually, this counter-example in somehow more general since it allows to obtain thresholds for the weak well-posedness of a large class of perturbation of (1.2) (including our present degenerate setting). This then leads us to investigate what could be expected in that case. Using the Zvonkin theory together with a martingale problem approach, we succeeded in extending the previously known results to this setting and obtain almost sharp weak well posedness result for the drift of the second component.

Strategy of proof.
Our strategy relies on the martingale problem approach of Stroock and Varadhan [21]. We indeed know that under our setting (coefficients with at most linear growth) the system (1.1) admits at least a weak solution. We then show that this solution is unique. To do so, we investigate the regularity of the (mild) solution of the associated PDE. Namely, denoting by Tr(a) the trace of the matrix a, "·" the standard Euclidean inner product on R d and L the generator of (1.1): we exhibit a "good" theory for the PDE set on the cylinder [0, T ) × R 2d with terminal condition 0 at time T and where the function f belongs to a certain class of functions F.
By "good", we mean that we can consider a sequence of classical solutions (u n ) n≥0 and associated derivative in the non-degenerate direction (D x 1 u n ) n≥0 along a sequence of mollified coefficients (F n 1 , F n 2 , a n ) n≥0 that satisfy a priori estimates depending only on the regularity of (F 1 , F 2 , a). By using Arzelá-Ascoli Theorem, this allows to extract a converging subsequence to the mild solution of (1.6) on every compact subset of [0, T ] × R 2d .
Hence, thanks to Itô's Formula, one can show that the quantity , is a martingale. By letting the class of function F be sufficiently rich, this allows us to prove uniqueness of the marginals of the weak solution of (1.1) and then of the law itself.
Here, the crucial point is that the operator is not uniformly parabolic: the second order differentiation operator in L only acts in the first (and non-degenerate) direction "x 1 ". Therefore, we expect a loss of the regularization effect w.r.t. the degenerate component of (1.1). Nevertheless, we show that the noise still regularizes, even in the degenerate direction, by means of the random drift: we can benefit from the hypoellipticity of the system.
The system (1.4) indeed relies on the so-called Kolmogorov example [16], which is also the archetypal example of hypoelliptic system without elliptic diffusion matrix. In our setting, the hypoellipticity assumption translates as a non-degeneracy assumption on the derivative of the drift function F 2 w.r.t. the first component. Together with the Hölder assumption, this can be seen as a weak Hörmander condition, in reference to the work of Hörmander [15] on degenerate operators of divergent form.
Let us emphasize that, from this Hörmander's framework point of view, the term of regularization by stochastic drift lies into the fact that one needs the vector field of the drift of the second component to be (uniformly) non degenerate w.r.t. the first spatial direction in order to make the family of Lie bracket associated with the vector fields of L spans the whole space, even the degenerate direction.
Our system appears as a non-linear generalization of Kolmogorov's example. Degenerate operators of this form have been studied by many authors see e.g. the works of Di Francesco and Polidoro [8], and Delarue and Menozzi [7]. We also emphasize that, in [19], Menozzi proved the weak well-posedness of a generalization of (1.1) with Lipschitz drift and Hölder diffusion matrix.
Nevertheless, to the best of our knowledge a "good" theory, in the sense mentioned above, for the PDE (1.6) has not been exhibited yet. We here prove the aforementioned estimates by using a first order parametrix (see [13]) expansion of the operator L defined by (1.5). This parametrix expansion is based on the knowledge of the related linearized and frozen version of (1.1) coming essentially from the previous work of Delarue and Menozzi [7].
Minimal setting to restore uniqueness. Obviously, all the aforementioned works, as well as this one, lead to the question of the minimal assumption that could be done on the drift in order to restore well-posedness. Having in mind that most of these works use a PDE approach, it seems clear that the assumption on the drift relies on the regularization properties of the semi-group generated by the solution. In comparison with the previous works, the threshold of 1/3 can be seen as the price to pay to balance the degeneracy of the system: the smoothing effect of the semi-group associated to a degenerate Gaussian process is less efficient than the one of a non-degenerate Gaussian process. We prove that our assumptions are (almost) minimal by giving a counter-example in the case where the drift F 2 is Hölder continuous with Hölder exponent just below 1/3.
Although this example concerns our degenerate case, we feel that the method could be adapted in order to obtain the optimal threshold (for the weak well posedness) in other settings. This is the reason why we wrote it in a general form. Let us briefly explain why and expose the heuristic rule behind our counter example.
It relies on the work of Delarue and Flandoli [6]. In this paper, the Peano example is investigated: namely, the system of interest is (1.7) The Authors studied the zero-noise limit of the system (ǫ → 0) pathwise. When doing so, they put in evidence the following crucial phenomenon: in small time there is a competition between the irregularity of the drift and the fluctuations of the noise. The fluctuations of the noise allow the solution to leave the singularity while the irregularity of the drift (possibly) captures the solution in the singularity. Thus, the more singular the drift is, the more irregular the noise has to be.
This competition can be made explicit. In order to regularize the equation, the noise has to dominate the system in small time. This means that there must exists a time 0 < t ǫ < 1 such that, below this instant, the noise dominates the system and pushes the solution far enough from the singularity, while above, the drift dominates the system and constrains the solution to fluctuate around one of the extreme solution of the deterministic Peano equation. A good way to see how the instant t ǫ looks like is to compare the fluctuations of the extreme solution (±t 1/(1−α) ) with the fluctuations of the noise. Denoting by γ the order of the fluctuations of the noise this leads to the equation which gives t ǫ = ǫ (1−α)/(1−γ(1−α)) and leads to the condition: The counter example, which also especially compares the fluctuations of the noise with the extreme solution, leads to the same threshold and says that weak uniqueness fails below this threshold.
Obviously, cases where α < 0 have to be considered carefully. But if we formally consider the case of a Brownian perturbation, we get γ = 1/2 and so α > −1, which is the sharp threshold exhibited in the recent work of Beck, Flandoli, Gubinelli and Maurelli [1].
In our setting, as suggested by the example (1.4), the noise added in (4.1) can be seen as the integral of a Brownian path, which gives γ = 3/2. We deduce from equation (1.8) that the threshold for the Hölder-regularity of the drift is 1/3. We finally emphasize that this heuristic rule gives another (pathwise) interpretation for our threshold in comparison with the one obtained in the non-degenerate cases. Since the noise added in our system degenerates, the fluctuations (which are typically of order 3/2) are not strong enough to push the solution far enough from the singularity when the drift is too singular (say less than C 1/3 ).
Organization of this paper. This paper is organized as follows. In Section 2, we give our main results: weak existence and uniqueness holds for (1.1) and almost sharpness of part of this result. Smoothing properties of PDE (1.6) are given in Section 3 as well as the proof of our main result. Then, we discuss on the almost sharpness of the result in Section 4 by giving a counter-example. Finally, the regularization properties of the PDE (1.6) are proved in Section 5.

Notations, assumptions and main results
Notations. In order to simplify the notations, we adopt the following convention: x, y, z, ξ, etc. denote the 2d−dimensional real variables ( t , X 2 t ) and, when necessary, we write (X t,x s ) t≤s≤T for the process defined by (1.1) which starts from x at time t, i.e. such that X t,x t = x. We denote by M d (R) the set of real d × d matrices, by "Id" the identity matrix of M d (R) we denote by B the 2d × d matrix: B = (Id, 0 R d ×R d ) * . We write GL d (R) the set of d × d invertible matrices with real coefficients. We recall that a denotes the square of the diffusion matrix σ, a := σσ * . The canonical Euclidean inner product on R d is denoted by "·". Subsequently, we denote by c, C, c ′ , C ′ , c ′′ etc. a positive constant, depending only on known parameters in (H), given just below, that may change from line to line and from an equation to another.
For any function from [0, T ] × R d × R d , we use the notation D to denote the total space derivative, we denote by D 1 (resp. D 2 ) the derivative with respect to the first (resp. second) d-dimensional space component. In the same spirit, the notation D z means the derivative w.r.t the variable z. Hence, for all integers n, D n z is the n th derivative w.r.t z and for all integers m the n × m cross differentiations w.r.t z, y are denoted by D n z D m y . Furthermore, the partial derivative ∂/∂ t is denoted by ∂ t .
Assumptions (H). We say that assumptions (H) hold if the following assumptions are satisfied.
(H1) Regularity of the coefficients: there exist 0 < β j i ≤ 1, 1 ≤ i, j ≤ 2 and three positive constants C 1 , C 2 , C σ such that for all t in [0, T ] and all (x 1 , x 2 ) and (y 1 , Moreover, the coefficients are supposed to be continuous w.r.t the time and the exponents β 2 i , i = 1, 2 are supposed to be strictly greater than 1/3. Note that β 1 2 = 1 but we keep it for notational. (H2) Uniform ellipticity of σσ * : The function σσ * satisfies the uniform ellipticity hypothesis: (H3-a) Differentiability and regularity of is continuously differentiable and there exist 0 < η < 1 and a positive constantC 2 such that, for all (t, We emphasize that this implies that Here are the main results of this paper: Theorem 2.1. Suppose that assumption (H) hold and in addition that Then, there exists a unique weak solution to (1.1).
Remark. Let us tell more about the regularity assumed on F 2 w.r.t. the non degenerate variable x 1 : the uniform differentiability assumption together with the non-degeneracy of the derivative (H3-b) are crucial in that framework because they guarantee the hypoellipticity of the solution and so it allows the noise to propagate through the second component.
We emphasize that one may object, as it is proved in the work [25] and [10], that the optimal condition on the drift of the non degenerate component F 1 that ensures weak uniqueness is an appropriate integrability condition. This is, in our opinion, true. But the main thing is that in those two works the system considered is linear in the second component: F 2 (t, x 1 , x 2 ) = x 1 . This permits, in particular, to use Girssanov Theorem and to reduce the system to a degenerate Ornstein Uhlenbeck process. Here, the dependence of F 2 upon x 2 as well as the non linearity break down the arguments and as far as we can see the generalization in our case is non trivial.
The way we apply our parametrix method to investigate the smoothing properties of the semi-group of (1.1) is somehow global, in the sense that it does not allow to differentiate the different components (F 1 , F 2 ) of the drift of (1.1). We then have to ask same kind of assumptions on the drift function F 1 and the linearized drift of F 2 (see system (5.2) below). This is the reason why both β 1 1 and η are strictly positive and β 2 1 and β 2 2 are supposed to be strictly greater than 1/3 although our counter example only holds for the Hölder exponent of the drift of the second component in the degenerate variable. Finally, let us mention that in [20], the Author proved L p estimate for the the semi-group of (1.1) by using parametrix approach when the drift functions are supposed to be Lipschitz in space. This could be a first step to extend our result in further work.
Finally, let us comment the almost sharpness of this result namely Theorem 2.2 which will be proved in Section 4. This exponent of 1/3 can be immediately deduced from the heuristic rule previously given. Concerning the critical value for the Hölder exponent (1/3) our feeling in that weak uniqueness holds but we were not able to prove this result especially because our approach is not adapted to reach this critical case.

PDE result and proof of Theorem 2.1
Let us first begin by giving the smoothing properties of the PDE (1.6). Let (F n 1 , F n 2 , a n ) n≥0 be a sequence of mollified coefficients (say infinitely differentiable with bounded derivatives of all orders greater than 1) satisfying (H) uniformly in n that converges to (F 1 , F 2 , a) uniformly on [0, T ]×R d ×R d (such an example of coefficients can be found in [4]). Let us denote by (L n ) n≥0 the associated sequence of regularized versions of the operator L defined by (1.5). We have the following result.
Moreover, there exist a positive T 3.1 , a positive δ 3.1 and a positive ν, depending on known parameters in (H) only, such that for all T less than T 3.1 the solution of the regularized PDE (1.6) with source term f satisfies: where Moreover, each classical solution u n is uniformly bounded on every compact subset of Let us just notice that the constant C above depends also on the class of functions F though the Lipschitz norm of functions belonging to this class.
Proof. The proof of this result is postponed to Section 5.
We are now in position to prove uniqueness of the martingale problem associated to (1.1). Under our assumptions, it is clear from Theorem 6.1.7 of [21] that the system (1.1) has at least one weak solution (the linear growth assumption assumed here is not a problem to do so).  6) with source term f and let (X 1 , X 2 ) be a weak solution of (1.1) starting from x at time 0. Let now suppose that T is less than T 3.1 given in Theorem 3.1. Applying Itô's Formula on u n (t, X 1 t , X 2 t ) we obtain that since u n is the solution of the regularized version of (1.6) and where we recall that B is the 2d × d matrix: Thanks to Theorem 3.1 and Arzelà -Ascoli Theorem, we know that we can extract a subsequence of (u n ) n≥0 and (D 1 u n ) n≥0 that converge respectively to a function u and D 1 u uniformly on compact subset of [0, T ] × R d × R d . Thus, together with the uniform convergence of the regularized coefficients, we can deduce that is a P-martingale by letting the regularization procedure tend to the infinity.
Let us now come back to the canonical space, and let P andP be two solutions of the martingale problem associated to (1.1) with initial condition (x 1 , x 2 ) in R d × R d . Thus, for all continuous in time and Lipschitz in space functions f : [0, T ] × R d × R d → R we have from (3.3) (recall that by definition we have that u(T, ·, ·) = 0), so that the marginal law of the canonical process are the same under P andP. We extend the result on R + thanks to regular conditional probabilities, see [21] Chapter 6.2. Uniqueness then follows from Corollary 6.2.4 of [21].

Counter example
We here prove Theorem 2.2. As we said, we feel that this counter example does not reduce to our current setting. Hence, we wrote it in a general form in order to adapt it to different cases. Let W be a random process with continuous path satisfying, in law, and E|W 1 | < +∞ for some given γ > 0. Let α < 1 and c α := (1 − α) 1/(1−α) . We suppose that W and α are such that there exists a weak solution of for any x ≥ 0 that satisfies Kolmogorov's criterion. Given 0 < β < 1 we define for any continuous path Y from R + to R the variable τ (Y ) as We now have the following Lemma whose proof is postponed to the end of this section: Lemma 4.1. Let X be a weak solution of (4.1) starting from some x > 0 and suppose that α < 1−1/γ. Then, there exists a positive ρ, depending on α, β, γ and E|W 1 | only such that We are now in position to give our counter-example. Note that if (X, W) is a weak solution of (4.1) with the initial condition x = 0, then, (−X, −W) is also a weak solution of (4.1). So that, if uniqueness in law holds X and −X have the same law.
Let us consider a weak solution X n of (4.1) starting from 1/n, n being a positive integer. Since each X n satisfies Kolmogorov's criterion, the sequence of law (P 1/n ) n≥0 of X n is tight, so that we can extract a converging subsequence (P 1/n k ) k≥0 to P 0 , the law of the weak solution X of (4.1) starting from 0. Since the bound in (4.2) does not depend on the initial condition we get that P 0 (τ (X) ≥ ρ) ≥ 3/4, and, thanks to uniqueness in law P 0 (τ (−X) ≥ ρ) ≥ 3/4, which is a contradiction. Choosing W = · 0 W s ds, so that γ = 3/2, we get that weak uniqueness fails as soon as α < 1 − 1/γ = 1/3. We now prove Lemma 4.1 which allows to understand how the threshold above, exhibited in the introduction, also appears in our counter-example.

Smoothing properties of the PDE
This section is dedicated to the proof of Theorem 3.1. This proof is in the same spirit and uses the same kind of tools as the one introduced in the previous work [4]. We nevertheless emphasize that our analysis is here quite different: although the tools are the same, the objective differs from the previously mentioned work. Firstly, our PDE does not have the same source term; secondly the controls we want to obtain on the solution are weaker; thirdly our regularity assumptions on the coefficients of the operator L in (1.6) are weaker so that we have to be careful.
Our main strategy for proving Theorem 3.1 rests upon parametrix approach (see [18], [13]). This perturbation approach consists in expanding the operator L around a well chosen proxy, denoted bỹ L, which enjoys suitable properties and can be handled. Hence we rewrite our PDE (1.6) as: and we call parametrix kernel the second term on the right hand side of the equation above (i.e. the approximation error). By doing so, we obtain a representation of our PDE solution in terms of a time space convolution of the source term and the parametrix kernel against the fundamental solution of our proxy. This is the reason why the choice of this proxy as well as its property are crucial in our analysis.
The idea is to obtain a Gaussian approximation which means that we aim at obtaining a proxyL which is the generator of a Gaussian process. This Gaussian process has to be as closed as possible of our original process. Having in mind the Kolmogorov Example, this means that the dependence w.r.t. the noise of each component has to be of linear form (see Section 2 of [4] for more details or [7] for a more general account on this topic).
We thus end with a proxyL which is an operator whose coefficients are the linearized version of the one of the operator L. As already emphasized, it is the generator of a degenerate Gaussian process relying on Kolmogorov Example. Namely this process has a transition density whose covariance matrix is homogeneous to the covariance matrix of a Brownian motion and its time integral. This proxy is not new, it has been introduced by Delarue and Menozzi in [7] and then successfully used by Menozzi in [19,20] to prove weak well posedness of a generalization of (1.1) when the drifts coefficients are Lispchitz continuous and the diffusion matrix σ is respectively Hölder continuous and solely continuous.
Since our Gaussian proxy (process, semi-group, generator, transition density) plays a central role in our analysis, we dedicated the first part of this section (Subsection 5.1) to the study and the properties of such a system. These tools have already been introduced in [4], we then collect from this work all the necessary ingredients and let the reader check the proof in the aforementioned paper. Then, we prove Theorem 3.1 in Subsection 5.2.
Let us notice that Theorem 3.1 concerns the solution of the regularized version of (1.6). Thus, for the sake of clarity, we forget the superscript n that follows from the regularization procedure and we suppose throughout this section that the coefficients F 1 , F 2 and a := σσ * are smooth (say infinitely differentiable with bounded derivatives of all orders greater than one). We then specify the dependence of the constants when necessary.

The frozen system
We here defined our Gaussian proxy. As already underlined, this process is a linear (w.r.t. the noise) version of (1.1). Because of our degenerate setting we have to be careful when doing this linearization: it has to be done around the forward flow associated to the deterministic version of (1.1) (i.e. when σ = 0 therein). Namely given any frozen point (τ, ξ) in [0, T ] × R 2d , we consider the following system which is well posed under our regularized framework and we extend the definition of its solution on [0, τ ) by assuming that for all (v > r) in [0, T ] 2 , for all ξ in R 2d , θ v,r (ξ) = 0. Given the solution (θ τ,s (ξ)) s≤T of this system, we define the linearized and frozen version of (1.1): τ,s (ξ)) ds and uniformly non-degenerate covariance matrix (Σ t,s ) t≤s≤T : where: (ii) This solution is a Gaussian process with transition density: for all s in (t, T ]. (iii) This transition densityq is the fundamental solution of the PDE driven byL τ,ξ and given by: (iv) There exist two positive constants c and C, depending only on known parameters in (H), such thatq (t, x 1 , x 2 ; s, y 1 , y 2 ) ≤ Cq c (t, x 1 , x 2 ; s, y 1 , y 2 ), (5.7) whereq , for all s in (t, T ] and any integers N x 1 , N x 2 , N y 1 less than 2.
The second assertion in (iii) follows from the same arguments. The last assertion (iii) of the Proposition follows from the Gaussian decay ofq. Indeed, by definition we have Note that for all s in [t, T ], the mean (m 1,t,x t,s (x), m 2,t,x t,s (x)) satisfies the ODE (5.1) with initial data (t, x). Hence, the forward transport function defined by (5.1) with the initial data (τ, ξ) = (t, x) is equal to the mean: θ t,s (x) = m t,x t,s (x) (recall that under our regularized setting (5.1) has a unique solution). We deduce the result by letting (τ, ξ) = (t, x) and by using the following inequality: For the proof of the last assertion, note that on S we have for any measurable function ϕ : for every 0 <ν < 1. The last inequality essentially follows from the definitions (5.10) and (5.11), the form ofq and convexity inequality (see also the computations done in the proof of Claim 4.4 p29 of [4]).

Estimation of the solution
Let us now expand the regularized solution of (1.6) to a a first order parametrix: we rewrite this PDE as on [0, T ) × R 2d with terminal condition 0 at time T . Thus, using the definitions given in the previous subsection, we obtain that for every (t, x) in [0, T ] × R 2d , the solution u writes Tr (a − a(s, θ t,s (ξ)))D 2 1 u (s, x) ds (5.14) =: T t I 1,ξ t,s (x) + I 2,ξ t,s (x) + I 3,ξ t,s (x) + I 4,ξ t,s (x)ds, (5.15) by choosing τ = t. We made this choice for the freezing time τ in the following.
We next assume without loss of generality that T < 1. We are now in position to prove the main estimates of Theorem 3.1. This is done by proving the following results and then using a circular argument: letting the time interval being sufficiently small to obtain each estimate separately.
Proposition 5.4. There exists four positive constants C 1 , C 2 , C ′ and C ′′ , and three positive numbers δ, δ ′ and δ ′′ , depending on known parameters in (H) only, such that: where || · || ∞,∞,ν is defined by (3.2) and such that: Before entering in the (long) proof of this result, let us mention that one may guess that the bound on the regularized solution depends on the regularity of the source term of the PDE f . This is indeed true, and this is why there is a 1 in the bounds above.
The main strategy consists in estimating the derivatives of the time integrands I j , j = 1, . . . , 4, of the representation (5.14) and then to invert the differentiation and integration operators. Hence, int the following, we estimate the derivatives of these time integrands and investigate their Hölder regularity. Without loss of generality, we suppose in the following that T < 1.
By using the regularity of the coefficients assumed in (H), Lipschitz regularity on f and expanding F 2 around the forward transport θ: By letting ξ = x we obtain from estimate (5.12) in Proposition 5.3 that D n x 1 I 1,ξ t,s (x) + I 2,ξ t,s (x) + I 3,ξ t,s (x) + I 4,ξ t,s (x) where all the time-singularities in the right hand side are integrables. Therefore Proof of estimate (5.17). We now estimate the derivative of the solution in the degenerate direction.
We are unfortunately not able to repeat the previous strategy since the Gaussian smoothing (see eq. (5.13)) of part of the coefficients are not strong enough to smooth the time singularity coming from the derivative (of order −3/2) in the degenerate direction. To overcome this problem we are lead to re-center some of the terms around the solution of the PDE (i.e. to use the second estimate in (ii) of Proposition 5.3 on the function D 1 u): namely, I 2 and I 4 in (5.14). In order to clarify our exposition, we hence estimate each term I j,ξ t,s , j = 1, . . . , 4 defined in (5.14) separately. From Proposition 5.3, (ii) and by using the Lipschitz regularity of f and the regularity of the coefficients assumed in (H) where ||D 1 u|| ∞,∞,ν is defined by (3.2) and ∆ 2 ν is defined by (5.9) and where we used the second estimate in (ii) of Proposition 5.3 on D 1 u. We also have We now deal with the term I 4 which is the delicate part. We first define for all measurable function ϕ : [0, T ] × R d × R d → R, for all t < s in [0, T ] 2 , ξ, x in R 2d and l = 1, . . . , d: ϕ(s, y)D y 1lq (t, x; s, y)dy, (5.23) where y 1l denotes the l th component of the d-dimensional variable y 1 . Note that the derivative in the integral above is w.r.t. the integration variable.
Hence, collecting all the previous estimates ((5.20), (5.21), (5.22), (5.24)) we eventually deduce By letting ξ = x we obtain from estimate (5.12) in Proposition 5.3 that Since β 2 j > 1/3 (condition (2.1)), and ν is given by (5.19), all the time-singularities of the right hand side above are integrable on (t, T ]. Hence, we deduce from (5.14) and the estimate above that there exists a positive δ ′ , depending on known parameters in (H) only, such that: Proof of estimate (5.18). Finally, we compute the Hölder semi norm of D x 1 u. Let x 2 = z 2 belong to R d . We have from (5.14): We first estimate for any s in (t, T ] the quantity: To do this, we split the time interval w.r.t. the characteristic time-scale of the second space variable: let S := s ∈ (t, T ] : |x 2 − z 2 | ≤ (s − t) 3/2 . Note that on S, the result is quite obvious: from the last assertion of Proposition 5.3 we have from (H) that (5.26) is bounded on S by (see the computations already done for the proof of (5.16)): where we chose ξ = x. Note that this term is integrable on (t, T ] for allν < (1 + β 1 1 )/3. We now estimate (5.26) on S c . On a first hand, we have from the computations done when estimating D x 1 u that: since for any positive numberν on S c we have 1 ≤ (s − t) −3ν/2 |x 2 − z 2 |ν and since we chose ξ = x and then used Proposition 5.3. We emphasize that this is the same bound as (5.27) so that all the time singularity above are again integrables providedν < (1 + β 1 1 )/3. It thus only remains to estimate the last part of (5.26) on S c , namely D x 1 4 j=1 I j,ξ t,s (x 1 , z 2 ) = D x 1 P ξ t,s f (s, x 1 , z 2 ) − P ξ t,s (F 1 − F 1 (s, θ t,s (ξ))) · D 1 u (s, x 1 , z 2 ) − P ξ t,s (F 2 − F 2 (s, θ t,s (ξ)) − D 1 F 2 (s, θ t,s (ξ))) · D 2 u (s, x 1 , z 2 ) − P ξ t,s 1 2 Tr (a − a(s, θ t,s (ξ)))D 2 1 u (s, x 1 , z 2 ) .
The main issue here is that the Gaussian smoothing of Proposition 5.3 can not be applied immediately (last part of (iii)), the semi-group being evaluating at point (s, x 1 , z 2 ) and the freezing point being previously chosen as ξ = (x 1 , x 2 ). To smooth the time singularity our control has to be of the form P x t,s |(· − m t,x t,s (x 1 , z 2 )) 2 | γ (s, x 1 , z 2 ), which is (see the proof of assertion (iii) of Proposition 5.3) bounded by C(s − t) 3γ/2 . To do so, the main idea consists in re-centering all the terms above around m 2,t,ξ t,s (x 1 , z 2 ). When choosing ξ = x, we end with the remaining difference |θ 2 t,s (x) − m 2,t,x t,s (x 1 , z 2 )| which is, fortunately, bounded by |x 2 − z 2 | (this follows from the definition (5.1) and (5.3) of θ and m respectively).