Polynomial 3-mixing for smooth time-changes of horocycle flows

Let $(h_t)_{t\in \mathbb{R}}$ be the horocycle flow acting on $(M,\mu)=(\Gamma \backslash \text{SL}(2,\mathbb{R}),\mu)$, where $\Gamma$ is a co-compact lattice in $\text{SL}(2,\mathbb{R})$ and $\mu$ is the homogeneous probability measure locally given by the Haar measure on $\text{SL}(2,\mathbb{R})$. Let $\tau\in W^6(M)$ be a strictly positive function and let $\mu^{\tau}$ be the measure equivalent to $\mu$ with density $\tau$. We consider the time changed flow $(h_t^\tau)_{t\in \mathbb{R}}$ and we show that there exists $\gamma=\gamma(M,\tau)>0$ and a constant $C>0$ such that for any $ f_0, f_1, f_2\in W^6(M)$ and for all $0=t_0<t_1<t_2$, we have $$\ \left|\int_M \prod_{i=0}^{2} f_i\circ h^\tau_{t_i} d \mu^\tau -\prod_{i=0}^{2}\int_M f_i d \mu^\tau \right|\leq C \left(\prod_{i=0}^{2} \|f_i\|_6\right) \left(\min_{0\leq i<j\leq 2} |t_i-t_j|\right)^{-\gamma}.$$ With the same techniques, we establish polynomial mixing of all orders under the additional assumption of $\tau$ being fully supported on the discrete series.

With the same techniques, we establish polynomial mixing of all orders under the additional assumption of τ being fully supported on the discrete series.

Unipotent flows and their time-changes
Unipotent flows on compact (or, in general, finite volume) quotients of Lie groups are homogeneous flows given by the action of one-parameter unipotent subgroups. An important example of a unipotent flow is the horocycle flow on compact quotients Γ\ SL(2, R) of SL(2, R), defined by multiplication on the right by 1 t 0 1 . Identifying Γ\ SL(2, R) with the unit tangent bundle of the compact hyperbolic surface S = Γ\H, the horocycle flow is the unit speed parametrization of translations along the stable leaves of the geodesic flow on T 1 S. Dynamical properties of horocycle flows have been studied in great details and are now well-understood: they have zero entropy [14], in the compact setting are minimal [15], uniquely ergodic [13], mixing and mixing of all orders [20], and have countable Lebesgue spectrum [22] (mixing and spectral properties hold for general finite volume quotients). Finer ergodic properties were investigated by Ratner [23,24].
Another important class of unipotent flows is given by nilflows on nilmanifolds, namely homogeneous flows on compact quotients of (non-abelian) nilpotent Lie groups. The prototypical examples of nilflows are Heisenberg nilflows on quotients of the 3-dimensional Heisenberg group.
One key feature of unipotent flows, in particular of the horocycle flow, is a form of slow divergence: the distance between nearby points lying on different orbits grows at most polynomially in time (quadratically, in the case of horocycle flows). This property is in sharp contrast with the dynamics of hyperbolic flows, such as the geodesic flow, for which the divergence of orbits is exponential. Unipotent flows are hence examples of smooth parabolic flows, namely smooth flows for which nearby points diverge polynomially in time.
Outside the homogeneous setting, very little is known for general smooth parabolic flows, even for smooth perturbations of homogeneous ones. Perhaps the simplest case of such perturbations are smooth time-changes, or time-reparametrizations. Roughly speaking, a smooth time-change of a flow is obtained by moving along the same orbits, but varying smoothly the speed of the points. In other words, a smooth time-change is defined by rescaling the generating vector field by a smooth function τ , called the generator of the time-change, see Section 2.1 for definitions. A time-change is said to be trivial if its generator is a quasi-coboundary for the flow, see Section 2.1. It is easy to see that trivial time-changes are isomorphic to the original flow.
On the other hand, performing a non-trivial smooth time-change can alter significantly the ergodic properties of the flow. This is the case, for example, of ergodic nilflows. Indeed, nilflows are never weakly mixing, because of the presence of a toral factor, corresponding to the projection onto the abelianization of the nilpotent group. Nevertheless, non trivial time-changes, within a natural class of "polynomial" functions on the nilmanifold, destroy the toral factor and are strongly mixing, as was shown by Avila, Forni, Ulcigrai, and the second author in [1], extending previous results in [2] and in [28]. For time-changes of bounded type Heisenberg nilflows, one obtains an even stronger dichotomy, [12]: either the time-change is trivial (in which case the toral factor-persists), or the time-changed flows is mildly mixing (it has no non-trivial rigid factors).
In the case of the horocycle flow, the study of the cohomological equation by Flaminio and Forni [8] imply that a generic time-change of the horocycle flow is non-trivial and thus, by the rigidity result of Ratner [25], not even measurably conjugated to the horocycle flow itself. Hence, non-trivial time-changes form an important family of smooth nonhomogeneous parabolic flows. Similarly to the unperturbed horocycle flow, they are mixing, as was shown by Marcus [19]. Moreover, it was conjectured by Katok and Thouvenot, [18], that sufficiently smooth time changes of horocycle flows have countable Lebesgue spectrum. Lebesgue spectral type for smooth time-changes was proved by Forni and Ulcigrai [10] (independently, Tiedra de Aldecoa [30] obtained the absolute continuity property). The full version of the Katok-Thouvenot conjecture (countable multiplicity) was recently obtained in [7].
However, as it happens for nilflows, other finer properties of non trivial time-changes are different from their homogeneous counterpart. One such example is the set of joinings between their rescalings: whilst all rescalings of the horocycle flow are isomorphic to each other, the first author, Lemańczyk and Ulcigrai [17], and Flaminio and Forni [9] independently, showed that different rescalings of non-trivial time-changes are always disjoint.

Quantitative mixing
Let k ∈ N, k 2. We recall that a measure preserving flow {ϕ t : M → M } t∈R on a probability space (M, B, µ) is said to be k-mixing if for any f 0 , . . . , We say that ϕ t is mixing of all orders if it is k-mixing for all k 2. In the case of the horocycle flow, it follows from [27] that Ratner's property persists under smooth time-changes, hence all smooth time changes of the horocycle flow are mixing of all orders.
Under some regularity assumptions on the observables f i , one can ask about the rate of decay in the limit (1) in terms of the minimum |t i − t j | for i = j. It turns out that in the parabolic setting, quantitative 2-mixing is more tractable than quantitative higher order mixing as we describe below.

Quantitative 2-mixing
For parabolic flows (i.e. flows of intermediate orbit growth), quantitative 2-mixing is in most cases based on controlled (quantitative) stretching of certain curves by the flow. Ratner, [26] proved that the rate of 2-mixing of the horocycle flow is polynomial, namely she showed that there exists an explicit γ > 0, depending only on the co-compact lattice Γ, such that for all Moreover, it can be shown that this bound is optimal. In the case of time-changes of horocycle flows, quantitative mixing estimates were obtained by Forni and Ulcigrai in [10], although they are conjecturally not optimal. Their result is based on sharp bounds on ergodic integrals of the horocycle flow proved by Flaminio and Forni in [8] and refined by Bufetov and Forni for "horocycle-like" arcs in [4], together with stretching of geodesic curves. For other parabolic flows, Forni and the first author in [11] showed that, for a full dimensional set of Heisenberg nilflows and for a generic set of smooth time-changes, if the time-change is not trivial, the rate of mixing is polynomial. This is the only quantitative result available for mixing properties of time-changes of nilflows.
A shearing phenomenon analogous to the one described above is at the base of several results on quantitative 2-mixing for non-homogeneous parabolic flows, see e.g., [6], [28], [7]. We will use a version of this mechanism in this paper as well, see the proof of Theorems 1 and 3 in Sections 4 and 5.4.

Quantitative higher order mixing
Quantitative higher order mixing (in particular, 3-mixing) for parabolic flows is much harder to get and, until recently, there were no results in the literature on this problem. The main reason for this is that mechanisms for obtaining higher order mixing are, by their very nature, non-quantitative: singular spectrum criterion of Host [16], Ratner's property [24], or Marcus multiple mixing mechanism [20].
The first, and to the best of our knowledge the only, quantitative higher order mixing result for parabolic systems appears in the very recent work of Björklund, Einsiedler, and Gorodnik [3], where the authors proved a very general quantitative result for multiple mixing of group actions which, in the very specific case of the regular action of SL(2, R), implies that, for all k 2, the rate of k-mixing of the horocycle flow is polynomial. Such results are difficult to obtain for non-homogeneous flows, and in particular for non trivial time-changes of unipotent flows, since one cannot exploit the algebraic properties of the actions and the powerful representation theory machinery.

Statement of the main results
In this paper, we establish polynomial 3-mixing estimates for any smooth time-change of the horocycle flow, see Theorem 1 below. To the best of our knowledge, this is the first quantitative mixing result beyond 2-mixing for smooth non-homogeneous parabolic flows.
Let (h t ) t∈R be the horocycle flow on (M, µ), where M = Γ\ SL(2, R) is compact and µ is locally given by the Haar measure. Let W 6 (M ) ⊂ L 2 (M ) denote the standard Sobolev space or order 6 (see Section 2.3 for definitions), and let τ ∈ W 6 (M ) be a positive function. We consider the time changed flow (h τ t ) t∈R generated by τ as defined in Section 2.1. The following is our main result. Theorem 1. Let τ ∈ W 6 (M ) be a positive function. There exists γ = γ(M, τ ) > 0 and a constant C > 0 such that for any f 0 , f 1 , f 2 ∈ W 6 (M ) and for all 0 = t 0 < t 1 < t 2 , we have With the same techniques, we are able to prove polynomial mixing of all orders only for time-changes supported on the discrete series H d (see Section 2.3 for definitions). The proof however present several additional technical difficulties compared to the 3-mixing case, hence we present it in the Appendix 5.
Theorem 2. Let τ ∈ W 6 (M ) ∩ H d be a positive function, and let k ∈ N. There exists γ = γ(M, k, τ ) > 0 such that for any f 0 , . . . , f k−1 ∈ W 6 (M ) there exists C = C(f 0 , . . . , f k−1 ) > 0, such that for all 0 = t 0 < t 1 < . . . < t k−1 , we have The driving idea of the proof is refining Marcus' approach for multiple mixing of the horocycle flow in [20] by making it quantitative. Our argument shares some similarities with the one in [3], notably in exploiting the shearing of a transverse vector field under the action (see in particular [3, §7.2]). For homogeneous flows, the push-forward of leftinvariant vector fields is given by the Adjoint, which can be controlled using the algebraic structure of the group, see [3, §2]. In our setting, however, due to the non-homogeneous structure of the flow, we employ a more geometric approach and we exploit precise bounds on the growth of ergodic integrals and good quantitative control of the (non-uniform) stretching of geodesic curves. In the proofs of Theorem 1 and Theorem 2, the problem is reduced to study the L 2 norm of some multiple ergodic averages, see Propositions 3.5 (and the more general version in Proposition 5.2), which are estimated using a sharp quantitative version of van der Corput inequality (Lemma 3.1). We hope that the local mechanism that we use has the potential to be applied to other non-homogeneous flows, such as time-changes of higher step nilflows, or some smooth surface flows.
We should emphasize that at this moment we do not know how to generalize Theorem 2 to higher order correlations (for functions having non-trivial support outside the discrete series). The main reason is that in the case of 3-mixing we face one of the two situations: either t 1 and t 2 are of similar order (in which case it is possible to apply Proposition 3.5) or t 1 is much smaller than t 2 (in which case we use the fact that appropriate length geodesic segments are not stretched for time t 1 , whereas they stretch for time t 2 − t 1 and we use invariance of measure). The reader will notice that in both cases the choice of the length σ of the geodesic segments is rather delicate. This mechanism seems not to work even for the case of 4-mixing especially in the case if t 1 is much smaller than t 3 and of order t 3 −t 2 : on one hand, a meaningful estimate using Proposition 5.2 would force σ to be larger than (t 3 − t 2 ) −1 , on the other hand, controlling the deviations from the homogeneous case requires σ to be smaller than some negative power of t 3 , and hence an appropriate choice of σ is not possible. We can handle this problem assuming additionally that the time change τ is fully supported on the disrecte series (Theorem 2): in this case the deviation of ergodic averages for τ are logarithmic (see Lemma 2.4). We then inductively get polynomial k + 1-mixing from polynomial k-mixing (using logarithmic deviation bounds for the time change).

Time changes of flows
Let (ϕ t ) be a flow on (X, B, µ) and let τ ∈ L 1 (X, µ) be a strictly positive function. Then, for a.e. x ∈ X, for every t ∈ R, there exists a unique solution u = u(x, t) of The function u(x, ·) defined this way is an R-cocycle, i.e. for t 1 , t 2 ∈ R, we have u(x, t 1 + t 2 ) = u(x, t 1 ) + u(ϕ t 1 x, t 2 ). We define the time-change flow (ϕ τ t ) t∈R induced by τ by setting ϕ τ t (x) = ϕ u(x,t) (x), and we say that τ is its generator. Since u(x, ·) is a cocycle, the latter equality defines an R-action. Moreover, (ϕ τ t ) t∈R preserves the measure µ τ given by dµ τ = τ X τ dµ dµ. We will always WLOG assume that X τ dµ = 1. Since the flow (ϕ t ) has the same orbits as any of its time-changes, and since the invariant measure µ τ is equivalent to µ, ergodicity is preserved when performing a smooth timechange. Mixing and other spectral properties, however, are more delicate, as discussed in the introduction.
We say that a function τ is a quasi-coboundary for (ϕ t ) t∈R if there exists a measurable solution ξ : It follows that if τ is quasi-coboundary, then (ϕ t ) t∈R and (ϕ τ t ) t∈R are isomorphic. We call such time changes trivial.

Horocycle and geodesic flows
Let G := SL(2, R) be the group of 2 × 2 matrices with determinant 1 and let µ be the Haar measure on G. We denote the lie algebra of G by g, which consists of 2 × 2 matrices of zero trace. Let U, X, V ∈ g be given by Then U, X, V are generators of respectively the (stable) horocycle, geodesic and opposite (unstable) horocycle flow. We will be dealing with flows generated by U and X. More precisely, let exp : g → G be the exponential map and let Γ ⊂ G be a co-compact lattice in G. We will consider the following R-actions on the homogeneous space M := Γ\G: the horocycle flow and the geodesic flow g t (Γx) = Γx exp(tX).
The flows h t and g t both preserve a smooth measure on M , locally given by the Haar measure µ, which we will denote also by µ. Recall that the horocycle and geodesic flows satisfy the following renormalization equation

Spectral theory of horocycle flows
We will briefly recall some facts from the spectral theory of horocycle flows, for details see e.g. [8]. Let be the generator of the maximal compact subgroup SO(2) of G = SL(2, R). Let H = L 2 (M, µ) be the Hilbert space of square integrable functions on M = Γ\G, on which G acts unitarily. We define the Laplacian by setting ∆ := −(X 2 + U 2 /2 + V 2 /2); it is an elliptic element of the universal enveloping algebra of g which acts as an essentially selfadjoint operator on H. Remark that ∆ on SO(2)-invariant functions coincides with the Laplace-Beltrami operator on the compact hyperbolic surface S = Γ\H. The Sobolev space of order s > 0, W s (M ), is defined as the completion of the space C ∞ (M ) of infinitely differentiable functions with respect to the inner product We will denote by · 6 the norm in W 6 (M ).
Let := −X 2 − (V + Θ) 2 + Θ 2 = ∆ + 2Θ 2 be the Casimir operator, a generator of the centre of the universal enveloping algebra of g. By the classical theory of unitary representations of SL(2, R), we have the following orthogonal decomposition into irreducible components, listed with multiplicity: The decomposition above induces a corresponding decomposition of the Sobolev spaces W r (M ), for all r > 0.
We call H p the principal series, H c the complementary series, and H d the discrete series. On each irreducible representation H µ , the Casimir operator acts as multiplication by the constant µ. The representation H 0 is the trivial representation and appears with multiplicity 1. We recall that the positive eigenvalues µ of the Casimir operator coincide with the eigenvalues of the Laplace-Beltrami operator on the surface S = Γ\H, in particular there is a spectral gap: there exists µ 0 > 0 such that (0, µ 0 ) ∩ Spec( ) = ∅. Let us further define In the second part of the paper, Appendix 5, we will be interested in functions τ ∈ H d . We remark that it follows from a recent work of D. Dolgopyat and O. Sarig [5] that functions coming from non-zero harmonic forms are not measurable coboundaries; in particular, there exist positive functions in H d which are not measurable quasi-coboundaries, and hence generate a time-change which is not measurably conjugate to the horocycle flow.

2-mixing estimates for time changes of the horocycle flow
Let us denote by (h τ t ) the time change of the horocycle flow (h t ) induced by the positive function τ . We make a standing assumption that M τ dµ = 1.
We recall a result of G. Forni and C. Ulcigrai, [10], on estimates of rates of 2-mixing for time-changes of the horocycle flow. In the homogeneous setting, optimal rates of mixing for the classical horocycle flow were obtained by M. Ratner in [26].
Lemma 2.1 (Theorem 3, [10]). Let τ ∈ W 6 (M ), τ > 0 and let (h τ t ) denote the time change induced by τ . There exists a constant C 0 = C 0 (τ ) > 0 such that for any functions f, g ∈ W 6 (M ) and any t > 1, we have In order to prove Theorem 3 in [10], the authors establish the following lemma, which will be useful for us as well.

Deviation of ergodic averages
We will first state a result on the growth of ergodic integrals, which is a straightforward consequence of Theorem 1.5 in [8].
There exists a constant C 2 such that for every 0 < s < 1, every T > 1 and every x ∈ M , we have Moreover, if τ ∈ W 6 (M ) ∩ H d , the integral above is bounded by C 2 s log T .
Proof. By Theorem 1.5 in [8], we have which finishes the proof by the choice of β and by taking C 2 = CC ′ . If we further assume that τ belongs to the discrete series, the estimate follows again from Theorem 1.5 in [8], after noticing that the space H d is invariant for the action of the geodesic flow g s , so that τ − τ • g s ∈ H d for any 0 < s < 1.
We remark that, since any non-trivial time change destroys the homogeneous structure, the commutation relation (4) does not in general hold for time-changes. Below we state an important lemma which estimates the error in the renormalization formula for the time changed flow. Lemma 2.4. Let τ ∈ W 6 (M ), τ > 0. There exists C 3 > 0 such that for every x ∈ M , 0 < s < 1 and T > 1, we have Proof. Let A(x, s, T ) be such that u(x, e s T + A(x, s, T )) = e s u(g s x, T ).
Notice that for every fixed x ∈ M , the function u(x, ·) is strictly increasing, hence the term A(x, s, T ) as in (5)  Changing variables r = e s t and using (4), we get Therefore, (5) gives us Using Lemma 2.3, we obtain Since max x∈M |u(x, T )| 1 inf M τ T , the proof is complete.

Van der Corput inequality
We recall a version of the van der Corput's inequality, that will be useful in our setting.
The following lemma is valid in general Hilbert spaces H, for simplicity we state it just for H = L 2 (X, µ), where (X, µ) is a probability space. The notation X = O(Y ) means that X cY for some global constant c > 0.
Lemma 3.1 (Van der Corput inequality). Let (φ u ) u∈R ⊂ L 2 (X, µ) with φ u 2 1 for every u ∈ R and assume that φ u , φ w = φ 0 , φ w−u for every u, w ∈ R. Then, for every N > 0 and 0 < L < N , we have Remark 3.2. As mentioned before, the result above is true for general Hilbert spaces and without the extra invariance assumption on the (φ u ) u∈R . We will use Lemma 3.1 for φ u = f • h τ u for which the above assumption is satisfied. A nice proof of the more general statement in the non-quantitative version can be found in J. Moreira blogpost, [21].
The proof follows standard steps, we provide it here for completeness.
Proof of Lemma 3.1. Notice first that Moreover, by Cauchy-Schwartz inequality, where we use invariance and the fact that −L l 1 − l 2 L. This finishes the proof.
The following observations will be important in what follows.

Remark 3.3.
There exist a constant D > 0 such that for every f ∈ W 6 (M ) and every r 1, we have f · f • h r 6 D f 2 6 r 6 . This follows from the fact that functions in W 6 (M ) have the algebra property, i.e. f ·g 6 D ′ f 6 g 6 and the fact that f • h r 6 D ′′ f 6 r 6 .
Using the van der Corput inequality in Lemma 3.1, we can prove the important estimate below. Proposition 3.5 will be generalized in the Appendix 5, see Proposition 5.2.
Proof. Notice that we always have so we can assume that |n − m|(1 − K) 1. Up to replacing m with n, we can also assume that n > m. We will use van der Corput inequality (see Lemma 3.1) with N = n − m and and optimize for 0 < L N = n − m.
With the notation introduced in Remark 3.4, we can estimate where we used the fact that, by definition, Applying the mixing estimates of Lemma 2.1, since f i 6 1, we get From Remarks 3.3, 3.4, and f i 6 1, it follows that and similarly for ( where we can take C ′ = max{C 2 0 , 4D 2 C 0 }. By Lemma 3.1, recalling that N = n − m, we obtain By assumption, there exists 0 < α 3/2 such that K > N −α . We fix so that, moreover, L N 1 N 5/24 1 ((1 − K)N ) 5/24 . Thus, the term O(L/N ) in the right hand-side of (8) satisfies an estimate of the desired form. It remains to bound the two summands in the square brackets in (8). By the choice of L, we get 1 where we use the fact that (1 − K)N 1. This concludes the proof.

Polynomial 3-mixing
This section is devoted to the proof of Theorem 1. The strategy of the proof is similar to the proof of Theorem 1.1 in [3]; however, since we are in the non algebraic setting, our reasoning is local and we use estimates on stretching of geodesic arcs of the time changed flow. We also use some ideas from Marcus' proof in [20].
The first step is to exploit the shearing property of the horocycle flow and its time changes: transverse segments in the geodesic direction get sheared by h τ t . We will fix > 0 the length of such segments. The proof will be divided in two cases (Case A and Case B below), depending on the relative size of the gaps t 1 and t 2 − t 1 . Roughly speaking, if t 1 is "much smaller" than t 2 − t 1 (Case A), our choice of σ will ensure that length of the sheared arc h τ t 1 •g s for time t 1 is sufficiently small, so that the correlations can be estimated, up to a small error, by the integral of f 2 along the arc h τ t 2 • g s . If t 1 and t 2 − t 1 are "of the same order" (Case B), we will reduce the problem of estimating the multiple correlations to the setting of Proposition 3.5, namely to a "multiple ergodic integral".
Proof of Theorem 1. Let 0 = t 0 < t 1 < t 2 be fixed. Up to considering the inverse flow composing with h τ −t 2 and relabeling t ′ 2−i = t 2 − t i , we can assume that t 1 t 2 − t 1 , so that, in particular t 2 − t 1 t 2 /2. We will also assume that t 1 1 Let f 0 , f 1 , f 2 ∈ W 6 (M ); define C f = f 0 6 f 1 6 f 2 6 and C f,τ = τ 6 C f C f . Recalling the notation introduced in Remark 3.4, we have By the mixing estimates of Lemma 2.1, we can bound each term in the sum in the right hand-side above by thus, by Remark 3.4, we have since t 1 = min{t 1 , t 2 − t 1 }. Therefore, it remains to bound the correlations for functions of zero average; we will simply denote f i instead of f ⊥ i , and we will assume that f i ∈ W 6 (M ) ∩ L 2 0 (M ). Let us define We recall that the invariant measure µ τ is equivalent to the Haar measure µ, with density τ . By invariance of µ under the geodesic flow, we have For all s ∈ [0, σ], we have We now estimate the first term in the right hand-side above in two different ways, depending on t 1 . Case A. Let us assume that t 1 t 1−β/2 2 . By Lemma 2.4 and the triangle inequality, . Therefore, We obtain that Lemma 2.2 gives us a uniform bound for the term in brackets in (11). Combining (9), (10), (11), and using Lemma 2.2, we conclude for some constant C 5 > 0 and where γ = min{β 2 /3, β/6}. This concludes the proof for Case A. Case B. Let us now assume that t 1 > t . From (10) and Cauchy-Schwartz inequality, we get (12) For any point x ∈ M and any 0 s σ < 1, This implies that x)|+ + f 1 6 f 2 6 (|A(x, s, t 1 )| + |A(x, s, t 2 )| + |A(x, s, t 1 )| · |A(x, s, t 2 )|) We now estimate the first term in the right hand-side above using Proposition 3.5. Define 0 < K = t 1 /t 2 < 1. For all x ∈ M , changing variable u = e s t 2 and by the second mean-value theorem for integrals, there exists z ∈ [t 2 , e σ t 2 ] such that and the second term in the right hand-side above is Since we are in the case t 1 > t 1−β/2 2 , we have that Hence, the assumptions of Proposition 3.5 are satisfied. We then get . This estimate and (13) conclude the proof.

Appendix: higher order mixing for discrete-series reparametrizations
In this section, we show how to refine the argument above to obtain polynomial mixing of all orders for time-changes (h τ t ) t∈R supported on the discrete series, namely for τ ∈ W 6 (M ) ∩ H d .

Preliminaries
We start with the following definition.
We will use the above definition for (h τ t ) on (M, µ) and F = W 6 (M ) ∩ L 2 0 (M, µ) with · F = · 6 . To shorten the notation we denote Q(k) = Q(k, W 6 (M ) ∩ L 2 0 (M, µ)). Notice that by Lemma 2.1, it follows that (h τ t ) has the Q(2)-property. The theorem below is a quantitative version of Proposition 1 in [20]. Theorem 3. Let τ ∈ W 6 (M ) ∩ H d . For every k 2 if (h τ t ) t∈R has the property Q(ℓ) for every 2 ℓ k then it has the property Q(k + 1). Moreover there exists an explicit lower bound on β k in terms of τ , M and k for every k 2.
Theorem 2 follows easily from Theorem 3 above.

A van der Corput estimate
We need to generalize the statement of Proposition 3.5 for all k 2. The proof is analogous, with additional technical difficulties.
Proof. As in the proof of Proposition 3.5, we will assume that n > m and |n−m| min i (K i+1 − K i ) 1. We will use van der Corput inequality (see Lemma 3.1).
Using the notation of Remark 3.4, we can write where in the last term we allow Notice moreover that if |J c | = 1, then the last integral vanishes by measure invariance. Hence we will assume that the sum is taken over J {1, . . . , k} with |J| k − 2.
We will now bound each term on RHS of (16): first trivially, we have therefore the last term in (16) is bounded by Let J be as above and let a J 1 < a J 2 < . . . < a J m J denote all the elements in J c (recall that m J 2). Using the Q(ℓ)-property for ℓ := |J c | k, we can bound the last term in the RHS of (16) by where in the last inequality we use the fact that the minimum on the LHS is taken over a smaller set than on the RHS ((a J i ) is a subset of {1, . . . k}). We now have the following important estimate (see Remark 3.3): for every 1 i k, The inequality above implies that i∈J c The above bounds and (16) imply that (18) Notice that by the Q(2) property (used only for the last term in the product), we have (using also that K k = 1) where, we recall, . We now define L ∈ [0, N ]. By assumption, let 0 < ε < (k+1)/k be such that K 1 > N −ε . Let us define θ = θ k,ε and L by (where we allow J = ∅). This implies that for every J ⊂ {1, . . . , k} with |J| k − 2, Moreover, if a set J 0 ⊂ {1, . . . k} realizes the minimum in the definition of L, then From our assumption N −ε K 1 · · · K k = 1, we deduce that in particular, by (19) for J = ∅ and using the definition of θ, Notice that by the bound on a L and (20) it follows that it follows, using also (19), that notice that δ > 0 by the definition of θ. We get from (18), using (21), (22), and (23) (for each J {1, . . . , k} and summing over J), we obtain

Combinatorial argument
We describe an inductive procedure that will be used in the proof of Theorem 3 (see the outline of the proof below).
Step 1. Let r 1 = t ζ 1 12k , the procedure stops. If not, let s 1 < k be the largest such that t s 1 / Step 2. Let r 2 := r ζ 2 12k the procedure stops. If not, let s 2 < k be the largest such that t Step ℓ + 1. If the procedure does not stop at Step ℓ, let r ℓ+1 := r ζ ℓ+1 12k ℓ and take s ℓ < k to be the largest such that t s ℓ / . Notice that the procedure will definitely stop no later than Step k. Moreover, notice that if the procedure stops exactly at Step k, then by the definition of (r ℓ ), for ξ k := It is crucial that ξ k depends on (ζ i ) and k but not on the (t i ) i k .

Proof of Theorem 3
The rest of the section is devoted to the proof of Theorem 3. We first present an outline for the reader's convenience.
Outline of the proof. The proof consists of two cases (Case A and Case B below), depending whether the inductive procedure described above stops at Step k or before.

If it stops at
Step ℓ for ℓ < k (Case A), it means that there exist j = i such that the corresponding times t j and t i are close, namely |t i − t j | 2r ℓ . We then write • h τ t i , and, by assumption, the Sobolev norm of the term in brackets is small, namely is of order O(r 6 ℓ ). We do the same for all the times t j contained in an interval of the form [t s i −r ℓ , t s i +r ℓ ] as described in the inductive procedure, and we consider the corresponding terms in brackets as a single observable. In this way, we reduce the number of observables to ℓ < k (with appropriate bounds on their Sobolev norms), and we can apply the inductive hypothesis on quantitative ℓ-mixing to conclude.
If the procedure does stop exactly at Step k (Case B), we proceed as in the proof of Theorem 1, exploiting the shearing properties of geodesic segments of length σ. We remark that our assumption on the time-change ensures that the deviations of the shearing property form the unperturbed homogeneous case is logarithmic (see Lemma 2.4), hence the error term is of order σ log k |t k |. In this case, for the assumptions of Proposition 5.2 to be satisfied, we will need to choose σ = |t k | −α 2 for some small α > 0. In order to conclude, it will be crucial to exploit (24), which will ensures that σ = |t k | −α 2 = O(min 0 i<k |t i+1 − t i | − α ), for some α > 0.
Since, for every r ∈ [0, s] and x ∈ M , there exists a constant C ′ such that Moreover, by Lemma 2.4, we have for some constant C > 0. From (28), using (29) and (30), we obtain (31) where we have defined C τ,f = 6 τ 6 k i=0 f i 6 . We now bound the two terms in the right hand-side of (31) separately. For the first term, by the choice of σ and by (24), we have that for every ε > 0 We now bound the second term in (31). Define 0 < K i = t i /t k 1. For all x ∈ M , changing variable u = e r t k , and integrating by parts, Therefore, In both cases 0 s |t k | −2α or |t k | −2α < s σ, by (34)  6 τ 6 f 0 6 σ|t k | for some δ > 0, so that, by (33), The claim then follows by (32) and (36).