Mobius disjointness for interval exchange transformations on three intervals

We show that Sarnak's conjecture on Mobius disjointness holds for interval exchange transformations on three intervals (3-IETs) that satisfy a mild diophantine condition.


Introduction
Let µ : N → {−1, 0, 1} denote the Möbius function, namely, µ(n) = 0 if n is not square-free, µ(n) = 1 if n is square-free and has an even number of prime factors, and µ(n) = −1 if n is square-free and has an odd number of prime factors.
Let X be a topological space, and let T : X → X be an invertible map. We think of the map T as a dynamical system. Peter Sarnak made the following far-reaching conjecture: Conjecture 1.1 (Möbius Disjointness). Suppose the topological entropy of T is 0. Then, for any x ∈ X, and any (continuous) function f : X → R, Definition 1.2 (IET). An interval exchange transformation (IET) is given by a vector = ( 1 , . . . , d ) ∈ R d + and a permutation π on {1, . . . , d}. From we obtain d-subintervals of [0, d i=1 i ) as follows: Now we obtain a d-Interval Exchange Transformation T = T π, which exchanges the intervals according to π. More precisely, if x ∈ I j , then T It is well known that the topological entropy of any interval exchange transformation is 0. Thus, if Conjecture 1.1 is true, then (1.1) should hold for any interval exchange transformation.
In this paper, we consider only the case d = 3. Extending our results e.g. to d = 4 will require fundamental new ideas. Lemma 1.3. If T is a 3-IET with permutation 1 2 3 3 2 1 , then T is also the induced map of a rotation on an interval.
Let g t = e t 0 0 e −t ∈ SL(2, R). We refer to the action of the 1-parameter subgroup g t as the geodesic flow on M 1 (or M 1,2 ). The action of g t on both M 1 and M 1,2 is ergodic.

Renormalization.
We will need to put a diophantine condition on the IET T . In terms of X ∈ M 1,2 , we want the geodesic ray {g t X : t > 0} to spend significant time in compact subsets of M 1,2 . Directly in terms of the IET data, our conditions are the following: ASSUMPTIONS: There exist constants C 1 , C 2 , C 3 , C 4 > 1 such that the following holds: Suppose ∈ N and 0 < η are small enough. Then there existsĉ η > 0 and c (depending on η and respectively) so that for every 0 < c < c there exists a constant k c ∈ N and infinite sequences L i , k i so that: (A0) q k i −1 < cL i < q k i . (A1) a k i < C 1 , (A2) a k i +1 < C 2 and (A3) a k i +2 < C 3 . (A4) The shortest vertical trajectory on the torus from one marked point to a q k α neighborhood of the other has length at least q k C 4 . (A5) There exists u i so that either λ(ψ −1 L i (u i )) >ĉ η and λ(ψ −1 L i ((−∞, u i ))) < ηĉ η or λ(ψ −1 L i (u i )) >ĉ η and λ(ψ −1 L i ((u i , ∞))) < ηĉ η where ψ r (x) = r−1 =0 χ J (R x) and our 3-IET is the first return map of R to J. (A6) lim i→∞ 1 0 d(R L i x, x)dλ = 0. (A7) We have L i > q k i + . (A8) We have max{j : ψ −1 L i (j) = ∅} − min{j : ψ −1 L i (j) = ∅} < k c . (A9) There exists v i so that q v i ≤ L i < q v i +1 and either L i = q v i or a v i +1 > 4 and . Our main result is the following: Theorem 1.4. Suppose T is a 3-IET satisfying the assumptions (A0)-(A9). Then, Möbius disjointness, i.e. (1.1) holds.
Assumptions (A0)-(A9) are reasonable in view of the following: Proposition 1.5. Let X ∈ M 1,2 . Let ν T be the measure on M 1,2 given by f dν T = 1 T T 0 f (g t X)dt for all f ∈ C c (M 1,2 ). If there exists a weak-* limit of ν T that is not the zero measure then the corresponding 3-IET satisfies assumptions (A0)-(A9).
Disjointness. As in e.g. [BSZ] we derive the Möbius disjointness result from a result about joinings of powers of T . In fact, we prove the following: Theorem 1.7. If T is a 3-IET that satisfies assumptions (A0)-(A9) then there exists κ > 1 so that for all n > 0, B n = {m < n : T m is not disjoint from T n } has the property that m 1 < m 2 ∈ B n then m 2 m 1 > κ. See Appendix A for a proof that Theorem 1.7 implies Theorem 1.4. This is a straightforward modification of a note of Harper [Ha]. It is included for completeness.
Remark 1.8. In Appendix B, we prove that for almost every 3-IET, T , T n is disjoint from T m for all 0 < n < m. This gives an alternative (and much easier) proof of Corollary 1.6. However, the proof in Appendix B does not give a useful diophantine condition under which Möbius disjointness holds.
Related work: Möbius disjointness has been shown for a variety of systems see for example [ELR], [GT] and [W] among others. Most closely related to this work is [D], where Vinogradov's circle method is used to prove that every rotation (2-IET) is disjoint from Möbius; [Bou] which shows a set of 3-IETs satisfying a certain measure 0 condition are disjoint from Möbius and [BSZ] where a slightly stronger version of our criterion is introduced to show that the time 1 map of horocycle flows are disjoint from Möbius. This last paper motivated our approach.
Further questions and conjectures. Question 1.9. What is the Hausdorff codimension of the set of X ∈ M 1,2 so that any weak-* limit point of ν T is the zero measure? Conjecture 1.10. For almost every IET that is not of rotation type and n < m ∈ N we have T n is disjoint from T m . In fact if U T is the unitary operator associated to (composition with) T on L 2 function of inttegral zero then there is a sequence k 1 , ... so that U T nk i converges to the 0 operator in the weak operator topology and U T mk i converges to the identity operator in the strong operator topology.
Outline: The Section 2 establishes an abstract disjointness criterion, Proposition 2.2. Sections 4 uses this to prove Theorem 1.7. Section 3 recalls standard facts about rotations used in Section 4. Section 5 proves Proposition 1.5. Appendix A proves that Theorem 1.7 implies Theorem 1.4. Appendix B proves that almost every 3-IET has the property that all of its distinct positive powers are disjoint.
Acknowledgments: J. C. was supported in part by NSF grants DMS-135500 and DMS-1452762 and the Sloan foundation. A. E. is supported in part by NSF grant DMS 1201422 and the Simons Foundation. The authors thank Adam Harper for graciously letting us modify his note and a helpful correspondence. A. E. thanks Princeton University and the Institute for Advanced Study for support during part of this work. The authors would like to thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the program Dynamics of Group Actions and Number Theory where work on this paper was undertaken. This work was supported by EPSRC grant no EP/K032208/1.

Disjointness criterion
Let (X, d) be a metric space. We set X 1 = X 2 = X, and write the product X × X as X 1 × X 2 . Let λ be a measure on X, and let T 1 : X 1 × X 1 and T 2 : X 2 → X 2 be λ-preserving maps. Let σ be a joining of (X 1 , T 1 , λ) and (X 2 , T 2 , λ), i.e. σ is an ergodic T 1 × T 2 -invariant measure on X 1 × X 2 which projects to λ in either factor.
Our basic strategy is due to Ratner [Ra]. In fact, we use the following proposition: Proposition 2.1. Suppose S : X → X is a λ-preserving map which commutes with T 1 and T 2 . Suppose d 1 ≥ 0, d 2 > 0, and for every δ > 0 and any compact set K ⊂ X 1 × X 2 with σ(K) > 1 − δ and for every δ > > 0 there exist points (x, y) ∈ X 1 × X 2 , (x , y ) ∈ X 1 × X 2 and r ∈ N so that the following conditions hold: σ is an ergodic T 1 ×T 2invariant measure which is distinct from σ. Thus, (S d 1 × S −d 2 )σ and σ are mutually singular. It follows that for any δ > 0 there exists a compact set K with σ (K) This is not consistent with conditions (a)-(c).
Proposition 2.2. Suppose S is continuous except for finitely many points, and suppose λ gives zero measure to the points of discontinuity of S. Assume (1) There exists a sequence of measurable partitions of X 1 , U and a sequence of numbers r i so that (2) There exists ∈ {1, 2, 3}, a sequence of measurable sets A i , and functions F i preserving the measure λ, so that There exists an absolute constant δ 0 > 0 such that the following holds: for any 0 < δ < δ 0 , either there exists a ∈ Z so that for infinitely many i, or there exists a ∈ Z so that for infinitely many i, Under the assumptions (1) Lemma 2.3. Suppose > 0 and δ > 0. Then, for any compact set K ⊂ X 1 ×X 2 with σ(K) > 1 − δ and all i ∈ N sufficiently large, there exists a compact set For y ∈ X 2 , letσ y be the conditional measure ofσ along X 1 × {y}. Let B(x, ) denote the open ball of radius . Forσ-almost all (x, y) ∈ X 1 × X 2 , σ y (B(x, /2)) > 0. Therefore, there exists ρ( , δ) > 0 and a set K 1 ⊂ K with σ(K 1 ) > 1 − δ such that for all (x, y) ∈ K 1 , σ y (B(x, /2)) > ρ( , δ).
Let π 2 : X 1 × X 2 → X 2 denote projection to the second factor. Since the function y →σ y is measurable, by Lusin's theorem there exists a compact set K 2 ⊂ X 2 with π * 2 (σ) (K 2 ) > 1 − δ on which it is uniformly continuous relative to the Kantorovich-Rubinstein metric, where the sup is taken over all 1-Lipshitz functions f : X 1 → R with sup |f (x)| ≤ 1. Then, there exists δ > 0 such that for all y, y ∈ K 2 with d(y , y) < δ and for all x ∈ X 1 such that (x, y) ∈ K 1σ y (B(x, )) > ρ( , δ)/2. Then, for all y, y ∈ K 2 , with d(y, y ) < δ and all x with (x, y) ∈ K 1 , For large enough i, (2.4) holds, and also (x, y) ∈ K 1 × K 2 and F i y ∈ K 2 . Thus (2.3) holds (with y = F i y). This implies the first statement of the lemma. The proof of the second statement is identical.
Proof of Proposition 2.2. We establish the (2.1) case. The (2.2) case is analogous. The basic strategy is to choose (x, y) ∈ U (i) a × A i , and apply Proposition 2.1 with r = r i to the points (x, y) and (x , F y), where x is as in Lemma 2.3. We now give the details.
Suppose δ > 0 and 0 < < δ are arbitrary. Let ∆ denote the union of the points of discontinuity of S j , 1 ≤ j ≤ k. There exists c 1 ( ) > 0 such that if we let Since K 00 is compact and S is continuous on K 00 , there exists > 0 such that if x 1 , x 2 ∈ K 00 , with d(x 1 , x 2 ) < then for all 1 ≤ j ≤ k, d(S j x 1 , S j x 2 ) < /6. Without loss of generality, we may assume that < c 1 ( )/2. Then, we have, for all 1 ≤ j ≤ k, Let a be as in Proposition 2.2 (3). Write We may assume that i is large enough so that there exists a compact set In view of assumption (1) of Proposition 2.2, there exists a compact set j . In view of assumption (2b) of Proposition 2.2, there exists a compact set K 2b ⊂ X 2 with λ(K 2b ) > 1 − δ such that for y ∈ K 2b ∩ A i and i ∈ N sufficiently large, As in the proof of Proposition 2.1, let K be a compact set so that σ(K) > 1 − δ and (T d 1 × T − 2 )K are compact and disjoint for all 0 ≤ d ≤ k and 0 ≤ ≤ 3. Formally, K may depend on d, but without loss of generality we may assume that the same K works for all 0 ≤ d ≤ k. Let a , and thus we may assume x ∈ U (i) b for some b ≥ a. Then, since Also, in view of Lemma 2.3, d(x , x) < , We have x ∈ K 0 and T r i 1 x ∈ K 0 . Therefore, by (2.5), Similarly, Therefore, assumption (c) in Proposition 2.1 also holds with d 1 = b − a and d 2 = , and Proposition 2.1 can be applied.
Corollary 2.4. If S is weakly mixing and the conditions of Proposition 2.2 are satisfied then σ = λ × λ.
We include a proof because the statement in [Ru] is slightly more specific.
This is a measure on X. Because σ has marginals µ, ν 2 this measure is absolutely continuous with respect to µ. So it has a Radon-Nikodym derivative f A . By our assumption this is a T invariant function and so it is constant. This implies any two rectangles with the same dimensions have the same measure and thus σ is the product measure.
Proof of Corollary 2.4. We show only the (2.1) case, since the (2.2) case is similar.
Using the fact that σ is a joining of S n and S m , this implies that σ is S md 1 +nr × id invariant. Since S is weak mixing and thus totally ergodic, S md 1 +nr is ergodic and so by Lemma 2.5, σ = λ × λ.
In §3- §4 we will show that Proposition 2.2 can be applied to prove Theorem 1.7.
Proof. First, we assume k + 1 is odd. If x ∈ I then R q k+1 x = x − q k+1 α . So if x is not in the leftmost q k+1 of I then R q k+1 x ∈ I. Otherwise, R q k x = x + q α is on the right of I and within q k+1 α| of I. So R q k+1 +q k x ∈ I. The case of k + 1 even is similar.
Note if f is the characteristic function of an interval var(f ) = 2. This is the only case we use in the sequel and we present the proof of this case below. A similar argument will be used to prove the more general Lemma 3.7.
Proof of special case. Following the paragraph in the introduction 'Connection to tori and tori with marked points' we want to understand the intersections of a (half open) vertical line segment of length q k to a horizontal line segment of length z onX, see Figure 1. (Indeed, R q k is given by a vertical trajectory of length q k and q k −1 j=0 χ J (R j x) is given by the intersection of the corresponding vertical trajectory of length q k with a horizontal trajectory of length z.) This is equivalent to understanding the intersections of a vertical segment of length 1 to a horizontal line segment of length q k z on g log(q k )X . Call these segments γ 1 and γ 2 respectively, see Figure 2. We close up these two curves as pictured in Figure 3 using the following observations: (1) Any vertical trajectory of length q k onX has that its endpoints differ by a horizontal vector of length at most q k α < 1 a k+1 q k . This implies we can close γ 1 up by a horizontal segment, ζ 1 , of length less than 1 a k+1 ≤ 1. Call the resulting closed curveγ 1 .
(2) We may close up γ 2 by a vertical segment ζ 2 of length at most 1, union a horizontal segment ζ 2 of length at most 1 2 which is either contained in γ 2 or disjoint from it. Call the resulting closed curveγ 2 . Any vertical segment of length 1 on g log(q k )X is a translate of γ 1 and so we may close it up so that it is a translate ofγ 1 . The intersection of any translate ofγ 1 witĥ γ 2 is constant (it is a topological invariant of these curves). So now we study the intersection of translates ofγ 1 and ζ 2 ∪ ζ 2 . γ 1 can intersect ζ 2 either 0 and 1 times. γ 1 does not intersect ζ 2 and ζ 1 does not intersect ζ 2 . Once again ζ 1 intersects ζ 2 at most once. To summarize the intersections with γ 2 of any two translates of γ 1 differ by at most 2. Lemma 3.7. For all k ∈ N with a k+1 > 4, and i ∈ N with i ≤ a k+1 4 we have that there exists j with λ(ψ −1 iq k (j)) > 1 12 and either j − min{ : Proof. This is similar to the proof of Lemma 3.6, but the vertical segment onX has length iq k . We once again work on g log(q k )X , where the vertical segment γ 1 has length i, and the slit γ 2 has length q k z (See Figure 3). Thus, we need to estimate the number of intersections between γ 1 and γ 2 . As in the proof of Lemma 3.6, we make the following observations (see Figure 3): (1) Any vertical trajectory of length iq k onX has that its endpoints differ by a horizontal vector of length at most i q k α < i a k+1 q k . This implies we can close γ 1 up by a horizontal segment, ζ 1 , of length less than i a k+1 < 1 4 . Call the resulting closed curveγ 1 .  . Closing the curves. We complete the vertical segment γ 1 to a closed curveγ 1 by adding a horizontal segment ζ 1 (drawn in green). Simularly, we close up the horizontal slit γ 2 to obtain a closed curveγ 2 by adding in a horizontal segment ζ 2 and a vertical segment ζ 2 (drawn in purple).
(2) We may close up γ 2 by a vertical segment of length at most 1, ζ 2 , union a horizontal segment of length at most 1 2 , ζ 2 , which is either contained in γ 2 or disjoint from it. Call the resulting closed curveγ 2 .
Any vertical segment of length i on g log(q k )X is a translate of γ 1 and so we may close it up so that it is a translate ofγ 1 . As in the proof of Lemma 3.6, the intersection of any translate ofγ 1 withγ 2 is constant (it is a topological invariant of these curves). So now we study the intersection of translates ofγ 1 and ζ 2 ∪ ζ 2 . γ 1 can intersect ζ 2 between 0 and i times. γ 1 does not intersect ζ 2 and ζ 1 does not intersect ζ 2 . Also ζ 1 intersects ζ 2 at most once. To summarize the intersections with γ 2 of any two translates of γ 1 differ by at most i + 1.
Observe that on every horizontal line, a segment of at least 1 4 has that all the corresponding translates ofγ 1 for this line segment intersect ζ 2 or all of them do not. (Indeed there is a segment of size at least 1 2 so that a vertical segment of length 1 from any point on this segment misses ζ 2 and {j q k α : 0 ≤ j ≤ i} is contained in an interval of length at most 1 4 . That is, there is a subinterval of size 1 4 so that for each x in this subinterval we have that j q k α + x is in the subinterval of size 1 4 for all 0 ≤ j ≤ i.) So on a subset of this set of measure at least 1 8 the translates ofγ 2 either all intersect ζ 2 or all miss ζ 2 . This set satisfies the lemma and it is either within one of the maximal or within 1 of the minimal.
Lemma 3.8. IfÎ is an interval of size at least γ q k α and q L > 12γ −1 q k α then for all x we have 1 . Also for all γ > 0 there exists u so that ifÎ is an interval of size at least γ q k α then 1 , and the second claim follows from the first with u = 2b.
To see the last claim, and apply the previous sentence to obtain that this is at least t q k+u q k+u 1 2 λ(Î). Since t < 2 t q k+u q k+u we have the final claim.
Proof. By Lemma 3.1 we have that q > a q −1 and q +2 > 2q . So by induction we have that q k > 2 In view of Proposition 2.2 and Corollary 2.4, to prove Theorem 1.7 it is enough to prove the following: Proposition 4.1. Suppose assumptions (A0)-(A9) are satisfied. Then, there exists a constant * > 0 (depending only on the constants in (A0)-(A9)) such that the following holds: Suppose n ∈ N, m < m < n, and Then the assumptions of Proposition 2.2 can be satisfied for X 1 = X 2 = J, T 1 = S n and T 2 either S m or S m . (Note that we view T 1 and T 2 as maps from J to J).
Notation. Before starting the proof of Proposition 4.1 we introduce some notation. Let Then, for any x ∈ J so that R M x ∈ J, Picking parameters. Let Thus, in order to prove Theorem 1.7, we may assume that c is small. Then, Thus, in view of (A0) and (4.3), we have The proof of Proposition 4.1 relies on the following technical result: Proposition 4.2. There exists c 0 ,c > 0,Ĉ > 0 and u ∈ N depending only on the constants C 1 , ..., C 4 of the assumptions (A0)-(A4) so that if c < c 0 (where c is as in (4.3)), then there existsm ∈ {m, m } and d ∈ {−3, −2, −1, 1, 2, 3} so that, after passing to a subsequence, for all large enough k, (1) there existsÃ ⊂ J with λ(Ã) >c so that for all y ∈Ã we have (4.5) R q k Sm r k y = S d Sm r k R q k y.
(2) If N > q k+u 2m then for any y ∈ [0, 1] we have In view of assumption (A8) there exists an interval K 1 (L) ⊂ N of size at most k c so that for any x ∈ J, ψ L (x) ∈ K 1 (L). Since R(J c ) ⊂ J, there exists an interval K 2 (L) of size (k c + 1) such that for all x ∈ [0, 1], ψ L (x) ∈ K 2 (L).
We have Now, for x ∈ J, by (4.2), (1) of Proposition 2.2 follows, (with the size of the partition dependent on n).
Let F k be the first return map of R q k to J. (Essentially we want F k to be R q k , but we want F k to be a map from J to J). Since R q k tends to the identity map as k → ∞, condition (2a) of Proposition 2.2 follows.
For x ∈Â, and since we are assuming d = 1, (4.5) becomes Since R q k tends to the identity as k → ∞, there exists a subset E ⊂ J of almost full measure such that for x ∈ E, R q k x = F k x. Then, for x ∈ E ∩Â, We now begin the proof of Condition (3) of Proposition 2.2. In (A0)-(A9) we choose η < (96 · 25) −1 . Let ρ =ĉ η , and choose δ 0 < 1 12 ηρ. Then, by (A5), we can either choose a such that for i sufficiently large, or choose a so that for i sufficiently large, We need a lemma to obtain Condition (3) of Proposition 2.2 from Proposition 4.2: Lemma 4.3. For every ρ > 0 there exists b ∈ N so that if for some s ∈ Z, λ(ψ −1 L (s)) > ρ where L ≥ q so that either L = q or a +1 > 4 and L = iq for i ≤ a +1 4 then there exists a measurable set V with the following properties: Proof of Lemma 4.3. We claim that there exists b ∈ N so that To prove (4.8), note that for any x ∈ [0, 1] there exist at most 2 different 0 Any orbit of length q − 1 can hit each of these intervals at most once.) Thus for any r ∈ N, (Indeed each orbit of length q − 1 can have at most 3 consecutive stretches in this set. These stretches have length at most r.) Now (4.9) implies (4.8).
Proof of Condition (3) of Proposition 2.2 continued. We assume that (4.6) holds. (The proof in the case (4.7) holds is virtually identical). Let a and ρ be as in (4.6). We then apply Lemma 4.3 with this ρ and s = a. Then, let V , and b be as in Lemma 4.3. We assume * (and thus c) is small enough so that (A7) implies that (4.10) Let σ be any joining of S n × Sm and for each x let Σ x denote the points y so that (x, y) is σ-generic. Let We are assuming that x ∈ V ⊂ J, and also we are assuming that Ry ∈ J whenever y ∈ J. Then, for at least half of 0 ≤ i < n we have R i x ∈ J, it follows that We can choose N ∈ N so that N ≥ q −b 12n and also (4.11) ψ L (S nj x) = a for all x ∈ E 1 and all 0 ≤ j < 2N . Let Then, in view of (4.11), Note that R i x, R j x are in distinct S n orbits if |i − j| < n and R i x, R j x ∈ J. This means that the above union is disjoint, and thus Since σ is a self joining of λ, we can find E ⊂ E 1 × [0, 1] so that (4.13) σ(E) > 1 2 λ(E 1 ) > 1 96 λ(ψ −1 L (a)).

4.2.
The main lemma. The next lemma about rotations is the key step in the proof of Proposition 4.2.
Lemma 4.4. Assume (A0)-(A4) are satisfied and also that (m − m )w k ∈ [q k−1 , q k ). Let C 1 , . . . , C 4 be as in assumptions (A0)-(A4). Then there exist c 2 > 0 and C > 0 depending only on C 1 , . . . , C 4 such that for all k ∈ N there exists an interval I ⊂ [0, 1) and a set of natural numbers E = {e, ..., e + c 2 q k } so that (1) |I| ≥ C q k α (2) For all x ∈ i∈E R i I. (4.16) In the rest of this subsection, we will prove Lemma 4.4. We will derive Proposition 4.2 from Lemma 4.4 in §4.3. The proof of this lemma is complicated and so we provide a brief sketch: We use (4.19) to have a criterion for mw k −1 1}. Claims 4.5, 4.6 and 4.7 use this criterion to prove the claim. Claim 4.5 and subsequent comments identify I. Claim 4.6 identifies E. Claim 4.7 is used to show the critierion given by (4.19) holds for these I and E.
Recall that J = [0, z]. Assume k is odd. (This is an assumption of convenience of exposition. If k is odd then R q k 0 = − q k α , if k is even it is q k α . Thus if k is even all sets [− q k α , 0) should be [0, q k α ) and [z − q k α , z) should be [z, z + q k α )). Observe (4.17) Recall that J = [0, z). We have Consider [− q k α , 0). By Lemma 3.5, the function that assigns to a point in [− q k α , 0) its first return time takes two values, q k+1 and q k+1 + q k . The return time of q k+1 + q k occurs on [− q k+1 α , 0).
Claim 4.7. Suppose (4.20) holds and ∈ (j − c 2 q k , j + q k ) or (4.21) holds and ∈ (j − q k , j + c 2 q k ). Also assume that = j. Then, Proof of Claim 4.7. Recall that I ⊂ [− q k+1 α , 0) and thus by Lemma 3.5, Also, by Lemma 3.5, the return time of any point in the interval [z − q k α , z) to itself is at least q k+1 > q k . Thus, for such that | − j| < q k , Claim 4.7 follows.
We now continue the proof of Lemma 4.4. Let r = min(c 2 q k , (m − m )w k ). Recall that (m − m )w k < q k . If (4.20) holds, let If (4.21) holds, let Then, for all i ∈ E, Hence, by Claim 4.7, for all x ∈ I and for all i ∈ E, (the only contribution is from the case where i + = j). Therefore, in view of (4.17) and (4.19), for x ∈ R −m w k I and ∈ E, (4.16) holds. From the definition, |E| ≥ c 2 q k . We now estimate λ(I). By Lemma 3.1, q k+2 = a k+2 q k+1 + q k < (a k+2 + 1)q k+1 . q k+1 = a k+1 q k + q k−1 < (a k+1 + 1)q k By Lemma 3.2, Thus, by (A3), This completes the proof of Lemma 4.4.

Proof of Proposition 4.2 from Lemma 4.4.
Recall w k = r k λ(J) as above.
Corollary 4.8. (Corollary to Lemma 4.4) Given w k so that q k < (m − m )w k < q k+1 as before there existsm ∈ {m, m } and a set A k with λ(A k ) ≥c (depending on our non-divergence condition, that is C 1 , ..., C 4 , C ) so that for all x ∈ A k we have Proof. Lemma 4.4 establishes that there existsc > 0 andc 1 > 0, and for an infinite sequence of k ∈ N there exists an interval I ⊂ [0, 1] with λ(I ) >c q k α so that for any x ∈ I there exists H x ⊂ {0, 1, . . . , q k − 1} with |H x | >c 1 q k so that for any x ∈ I and any ∈ H x , and any w k with (m − m )w k ∈ [q k−1 , q k ) we have (4.26) Also note that by Lemma 3.6, ψ q k takes at most 5 values (which are also consecutive) and so for any s ∈ N and any x we have Note that the left-hand-side of (4.26) is By (4.26), for all x ∈ I and for all ∈ H x , S 1 (R x) − S 2 (R x) ∈ {−1, 1}, and by (4.27), we have |S 1 (R x)| ≤ 4, and |S 2 (R x)| ≤ 4. It follows that for all x ∈ I and all ∈ H x , Thus, there existsm ∈ {m, m } and d ∈ {−3, −2, −1, 1, 2, 3} and a set A k with so that for for x ∈ A k (4.25) holds.
We frequently use the following trivial result in this section.
Lemma 4.9. If λ(B) ≥ γ and B is the union of at most intervals then there exists B ⊂ B with λ(B ) ≥ 1 2 λ(B) and B is the union of intervals of size at least γ 2 . Lemma 4.10. There exists A k ⊂ A k with λ(A k ) > 1 2 λ(A k ) and so that A k is made of at most 4q k + 1 intervals with length at leastc 2 1 4q k +1 . Proof. Recall that A k is a level set of a function which has at most 4q k discontinuities. The lemma follows from Lemma 4.9 since this implies that any level set is made of at most 4q k + 1 intervals.

In the previous results we have proved properties of a level set of mw
In this lemma we relate that to proving nice properties about a set, G = {x : Sm n x = R x} for some , to obtain the setÃ in Proposition 4.2.
Lemma 4.11. For all large enough k there existsm ∈ {m, m }, d ∈ {−3, −2, −1, 1, 2, 3} and a setÃ k with λ(Ã k ) >c 4 and which is the union of at most 8q k intervals of size at leastc 2 1 8·4q k so that for x ∈Ã k , (4.28) Proof. By Lemma 3.6, for all h, j ∈ N, and any x ∈ [0, 1], Let 0 < N < q b be a positive integer, and write Note that D b = N . Then, by (4.29), for x ∈ [0, 1], We now apply (4.30) with N (x) instead of N . We obtain that for each x ∈ J there exists N (x) ∈ N so that (4.31)mr k = and so N (x) =mw k + o(mw k ). Therefore, for all large enough k we have |N (x) − mw k | <c 32 q k . Observe that if (4.32) then R j x is in one of two intervals of size at most q k α for some j ∈ [min(N (x),mw k ), max(N (x),mw k )].
It follows that there existsÃ k ⊂ A k with λ(Ã k ) >c 4 which is a union of intervals each of size at leastc 2 1 8·4q k , such that for x ∈Ã k (4.32) does not hold. Indeed, we are removing at most 2c 32 q k intervals of size q k α so we obtain a set of measure at least λ(A k ) −c 16 that is a union of at most 4q k + 1 + 4c 32 q k intervals and can invoke Lemma 4.9.
To complete the proof we need to show {S im x} hitsÃ k frequently enough. Lemma 3.8 lets us that show R orbits hitÃ k frequently enough. The key observation we use is that if we define j i by Sm i x = R j i x then j i+1 − j i ≤ 2m. We call a set with this property 2m dense. This motivates us to build an auxiliary set,Â k , so that the hits of an R orbit toÂ k give a lower bound for the hits of an Sm orbit toÃ k .
Proof of Proposition 4.2. We assume k is large enough so that Lemma 4.11 holds, q k > 16m and 4m q k α <c 16 .
ConsiderÃ k and remove from it all x so that for some j < 2m. This means that if x remains in this set then (4.34) ∃ j ≤ 0 ≤ k so that k − j ≥ 2m and R x ∈Ã k for all ∈ {j, . . . , k}.
The set remaining, A k has measure at least λ(Ã k ) − 4m q k α and is made up of at most 8m + 8q k intervals. (Indeed, we are removing at most 2m pre-images of 2 intervals of size q k α .) LetÂ k be a subset of A k of measure at least 1 2 λ(A k ) made of intervals of length at leastc 32·64(q k ) . (Indeed, we invoke Lemma 4.9 using that A k is a set of measure at leastc 8 which is made up of at most 16q k intervals.) By the estimate of the size of intervals inÂ k , Lemma 3.8 implies that if q k+u q k is large enough we have for t > q k+u , Because R i x ∈Â k , the equation (4.34) implies there exists j ≤ i ≤ k with k − j ≥ 2m so that R x ∈Ã k for all ∈ {j, . . . , k}. Then, we have where C is any 2m dense subset of {0, ..., t + 2m}.
Observing that {j ∈ [0, k + 2m] : ∃i with Sm i x = R j x} is 2m dense this implies that where N t (x) = min{j : Sm j x = R x with ≥ t}. We obtain the proposition witĥ C = 16. Indeed, λ(Â) ≥ 1 4 λ(Ã) and so by (4.35) we have that for t > q k+u , we have that (4.36) Lastly, G x := {j : ∃i with Sm i x = R j x} is at leastm separated (that is if j ∈ G x and |i − j| <m then i / ∈ G x ) and so N t (x) ≤ t m , letting us obtain, using (4.36), Multiplying the sequence of inequalities bym completes the estimate.

Renormalization
Recall that X is a torus with two marked points related to a 3-IET, T andX is the torus obtained by forgetting the two marked points.
Divergence in the space of tori, M 1 : By Mahler's compactness criterion the divergence of g tX is controlled by the shortest (non-homotopicaly trivial) simple closed curve on g tX . This sequence is given by curves γ k with vertical holonomy q k and horizontal holonomy ±|q k α − p k | = ± q k α . Coarsely, this curve is contracted from t = 0 to t = log(q k √ a k+1 ) and then expanded. Additionally, there is a fixed compact setK so that g log(q k )X ∈K for all k (and in particular |γ k | is proportional to 1 at g log(q k ) ).
Proof. For anyK there exists δ so that if g sX ∈K then the shortest simple closed curve on g sX is at least δ. As in the previous paragraph, consider the curve γ k onX, with vertical holonomy q k and horizontal holonomy ±|q k α − p k |. On g sX the curve g s γ k has vertical holonomy e −s q k and horizontal holonomy ±e s |q k α − p k |. If s ∈ [log(q k ), log(q k+1 )] then, since we are assuming that the length of g s γ k is at least δ, we must have e s |q k α − p k | ≥ δ 2 or e −s q k ≥ δ 2 . By Lemma 3.2 the first condition can only hold if e s > δ 2 a k+1 q k . Noticing that a k+1 q k > 1 2 q k+1 this implies s > log(q k+1 ) + 2 log(2) + log(δ). The second condition can only hold if s < 2q k 1 δ . The lemma follows with K = −2 log(δ) − 3 log(2).
We now assume the assumption of Proposition 1.5. This means there exists a compact set K ⊂ M 1,2 so that lim sup T →∞ 1 T |{0 < t < T : g t X ∈ K}| = c > 0. Let D 1 , ... be a sequence chosen so that 1 D i |{0 < t < D i : g t X ∈ K}| > 99c 100 and sup ζ>D 1 The next lemma is used to obtain (A7).
Lemma 5.2. For all r > 0 for all i large enough we have Proof. This is a standard application of the Vitali covering lemma. Indeed let and so for each t ∈ B we have λ({s ∈ [t, t + r] : g s X ∈ K}) < c 99 r. By applying the Vitali covering lemma to the intervals [t, t + r] where t ∈ B, we may take a disjoint subcollection of these intervals I 1 , . . . I so that Indeed let U 1 = {[t, t + r] : t ∈ B} and choose I 1 to be an interval [t, t + r] in this set so that λ({s ∈ [t, t + r] : g s X ∈ K} is maximal. Let U 2 = {[t, t + r] : t ∈ B and [t, t + r] ∩ I 1 = ∅ and let I 2 be an interval [t, t + r] in this set so that λ({s ∈ [t, t + r] : g s X ∈ K} is maximal. Also observe λ({s : s ∈ [τ, τ +r] with τ ∈ B, [τ, τ +r]∩I 1 = ∅ and g s X ∈ K}) ≤ 2λ({s ∈ I 1 : g s X ∈ K}).
Repeating this procedure we obtain our intervals I 1 , . . . , I . Having established (5.1) we see at most c 99 r of the points in each interval are in K and the measure of the union of these intervals is at most D i . This is a contradiction unless λ(B) ≤ c 33 D i + r.
By the same proof we obtain: Let f (t) = max{j : q j ≤ e t }. The next lemma is used to obtain (A8).
We use the following straightforward consequence of Lemma 5.3.
Proof of Lemma 5.4. LetĈ be the compact set in M given by projecting K to M by forgetting the marked points. Let = Ĉ as in Lemma 5.1 and δ be the shortest simple closed curve on any surface inĈ. Obtain L from the sublemma with r = r, = 2 , = . LetK denote the set of all tori whose shortest simple closed curve is at least δe −L . The lemma holds for thisK. Indeed, if g sX / ∈K then by examining the size of the shortest simple closed curve we see g s+τ X / ∈ K for all −L < τ < L. That is, considering A = {s : g s X ∈ K} and ρ = log(q f (t+r) ) − t we are asking that |{s ∈ [t + ρ, t + ρ + L] : s ∈ A}| < . So by the sublemma the set of such t has small density and so we have the lemma.
Lemma 5.5. For all t there exists −2 ≤ s ≤ 2 so that either • there exists k with a k+1 > 4 and i ≤ a k+1 4 so that e t+s = iq k • or there exists k so that e t+s = q k .
Proof. Let j = f (t). If a j+1 ≤ 4 then since e 2 > 5 we may choose s so that e t+s = q j (and so k = j). If a j+1 > 4 and i > q j+1 2 choose s so that e t+s = q j+1 (and so k = j + 1). Otherwise choose s so that e s+t = iq j with i ≤ a j+1 4 (and so k = j).
Proof of Corollary 5.6. Applying Lemma 5.5 we obtain e t+s . If q k = e t+s then by Lemma 3.6 we have ψ q k takes at most 5 values that are consecutive. Letting x i be the measure of the i th level set and appying the sublemma implies the corollary. Otherwise by Lemma 3.7 and the sublemma imply the corollary.
The next lemma is used to obtain (A0)-(A4). Its proof is similar to Lemma 5.4 and is omitted.
By choosing = 1 9 in Lemmas 5.4, 5.2 and 5.7, for each r, we may choose a sequence of t going to infinity which is simultaneously in the three sets whose measure is bounded from below in these Lemmas. For each t there exists s as in Lemma 5.5 and this choice verifies (A6) and (A9). Consider L = e s+t and c = e −r . Since |s| < 3, by Lemma 5.7 assumptions (A0-4) hold for this L and c. Indeed, C 1 , C 2 , C 3 = M and C 4 is e −3 times the minimum of the shortest distance between the marked points taken over surfaces in K. By Lemma 5.2 (and the fact that the projection of K to M 1 is compact) (A7) holds. Indeed ifĈ is the projection of K to M and c 99 r > Ĉ (u + 2) (where Ĉ is as in Lemma 5.1) then L > q k+u . Moreover, by Corollary 5.6 for each η > 0 there existsĉ η so that (A5) holds. We now just need to show (A8) holds. If e s+t ∈ {q f (t) , q f (t)+1 } this is by Lemma 3.6. Otherwise note ψ nq i is at most 5 + 2n valued by Lemma 3.7. By Lemma 5.4 there exist N r, so that a f (t)+1 < N r, and thus e s+t = q j for some < N r, . We obtain (A8) with k e −r = 5 + 2N r, .

A. The Sarnak conjecture and joinings of powers
The following result is a trivial modification of a note [Ha] of Harper, which is included for completeness. What is below is a lightly edited version of his note. See that note for connections with the work of other authors.
Theorem A.1. Let (X, T ) be a topological dynamical system. Assume that there exists C > 1 so that for every n, the set B n = {m < n : T m is not disjoint from T n } has the property that if m > m ∈ B n then m m > C then T is disjoint from Möbius. Indeed for any continuous compactly supported function with integral 0, F , we have Lemma 1 is a special case of the Turán-Kubilius inequality, but since the proof is just a short calculation we shall give it in full. Expanding the sum in the statement we obtain and on removing the square brackets, and paying attention to the diagonal contribution in the double sum, we see that is at most A.1. Completion of proof. Let F (n) = F (T n x) and in view of Lemma 1 and the Cauchy-Schwarz inequality, we have that Observe that for each τ there exists N 0 so that for all N > N 0 we have N (N µτ +O(e 1 τ )) µ 2 Because n≤N 2|{p < e 1 τ : p 2 |n}| is O(N ) we focus on the other term, We apply Cauchy-Schwartz to 2 j ≤k<2 j+1 |µ(k) p≤min{e 1 τ , N k } µ(p)F (pk)| and bound (A.2) by The contribution of the diagonal terms (p 1 = p 2 ) is at most 2 j π(min{e 1 τ , N 2 j )} F sup where π(n) is the number of primes less than or equal to n. The contribution of the p 1 = p 2 where p 1 and p 2 are not disjoint is at most Summing over j these terms give a contribution that is O(N ). Indeed, we estimate by . This is clearly O(N ).
For τ fixed we choose M 0 large enough so that for any M > M 0 , p 1 , p 2 < e 1 τ with T p 1 disjoint from T p 2 , and L ≤ M we have | n≤L F (p 1 n)F (p 2 n)| < τ M.
The contribution of the p 1 , p 2 where T p 1 and T p 2 are disjoint and 2 j > M 0 is at most 2 j (τ 2 j )π(min{e 1 τ , For fixed τ , summing over j, this is also O(N ). Indeed we focus on j: N 2 j <e 1 τ 2 j (τ 2 j )π(min{e 1 τ , N 2 j }) 2 and observe that this is bounded by O(N τ log( 1 τ )). If N is large enough the terms when 2 j < M 0 are also O(N ). Since µ n → ∞ plugging this into the last line of (A.1) and possibly choosing an even larger N so that N (N µτ +O(e 1 τ )) µ 2 τ < N 2 µτ completes the proof.

B. Disjointness of powers for generic 3-IET's
Theorem B.1. For almost every 3-IET, T we have that T n is disjoint from T m for all 0 < n < m.
We prove this by the following straightforward disjointness criterion: Proposition B.2. Let T be an ergodic 3-IET, R be an irrational rotation and 0 < n < m be natural numbers. Assume there exists c > 0, r ∈ N, a sequence k 1 , ... sets F i , G i so that for all i (1) lim Then T n and T m are disjoint.
Sketch of proof. Let σ be an ergodic joining of T n × T m that is a probability measure. Because is T is ergodic it suffices to show that σ is id × T −1 invariant. By the fact that ergodic probability measures are mutually singular or the same it suffices to show that (id × R −1 ) * σ is not singular with respect to σ. By our assumptions, for any i we have σ(F i × G i ) ≥ c. Similarly to Section 2, σ is not singular with respect to (id × R −1 ) * σ.
For any α let q j α = (−1) j q j α , the signed distance of R q j x from x. If x ∈ [0, 1) there exists b 1 , ... so that b i ≤ a i , if b i = a i then b i+1 = 0 and x = ∞ i=1 b i q i−1 α . Notice that for any fixed α the set of x with (an allowable) Ostrowski expansion b 1 , ..., b k is an interval of size at least q k+1 α . Lemma B.3. Given a 3-IET consider it as rotation by α induced on an interval [0, x). Let [a 1 , . . . ] be the continued fraction of α and (b 1 , ...) be the α-Ostrowski expansion of x. For λ 2 almost every (α, x) we have that for any ordered k-tuple of pairs (c 1 , d 1 ), ..., (c k , d k ) of natural numbers so that d i ≤ c i − 1 we have that there are infinitely many i with ((a i , b i ), ..., (a i+k−1 , b i+k−1 )) = ((c 1 , d 1 ), ..., (c k , d k )).
Proof. For almost every α any (k + 1)-tuple of natural numbers occurs infinitely often in its continued fraction expansion by the ergodicity of the Gauss map with respect to a fully supported finite invariant measure and the fact that having a fixed initial (k + 1)-tuple (c 1 , . . . , c k+1 ) is a set of positive measure. For any α with this property, the set of x so that the pair (α, x) satisfies the proposition is a set of full measure because the complement has no Lebesgue density points. Indeed let α have a j+i = c i for i ≤ k + 1 and y ∈ [0, 1), then an interval of size at least q j+k+1 α in B(y, q j α|) have that the j + 1 through j + k terms of their Ostrowski expansion are d 1 , . . . d k−1 . Since q j+k+1 α q j α > 3 −(j+1) c 1 · · · c k+1 we have the claim.