M\"{o}bius disjointness for topological models of ergodic systems with discrete spectrum

We provide a criterion for a point satisfying the required disjointness condition in Sarnak's M\"obius Disjointness Conjecture. As a direct application, we have that the conjecture holds for any topological model of an ergodic system with discrete spectrum.


Introduction
The Möbius function µ : N → {−1, 0, 1} is defined as follows: µ(n) = (−1) k when n is the product of k distinct primes and µ(n) = 0 otherwise. The well-known Möbius Randomness Law in [IK04,Section 13.1] speculates that summing the Möbius sequence µ(n) against any reasonable sequence ξ(n) would lead to significant cancellations. This was verified in [GT12] for polynomial nilsequences, a class of sequences of low complexity.
In [Sar09] Sarnak reformulated the law as the Möbius Disjointness Conjecture by making precise the notion of a "reasonable sequence", namely a bounded sequence arising from a topological dynamical system with zero topological entropy. Since then, the Möbius Disjointness Conjecture became an important theme in dynamical system and number theory.
Recall that a topological dynamical system (TDS for short) is a pair (X, T ) consisting of a compact metric space X, and a continuous self-map T : X → X. The distance on X will be denoted by d(·, ·). In this paper, E stands for a finite average, for instance, Conjecture 1.1. (Möbius Disjointness Conjecture, [Sar09]) Let (X, T ) be a TDS with zero topological entropy. Then, for every x ∈ X, f (T n x)µ(n) = 0, ∀f ∈ C(X). (1.1) The case where (X, T ) is a finite periodic system is equivalent to the Prime Number Theorem in arithmetic progressions, and the case where T is a rotation on the circle is Davenport's theorem [Dav37]. Many other special cases have been established for Conjecture 1.1 more recently. Here we just list a few of them: [ Note that the conjecture holds for any minimal rotation over a compact abelian metric group, and hence for its every isomorphic extension (for details see [DK15,Theorem 4.1], [EAKL16,Proposition 5.2] or [Vee16]). Thus it is natural to ask whether the conjecture holds for any topological model of an ergodic system with discrete spectrum, as the minimal rotation over a compact abelian metric group is the "standard model" of an ergodic system with discrete spectrum by the well-known Halmos-von Neumann Representation Theorem (see for example [HvN42] or [Wal82, Theorem 3.6]). By a topological model of an ergodic system (X , B, ξ, S) we mean any uniquely ergodic TDS (X, T ) (with the unique invariant Borel probability measure ν) such that (X, B X , ν, T ) and (X , B, ξ, S) are measure-theoretically isomorphic, where B X is the Borel σ-algebra of X. Recall that a TDS is uniquely ergodic if it admits a uniquely invariant Borel probability measure. We say that TDS (X, T ) is minimal if X is the only nonempty closed invariant subset K (that is, T K ⊂ K) of the system. Though an ergodic system with discrete spectrum may be the simplest ergodic system with zero measure-theoretic entropy, the question seems to be not easy, even we require additionally that these topological models are minimal. The trivial example of an ergodic system with discrete spectrum is a finite periodic system, whose each minimal topological model is exactly a finite periodic system, and so the conjecture follows from the Prime Number Theorem in arithmetic progressions. In general, an ergodic system with discrete spectrum need not to be a finite periodic system, whose minimal topological model may present very complicated dynamical behavior, except its standard model of a minimal rotation over a compact abelian metric group. In fact, Lehrer [Leh87] showed that each non-periodic ergodic system admits a minimal topological model which is topologically strongly mixing.
In this paper we will solve the question by proving the following result (without the assumption of the topological model being a minimal TDS).
Theorem 1.2. Conjecture 1.1 holds for all topological models of an ergodic system with discrete spectrum.
Remark that the conjecture was solved for any topological model of an ergodic system with irrational discrete spectrum by the recent interesting work of El Abdalaoui, Lemańcyzk and de la Rue [EALdlR16] 1 . In [DG15], Downarowicz and Glasner asked if the conjecture holds for all topological models of an ergodic system with rational discrete spectrum {−1, 1} (whose standard model is the two-point periodic system). Thus Theorem 1.2 answers affirmatively the question raised in [DG15]. After finishing a preprint version of the paper, we learned from Mariusz Lemańczyk that El Abdalauoi, Ku laga-Przymus, Lemańczyk and de la Rue had previously solved the conjecture for all topological models of a finite periodic system and communicated it to others on several occasions. Theorem 1.2 follows directly from the following more general result.
Theorem 1.3. Suppose that TDS (X, T ) admits only countably many ergodic invariant Borel probability measures, and that each of these measures has discrete spectrum. Then Conjecture 1.1 holds for the system (X, T ).
We shall prove Theorem 1.3 by showing the following stronger result.
converges to a Borel probability measure ρ (which is obviously invariant). Suppose that ρ is a convex combination of countably many ergodic invariant Borel probability measures, and that each of these ergodic measures has discrete spectrum. Then (1.2) Our criterion Theorem 1.4 provides a sufficient condition for a point satisfying the disjointness condition (1.1), even though it is not hard to see that the point may produce positive topological entropy. Observe that, as shown in [DK15] and [EAKL16], the equation (1.1) may fail for some point from a class of Toeplitz sequences which may have positive topological entropy.
As shown in next section, Theorem 1.4, the main technical result of the paper, is used to deduce Theorem 1.3 and recover some recent results obtained by other mathematicians. We now briefly outline its proof as follows.
Given the point x, the sequence {N i } and the limit measure ρ, we first choose finitely many distinct ergodic measures ρ 1 , · · · , ρ J such that up to ǫ-error, ρ is a convex combination of ρ 1 , · · · , ρ J . Thanks to the Halmos-von Neumann Representation Theorem, one can approximate, in a measurable way, the dynamics on ρ j by a rotation of a compact abelian metric group. And then by Lusin Theorem, one can fix a compact set A j on which this approximation is continuous. As the ρ j 's are mutually singular, one can make the A j 's disjoint. Then for a typical 1 ≤ n ≤ N i , T n x is very close to one of 1 In fact in [EALdlR16] Conjecture 1.1 is solved for any topological model of an ergodic system with quasi-discrete spectrum. A transformation with quasi-discrete spectrum is by definition totally ergodic [EALdlR16, Definition 2], which holds if and only if all of its eigenvalues except 1 are irrational. In particular, any ergodic automorphism with irrational discrete spectrum has quasi-discrete spectrum.
these A j 's. By continuity of the map T , this allows us to approximate the finite orbit segment T n x, · · · , T n+L−1 x by a finite orbit segment that stays inside A j for most of the time. After further approximating measurable functions on the rotation by continuous ones on a finite dimensional torus, we may approximate the values of the function f observed along the finite segment of orbit above by a linear combination of exponential sequences of the form {e 2πiβ(l+n) } L−1 l=0 . The value of β may depend on the starting position n of the segment. We finally apply a recent theorem of Matomäki, Radziwi l l and Tao on short averages of non-pretentious multiplicative functions to assert that, for sufficiently large N i and appropriately chosen L, such exponential sequences have significant cancellations against the Möbius sequence {µ(l+n)} L−1 l=0 for most 1 ≤ n ≤ N i . Finally, one combines the bounds on these short segments to obtain the conclusion.
Acknowledgements. Part of the work was carried out during a visit of Z. Wang to the School of Mathematical Sciences and Shanghai Key Laboratory for Contemporary Applied Mathematics of Fudan University. He gratefully acknowledges the hospitality of Fudan University.
We thank Ai-Hua Fan, Yunping Jiang, Peter Sarnak, Weixiao Shen and Xiangdong Ye for helpful and encouraging discussions.
We also thank Lemańczyk for bringing to our attention the work of [EAKL16] and [Vee16], and for informing us, upon the posting of a preprint version of this paper, of El Abdalauoi, Ku laga-Przymus, Lemańczyk and de la Rue's earlier unpublished proof for topological models of a finite periodic system.

Consequences of Theorem 1.4
In this section, we present and prove some consequences of Theorem 1.4.
2.1. Direct consequences of Theorem 1.4. Firstly we can deduce Theorem 1.3 easily from Theorem 1.4 as follows.
Proof of Theorem 1.3. Let (X, T ) be as in the statement of the theorem. Given x ∈ X and f ∈ C(X), it suffices to prove that in any increasing sequence {Ñ i } of positive integers, there is a subsequence {N i } for which the convergence (1.2) holds. Indeed, one can always choose a subsequence δ T n x converges to a Borel probability measure ρ. By the assumption, ρ is a convex combination of countably many ergodic invariant Borel probability measures, and each of these ergodic measures has discrete spectrum. Thus (1.2) holds by Theorem 1.4.
Observe that if the compact metric state space of a TDS contains at most countably many points, then the system satisfies the assumption of Theorem 1.3, as in this case ergodic invariant Borel probability measures of the system are supported on disjoint periodic orbits. And so we recover the following recent result obtained by Wei [Wei16] as a direct corollary of Theorem 1.3.
Theorem 2.1. [Wei16, Theorem 5.16] Assume that X contains at most countably many points. Then Conjecture 1.1 holds for the system (X, T ).
Recently, in [FJ15] Fan and Jiang related the Möbius disjointness with the notion of stable in the mean in the sense of Lyapunov or simply mean-Lstable, which was introduced in [Fom51] by Fomin when studying ergodic systems with discrete spectrum and further discussed in [Oxt52,Aus59,LTY15,DG15,FJ15].
In the following we will deduce from Theorem 1.4 the following Proposition 2.2, which allows us to recover the Möbius function case [FJ15, Corollary 1] of Fan and Jiang's recent work [FJ15]. As implied by a characterization due to Downarowicz-Glasner [DG15, Theorem 2.1] and Li-Tu-Ye [LTY15, Theorem 3.8], each minimal mean-L-stable TDS is an isomorphic extension of a minimal rotation over a compact abelian metric group, and then any minimal mean-L-stable TDS (and hence the system considered in Proposition 2.2) has zero topological entropy. See [DG15] for the detailed definition of an isomorphic extension.
Recall that a TDS (X, T ) is mean-L-stable if for any ǫ > 0 there is δ > 0 such that d(x 1 , x 2 ) < δ implies d(T n x 1 , T n x 2 ) < ǫ for all n ∈ N except a set of upper density less than ǫ. A minimal set is a nonempty closed invariant set K ⊂ X such that the subsystem (K, T ) is minimal. Now let ν be an invariant Borel probability measure of (X, T ) and K ∈ B X . We say that ν is supported on K if ν(K) = 1.
Proposition 2.2. Let (X, T ) be a TDS with x ∈ X and {N i : i ∈ N} ⊂ N tend to ∞, such that the sequence N i E n=1 δ T n (x) converges to a Borel probability measure ρ (which is obviously invariant). Suppose that ρ is supported on countably many minimal mean-L-stable subsystems (X j , T ). Then Proof. Each TDS (X j , T ) is uniquely ergodic by [Oxt52, (6.4)] (see also [LTY15,Corollary 3.4]), and set ν j to be the unique ergodic corresponding invariant Borel probability measure. In particular, ρ is a convex combination of these countably many ν j . By the characterization of a minimal mean-Lstable TDS in [DG15,LTY15], each (X j , T ) is an isomorphic extension of a minimal rotation over a compact abelian metric group, and hence ν j has discrete spectrum. Then the conclusion follows from Theorem 1.4.
As a byproduct of Proposition 2.2 we have the following result. Note that each mean-L-stable TDS has zero topological entropy (see Remark 2.5).
Proof. Let (X, T ) be a mean-L-stable TDS and x ∈ X. Set X * to be the closure of the orbit {T n x : n ∈ N}. Then the TDS (X * , T ) is uniquely ergodic (again by [Oxt52, (6.4)] or [LTY15, Corollary 3.4]), and set ν x to be the unique corresponding invariant Borel probability measure. In particular, the sequence N E n=1 δ T n (x) converges to ν x , which is necessarily supported on the unique minimal subsystem of (X * , T ), which is also mean-L-stable.  [FJ15], we say that a TDS (X, T ) is minimally mean attractable if for each x ∈ X there exists a minimal subset M x of (X, T ) such that x is mean attracted to M x , that is, for any d(T n x, T n z) < ǫ; and that (X, T ) is minimally mean-L-stable if every minimal subsystem is mean-L-stable.
We remark that the second part of Proposition 2.4 is in fact [FJ15, Corollary 1]. We give here an alternative proof of it based on Proposition 2.2.
• If (X, T ) is mean-L-stable, then it is both minimally mean-L-stable and minimally mean attractable. • If (X, T ) is both minimally mean-L-stable and minimally mean attractable, then it satisfies Conjecture 1.1.
Proof. First assume that (X, T ) is mean-L-stable. It is obviously minimally mean-L-stable. Let x ∈ X, and set X * to be the closure of {T n x : n ∈ N}. Then TDS (X * , T ) is uniquely ergodic and hence contains a uniquely minimal subset X x . Let ǫ > 0. By [LTY15, Lemma 3.1] we can select δ > 0 d(T n x, T n x * ) < ǫ, and so (X, T ) is minimally mean attractable. Now assume that (X, T ) is not only minimally mean-L-stable but also minimally mean attractable, and let x ∈ X. Fix any f ∈ C(X) and ǫ > 0. Take δ > 0 such that d(x 1 , x 2 ) < δ implies |f (x 1 ) − f (x 2 )| < ǫ 2 and choose M > 1 to be a finite upper bound for |f |. By the assumption, x is mean attracted to a minimal subset M x of (X, T ), and so there is z ∈ M x with lim sup (2.1) By the construction of δ, it is easy to obtain lim sup |f (T n x)−f (T n z)| < ǫ from (2.1). Again by the assumption (M x , T ) is mean-L-stable, and then it admits a unique invariant Borel probability measure ν x . In fact, the sequence N E n=1 δ T n (z) converges to ν x , and hence lim sup By the arbitrariness of ǫ and f we obtain that the sequence converges to the measure ν x . Now applying Proposition 2.2 to the point x we obtain the conclusion.
Remark 2.5. By the proof of Proposition 2.4, if a TDS (X, T ) is both minimally mean-L-stable and minimally mean attractable, then each ergodic invariant Borel probability measure is supported on a minimal mean-L-stable subsystem and hence has zero measure-theoretic entropy by [DG15, LTY15], thus the system (X, T ) has zero topological entropy. In particular, by Proposition 2.4, mean-L-stability implies zero topological entropy.

Proof of Theorem 1.4
From now on, we shall fix the point x ∈ X, the sequence {N i }, the measure ρ as in Theorem 1.4. By the assumption, we can write ρ as a countable average i w i ρ i , where i w i = 1 with each w i > 0, and the ρ i 's are distinct ergodic invariant Borel probability measures of (X, T ) and each of these ergodic measures has discrete spectrum.
We shall also fix any f ∈ C(X) and ǫ ∈ (0, 1 30 ). Without loss of generality, we assume |f | ≤ 1. It suffices to prove (3.1) 3.1. Approximation by functions on tori. Clearly there are finitely many of ρ i 's, say ρ 1 , · · · , ρ J , such that Proposition 3.1. For some integer d ≥ 1, there exist • a continuous function h ∈ C(X) with |h| < 2 such that X |f − h|dρ < 7ǫ; (3.3) • mutually disjoint compact subsets A 1 , · · · , A J ⊂ X with each (3.4) • for each 1 ≤ j ≤ J, a vector α j ∈ T d and a continuous map p j : A j → T d such that for each nonnegative integer l and any point x * ∈ X, if x * and T l x * are both contained in A j , then with integer M j ≥ 1, coefficients a j,1 , · · · , a j,M j ∈ C and frequencies ξ j,1 , · · · , ξ j, Proof. By the Halmos-von Neumann Representation Theorem, any ergodic system with discrete spetrum is measurably isomorphic to a minimal rotation over a compact abelian metric group (with the normalized Haar measure). It is standard that a compact abelian metric group is topologically isomorphic to a closed subgroup of T N (see for example [Mor77, Chapter 5, Corollary 1]). Hence for each j = 1, · · · , J, there existα j ∈ T N and a Rα j -invariant ergodic Borel probability measure θ j on T N , such that (X, B X , ρ j , T ) is measurably isomorphic to (T N , B T N , θ j , Rα j ), where Rα j (ω) = ω +α j for each ω ∈ T N . We write φ j : (X, B X , ρ j , T ) → (T N , B T N , θ j , Rα j ) for the measurable isomorphism. Then there exists a subset Y j of full ρ j -measure such that φ j : Y j → φ j (Y j ) is invertible and measure-preserving and (3.8) Note that these countably many ρ i 's are mutually singular, it makes no any difference to assume that these Y 1 , · · · , Y J are mutually disjoint, and additionally that ρ i (Y j ) = 0 for each 1 ≤ j ≤ J and all i = j (no matter whether i ∈ {1, · · · , J} or not).
For each 1 ≤ j ≤ J, by Lusin's Theorem there exists a compact subset

and then by Tietze Extension Theorem there exists a function
where π d : T N → T d denotes the projection of T N to the first d coordinates.
As the Fourier basis {e(ξ · z) : ξ ∈ Z d } generates a dense subspace of C(T d ), one may assume without loss of generality that each h ′ j has the form of for some integer M j ∈ N, coefficients a j,1 , · · · , a j,M j ∈ C and frequencies ξ j,1 , · · · , ξ j, As these compact subsets A j ⊂ Y j are mutually disjoint, by Tietze Extension Theorem we can find a function h ∈ C(X) with |h| < 2 and h = h j over each A j . Note that by (3.9), |f − h| = |f − h j | < ǫ over each A j . Furthermore, (3.10) For each 1 ≤ j ≤ J, we take p j = π d • φ j and α j = π d (α j ). Note that <ǫ + 2ǫ · 3 (using (3.10)) = 7ǫ, which implies the inequality (3.3). Moreover, (3.5) and (3.7) follow respectively from (3.8) and the construction. This finishes the proof.
Recalling the assumption that the sequence δ T n (x) converges to the measure ρ, one has that the sequence Therefore, to prove (3.1), it suffices to estimate 3.2. Decomposition into short orbit segments. Now let C be a large constant for the moment, which will be specified later, and select a large integer L with C log log L log L < ǫ. (3.12) Further define for each 1 ≤ j ≤ L a subset B j ⊂ A j by which is clearly a compact subset. Note that, by (3.4), (3.14) Finally, let η > 0 be small enough such that |h(T l y) − h(T l y ′ )| < ǫ whenever d(y, y ′ ) < η and 0 ≤ l ≤ L − 1. (3.15) For every n ∈ N, choose once and forever x n ∈ J j=1 B j such that d(T n x, x n ) = min Lemma 3.2.
Proof. Applying the Tietze Extension Theorem we can find a continuous function φ : X → [0, 1] that equals 0 on J j=1 B j , and equals 1 outside the η-open neighborhood of J j=1 B j . Then clearly Since the sequence δ T n (x) converges to the measure ρ, which is supported on J j=1 B j except for a portion strictly smaller than 3ǫ by (3.14), one has that the sequence N i E n=1 φ(T n x) converges to X φdρ < 3ǫ by the construction of the function φ. And then the conclusion follows from (3.16).
In the sequel, for each i ∈ N denote by E i the set of n ∈ {1, · · · , N i } with d(T n x, x n ) < η. Lemma 3.2 asserts #(E i ) > (1 − 3ǫ)N i if i is large enough.
Proof. Once i is large enough, then L N i < ǫ 2 and Lemma 3.2 holds. As one has that, because of |h| ≤ 2, Moreover, by the construction (3.15) of η one has that |h(T l+n x) − h(T l x n )| < ǫ whenever n ∈ E i and 0 ≤ l ≤ L − 1, and then Observing that the density of the exceptional set E c i in {1, · · · , N i } is strictly smaller than 3ǫ by Lemma 3.2, we have Proposition 3.4. There are constants C 0 , κ 0 such that, for all N ≥ L ≥ 10, The proposition is a direct application of [MRT15, Theorem 1.7] to the Möbius function µ, following the discussion about the non-pretentiousness of µ preceding the theorem in that paper.
We now define the value of the constant C to be where the coefficients a j,m are defined as in Proposition 3.1.
Proof. For every n ∈ N, the point x n belongs to J j=1 B j and thus lies in B jn for some 1 ≤ j n ≤ J. Let p jn and h ′ jn be defined as in Proposition 3.1. Then by the construction (3.13) of the compact subset B jn ⊂ A jn , the set F n = {0 ≤ l ≤ L − 1 : T l x n ∈ A jn } has the cardinality #(F n ) ≥ (1 − ǫ)L.
Thus for each l ∈ F n , x n ∈ A jn and T l x n ∈ A jn , and then one has h(T l x n ) = h ′ jn (p jn (T l x n )) = h ′ jn R l α jn (p jn (x n )) by the properties (3.5) and (3.7). From this, we first deduce that [h(T l x n ) − h ′ jn (R l α jn p jn (x n ))]µ(l + n) 1 L · ǫL · (|h| + |h ′ jn |) < 4ǫ. (3.19) On the other hand, given (3.6), for every 0 ≤ l ≤ L − 1, we can write h ′ jn R l α jn (p jn (x n )) = once i is large enough. Because that the second term is bounded by ǫ by (3.12), one has that h ′ jn (R l α jn p jn (x n ))µ(l + n) < 2ǫ (3.20) as long as C(log N i ) −κ 0 < ǫ. Thus we can obtain the conclusion by adding together (3.19) and (3.20).
Proof of Theorem 1.4. Now we are ready to complete the proof of (3.1) (and hence the proof of Theorem 1.4), by adding together the estimates from the inequality (3.11), Corollary 3.3 and Proposition 3.5.