Automatic sequences as good weights for ergodic theorems

We study correlation estimates of automatic sequences (that is, sequences computable by finite automata) with polynomial phases. As a consequence, we provide a new class of good weights for classical and polynomial ergodic theorems, not coming themselves from dynamical systems. We show that automatic sequences are good weights in $L^2$ for polynomial averages and totally ergodic systems. For totally balanced automatic sequences (i.e., sequences converging to zero in mean along arithmetic progressions) the pointwise weighted ergodic theorem in $L^1$ holds. Moreover, invertible automatic sequences are good weights for the pointwise polynomial ergodic theorem in $L^r$, $r>1$.

1. Introduction. The study of weighted ergodic averages goes back to Wiener and Wintner [51] who showed that sequences of the form (λ n ) for λ in the unit circle T form a set of good weights for the pointwise ergodic theorem in L 1 with the set of convergence being independent of λ. Recall that a sequence (a n ) ⊂ C is called a good weight for the pointwise ergodic theorem in L p if for every measure-preserving system (X, µ, T ) and every f ∈ L p (µ) the weighted averages 1 N N n=1 a n T n f converge a.e.. Note that some authors call the above averages "modulated" instead of "weighted", see, e.g., Berend, Lin, Rosenblatt, Tempelman [10].
Since then quite a few classes of good weights have been discovered. The celebrated return time theorem of Bourgain [12] states that for an ergodic measurepreserving system (X, µ, T ), every f ∈ L ∞ (µ) and a.e. x ∈ X, the sequence (f (T n x)) is a good weight in L ∞ (and hence in L 1 by the maximal inequality). Lesigne [36,37] has extended the linear weights (λ n ) in the Wiener-Wintner result 4088 TANJA EISNER AND JAKUB KONIECZNY to polynomial ones (λ p(n) ), see also Frantzikinakis [29] for the study of uniform convergence, and the general case of so-called nilsequences was treated in Host, Kra [32] with the uniform convergence version in Eisner, Zorin-Kranich [26]. Thus, for good systems (in this case nilsystems), Bourgain's return times result holds everywhere instead of almost everywhere.
Until now there is a limited number of different examples of good weights known. Such examples are the von Mangoldt and the Möbius function treated in Wierdl [52] and El Abdalaoui, Ku laga-Przymus, Lemańczyk, de la Rue [27], respectively, q-multiplicative sequences as in Lesigne, Mauduit, Mossé [39,38], and Hardy fields weights studied in Eisner, Krause [25], see also Krause, Zorin-Kranich [34] for a random version. Note that the argument in [27] also shows that the Möbius function times a nilsequence is also a good weight.
The purpose of this paper is to give a new class of examples of good weights for the pointwise ergodic theorem and its polynomial version, namely automatic sequences.
Automatic sequences are simply the sequences computable by finite automata (see Section 2 for a precise definition). They are of considerable interest in computer science, as they give rise to one of the weakest notions of computability. For extensive background, see [2].
Computations of exponential sums involving automatic sequences are a standard tool, often used to solve number theoretic problems. Given how ubiquitous the many variants of the circle method are in modern number theory, this comes as no surprise. For a good source of background and discussion, see [46].
Before we move on with the discussion, let us note that in the simplest instance, one may consider the average value En<N a(n) (for notation, see the end of this section), where a(n) is an automatic sequence. Unfortunately, these do not converge in general as N → ∞, although the logarithmic averages 1 log N n<N a(n)/(n + 1) do. Some of the technical complications in this paper can be traced back to this kind of behaviour.
Perhaps the simplest non-trivial result in this vein is due to Gelfond [30], who showed that for the Thue-Morse sequence (t(n)) it holds that En<N t(n)e(nα) N −c uniformly in α, and gave the optimal value of c. Here, t is given by t(n) = (−1) s2(n) , where s 2 (n) denotes the sum of binary digits of n. For similar results involving the Rudin-Shapiro sequence, see [44] and references therein. The Rudin-Shapiro sequence is given by r(n) = −1 if the number of times the pattern 11 appears in binary expansion of n is odd, and r(n) = +1 if the said number is even.
In [32] it is shown that the Thue-Morse sequence is a good sequence of weights for mean convergence of multiple ergodic averages. In [33], Konieczny obtained bounds on the Gowers norms of the Thue-Morse and Rudin-Shapiro sequences, which imply that these sequences do not correlate with any polynomial phases.
For the purposes of this paper, we will study correlations of fairly general classes of automatic sequences with linear phases (Prop. 5.1, 8.1) and polynomial phases (Cor. 5.2, Prop. 9.3). Similar results for Kloosterman sums are obtained in [17]. Related work for infinite automata can be found in [43].
For almost everywhere convergence, we need to impose some mild conditions on the automatic sequence. Let us say that a (bounded) sequence a : N 0 → C is balanced if En<N a(n) → 0 as N → ∞, and totally balanced if En<N a(qn + r) → 0 as N → ∞ for any q ∈ N, r ∈ N 0 . Equivalently, a(n) is totally balanced if it does not correlate with any periodic sequence b(n): En<N a(n)b(n) → 0 as N → ∞.
Theorem B. Let a : N → C be a totally balanced automatic sequence. Then, for any ergodic measure-preserving system (X, µ, T ) and any f ∈ L 1 (µ) we have E n<N a(n)T n f (x) → 0 for a.e. x ∈ X as N → ∞.
Remark. Note that since automatic sequences are bounded, it suffices to show a.e. convergence for L 2 -functions in both theorems by the classical maximal inequality.
For polynomial averages, by Bourgain's maximal inequality for polynomials [11], a.e. convergence in L 2 implies a.e. convergence in L p for every p > 1. Note that even unweighted monomial averages diverge in L 1 (and that monomials have even a much stronger property of being universally bad in L 1 ) by Buczolich, Mauldin [13] and LaVictoire [35].
Theorem B holds (with a natural modification of the limit) also when a(n) is a sum of a totally balanced sequence and a periodic sequence. Unfortunately, not every k-automatic sequence admits a decomposition into a periodic and totally balanced part, as is seen from the example of the sequence a(n) = (−1) ν2(n) (where ν 2 (n) denotes the largest power of 2 dividing n). However, invertible automatic sequences (see Section 9 for details) admit such a decomposition. In fact, for invertible sequences we obtain a considerably stronger conclusion.
Theorem C. Let a : N 0 → C be an invertible automatic sequence and let p ∈ Z[x] be a polynomial with p(N 0 ) ⊂ N 0 . Then, for any ergodic measure-preserving system (X, µ, T ) and any f ∈ L r (µ), r > 1, the averages converge a.e. as N → ∞. If p is linear then the convergence holds for any f ∈ L 1 (µ). Notation. We denote N = {1, 2, . . . , } and N 0 = N ∪ {0}. The symbol E is borrowed from probability theory, Ex∈A f (x) = 1 |A| x∈A f (x) for a finite set A. We write [N ] = {0, 1, . . . , N − 1} and e(θ) = e 2πiθ . We use standard asymptotic notation: X = O(Y ) or X Y if there exists an absolute constant c such that |X| < cY . If X and Y depend on a parameter n then X = o(Y ) as n → ∞ if Y > 0 for sufficiently large n and X/Y → 0 as n → ∞.

Definitions.
Automatic sequences. A sequence (a(n)) n≥0 taking values in a finite set ∆ is k-automatic if a(n) can be computed by a finite device, given the expansion of n base k on input. We now make this more precise. For the canonical introduction to the theory of automatic sequences, we refer to [2].
Let k ≥ 2 be an integer. We will denote by Σ k = {0, 1, . . . , k − 1} the set of digits base k, and by Σ * k = l≥0 Σ l k the set of words over Σ k , including the empty word . With the operation of concatenation, Σ * k is a monoid. If w = (w i ) l−1 i=0 ∈ Σ * k , then by [w] k ∈ N 0 we denote the corresponding integer l−1 i=0 w i k i , and for n ∈ N 0 by (n) k ∈ Σ * k we denote the expansion of n base k with no leading 0's. (In particular, (0) k = .) Similarly, for n ∈ N 0 and t ∈ N 0 by (n) t k ∈ Σ t k we denote the terminal t digits of n (padded with leading 0's if necessary).
A finite k-automaton with output A (which we will subsequently just call automaton) consists of the following data: i. a finite set of "states" S, ii. a distinguished "initial" state s 0 ∈ S, iii. a "transition" function δ : S × Σ k → S, iv. an "output" function τ : S → ∆ (where ∆ is some finite set).
For instance, the Thue-Morse sequence, given by t(n) = s 2 (n) mod 2 where s 2 (n) denotes the sum of binary digits of n, can be computed by the following automaton with S = {s 0 , s 1 }, δ(s i , 0) = s i , δ(s i , 1) = s 1−i and τ (s i ) = i for i ∈ {0, 1}. We will also occasionally be interested in automata without output, or without a distinguished initial state; in this case we will refer to them as partial automata (it will always be clear from the context which data is present).
The class of k-automatic sequences is closed under arithmetic operations and restriction to arithmetic progressions for any k ≥ 2. That is, if a(n) and b(n) are k-automatic sequences taking values in C, then a(n) + b(n), a(n) · b(n) are k-automatic; and if a(n) is any k-automatic sequence and q ∈ N, r ∈ N 0 then a(qn + r) is k-automatic.
A (partial) k-automaton A = (S, δ) without output and initial state is strongly connected is there exists a path between any pair of vertices, i.e. for each s, s ∈ S there exists v ∈ Σ * k with δ(s, v) = s . A strongly connected component of A is a set S ⊂ S of states such that for any states s, s ∈ S there exists v ∈ Σ * k such that δ(s, v) = s , i.e. S is strongly connected as a directed multigraph.
We will usually treat the base k ≥ 2 as fixed, although occasionally it will be convenient to replace it by a power k t . If a k-automatic sequence a(n) is given, then this is essentially the only freedom we have in the choice of k. Indeed, we have the following result.
Theorem 2.1 (Cobham). Let k, l ∈ N ≥2 , and let a(n) be a k-automatic sequence. Then a(n) is l-automatic if and only if either a(n) is ultimately periodic or log l/ log k ∈ Q.
Here, a sequence is ultimately periodic if it agrees with a periodic sequence away from a finite set.
A slight technical difficulty stems from the fact that elements of Σ * k may well have leading 0's. Luckily, whenever an automatic sequence a : N 0 → ∆ is given, it is always possible to find an automaton A which produces a(n) so that the corresponding sequence a : Σ * k → ∆ has the property that a(w) = a([w] k ) for all w ∈ Σ * k , i.e. τ (δ(s, 0)) = τ (s) for all s. In this case, we will say that A ignores leading 0's. We will assume that all our automatic sequences a : N 0 → ∆ are produced by automata which ignore leading 0's.
For any sequence a : Σ * k → ∆, we define the k-kernel of a to be the set N k (a) of sequences of the form b(u) = a(uv) where v ∈ Σ * k . Accordingly, for any sequence a : N 0 → ∆, we define the k-kernel N k (a) of a to be the set of sequences b(n) = a(k l n + m) where m < k l . Note that these definitions are consistent with the way that we identify sequences Σ * k → ∆ and N 0 → ∆. The relevance of k-kernels to the study of automatic sequences stems from the following well-known characterisation.
Then a is k-automatic if and only if N k (a) is finite. The analogous statement holds for sequences N 0 → ∆.
We will also use the complementary notion of "co-kernel". For a : Σ * k → ∆, the co-kernel N k (a) consists of the sequences b(u) = a(vu) where v ∈ Σ * k . This notion does not have a satisfactory analogue for sequences N 0 → ∆. Let is finite. Suppose that the k-automatic sequence a : Σ * k → Ω is produced by the automaton A = (S, s 0 , δ, τ ). Then, the sequences in N k (a) are obtained by changing the initial state, and the sequences in N k (a) are obtained by changing the output function. More precisely, if b ∈ N k (a) and b ∈ N k (a) are given by b(u) = a(uv) and b (u) = a(vu), then b is produced by the automaton (S, δ(s 0 , v), δ, τ ), and b is produced by (S, s 0 , δ, τ ) where τ (s) = τ (δ(s, v)).
In particular, given a "partial automaton" consisting of a set of states S and a transition function δ, as well as a (not necessarily finite) target set Ω, if we let M denote the family of sequences a : Σ * k → Ω produced by all possible automata (S, s 0 , δ, τ ) where s 0 ∈ S and τ : S → Ω, then M is closed under the operation of taking kernels and co-kernels.
Measure-preserving systems. By a measure-preserving system we mean a triple (X, µ, T ), where (X, µ) is a probability space and T : X → X is a µ-preserving transformation. For every p ≥ 1 one calls the corresponding map T : L p (µ) → L p (µ) defined by (T f )(x) := f (T x) the Koopman operator ; the Koopman operator is a linear isometry.
A measure-preserving system (X, µ, T ) is called ergodic if for measurable sets . For the Koopman operator T this means that the space of T -invariant functions Fix(T ) consists of constant functions only, i.e., dim Fix(T ) = 1. Moreover, (X, µ, T ) is called totally ergodic if T n is ergodic for every n ∈ N. The equivalent spectral condition is that the Koopman operator T does not have any rational eigenvalue on the unit circle other than 1 and dim Fix(T ) = 1.
For the basic theory of measure-preserving transformations we refer to any book on ergodic theory, e.g., to Walters [50], Petersen [47] or [24].

3.
Outline. In this section, we outline the main argument, and explain how the proofs of Theorems A, B and C can be reduced to Fourier analysis. We also discuss some examples, showing where our methods are (or are not) applicable.
When it comes to L 2 -convergence, the reduction is rather straightforward. Indeed, if (X, µ, T ) is a measure-preserving system and f ∈ L 2 (µ), then by the Spectral Theorem the space spanned by T n f for n ∈ N 0 can be identified with a subspace of L 2 (S, ν f ) for some measure ν f on the complex unit circle S = {z ∈ C | |z| = 1} through a map induced by T n f → z n . (1) Then for any totally ergodic measure-preserving system (X, µ, T ), and f ∈ L 2 (µ) Proof. This is a standard application of the Spectral Theorem.
Hence, Theorem A will follow as soon as we can prove that (1) holds for any balanced automatic sequence a : N 0 → C and polynomial sequence p : N 0 → N 0 . This is carried out in Section 5.
For pointwise convergence, more precise estimates are needed. Note also that for this part we restrict our attention to the case when p is a polynomial.
Then, for every measure-preserving system (X, µ, T ) and every f ∈ L r (µ), r > 1, there exists a set X ⊂ X with µ(X ) = 1, such that for any x ∈ X . If p is linear, then (4) holds for every f ∈ L 1 (µ).
Proof. For a proof, see Corollary 2 in [28]. This also follows by an adaptation of the proof of Proposition 3.1 in [27].
Note that the restriction r > 1 in Proposition 3.2 is due to the fact that the pointwise ergodic theorem (and hence also the maximal inequality which is crucial for the proof) for nonlinear polynomials fails in L 1 in general, see Buczolich, Mauldin [13] and La Victoire [35].
As before, it follows that in order to prove Theorems B and C, it will suffice to verify that condition (3) holds for the sequence a and the appropriate class of polynomial sequences. This is carried out in Sections 8 and 9, respectively.
As an example, we consider the Thue-Morse sequence, given by t(n) = (−1) s2(n) , where s 2 (n) denotes the sum of digits of n base 2. We use log to denote logarithm base 2. The following lemma with a superior value of c = 1 − log 3/ log 4 can be found in [30], but we present the following argument as a source of motivation.
, and note that the identity

It follows by induction that
4 . Now, for arbitrary N , note that [N ] can be decomposed into a disjoint union of intervals I j taking the form I j = [m j 2 lj , (m j + 1)2 lj ) where m i , l i ∈ N 0 and l 1 > l 2 > l 3 > . . . . Such decomposition can be constructed greedily, taking in each step the largest possible value of l i . Then, As a consequence, the conclusion of Theorem C (hence also A and B) holds for the Thue-Morse sequence. (The Thue-Morse sequence is invertible and totally balanced, see Sec. 9.) It is natural to ask about the degree to which our results can be extended. Theorem A deals with arbitrary automatic sequences. The assumption that X f dµ = 0 cannot be relaxed, since automatic sequences need not be Cesàro convergent. (Note, however, that for Cesàro convergent sequences this condition is irrelevant since we can replace f with f − X f dµ.) For similar reasons, the assumption of total ergodicity cannot be dropped (pick f with T q f = f and pick p such that q|p(n) for all n).
Theorem B cannot be extended to all automatic sequences for similar reasons, as shown in the following proposition. Here, f, g is shorthand for X f (x)ḡ(x)dµ(x). for n ≥ 1, so that a(n) = 1 if the length of n base 2 is odd, and a(n) = 0 otherwise. Then a is 2-automatic and Cesàro divergent. Moreover, for every measurepreserving system (X, µ, T ) and every f ∈ L 1 (µ), the following assertions are equivalent: converge to 0 almost everywhere.
Proof. It is straightforward to construct a 2-automaton which produces a(n); it is enough to use two states as outlined below (the initial state is s 0 and the output at Let (X, µ, T ) be a system and let f ∈ L 1 (µ) satisfy (i). It follows from the von Neumann decomposition that we may decompose and the classical maximal inequality, the set of functions for which the averages (6) converge a.e. is closed in L 1 (µ). So we can assume that f = h − T h for some h ∈ L ∞ (µ). Moreover, we can assume without loss of generality that h ∞ ≤ 1. Since a(n) is constant on the interval blocks between any two consecutive powers of 2 we obtain Conversely, assume that (ii) holds and that f, g = 0 for some g ∈ Fix(T ) ∩ L ∞ (µ). Then the averages In Theorem C we impose a relatively strong condition of invertibility. To the best of our knowledge, the analogous result might hold for a wider class of sequences; in particular it is possible that the same statement holds for totally balanced sequences. However, our proof does not extend to such sequences because of the use of van der Corput lemma. See Section 9 for further discussion.

Preliminaries.
In this section, we discuss some basic lemmas which will be useful in subsequent sections. The main new insight is that if α ∈ R and β n ∈ R takes finitely many values then the average En<N e(nα + β n ) cannot be close to 1 for large N , unless α is very well approximable by rationals. where The following fact is well-known, for instance it is a very special case of the Quantitative Kronecker Theorem in [31].
There exists a constant C such that for any α ∈ R, N ∈ N and δ with 0 < δ < 1/2, one of the following holds: i. the sequence (nα mod 1) n<N is δ-equidistributed in R/Z; or ii. there exists p q ∈ Q with 0 < q < 1/δ C such that α − p q < 1/(δ C N ).
The following elementary fact will be useful on several occasions.

TANJA EISNER AND JAKUB KONIECZNY
Another result which we will extensively use is the classical van der Corput inequality. One of its many formulation is the following (see e.g. [49,Lemma 1.4.3]).
Lemma 4.5 (van der Corput inequality). Let x : N 0 → C be a sequence with |x(n)| ≤ 1 for all n. Then for any H, N ∈ N with H < N we have All implicit constants are absolute (i.e. do not depend on x, N and H).
As an immediate consequence, we note that (with the above notation) to prove that En<N x(n) → 0 as N → ∞, it suffices to prove that for all h except for a set of density 0 we have En<N x(n + h)x(n) → 0 as N → ∞.

5.
Mean convergence. In this section we finish the proof of Theorem A. The main technical tool used for this purpose is the following proposition. We could derive this result directly from earlier work by Mauduit [42, Thm. 1], but we present an independent argument which motivates the approach we take in subsequent sections when proving Theorems B and C. For yet another approach, see Remark 5.3.
Proof. For an automatic sequence a : Σ * k → C, denote Our first goal is to prove that A(a) = 0 for any choice of a. Observe that Fixing the value of l and sending L to infinity, we conclude that Using Corollary 4.4 with r = |N k (a)|, and letting l be sufficiently large that condition (ii) in Corollary 4.4 does not hold, we obtain where c = 1/ 6 N k (a) 2 > 0. Letting α vary and repeating the same argument for all b ∈ N k (a), we conclude that which is only possible when A(a) = max b∈N k (a) A(b) = 0. In particular, we conclude that (11) holds for N restricted to powers of k.
We now proceed to prove (11) for arbitrary N . Assume that a(u) ignores leading 0's, and identify it with a sequence N 0 → C. For any N , take L = 9 10 log k N and Corollary 5.2. Let a : N 0 → C be a k-automatic sequence, and let p ∈ R[x] be a polynomial with at least one irrational coefficient other than the constant term. Then lim Proof. Splitting [N ] into a union of arithmetic progressions, we may assume that the leading coefficient of p is irrational. We proceed by induction on deg p. The case when deg p = 1 follows easily from Proposition 5.1. If deg p ≥ 2, then using the van der Corput Lemma, it will suffice to verify that for each h ∈ N, Proof of Theorem A. Immediate from Corollary 5.2 and Propositon 3.1.
6. Growth rate of partial sums. In this section we prove a lemma describing possible growth of partial sums of automatic sequences. Questions of this type have been extensively studied and our estimate is rather standard, but we provide a detailed proof for the convenience of the reader.
To provide context, let us introduce the notion of a k-regular sequence (first put forward in [1]). A sequence a : N 0 → Z is said to be k-regular if N k (a) is a finitely generated Z-module; the same definition makes sense with other domains in place of Z. Any k-automatic sequence is automatically k-regular, and it can be shown that conversely a finitely valued k-regular sequence is k-automatic [2, Theorem 16.1.5]. Moreover, if a(n) is a k-regular sequence, then the sequence of partial sums (Σa)(n) = m<n a(m) is again k-regular. In particular, partial sums of automatic sequences are regular.
In [7], Bell, Coons and Hare showed that if a(n) is an unbounded regular sequence then |a(n)| log n for infinitely many n. A more precise description was obtained by the same authors in [8]: lim sup n→∞ log |a(n)| / log log n ∈ N ∪ {∞}. Related results are also obtained in [19,20]. Proof. Fix the choice of a, and assume it is generated by an automaton which ignores leading 0's. Our first goal is to show that there exists c > 0 such that for all b ∈ N k (a) we have the bound Because N k (a) is finite, it suffices to prove the claim for a single b ∈ N k (a).
Using automaticity of b(u), we obtain a linear recurrence s b (L + 1) = d∈N k (a) α b,d s d (L) for some coefficients α b,d ∈ R ≥0 . Since a recursive sequence tending to 0 tends to 0 at an exponential rate, we obtain (14).
We have thus proved the desired bound for N = k L . For general N , pick L = Proof. Immediate application of Proposition 6.1 to a(n)b(n).

7.
Aperiodicity. Even though a totally balanced sequence a(n) is guaranteed to have mean 0 along any arithmetic subsequence in the sense that it is by no means guaranteed that various sequences a(qn + r) (0 ≤ r < q) obtained by restricting a(n) to arithmetic subsequences are in any way related. For instance, the sequence a(n) = t(n)(1 − (−1) n )/2 is clearly 2-automatic, and it is not hard to verify that it is totally balanced. It is also easy to notice that a(2n) = t(n) while a(2n + 1) = 0. A much closer relation exists between the sequences t(qn + r) for fixed q, r with 0 ≤ r < q. Hence, in proving Theorem B for the sequence a(n) it will be more convenient to work instead with a(2n) and a(2n+1) independently. The goal of this section is to obtain a similar decomposition for a general totally balanced automatic sequence.
For a k-automaton A = (S, s 0 , δ, τ ) we may consider the frequencies which may or may not exist. More generally, for q ∈ N, r ∈ N 0 with 0 ≤ r < q, let π(s, s ; r(q)) = lim If s = s 0 , we simply write π(s ) or π(s , r(q)). Note that these quantities depend only on (S, s 0 , δ), which we will call a k-automaton without output. For lack of a better phrase, we shall say that an automaton without output A = (S, s 0 , δ) is strongly aperiodic if for each q ∈ N, for each r ∈ N 0 with r < q and s ∈ S, the frequencies π(s, r(q)) exist and are equal to π(s). (To motivate this piece of nomenclature, note, in particular, that a non-constant automatic sequence produced by a strongly aperiodic automaton does not become periodic even after restricting to an arithmetic progression; hence strong aperiodicity can be seen as a far reaching strengthening of the property of not being periodic. This property can also be viewed as an analogue of the notion of aperiodicity for graphs.) We will always assume that all states s ∈ S are reachable from the initial state s 0 . Under this assumption, if A is aperiodic then it is also strongly connected. Note that for a strongly connected automaton, changing the initial state does not alter strong aperiodicity; we will say that a strongly connected automaton (S, δ) without a distinguished initial state is strongly aperiodic if (S, s 0 , δ) is aperiodic for some (all) s 0 ∈ S.
It is relevant to the study of aperiodic behaviour of an automaton A to know what the possible values associated with the cycles are. Let A = (S, δ) be an automaton without output nor initial state. For s ∈ S, we will consider the sets and put d A,s = gcd(D A,s ). Lemma 7.1. Fix k ≥ 2. Let A = (S, δ) be a strongly connected k-automaton without output and initial state. Then d A,s does not depend on s and is coprime to k.
We will denote the common value of d A,s by d A .
Proof. To verify that k is coprime to d s,A , take two u, v ∈ Σ * k such that u ends with 0, v ends with 1, and δ(s, u) = δ(s, v) = s. Such u, v exist because A is strongly connected. Replacing u, v with u |v| and v |u| , we may assume that |u| = |v|. Hence, [u] k − [v] k ∈ D A,s and is coprime to k, and d A,s is coprime to k.
To verify that d A,s is independent of s, let us fix first some s, s ∈ S. Pick x, y ∈ Σ * k such that δ(s, x) = s and δ(s , y) = s. Then, for any u, v as in definition of D A,s we have that Hence, there exists m = m(s, s ) such that d A,s | k m d A,s . Since d A,s is coprime to k, d A,s | d A,s . By symmetry, d A,s = d A,s . Proposition 7.2. Fix k ≥ 2. Let A = (S, δ) be a strongly connected automaton (without output and initial state). Suppose that d A = 1 and that there exists a state s ∈ S such that δ(s, 0) = s. Then A is strongly aperiodic.
Proof. We wish to show that π (s, s ; r(q)) = π(s ) for any r, q with 0 ≤ r < q and any s, s ∈ S, and both quantities exist.
Note that if the claim holds for a given value of q, then it also holds for any q which divides q. Hence, we may assume without loss of generality that q = q 0 q 1 where q 0 = k m and q 1 = k l − 1 for some l, m ∈ N. Note further that for any r with 0 ≤ r < q there exist some w ∈ Σ m k and r 1 with 0 ≤ r 1 < q 1 such that for all s, s ∈ S we have π (s, s ; r(q)) = π (δ(s, w), s ; r 1 (q 1 )) (in the sense that if one of the quantities is defined then so is the other, and if so they are equal). Hence, we may without loss of generality assume that q 0 = 1, whence q = k l − 1. Fix the value of q from now on.
We will consider the random walk W on the vertex set S × Z/qZ, where at each step we select u ∈ Σ l k randomly and pass from (s, n) to (δ(s, u), n + [u] k ). Note that a sequence u = u t−1 u t−2 . . . u 0 ∈ Σ l k t Σ tl k gives rise to a path from (s, n) to (s , n + r) if and only if δ(s, u) = s and [u] k ≡ r mod q.
In order to apply the Perron-Frobenius theorem to W, we need to verify that it is aperiodic (in the sense that the greatest common divisor of all cycles is 1) and strongly connected (in the sense that there exists a path from any vertex (s, r) to any other vertex (s , r )). The former condition is clear because for any s ∈ S such that δ(s, 0) = s, we have a loop in W at (s, 0) corresponding to taking u = 0.
We now proceed to prove strong connectedness. Note that a path from a vertex (s, r) to a vertex (s , r ) exists if and only if there exists a path from the vertex (s, 0) to (s , r − r). Hence, for a fixed choice of s ∈ S, the set I s of r ∈ Z/qZ such that there exists a path from (s, 0) to (s, r) is a subgroup of Z/qZ.
We will show that actually I s = Z/qZ. Pick any n ∈ D A,s and let u, v ∈ Σ * k be two words with δ(s, u) = δ(s, v) = s and |u| = |v| with n = [u] k − [v] k . Let w 0 = u l and w 1 = u l−1 v. By construction, there exists a path from (s, 0) to (s, [w 0 ] k ) and (s, If follows that 1 = d A ∈ I s and I s = Z/qZ, as claimed. To prove that W is strongly connected, it will now suffice to verify that for each s, s ∈ S, there exists a path from (s, 0) to (s, r ) for some choice of r . This is equivalent to the statement that for any s, s ∈ S, there exists u ∈ Σ * k with l | |u| and δ(s, u) = s . If either δ(s, 0) = s or δ(s , 0) = s , then it is enough to pick any v ∈ Σ * k with δ(s, v) = s and set u = |v| 0 (l−1)|v| (or u = 0 (l−1)|v| |v|). Otherwise, pick s with δ(s , 0) = s and note that there exists a path from (s, 0) to (s , r ) and from (s , r ) to (s , r ) for some r , r ∈ Z/qZ.
It now follows from Perron-Frobenius that there exist unique limiting probabilities for the random walk W and they are independent of the starting point. Hence, the limit π (s, s ; r(q)) = lim K→∞ P (W goes from (s, q − r) to (s , 0) in K steps) exists for any r with 0 ≤ r < q and s, s ∈ S, and does not depend on r and s. Denote the common value of π (s, s , r(q)) by π (s ).
For any j with 0 ≤ j < l we may similarly compute that If follows that for any s, s and r, the limit π(s, s , r(q)) exists and equals π (s ). In particular, π(s) exists and π(s) = π (s) = π(s, s , r(q)), and thus A is strongly aperiodic.
Let a(n) be a k-automatic sequence which is produced by a k-automaton A = (S, s 0 , δ, τ ) which is strongly connected and ignores leading 0's.
Then there exist k which is a power of k and q ∈ N such that for any r with 0 ≤ r < q, the sequence a r (n) = a(qn + r) is produced by some k -automaton A r which ignores leading 0's and has the property that any of its terminal components is strongly aperiodic. Moreover, k and q depend only on (S, δ).
Proof. We begin with a reduction to a case when A has some additional favourable properties.
Let k = k l be a power of k. Then a(n) is a k -automatic sequence, and it is produced by the automaton A = (S , s 0 , δ , τ ) constructed as follows. The set of states S will be a subset of S, defined thereafter, and s 0 = s 0 , τ = τ | S . The transition function is defined for u ∈ Σ l k by δ (s, [u] k ) = δ(s, u). Finally, S is the set of states s ∈ S reachable by δ from s 0 , or equivalently the set of states which are reachable in A from s 0 by a path of length divisible by l. Note that under these definitions, for any n ∈ N, δ (s 0 , (n) k ) = δ(s 0 , 0 j (n) k ) (with natural identifications, where j is chosen so that the length of 0 j (n) k is divisible by l), whence the condition of ignoring the leading 0's ensures that A and A produce the same sequence. Moreover, A ignores the leading 0's and is strongly connected. Thus, we may freely replace k with k = k l and A with A . We will perform this replacement several times; to avoid obfuscating the notation we will reuse the same symbols k and A.
Note that the set of states S may become smaller as we change the base k. Replacing k by its power, we may assume that S has stabilised, i.e. that subsequent replacements will not decrease the set of states further. We may also assume that the action of 0 is idempotent, in the sense that δ(s, 00) = δ(s, 0) for any s (this property is preserved under the change of base). Let s 1 = δ(s 0 , 0) ∈ S (change of base will not affect s 1 ). Because any two paths from s 1 to s 1 can be padded by an arbitrary number of 0's, the set D A,s1 does not change when the base is changed, and any subsequent change of base does not change d A . We put q = d A . We may assume that k is much larger than q, and in particular that δ(s, 0j) = δ(s, j) for any j with 0 ≤ j < q and any s ∈ S.

TANJA EISNER AND JAKUB KONIECZNY
We next construct the automata A r = (R, s r , δ , τ ) which produce the sequences a r (n) = a(qn + r). They are defined as follows: δ(s, m)).
(In the last line, note that [q] may be considered as a subset of Σ k .) It is routine to check that R is preserved under δ(·, j) for j ∈ Σ k , that A r ignores the leading 0's, and that δ (s , 00) = δ (s , 0) for all s ∈ R. Moreover, an inductive argument shows that for any n ∈ N, we have qn + r k t , t = log k n + 1.
Note that there is no guarantee that A r is strongly connected for any choice of r, or even that all states s ∈ R are reachable from s r . We will show that every strongly connected component of A r is strongly aperiodic.
Proof of Theorem B, assuming Proposition 8.1. Immediate by Proposition 3.2.
We devote the remainder of this section to proving this Proposition 8.1. To begin with, we reduce to the case when N is a power of k.
k −cL (16) for some constant c > 0. Then it also holds that where c = 2 3 c. Proof. Let N be a large number, and put L := log k N . Observe In order to use Fourier analysis, we replace 1 [N ] by its smoothed version. Fix ε > 0 (independent of N , to be determined later), and take M = k L− εL . It will be convenient to do Fourier analysis in the finite group Z/k L Z, with which the interval [k L ] can naturally be identified. We will approximate 1 [N ] with In order to estimate the right hand side of (19), expand where we identify Z/k L Z with k −L Z mod 1. We may now estimate, using (16): Using the Cauchy-Schwarz inequality and Parseval's equality we find Combining the above bounds, we find that Choosing ε = 2c/3 we obtain the claim.
We will derive Proposition 8.1 from the following more technical result.
Let (S, δ) be a strongly aperiodic k-automaton without output and initial state, and let M be the set of totally balanced sequences produced by a k-automaton (S, s 0 , δ, τ ) for some initial state s 0 ∈ S and output τ : S → {z ∈ C | |z| ≤ 1}. Then, there exists c > 0 (depending only on (S, δ)) such that Proof of Proposition 8.1 assuming Proposition 8.3. Let a be a C-valued totally balanced k-automatic sequence, and let N be a large integer. Or goal is to prove that the estimate (15) holds. The reduction will consist of several steps. Let A = (S, s 0 , δ, τ ) be a k-automaton producing a, and assume without loss of generality that A ignores the leading 0's.
Step 1. It suffices to prove the assertion for all A which are strongly connected and ignore leading 0's.
Proof. Take L = (log k N )/2 . We may estimate If a is produced by A = (S, δ, s 0 , τ ), then for any m with 0 ≤ m < k L , the sequence a m given by a m (n) = a(k L n + m) is produced by the automaton A m = (S, δ, s m , τ ), where s m = δ(s 0 , (m) k ). The proportion of m < k L such that s m fails to belong to a strongly connected component is k −c1L for some constant c 1 > 0 (depending only on (S, δ)). Note that for any m with 0 ≤ m < k L , the sequence a m is totally balanced, the automaton A m ignores leading 0's, and if s m lies in a strongly connected component then A m is strongly connected.
Suppose that we already know that (15) holds with c = c sc for those of A m which are strongly connected. Applying this estimate where applicable (and estimating the remaining summands trivially by 1) we conclude that (15) holds for A with c = min(c 1 , c sc /2) > 0.
Step 2. It suffices to prove the assertion for all A with the property that each strongly connected component of A is strongly aperiodic and ignores the leading 0's (but A may not be strongly connected).
Proof. By Proposition 7.3, there exist q ∈ N and k (depending only on (S, δ)) such that each of the sequences a r (n) = a(qn + r) for r with 0 ≤ r < q is produced by a k -automaton all of whose strongly connected components are aperiodic. Note that for r with 0 ≤ r < q, the sequences a r are again totally balanced.
Using the analogue of (22), and the argument reminiscent of that in Step 1, we conclude that it is enough to prove the claim for automata with strongly aperiodic strongly connected components.
Step 3. Without loss of generality, A is strongly connected, strongly-antiperiodic and ignores the leading 0's.
Proof. Assume, as we may, that each strongly connected component of A is strongly aperiodic, and apply the same reduction as in Step 1. Note that, with notation as in Step 1, each of A m has the property that each of its strongly connected components is strongly aperiodic. Hence, if (15) holds for strongly connected and strongly aperiodic automata with constant c = c sa , then it holds for A with c = min(c 1 , c sa /2).
Step 4. Without loss of generality, A is strongly connected, strongly-antiperiodic, ignores the leading 0's, and N is a power of k.
Proof. It follows from Lemma 8.2 that if (15) holds for N which are powers of k with c = c pow , then it holds for general N with c = 2c pow /3.
The remainder of the claim follows directly from Proposition 8.3 applied to the partial automaton (S, δ).
Proof of Proposition 8.3. Note that M is compact (in the · ∞ topology) and closed under the operations of taking kernels. Denote for a ∈ M, α ∈ R and L ≥ 0 the corresponding averages where the sets S l a,b are given by Note that for given a ∈ M and l ≥ 0, the sets S l a,b for b ∈ M are a partition of Σ l k , and S l a,b = ∅ unless b ∈ N k (a). Hence, the sum in (24) is really finite for each a ∈ M.
We aim to recursively exploit (24) to obtain (21). In fact, depending on Diophantine properties of α and the value of L, we will use one of several estimates, with different values of l. Let Q be the constant from Corollary 4.4 applied with r = |S|. Note that Q may be replaced by any larger number; in particular we may assume without loss of generality that Q ≥ k.
Trivial estimate. Let α ∈ R and l with 0 ≤ l ≤ L be arbitrary. Then Proof. This is an immediate consequence of (24).
Proof. This is an immediate consequence of (24) and Corollary 4.4 applied for each Major arc estimate. There exist constants c maj and L 0 (depending only on A) such that the following holds. Let α ∈ R and l with L 0 ≤ l ≤ L. Suppose that there exist p, q with 0 < p < q ≤ Q such that α − p q < 1/k l . Then Proof. Fix any choice of a ∈ M, b ∈ N k (a). In order to prove the claim, it will suffice to estimate the term appearing in (24). Let g : Σ * k → {0, 1} be the sequence given by Thus, for v ∈ Σ l k we have v ∈ S l a,b precisely when g(v) = 1. We further note that g is generated by an automaton with states and transition function (S, δ). Indeed, suppose that a is produced by the automaton (S, s 0 , δ, τ ) for some choice of s 0 ∈ S and τ : S → C. Let S 1 be the set of states s 1 ∈ S such that b is produced by the automaton (S, s 1 , δ, τ ). Then, one immediately sees that g(v) = 1 precisely when δ(s 0 , v) ∈ S 1 , so g is produced by the automaton (S, s 0 , δ, 1 S1 ).
Note that g(u)e([u] k p/q) is an automatic sequence, and it is balanced. To see this, note that we have where in the last step we use p/q ∈ Z. It follows from Proposition 6.1 that there exists a constant c > 0 such that A priori, the value of c and of the implicit constant in (26) depend on g; however only finitely many choices of g are possible (since g is fully determined by S 1 ⊂ S), hence we may choose these constants uniformly. By the same token (possibly after changing the value of c), we have for each v ∈ Σ * k the bound where c and the implicit constant do not depend on v.

AUTOMATIC SEQUENCES AS GOOD WEIGHTS 4107
Using (27), we may now etimate (24). Take m = l(1 − 2c maj ). We find for a constant C, assuming (as we may) that c maj is chosen small enough with respect to c that c(1 − 2c maj ) > 2c maj . Taking L 0 sufficiently large that k cmajL0 > C we conclude that We now proceed to prove (21). Take any L ≥ 0 and α ∈ R. If α R/Z < k −L/10 then (21) follows from the estimate obtained for the neighbourhood of 0, so suppose this is not the case. Thus, there is some l with 0 ≤ l ≤ L/10 such that k l α R/Z ≥ 1/k, and by the trivial estimate we have A(L, α) ≤ A(L − l, k l α ).
If the estimate (21) holds for A(L − l, k l α ), then it also holds for A(L, α) (with the constant c smaller by the factor of 9/10). Hence, replacing α with {k l α} and L with L − l, we may now assume that α R/Z ≥ 1/k. We now combine the obtained estimates to obtain an inductive step.
Combined estimate. There exist constants c cmb > 0 and L 1 ≥ 0 such that for any α ∈ R with α R/Z ≥ 1/k and any L ≥ L 1 , there exists l with 0 < l ≤ L such that A(L, α) ≤ k −c cmbl A(L − l, k l α) and if l = L then k l α R/Z ≥ 1/k. Proof. To begin with, we branch off into cases depending on the length of the longest string of 0's in the first L 1 digits of α base k.
Suppose that there is a string of ≥ L 1 /2 consecutive 0's in the initial L 1 digits of α. This means that there is some l 1 ≤ L 1 /2 and a digit j with 0 ≤ j < k such that k l1 α − j k R/Z ≤ k −L1/2 . Using the trivial estimate we have A(L, α) ≤ A(L − l 1 , k l1 α). Pick the largest l 2 ≤ L such that k l1 α − j k R/Z ≤ k −l2 ; note that l 2 ≥ L 1 /2. Recall that Q ≥ k. Hence, the major arc estimate can be used to estimate A(L − l 1 , k l1 α) ≤ k −cmajl2 A(L − l 1 − l 2 , k l1+l2 α).
By the choice of l 2 we have, k l1+l2 α R/Z ≥ 1/k or l 2 = L − l 1 . Hence, the claim holds with l = l 1 + l 2 (as long as c cmb ≤ c maj /2).
Suppose now that there is no string of 0's of length ≥ L 1 /2. Again, there are two cases to consider, depending on whether there exist p, q with 0 ≤ p < q ≤ Q such that α − p If no, then we may apply the minor arc estimate with l 1 = L 1 /2 to obtain (as long as c cmb is small enough that 1 − 1 6|S| 2 ≤ k −c cmb L1 ). Otherwise, the major arc estimate is applicable with l 1 = L 1 /2 − log k Q ≥ L 1 /3 ≥ L 0 , provided that L 1 is chosen large enough that L 1 ≥ max(6 log k Q , 3L 0 ). We obtain the same estimate (as long as c cmb is small enough that c cmb ≤ c maj /3).
In either case, let l 2 with 0 ≤ l 2 ≤ L 1 − l 1 be the least integer such that k l1+l2 α R/Z ≥ 1/k. Such an l 2 exists, because α is assumed to have at least one non-zero digit at positions between l 1 + 1 and l 1 + L 1 /2 ≤ L 1 . Applying the trivial bound with l 2 to the right hand side of (29) or (30) respectively, we thus so the claim holds with l = l 1 + l 2 .
Iterating the combined estimate we have just obtained gives (21) (with c = c cmb ) by a simple inductive argument. 9. Invertible sequences. In this section, we deal with invertible automatic sequences, and show that, in a quantitative sense, they cannot correlate with polynomial phases. As discussed in Section 3, once this is accomplished, Theorem C will follow.
A k-automatic sequence a : N 0 → Ω is invertible if it is generated by an automaton A = (S, s 0 , δ, τ ) such that for each j ∈ Σ k , the map δ(·, j) : S → S is invertible [18]. A generalised Thue-Morse sequence taking values in a finite group G is a kautomatic sequence g : Σ * k → G such that g(uv) = g(u)g(v) for any u, v ∈ Σ * k , and g(0) = id G . Note that g(n) is then characterised by g(1), . . . , g(k − 1). Invertible sequences are now precisely the ones of the form a(n) = π(g(n)), where g(n) ∈ G is a generalised Thue-Morse sequence, and π : G → Ω is any function (see [18] for further discussion).
Lemma 9.1. If a 1 , a 2 are invertible k-automatic sequences, then so is (a 1 , a 2 ). In particular, the family of C-valued invertible k-automatic sequences is a ring.
The following is a direct consequence of [18,Theorem 3]. A sequence a(n) is d-periodic if a(n + d) = a(n) for all n; we do not require that d should be the least period.
Proposition 9.2. Fix k ≥ 2. Let a : N 0 → C be an invertible k-automatic sequence. Then, there exists a decomposition a(n) = a per (n) + a bal (n), where a per (n) is (k − 1)-periodic and a bal (n) is totally balanced (and invertible).
Theorem C reduces to the following being the main result of this section. Proposition 9.3. Fix k ≥ 2 and d ∈ N. Let a : N 0 → Ω ⊂ C be a totally balanced invertible k-automatic sequence. Then there exists a constant c > 0 such that Proof of Theorem C, assuming Proposition 9.3. Immediate by Proposition 3.2.
The following technical lemma will be crucial in the argument. Recall that ∆ h p(n) := p(n + h) − p(n).
Then, there exists a decomposition p(x) = p(x) + r(x), where e(r(n)) is periodic with period H O(1) M O(1) and p(n) is approximately linear in the sense that for each n 0 ∈ Z there exist β 0 (n 0 ), β 1 (n 0 ) ∈ R such that for all n ∈ Z we have where the implicit constants depend on d only. In fact, β 0 (n) = p(n) and β 1 (n) = ∆ 1 p(n).
Proof. Fix a choice of h and m. Let us expand It follows from the Quantitative Weyl Theorem (see e.g. [31,Prop. 4.4]) and (32) that there exists 0 < Q 1/ε O(1) such that Note that γ i and α i are related by a linear relation of the form  where all the hidden coefficients are polynomials in M, h and m with total degree bounded in terms of d. . As for p(n), for any n ∈ Z find that Fix a choice of n 0 , and put β 0 = p(n 0 ) and β 1 = ∆ 1 p(n 0 ). A simple inductive argument shows that for all n we have Another ingredient which we will need is some observations about co-kernels of invertible sequences. Pick a k-automatic invertible sequence a : Σ * k → Ω. Recall that a has a representation a = π • g where g : Σ * k → G is a generalised Thue-Morse sequence taking values in a group G and π : G → Ω. In particular, a ignores the leading 0's. There is a natural choice of an automaton A = (S, s 0 , δ, τ ) producing a, namely S = G, s 0 = h, δ(s, j) = g(j)s, τ (g) = π(gh −1 ), where h ∈ G (one may take h = id G for concreteness). Conversely, any automaton of the above form is invertible.
It is clear from the above description that if a : Σ * k → Ω is an invertible kautomatic sequence, then any sequence b ∈ N k (a) ∪ N k (a) is also invertible.
Note also that any invertible sequence automatically ignores the leading 0's. Hence, we may freely identify invertible sequences N 0 → Ω with invertible sequences Σ * k → Ω. In particular, if a : N 0 → Ω is an invertible sequence them it makes sense to consider N k (a), and any b ∈ N k (a) may naturally be viewed as a sequence N 0 → Ω. Lemma 9.5. Let a : N 0 → C be an invertible and totally balanced k-automatic sequence, and let b ∈ N k (a). Then b is totally balanced.
Proof. Fix q ∈ N; we will show that any b ∈ N k (a) does not correlate with q-periodic sequences. Denote For any N , we may partition [N ] into disjoint intervals of the form I m,l = [mk l , (m+ 1)k l ) where each value of l appears at most k − 1 times. It follows that for any b ∈ N k (a) and any r with 0 ≤ r < q we have Hence, it will suffice to prove that A(L) → 0 as L → ∞. Fix b ∈ N k (a). There exists some v ∈ Σ * k such that b(u) = a(vu) for all u ∈ Σ * k . Hence max 0≤r<q E n<k L b(n)e(nr/q) = max a(n)e(nr/q) → 0 as L → ∞, because a is totally balanced.
Proof of Proposition 9.3. We proceed by induction on d. The case d = 1 follows from Proposition 8.1. Fix d ∈ N and assume that the claim holds for d − 1.
Using the analogue of Lemma 8.2, we may assume that N = k L for some L. (In fact, we may assume also that L is divisible by a specified large integer D in order to ensure later in the argument that various small multiples of L are also integers.) Let p ∈ R[x] with deg p = d and assume without loss of generality that p(0) = 0. The implicit (and explicit) constants below are independent of p.
We begin by using the van der Corput inequality 4.5 with H = k δL , where δ > 0 is a small constant to be determined later, and we assume for the sake of clarity that δL is an integer. We obtain sequences. As the following example shows, even for relatively simple sequences, direct generalisations fail.
In particular, the methods in the proof of Proposition 9.3 cannot be directly applied to the sequence t(n + 1).
Remark. The only point where the proof of Proposition 9.3 essentially uses the assumption that the sequence a(n) is invertible is to ensure that any sequence b(n) in the (multiplicative) group generated by N k (a) has a decomposition b(n) = b per (n) + b bal (n) into a periodic and totally balanced part. We believe that similar results can be proven for any class of automatic sequences having this property.