Automatic sequences are orthogonal to aperiodic multiplicative functions

Given a finite alphabet $\mathbb{A}$ and a primitive substitution $\theta:\mathbb{A}\to\mathbb{A}^\lambda$ (of constant length $\lambda$), let $(X_\theta,S)$ denote the corresponding dynamical system, where $X_{\theta}$ is the closure of the orbit via the left shift $S$ of a fixed point of the natural extension of $\theta$ to a self-map of $\mathbb{A}^{\mathbb{Z}}$. The main result of the paper is that all continuous observables in $X_{\theta}$ are orthogonal to any bounded, aperiodic, multiplicative function $\mathbf{u}:\mathbb{N}\to\mathbb{C}$, i.e. \[ \lim_{N\to\infty}\frac1N\sum_{n\leq N}f(S^nx)\mathbf{u}(n)=0\] for all $f\in C(X_{\theta})$ and $x\in X_{\theta}$. In particular, each primitive automatic sequence, that is, a sequence read by a primitive finite automaton, is orthogonal to any bounded, aperiodic, multiplicative function.

Introduction. Throughout the paper, by an automatic sequence (a n ) n≥0 ⊂ C, we mean a continuous observable in a substitutional system (X θ , S), i.e. a n = f (S n x), n ≥ 0, for some f ∈ C(X θ ) and x ∈ X θ . 1 Here, we assume that θ : A → A λ is a substitution of constant length λ ∈ N (over the alphabet A), and we let (X θ , S) denote the corresponding subshift of the full shift (A Z , S) (see Remark 2 and Section 2 for details).
By µ : N → {−1, 0, 1} we denote the classical Möbius function: µ(p 1 . . . p k ) = (−1) k for different primes p 1 , . . . , p k , µ(1) = 1 and µ(n) = 0 whenever n is divisible by a square greater than 1. In connection with the celebrated Sarnak's conjecture [39] on Möbius orthogonality of zero entropy systems, i.e. for each zero entropy dynamical system (X, S), all f ∈ C(X) and x ∈ X (for more information on the subject, see the survey article [12]), it has been proved in [37] that all automatic sequences (a n ) satisfy (1). This triggers the question whether (1) remains true if we replace µ by another arithmetic function. The Möbius function is an example of an arithmetic function which is multiplicative (µ(mn) = µ(m)µ(n) whenever m, n are coprime), hence, it is natural to ask whether automatic sequences are orthogonal to each zero mean, 2 bounded, multiplicative function u. That said, one realizes immediately that the answer to such a question is negative as periodic functions are automatic sequences and there are many examples of periodic, multiplicative functions. 3 Besides, even amongst non-periodic automatic sequences there are examples of (completely) multiplicative, zero mean functions [2], [40], [43], see also recent [24], [26] for a significant progress on characterizing such sequences. On the other hand, it has been proved in [13] that many automatic sequences given by so-called bijective substitutions are orthogonal to all zero mean, bounded, multiplicative functions (in [9], it is proved that they are orthogonal to the Möbius function). A stronger requirement than zero mean of u which one can consider is that of aperiodicity, that is for each a, b ∈ N, i.e. u has a mean, equal to zero, along each arithmetic progression. Many classical multiplicative functions are aperiodic, e.g. µ or the Liouville function λ: λ(n) = (−1) Ω(n) , where Ω(n) denotes the number of prime divisors (counted with multiplicities) of n.
The aim of the present paper is to prove the following (h(θ) and c(θ) stand, respectively, for the height and the column number of the substitution θ, see Section 2): Theorem 0.1. Let θ be a primitive substitution of constant length λ. Then, each automatic sequence a n = f (S n x), n ≥ 0, in (X θ , S) is orthogonal to any bounded, aperiodic, multiplicative function u : N → C, i.e. lim N →∞ 1 N n≤N a n u(n) = 0. (2) More precisely: (i) If c(θ) = h(θ) then each automatic sequence (f (S n x)) n≥0 is orthogonal to any bounded, aperiodic, arithmetic function u. 4 (ii) If c(θ) > h(θ) then the automatic sequences (f (S n x)) n≥0 for which the spectral measure of f is continuous are orthogonal to all bounded, multiplicative functions. If the spectral measure is discrete then (i) applies. All other automatic sequences in (X θ , S) are orthogonal to all bounded, aperiodic, multiplicative functions.
We have already mentioned that examples of automatic sequences which are multiplicative functions are known, but they are quite special. For example, in [40], 2 Recall that a sequence u : N → C has zero mean if M (u) := lim N →∞ 1 N n≤N u(n) exists and equals zero. 3 Indeed, examples of periodic multiplicative functions are given by: Dirichlet characters, or n → (−1) n+1 . 4 We emphasize that no multiplicativity on u is required.
it is proved that completely multiplicative, never vanishing functions "produced" by finite automata are limits in the upper density of periodic sequences, that is, they are Besicovitch rationally almost periodic. 5 In fact, we can strengthen this result by showing that they are even Weyl rationally almost periodic which is a consequence of Theorem 0.1: Corollary 1. All multiplicative, automatic sequences produced by primitive automata 6 are Weyl rationally almost periodic. Furthermore, the automaton/substitution that produces such an automatic sequence can be chosen to have equal column number and height.
The main problem in this paper belongs to number theory and combinatorics, and the proof of Möbius orthogonality for automatic sequences in [37] relied on combinatorial properties of sequences produced by automata and an application of the (number theoretic) method of Mauduit and Rivat [33], [34]. However, the problem of orthogonality of sequences is deeply related to classical ergodic theory, namely, to Furstenberg's disjointness of dynamical systems, see [12] for an exhaustive presentation of this approach. Our strategy in this paper is to apply ergodic theory tools to address Theorem 0.1. Specifically, we make use of some old results on the centralizer of substitutional systems [22] and, more surprisingly, we find an ergodic interpretation of the combinatorial approach from [37] in terms of joinings of substitutional systems. Finally, we relativize some of the arguments from [13] to reach the goal. For example, the reason behind the orthogonality of automatic sequences to all bounded, multiplicative functions (whenever the spectral measure of an automatic sequence is continuous) in (ii) of Theorem 0.1 is that the essential centralizer of substitutional systems is finite [22]. This "small size" of the essential centralizer puts serious restrictions on possible joinings between (sufficiently large) different prime powers S p , S q of S and allows one to use the numerical DKBSZ criterion on the orthogonality of numerical sequences with bounded, multiplicative functions (see Section 1.5).
The ergodic theory approach (together with number-theoretic tools) turns out to be very effective in some attempts to prove Sarnak's conjecture. The recent results of Frantzikinakis and Host [16] show: if instead of orthogonality (1), we ask for its weaker, namely, logarithmically-averaged version then this orthogonality indeed holds for all zero entropy (X, S) systems whose set of ergodic (invariant) measures is countable: in particular, it applies to substitutional systems. Moreover, in (3), we can replace µ by any so-called strongly aperiodic 5 A sequence (bn) taking values in a finite set B is called Besicovitch rationally almost periodic if it can be approximated by periodic sequences (with values in B) in the pseudo-metric we speak about Weyl rationally almost periodic (WRAP) sequences. 6 Sequences produced by finite automata take only finitely many values and are, therefore, bounded.
The ergodic theory approach allows for a further extension of the notion of automatic sequences, which suggests one more natural question. A uniquely ergodic topological dynamical system (Y, T ) (with a unique invariant measure ν) is called MT-substitutional if there is a primitive substitution θ : A → A λ such that the measure-theoretic dynamical systems (Y, ν, T ) and (X θ , µ θ , S) are measuretheoretically isomorphic. Then, any sequence a n = g(T n y), n ≥ 0 (with g ∈ C(Y ) and y ∈ Y ), can be called MT-automatic. 7 It is natural to ask whether (2) holds not only for any automatic sequence, but more generally for any MT-automatic sequences as well.
To cope with the MT-automatic case we need to be able to control the behaviour of u on short intervals, i.e., (for each (b k ) satisfying b k+1 −b k → ∞) which is much stronger than the requirement that M (u) = 0. It turns out however that (4) is satisfied for all non-pretentious multiplicative functions, see [31], [32]; in particular, it is satisfied for the Möbius function µ.
Corollary 2. Any MT-automatic sequence (a n ) satisfies (2) for any multiplicative, bounded, aperiodic u : N → C satisfying (4). Furthermore, for each MTsubstitutional system (Y, T ) and any g ∈ C(Y ), we have 1 N n≤N g(T n y)u(n) → 0 when N → ∞ uniformly in y ∈ Y . In particular, the uniform convergence takes place in (X θ , S) for any primitive substitution θ.
Corollary 2 answers Question 43 from [1] (see Lemma 8.1 below) on the validity of Möbius orthogonality in all uniquely ergodic models of substitutional systems.
It is clear that Möbius orthogonality of a dynamical system (X, S) implies Möbius orthogonality of any topological factor of (X, S). Thus, one of the key ideas to prove Theorem 0.1, given a substitutional system (X θ , S), is to build successive continuous extensions of it, which are also given by substitutions, where for the largest extension, Theorem 0.1 will be easier to handle. 8 For the largest extension, each continuous function f on it, can be decomposed as f = f 1 + f 2 (with both f i continuous), where f 1 is highly structured implying orthogonality to all bounded aperiodic functions. While, f 2 is very unstructured and more difficult to handle. To show its orthogonality to all aperiodic multiplicative (bounded) functions, we will show the relative independence (over the common Kronecker-factor) of different primes powers of the shift, so that we make use of a numerical criterion of orthogonality (see Theorem 1.3) applicable for f 2 (S n x) for each x in the largest extension.
The structure of the paper is as follows. In Section 1 we give an overview of ergodic theory necessities for, e.g. joinings, extensions and odometers. We recall basics on (dynamical systems associated with) substitutions of constant length in Section 2. This includes a classical description of the Kronecker factor for substitions of constant length. In Section 3, for two special cases of θ, we give the proofs of Theorem 0.1. They either follow the lines of proof of orthogonality of f 1 alone or represent a simplified version of the proof of the general case of f above. In Section 4 we present some more ergodic theory prerequisites like joinings of powers of finite extensions of odometers. Moreover, we give a more detailed overview of the main ideas and the structure of the proof in Section 4.3. We finally build the aforementioned extensions of (X θ , S) in Section 5. We finish the paper by proving Theorem 0.1 and Corollary 1 in Section 6 and Corollary 2 in Section 7.
1. Ergodic theory necessities. Below, we provide some basics of ergodic theory needed for the proof of Theorem 0.1. We refer the reader to [19] for more information on ergodic theory, in particular, on the theory of joinings.
1.1. Joinings and disjointness. By a (measure-theoretic) dynamical system we mean (X, B, µ, S), where (X, B, µ) is a standard Borel probability space and S : X → X is an a.e. bijection which is bimeasurable and measure-preserving. If no confusion arises, we will speak about S itself and call it an automorphism. 9 Remark 1. Each homeomorphism S of a compact metric space X determines many (measure-theoretic) dynamical systems (X, B(X), µ, S) with µ ∈ M (X, S), where M (X, S) stands for the set of Borel probability measures on X (B(X) stands for the σ-algebra of Borel sets of X). Recall that by the Krylov-Bogolyubov theorem, M (X, S) = ∅, and moreover, M (X, S) endowed with the weak- * topology becomes a compact metrizable space. The set M (X, S) has a natural structure of a convex set (in fact, it is a Choquet simplex) and its extremal points are precisely the ergodic measures. We say that the topological system (X, S) is uniquely ergodic if it has only one invariant measure (which must be ergodic). The system (X, S) is called minimal if it does not contain a proper subsystem (equivalently, the orbit of each point is dense).
Remark 2. The basic systems considered in the paper are subshifts whose definition we now recall. Let A be a finite, nonempty set (alphabet). By a block (or word ) over A, we mean B ∈ A n (for some n ≥ 0) and n =: |B| is the length of B. Hence B = (a i0 , a i1 , . . . , a in−1 ) with a i k ∈ A for k = 0, . . . , n − 1. We will also use the following notation: We say that the (sub)block B[i, j] appears in B. The notation we have just presented has its natural extension to infinite sequences.
Given η ∈ A N , we can define X η := {x ∈ A Z : each block appearing in x appears in η}.
It is not hard to see that X η is closed and S-invariant, where S : S((x n ) n∈Z ) = (y n ) n∈Z , where y n = x n+1 , n ∈ Z.
Then the dynamical system (X η , S) is called a subshift (given by η). If η has the property that each block appearing in it reappears infinitely often (and these are the substitutional systems that are considered in this paper) then there exists η ∈ A Z satisfying: η k = η k for each k ≥ 0 and X η = {S m η : m ∈ Z}.
Assume that T ∈ Aut(Y, C, ν) is another automorphism. Then (X, B, µ, S) and (Y, C, ν, T ) are isomorphic if for some (invertible 10 Given (X, B, µ, S) and (Y, C, ν, T ), we may consider the set J(S, T ) of joinings of automorphisms S and T . (When S = T , we speak about self-joinings of S.) Namely, κ ∈ J(S, T ) if κ is an S × T -invariant probability measure on B ⊗ C with the projections µ and ν on X and Y , respectively. Note that the projections maps p X : X × Y → X, p Y : X × Y → Y settle factor maps between the dynamical systems (X × Y, B ⊗ C, κ, S × T ) and (X, B, µ, S), (Y, C, ν, T ), respectively.
If both S and T are ergodic, the subset J e (S, T ) of ergodic joinings (i.e. of those ρ ∈ J(S, T ) for which the system (X × Y, B ⊗ C, ρ, S × T ) is ergodic) is non-empty; in fact, the ergodic decomposition of a joining consists (a.e.) of ergodic joinings.
If (X, B, µ, S) and (Y, C, ν, T ) are isomorphic via W : (X, B, µ) → (Y, C, ν), then W yields the corresponding graph joining µ W ∈ J(S, T ) determined by The automorphisms S and T are called disjoint if the only joining of S and T is product measure µ ⊗ ν, i.e. J(S, T ) = {µ ⊗ ν}. We will then write S ⊥ T . Note that if S ⊥ T then at least one of these automorphisms must be ergodic.
We recall that an automorphism R ∈ Aut(Z, D, κ) has discrete spectrum if L 2 (Z, D, κ) is spanned by the eigenfunctions of the unitary operator U R : f → f • R on L 2 (Z, D, κ). Assuming ergodicity, we have: If R has discrete spectrum then each of its ergodic self-joining is graphic. (5) Each automorphism S ∈ Aut(X, B, µ) has a maximal factor which has discrete spectrum. It is called the Kronecker factor of S.
Joinings are also considered in topological dynamics (cf. Remark 12). If S i is a homeomorphism of a compact metric space X i , i = 1, 2 then each S 1 × S 2 -invariant, closed subset M ⊂ X 1 × X 2 with the full natural projections, is called a topological joining of S 1 and S 2 .
It is not hard to see that S ∈ Aut( X, B, µ) and it is called the h-discrete suspension of S (it is ergodic if and only if S is). Note that the map (x, j) → j for (x, j) ∈ X yields a factor map between S and the rotation τ h : x → x + 1 on Z/hZ. Note also that S h (x, 0) = (Sx, 0), so in fact we can view S as the h-discrete suspension of S h | X×{0} . As a matter of fact, an automorphism R ∈ Aut(Z, D, κ) is an h-discrete suspension if and only if R has the rotation τ h as a factor.
Indeed, if π settles a factor map between R and τ h , then set Finally, consider (m, h) = 1. Then, it is easy to see that: The h-discrete suspension τ m is isomorphic to the direct product τ m × τ h . (9) 1.3. Group and isometric extensions. Given (X, B, µ, S) an ergodic dynamical system, consider a measurable ϕ : X → G, where G is a compact metric group. Then ϕ defines a cocycle: is called a compact group extension of S. We obtain the dynamical system (X × G, B ⊗ B(G), µ ⊗ m G , S ϕ ) which need not be ergodic. For example, it is not ergodic when ϕ(x) = ξ(Sx) −1 ξ(x) for a measurable ξ : X → G, i.e. when ϕ is a coboundary (indeed, the map (x, g) → (x, ξ(x)g) settles an isomorphism between S ϕ and S × Id G ). Note that (S ϕ ) m (x, g) = (S m x, ϕ (m) (x)g).

Proposition 1 ([18]
). Assume that S and T are ergodic automorphisms on (X, B, µ) and (Y, C, ν), respectively. Assume moreover that S ⊥ T and S ϕ , T ψ are ergodic group extensions of S and T , respectively (ψ : Y → H). If the product measure The following results are also classical.
Lemma 1.1. Assume that S ϕ and T ψ are ergodic and let ρ ∈ J e (S ϕ , T ψ ). Then (up to a natural permutation of coordinates) Moreover, if the relatively independent extension 12 ρ of ρ is ergodic then ρ = ρ.
A group extension is a special case of so-called skew products. Assume that we have a measurable map Σ : X → Aut (Z, D, ρ). Then we can consider S Σ : which is an automorphism of (X × Z, B ⊗ D, µ ⊗ ρ). If additionally, Z is a compact metric space and Σ(x), x ∈ X, are isometric then we call S Σ an isometric extension.
Since the group Iso(Z) of isometries of Z considered with the uniform topology is a compact metric group (by Arzela-Ascoli theorem), it is not hard to see that each isometric extension is a factor of a group extension (especially, if we assume that the isometric extension is uniquely ergodic). 13 If the isometric extension is ergodic, one can choose the group extension also ergodic.
Remark 3. If S ϕ is a group extension, then for each closed subgroup F ⊂ G, the automorphism S ϕ,F , given by is an isometric extension of S; it acts on the space X × G/F considered with µ ⊗ m G/F , where the measure m G/F on the homogenous space G/F is the natural image of Haar measure m G .

Remark 4.
Each finite extension is isometric. It is a factor of a group extension by G, where G is finite.

Odometers.
Odometers are given by the inverse limits of cyclic groups: we have n t−1 |n t for each t ≥ 1 and with the rotation R by 1 on each coordinate. If for each t, n t+1 /n t = λ ≥ 2, then we speak about λ-odometer and denote it by H λ (= H (λ) ).
Each odometer (X, R) is uniquely ergodic (with the unique measure being Haar measure m X of X), in particular, R ∈ Aut(X, m X ). Then (X, m X , R) has discrete spectrum with the group of eigenvalues given by all roots of unity of degree n t , t ≥ 1. Furthermore, R r is ergodic (uniquely ergodic) iff (r, n t ) = 1 for each t ≥ 1. In this case R r and R are isomorphic as both are ergodic and their spectra are the same, so the claim follows by the Halmos-von Neumann theorem (e.g. [19]). It now follows from (5) that whenever p, q ∈ P are different prime numbers (by P we denote the set of prime numbers) not dividing any n t then each ρ ∈ J e (R p , R q ) is a graph joining (of an isomorphism between R p and R q ).
1.4.1. h-discrete suspensions of odometers. Assume that (h, n t ) = 1 for each t ≥ 1. Let R denote the h-discrete suspension of R. Then (cf. (9)), we obtain that Indeed, both automorphisms have discrete spectrum (and are ergodic). The group of eigenvalues of U R is equal to {e 2πij/(hnt) : j ∈ Z, t ≥ 0}, while the group of eigenvalues of U R×τ h is generated by {e 2πij/nt : j ∈ Z, t ≥ 0} and the group of h-roots of unity. It follows that U R×τ h and U R have the same group of eigenvalues, hence again by the Halmos-von Neumann theorem, they are isomorphic.

1.4.2.
Group extensions of odometers -special assumptions. We will assume that R is an odometer with "small spectrum", that is, the set {p ∈ P : p|n t for some t ≥ 1} is finite (this will always be the case for λ-odometers). Moreover, we will assume that R ϕ (more precisely, U Rϕ ) has continuous spectrum on the space L 2 (X × G, m X ⊗ m G ) L 2 (X, m X ). 14 It follows that if p ∈ P is sufficiently large then (R ϕ ) p is ergodic (uniquely ergodic if ϕ is continuous). Furthermore, we assume that if p, q ∈ P are different and sufficiently large then the only ergodic joinings between (R ϕ ) p and (R ϕ ) q are relatively independent extensions of isomorphisms between R p and R q . 15 By all these assumptions ( is ergodic and has the same eigenvalues as the odometer R.
Lemma 1.2. Assume that ϕ is continuous. Under the above assumptions for each for each (x, g) ∈ X × G and different primes p, q sufficiently large.
Proof. First notice that any accumulation point ρ of 1 N n≤N δ (R p ×R q ) n (x,x) is ergodic (cf. e.g. [27]), hence is graphic. It follows that any accumulation point ρ of 1 N n≤N δ ((Rϕ) p ×(Rϕ) q ) n ((x,g),(x,g)) will be the relatively independent extension of an isomorphism between R p and R q in view of the second part of Lemma 1.1. By the definition of the relatively independent extension, F ⊗ F d ρ = 0 and the result follows.
Remark 5. We can apply the above proof to R ϕ instead of R p ϕ and τ being any rotation on finitely many points instead of R q ϕ . Then, similarly as in the proof of Corollary 11.37 in [12], using the fact that the spectral measure of F is continuous, we obtain 1 N n≤N F ((R ϕ ) n (x, g))v(n) → 0 for each (x, g) ∈ X × G and each periodic sequence v. That is, the sequence (F (R n ϕ (x, g))) n itself is aperiodic. This argument shows that we cannot expect (i) in Theorem 0.1 to hold for all automatic sequences.
1.5. Application -DKBSZ criterion. Recall the following result (to which we refer as the DKBSZ criterion) about the orthogonality of numerical sequences with bounded, multiplicative functions: Kátai [23], Bourgain, Sarnak and Ziegler [6]). Let a bounded sequence (a n ) ⊂ C satisfy   a n u(n) = 0 for each bounded, multiplicative function u : N → C. 15 We recall that if W : X → X settles an isomorphism of R p and R q , then it yields an ergodic joining (m X ) W (A×B) := m X (A∩W −1 B); its relatively independent extension (m X ) W is defined by In the context of topological dynamical systems, that is, given (X, S), we use this result with a n = f (S n x) with f ∈ C(X) and x ∈ X. It is not hard to see that this criterion applies for any uniquely ergodic (X, S) with the property that S p ⊥ S q (disjointness is meant if we consider the unique invariant measure µ) and we consider f ∈ C(X) with X f dµ = 0 and arbitrary x ∈ X. However, even if we do not have disjointness of sufficiently large (prime) powers, we can apply this criterion for particular continuous functions if we control the limit joinings as we have already seen in Lemma 1.2. In fact, it guides the strategy of the proof of Theorem 0.1 as explained in Section 4.3.

2.
Basics on substitutions of constant length. We refer the reader to [38], Chapters 5,6 and 9 for most of the statements about substitutions of constant length we list below.
2.1. Subshifts determined by substitutions. Let A be an alphabet (a nonempty finite set). Denote A * := m≥0 A m , where A m stands for the set of words w = a 0 a 1 . . . a m−1 over A of length |w| equal to m (A 0 consists only of the empty word). Fix N λ ≥ 2. By a substitution of (constant) length λ we mean a map θ : A → A λ which we also write as θ(a) = θ(a) 0 θ(a) 1 . . . θ(a) λ−1 for a ∈ A. Via the concatenation of words, there is a natural extension of θ to a map from A m to A mλ (for each m ≥ 1) or from A * to itself, or even from A Z to itself. In particular, we can iterate θ k times: → A λ k which can be viewed as the substitution θ k (the kth-iterate of θ) of length λ k : The following formula is well-known and follows directly by definition: for each a ∈ A, j < λ , j < λ k . Indeed, |θ (a)| = λ and consider θ (a) j , i.e. the j -th letter in the word θ (a). We now let act θ k on θ (a) which transforms letters in the word θ (a) into blocks of length λ k , in particular the j -th letter of θ (a) becomes the j -th block of length λ k . So counting the j-th letter in this block is the same as counting (j λ k + j)-th letter in θ k+ (a).
The subshift X θ ⊂ A Z is determined by all words that appear in θ k (a) for some k ≥ 1 and a ∈ A: That is, X θ is closed and invariant for the left shift S acting on A Z . Then, clearly, we have X θ k ⊂ X θ for each k ≥ 1.
(13) Note also that for each a ∈ A, there exist k, ≥ 1 such that from which we deduce that there is a letter a 0 ∈ A and an ≥ 1 such that Hence, by iterating the substitution θ , we obtain a fixed point u ∈ A N for the map θ : A N → A N . This, similarly to the RHS of (12) defines a subshift X u ⊂ A Z for which we have X u ⊂ X θ , cf. Remark 2. In general, we do not have equalities in (13), and X u is a proper subshift of X θ above a fortiori. The situation changes if we assume that θ is primitive, that is, when there exists k ≥ 1 such that for each a ∈ A the word θ k (a) contains all letters from A.
Then, we have equalities in (13), and moreover, for each ≥ 1 if u ∈ A N satisfies θ (u) = u then X u = X θ . Moreover, we have Assume that θ is a primitive substitution of constant length λ.
In what follows, we assume that θ : A → A λ is primitive. Under this assumption, it follows directly by the Perron-Frobenius theorem that, for any letter a ∈ A, the density δ a := lim n→∞ #{j < λ n : θ n (a ) j = a} λ n exists and is independent of a ∈ A. More precisely, first, denote by M (θ) ∈ Z |A|×|A| the incidence matrix of θ, i.e.
The vector (δ a ) a∈A is then the unique right eigenvector for the (maximal) eigenvalue holds for all a ∈ A and uniquely defines (δ a ) a∈A , see [38].
Then, by what is said around (14) and (15), we can also assume that for some and by iterating θ at a 0 we obtain u ∈ A N such that θ(u) = u (and X u = X θ ).

2.2.
The height and the column number. We first recall the definition of the height h(θ) of θ following [38]. Therefore, for k ≥ 0, set and This allows us to define We list now some basic properties of h = h(θ). 16 1 3. If, for j = 0, . . . , h − 1, we consider the set then, by 2., the letter u[j] can be equal to some u[s] only if u[s] ∈ C j . Hence, the sets C j form a partition of A. If we identify, in u, the letters in the same set C j , j = 0, . . . , h − 1, we thus obtain a periodic (of period h) sequence (cf. (8)), and h is the largest integer ≤ |A|, coprime to λ, with this property.
If θ is primitive then so is θ k (for each k ≥ 1). Take now u a fixed point for θ. This is also a fixed point of θ k . Moreover, h = h(θ) = h(θ k ). It follows easily that where the block on the RHS is divided into blocks of length h which are elements of A (h) . The j-th such block (j < λ) begins at the position jh which is of the form Replacing in this reasoning θ by θ k and using (22), we obtain the following: if (a 0 , . . . , Denote by c = c(θ) the column number of θ. Recall its definition: we consider iterations θ k : A → A λ k and each time we consider sets {θ k (a) j : a ∈ A} for In what follows we will make use of the following observation.
Proof. By assumption, there exist a ∈ A, k ∈ N and 0 ≤ j < λ k such that θ k (x) j = a for all x ∈ A. Recalling that the fixpoint of θ is u we find that u[j] = u[j + λ k ] = a. This shows that λ k ∈ S j (θ) and thus, g j | λ k . As h(θ)|g j and is coprime with λ, we have h(θ) = 1.
This seems to be a folklore result but we could not find a proof of it in the literature. We postpone the proof of Lemma 2.2 to Section 5.4.
The following result is well known for pure substitutions but an application of Lemma 2.2 yields the more general one below.
Proposition 3. Assume that θ : A → A λ is primitive. Then, as a measure- Proof. We have an isomorphism of X θ (we omit automorphisms and measures) with X θ (h) . If π : X θ (h) → H λ denotes the factor map which is c(θ (h) ) to 1 (a.e.), then π × Id Z/hZ yields a factor map between the h-discrete suspensions X θ (h) and H λ . It is again c(θ (h) ) to 1 (a.e.). However, in view of (10), the suspension H λ is isomorphic to the direct product H λ × Z/hZ. 17 This makes H λ a factor of X θ with (a.e.) fiber of cardinality c(θ (h) ) · h. The result follows from Lemma 2.2.
As a matter of fact, whenever θ is primitive and h(θ) = 1, the system (H λ , R) represents the so-called maximal equicontinuous factor of (X θ , S) (i.e., a maximal topological factor of (X θ , S) represented by a translation on a compact Abelian group). The corresponding factor map is usually seen in the following way: Each point x ∈ X θ has a unique λ t -skeleton structure. By that, one means a sequence (j t ) t≥1 with 0 ≤ j t < λ t − 1 for which, for each t ≥ 1, we have for each s ∈ Z and some c s ∈ A. Now, the map x → (j t ) yields the factor map which we seek.
Notice also that if R ⊂ A × A is an equivalence relation which is θ-consistent, that is: then the quotient substitution θ R : is correctly defined and the dynamical system (X θ R , S) is a (topological) factor of (X θ , S). Hence, (X θ R , µ θ R , S) is also a measure-theoretic factor of (X θ , µ θ , S) (clearly, θ R is primitive). A particular instance of a θ-consistent equivalence relation R is given by (a, b) ∈ R if and only if θ(a) = θ(b). 17 The factor H λ is represented in H λ × Z/hZ as the first coordinate σ-algebra and it is a factor of H λ represented by an invariant σ-algebra. However, because of ergodicity and the fact that H λ has discrete spectrum, there is only one invariant σ-algebra in H λ representing H λ . The same argument shows that H λ is a factor of X θ in a canonical way.
In this situation, the quotient dynamical system (X θ R , S) is in fact (topologically) isomorphic to (X θ , S). Therefore, no harm arises if we assume that Finally, we assume that θ is aperiodic, (26) that is, there is a non-periodic element x ∈ X θ . In fact, since we assume that θ is primitive, θ is aperiodic if and only if X θ is infinite.
3.1. Proof of Theorem 0.1 (i). The main ingredient in the proof is to show that every primitive automatic sequence (with substitution θ) with c(θ) = h(θ) is WRAP. In view of Lemma 2.2, it is enough to consider the case c(θ) = 1. Indeed, (X θ , S) is topologically isomorphic to the h(θ)-discrete suspension of (X η , S) and the automatic sequence with substitution η being WRAP immediately implies that the h-discrete suspension is also WRAP. Hence, according to Proposition 3, we deal (from the ergodic theory point of view) with the discrete spectrum case. 18 In fact, we will prove an even stronger property.
So we have θ : A → A λ , and we assume that for some a ∈ A, θ(a) 0 = a. Moreover, by replacing θ by its iterate if necessary, we can assume that Let us pass to θ 2 . We are interested in In view of (27), we have and since, by (28), the sequence k /λ k is convergent, the above recurrence formula implies Note that an interpretation of k is that it is the number of coordinates j for j = 0, 1, . . . , λ k − 1 such that u[j + sλ k ] = u[j] for each s ≥ 1. Now, (29) and Footnote 5 imply immediately that: Proposition 4. If c(θ) = 1 then the fixed point u = θ(u) is WRAP, in particular a limit of periodic sequences of period λ k , k ≥ 1, with respect to d W .
From the point of view of Möbius orthogonality, dynamical systems (X x , S) given by WRAP sequences x have already been studied in [4] and it is proved there that all continuous observables (f (S n y)) (for f ∈ C(X x )) are orthogonal to the Möbius function µ. But a rapid look at the proof in [4] shows that the only property of µ used in it was the aperiodicity of µ (what is essential in the proof is that all points y ∈ X x are also WRAP).
for all f ∈ C(X x ) and y ∈ X x . In particular, the above assertion holds for substitutional dynamical systems with c(θ) = 1. 19 Remark 6. Theorem 0.1 (i) also follows from [10] but because the relations between synchronized automata and substitutions with c(θ) = 1 do not seem to be explained explicitly in literature, we gave a more general and direct argument. If, additionally, c(θ) = |A|, then we speak about a bijective substitution (sometimes, such a substitution is also called invertible). The proof of Theorem 0.1 in the bijective case is provided in [13]. Moreover, the Rudin-Shapiro substitution is also treated in [13] and the method of proof can be extended to other quasi-bijective substitutions. The proof of Theorem 0.1 also covers the general case of quasibijective substitutions.

Essential centralizer.
Assume that T is an ergodic 21 automorphism of (Y, C, ν). By the centralizer C(T ) of T we mean the group of all invertible automorphisms V ∈ Aut (Y, C, ν) commuting with T . Clearly, {T n : n ∈ Z} is a normal subgroup of C(T ) and the group EC(T ) := C(T )/{T n : n ∈ Z} is called the essential centralizer of T . Lemma 4.1. Assume that EC(T ) is finite and C(T ) = C(T p ) for all sufficiently large p ∈ P. Then, for all sufficiently large p, q ∈ P, p = q, the automorphisms T p and T q are not isomorphic.
Proof. Since EC(T ) is finite, we have In what follows, we consider only prime numbers p, q which are dividing no m i for i = 1, . . . , K. Suppose that for p = q (sufficiently large) we have an isomorphism of T p and T q . As obviously T q has a q-root, it follows that there is a root of degree q of T p , i.e. there exists W ∈ Aut (Y, C, ν) such that W q = T p . 22 Now, W ∈ C(T p ), hence (by assumption) W ∈ C(T ). It follows that, for some n ∈ Z and 0 ≤ i ≤ K, Remark 7. Notice that in the above proof, in fact we proved that T p cannot be isomorphic to U q with U ∈ Aut (Y, C, ν), in other words, we proved that T p cannot have a q-root.

4.1.1.
Centralizer of h-discrete suspensions. We assume that T ∈ Aut(Y, C, ν) and let T denote its h-discrete suspension, see (6) and (7). Note that whenever V ∈ C(T ), the formula V (y, j) := (V y, j) for (y, j) ∈ Y defines an element of the centralizer of T . In fact, we have the following: It follows immediately from Proposition 5 that: Essential centralizer of substitutions of constant length. The result below has been proved by Host and Parreau in [22] for pure substitutions of constant length. However, taking into account Corollary 5 and the fact that each substitutional system is an h-discrete suspension of its pure basis (which is also given by a substitution of constant length), see Section 2, we obtain the following 23 result. Remark 8. If c(θ) = h(θ) (in particular, if c(θ) = 1) then we are in the synchronizing case. Thus, the spectrum of the corresponding dynamical system is discrete. Therefore, the essential centralizer is uncountable. In fact, S p is isomorphic to S q for all sufficiently large p, q ∈ P.
Remark 9. Assume that V : A → A is a bijection "commuting" with θ: for each j = 0, 1, . . . , λ − 1. Then, we claim that V has a natural extension to a homeomorphism V : ) for all m ∈ Z), so obviously, it commutes with the shift. Clearly, we only need to show that for some a ∈ A, k ≥ 1 and m ≥ 0. It follows from (30) that and the claim follows. Note that if θ is primitive, then V : X θ → X θ preserves µ θ , so V ∈ C(S).

4.2.
Joinings of powers of finite extensions of odometers.
. Assume that T acting on (Y, C, ν) is ergodic (aperiodic) and has discrete spectrum. Let ϕ : Y → G be a cocycle with G finite. Assume that T ϕ is ergodic. Moreover, assume that for p ∈ P large enough the corresponding group extension (T ϕ ) p is also ergodic. 24 Then for p ∈ P large enough, we have Using Lemmas 4.1 and 4.3, we obtain the following.
Proposition 6. Assume that T ∈ Aut (Y, C, ν) is ergodic (aperiodic) and has discrete spectrum. Assume moreover that T ϕ is an ergodic G-extension (G is a finite group) for which EC(T ϕ ) is finite and (T ϕ ) p is ergodic for all p ∈ P sufficiently large. Then (T ϕ ) p and (T ϕ ) q are not isomorphic for all p, q ∈ P which are different and large enough.
In the case of odometers, we can prove more. Assume that R is the odometer given by λ, that is, R ∈ Aut (H λ , m H λ ) is ergodic, has discrete spectrum and the eigenvalues are roots of unity of degree λ k , k ≥ 1). Consider p, q ∈ P relatively prime with λ. Then R is isomorphic to R p (also to R q ), so R p is isomorphic to R q . Recall also that the only ergodic joinings between R p and R q are given by graph joinings determined by isomorphisms between R p and R q .
Proposition 7. Assume that R is the λ-odometer. Let ϕ : H λ → G be a cocycle with G finite, so that R ϕ is ergodic, and (R ϕ ) p is also ergodic for p ∈ P large enough. Assume moreover that EC(R ϕF ) is finite for each proper subgroup F ⊂ G. 25 Then, for each pair of sufficiently large, distinct primes p, q, the only joinings 26 between (R ϕ ) p and (R ϕ ) q that project onto ergodic joinings of R p and R q are relatively independent extensions of the projections.
Proof. In view of Section 1.3, (R ϕ ) p = R p ϕ (p) . The general theory of groups extensions (see e.g. [30], [36]) tells us that if ) then there exist F 1 , F 2 ⊂ G normal subgroups of G such that an isomorphism W between R p and R q (we have assumed that ρ| H λ ×H λ is the graph (m H λ ) W of W ) lifts to an isomorphism W of the factors given by H λ × G/F 1 and H λ × G/F 2 . But R p ϕ (p) F1 = (R ϕF1 ) p . Moreover, F 1 , F 2 depend on p, q but altogether we have only finitely many possibilities for F 1 , F 2 . Assume that F 1 is a proper subgroup of G. We use our assumption and Lemma 4.1 together with Remark 7 to get a contradiction. It follows that ρ = (m H λ ) W is the relatively independent extension of the graph joining (m H λ ) W , see Footnote 12. We have proved that there is only one ergodic joining between (R ϕ ) p and (R ϕ ) q projecting on (m H λ ) W , whence there is only one joining between (R ϕ ) p and (R ϕ ) q projecting on (m H λ ) W and the result follows.
Remark 10. By Mentzen's theorem [35] on (partly continuous spectrum) factors of substitutions of constant length, when R ϕ is given by a substitution (primitive and non-synchronizing, i.e. h(θ) < c(θ)) then the natural factors R ϕF for F proper subgroup of G are also (up to measure-theoretic isomorphism) substitutions (cf. Footnote 25), so the assumptions of Proposition 7 will be satisfied.

4.3.
Strategy of the proof of the main result. To prove Theorem 0.1, we first show that each substitutional system (X θ , S) is a topological factor of another substitutional system (X Θ , S), where, from the measure-theoretic point of view, the system (X Θ , S), which is uniquely ergodic, is isomorphic to R ϕ satisfying the assumption of Proposition 7, see Remark 10. Moreover, we will show that the odometer of the original substitutional system is R.
Let (H λ , R) denote the λ-odometer associated with (X Θ , S). If the height of Θ is one, the λ-odometer is the maximal equicontinuous factor 27 of (X Θ , S). Hence, there is a continuous equivariant map π : X Θ → H λ . Moreover, we note the following observation which was already used in the proof of Lemma 1.2: Lemma 4.4. Take different p, q ∈ P sufficiently large. Then, for each y ∈ H λ , the point (y, y) is generic for an ergodic R p × R q invariant measure κ, i.e. for each . It follows that (y, y) is generic for κ = (m H λ ) W , the graph joining given by an isomorphism W between R p and R q .
Take any x ∈ X Θ . We study now the sequence (p = q sufficiently large) Any of its limit points yields a joining ρ ∈ J((R ϕ ) p , (R ϕ ) q ). Now, ρ| H λ ×H λ is precisely obtained as the limit of 1 N k n≤N k δ (R p ×R q ) n (π(x),π(x)) , 27 If h( Θ) > 1 then we must replace H λ with H λ × Z/hZ. for a relevant subsequence (N k ). But we have already noticed that any such limit must be an ergodic joining of R p and R q , hence, it is a graph joining (cf. (5)). Now, we use our results to obtain that the limit of 1 N k n≤N k δ (S p ×S q ) n (x,x) also exists and it is the relatively independent extension of the underlying graph joining between R p and R q . We have proved the following.
Proposition 8. If different p, q ∈ P are large enough, then the set is contained in the set of relatively independent extensions of the graphs of isomorphisms between R p and R q . 28 Assume now that F ∈ C(X Θ ) and F ⊥ L 2 (π −1 (B(H λ ))).
Again, fix x ∈ X Θ . We have by the definition of the relative product. The only thing which is missing now is to be sure that we have sufficiently many functions F satisfying (31). In fact, we aim at showing that (X Θ , S) has a topological factor (X θ , S), which measure-theoretically is equal to (H λ , R) and for each F ∈ C(X Θ ), we have where F ∈ C(X θ ) (with some abuse of notation), and F ⊥ L 2 (π −1 (B(H λ ))) (of course, F is also continuous). This will allow us to conclude the proof of Theorem 0.1 using Theorem 1.3 for F (as here we will deal with relatively independent joinings) and dealing separately with F . When θ is bijective, the existence of a "good" Θ is known, see Section 3.2 below: in (X Θ , S) we have many "good" continuous function, as the Kronecker factor of (X Θ , S) has a "topological" realization (see [13]). However, in the general case such an approach seems to be unknown, and in Section 5, we will show a new general construction of an extension of a substitutional system in which we will see sufficiently many continuous functions satisfying (31).

5.
Substitutions of constant length -one more point of view.

5.1.
Substitution joinings of substitutional systems. Assume that we have two substitutions θ : A → A λ and ζ : B → B λ . Assume that where p A , p B stand for the projections on A and B, respectively. Moreover, we assume that for each j = 0, . . . , λ − 1. Then it is easy to see that the formula defines a substitution Σ : A → A λ of length λ. We can also use the notation: Remark 11. We note that in general the above Σ need not be primitive, that is, it does not necessarily satisfy (15). Indeed, consider for example ζ = θ and then take for A the product set A×A. On the other hand, the "diagonal" A := {(a, a) : a ∈ A} yields a primitive substitution (clearly isomorphic to θ). (which is still primitive with c = h = 2) then Σ is primitive but (18) is not satisfied.
Remark 12. Note that X Σ is a topological joining of X θ and X ζ . Indeed, up to a natural rearrangement of coordinates, X θ∨ζ ⊂ X θ × X ζ . Then, for every (x, y) ∈ X Σ , the orbits of x and y are dense in X θ and X ζ , respectively. 30 Now, the image of the natural projection (x, y) → x is contained in X θ and if θ(u) = u, ζ(v) = v then (θ ∨ ζ)(u, v) = (u, v) (after a rearrangement of coordinates). So u is in the image of p X θ (x, y) = x and therefore Since p X θ is continuous and equivariant, it settles a topological factor map between the relevant substitutional systems.

5.2.
Joining with the synchronizing part. In general, when dealing with the dynamical system given by a substitution of constant length, we would like to see its Kronecker factor as a topological factor realized "in the same category of objects", that is, realized by another substitution. This is not always possible, even in the class of bijective substitutions, see Herning's example [21]. For the purpose of orthogonality with an arithmetic function u, we need however only an extension of the original substitution which is given by another substitution (of the same length) and require that in the extended system we have a "good" realization of the Kronecker factor. This is done by a joining of θ with its synchronizing part. 29 It is not hard to see that in this example, θ is primitive and c(θ) = 2. If by u we denote the fixed point of θ obtained by the iterations of 0, then S 0 = 2N, so g 0 = 2 and hence h(θ) = 2 = c(θ). This means that the dynamical system (X θ , µ θ , S) has discrete spectrum and hence since Σ is also primitive, as measure-theoretic dynamical systems, (X θ , µ θ , S) and (X Σ , µ Σ , S) are isomorphic, cf. (5). 30 This follows by the minimality of (X θ , S) and (X ζ , S), respectively.
Proposition 9. The substitution θ has the following basic properties: Proof. For (i), let M ∈ X and suppose that θ k M (A) j M = M . Then, for each M ∈ X , we have θ k M (M ) j M = M . The validity of (ii) follows by the same argument. Finally, (iii) follows from Lemma 2.1.
Definition 5.2. We call θ the synchronizing part of θ.
Also, note that Indeed, fix M ∈ X and let a ∈ A. Take any x ∈ M and (by primitivity) choose k ≥ 1 so that θ k (x) j = a for some 0 ≤ j < λ k . Then a ∈ θ k (M ) j and θ k (M ) j ∈ X . (38) is additionally a partition, then we obtain an equivalence relation on A which is θ-consistent (cf. (24)) and the dynamical system (X θ , S) is a topological factor of (X θ , S). However, in general, there is no reason for (X θ , S) to be a topological factor of (X θ , S).

Remark 13. If the union in
Note that, in general, the union in (38) is not a partition as the following example shows.
By looking at θ 3 , we see that θ is primitive. Moreover, straightforward computations give that c(θ) = 2 and h(θ) = 1. In fact, X = {{a, b}, {a, c}}. Therefore, the union in (38) is not a partition. Note that the first column of (39) and all its iterates yield the set {a, b, c}, so θ is not quasi-bijective.

MARIUSZ LEMAŃCZYK AND CLEMENS MÜLLNER
We also have the following result: Proposition 10. Let θ : A → A λ be a substitution. Then: θ is bijective if and only if θ is trivial, i.e. |X (θ)| = 1.
θ is quasi-bijective if and only if X θ is finite, i.e. a fixed point u of θ is periodic.
Proof. First of all, if θ is bijective then X (θ) = {A}, and (40) follows directly from (38). Let us pass to the proof of (41). ⇒: We assume that θ is quasi-bijective, i.e. for all n ≥ n 0 and all j < λ n , we have We define M j := θ n0 (A) j and it follows immediately that θ n0 (M ) j = M j for all M ∈ X . Let u be a fixed point of θ. Now, for i ≥ 0 and 0 ≤ j < λ n0 , we find that (use (11) letting → ∞), which shows that u is periodic (in fact, λ n0 is a period of u).
⇐: Let u be a periodic fixed point of θ with period p. First, we claim that u is also periodic with period λ n0 for some n 0 ≥ 0 (if so, u is periodic of period λ n for all n ≥ n 0 ). Indeed, by Proposition 4 (and Proposition 9), it follows that, in the Weyl pseudo-metric d W , we can approximate u by periodic sequences that have period λ n . Thus, there exists n 0 large enough and a λ n0 -periodic sequence v such that By basic properties of d W , we obtain Hence, remembering that S λ n 0 (v) = v, the triangle inequality implies d W ( u, S λ n 0 ( u)) < 1 p .
Since u and S λ n 0 ( u) are both p-periodic sequences with distance less than 1/p, they must coincide and the claim follows. As, by Proposition 9, θ is primitive, there exists n ≥ 0 such that for each M ∈ X , we can find j 1 < λ n such that θ n ( u[0]) j1 = M . Moreover, for all j 0 < λ n0 and all j < λ n (use (11) and j 0 + jλ n0 < λ n+n0 ), we have Letting j = j 1 , this shows that θ n0 (M ) j0 = u[j 0 ] for all M ∈ X , j 0 < λ n0 . As X covers A in view of (38), this shows that for all j 0 < λ n0 , we have θ n0 (A) j0 = u[j 0 ] and, therefore, which concludes the proof.

Remark 15.
Although, by primitivity, we can assume that there are a ∈ A and M ∈ X such that θ(a) 0 = a and θ(M ) 0 = M , it may happen that (18) is not satisfied for Θ. However, as is the case for any substitution of constant length, there exists some k ∈ N such that Θ k = θ k ∨ θ k 31 satisfies (18). Since the following proposition shows that Θ satisfies (15), which assures us that X Θ = X Θ k , we can assume without loss of generality that Θ satisfies (18).
Hence, using Proposition 11, the above remark and the ergodic interpretation of the column number, cf. Proposition 3, we obtain the following (which is also an immediate consequence of Lemma 5.5 33 ): Remark 17. Note also that (X θ , S) is a topological factor of (X θ∨ θ , S) via the map: Assume that M ∈ X . As, for each j = 0, . . . , λ − 1, the set {θ(a) j : a ∈ M } has c(θ) elements, we obtain the following: Lemma 5.3. The substitution θ ∨ θ has the following "relative invertibility" property: for each M ∈ X , (θ ∨ θ)(·, M ) j is a bijection from M to θ(M ) j for each j = 0, . . . , λ − 1.
Remark 18. In this way, using joinings, we may explain one of the strategies in [37] which consists in representing each automatic sequence as a "combination" of the synchronized and relatively invertible parts.

5.2.3.
Description of the subshift (X θ∨ θ , S). Each point of the space X θ∨ θ is of the form (x n , M n ) n∈Z , where M := (M n ) ∈ X θ with M n ∈ X , and x = (x n ) ∈ X θ , so for each n ∈ Z, x n ∈ M n ⊂ A. More than that, like every substitutional system, such a point must have its (unique) λ t -skeleton structure (j t ) t≥1 which we elaborate upon presently. First of all, by the definition of θ ∨ θ, (j t ) must be the λ t -skeleton structure of both M and x (we make crucial use here of the fact that the skeleton structure is unique; for that, we need that the substitutionθ is non-trivial, in other words the argument makes no sense if θ is quasi-bijective). That is, for each t ≥ 1, we have M [−j t + sλ t , −j t + (s + 1)λ t − 1] = θ t (R s ) for each s ∈ Z and some R s ∈ X . Fix t ≥ 1 and consider s = 0. We have and X R 0 = {r 0 , . . . , r c−1 }, where c = c(θ) and r j ∈ A. Hence 32 Alternatively, h(θ)|h(Θ) follows from the fact that (X θ , S) is a topological factor of (X Θ , S). 33 Indeed, as c( θ) = 1, Lemma 5.5 shows that c(θ) ≤ c(Θ) ≤ c(θ).
where the columns "represent" sets in X . We look at θ t (R 0 ) jt the j t -th element of θ t (R 0 ) which is "represented" by the j t -th column of the matrix above and is equal to M 0 (the zero coordinate of M ). Now, Note that, we can reverse this reasoning in the case: Indeed, in this case, given M , we can choose x 0 ∈ M 0 arbitrarily, and then successively fill in the "first" coordinate by placing there the -th row in the matrix (42), where x 0 = θ t (r ) jt . This shows that over all M ∈ X θ we have c points (x, M ) ∈ X θ∨ θ whenever the λ t -skeleton of M satisfies (43). What remains are points for which When projected down to the maximal equicontinuous factor, this condition defines a countable set, in particular, of (Haar) measure zero. For the unique measure µ θ we have hence a.e. a c-point extension, while for M which do not satisfy (43), we need (at most) two coordinates for x to determine all the others, so the fibers have at most c 2 elements.
0 . Acting on it by θ, we obtain 3 . By iterating this procedure and passing to the limit, we obtain a two-sided sequence which is a fixed point for θ and which we denote by M = θ ∞ (M 0 ). θ ∞ (M 0 ), the "dot" indicating the zero position of the sequence. A similar procedure can be made on the θ- side, starting with a.a, a.b, b.a and b.b. It is not hard to see that, up to a natural rearrangement of coordinates, the following four points (θ ∞ (a).θ ∞ (a), M ), , M ) are members of X θ∨ θ . It follows that the fiber over M has four points.

5.3.
Toward a skew-product measure-theoretic representation -making the relative invertibility clearer. In this part we want to rename the alphabet of Θ to get a new substitution, which makes the invertible part easier to handle. Namely, our new alphabet will be the set {0, . . . , c − 1} × X , where c = c(θ). The only thing we need is to give a "good" identification of {0, . . . , c − 1} × M with {(a, M ) : a ∈ M } (for M ∈ X ). We start by giving a classical example.
Example 3 (Rudin-Shapiro sequence). We consider the Rudin-Shapiro substitution θ defined by A = {a, b, c, d} and We find that X = {{a, d}, {b, c}} and consequently for Θ, we have: Let us now fix a partial ordering on A that is complete on every M ∈ X , e.g., a < b, c < d. Thus We will be dealing with θ which is not quasi-bijective (which, via Propositions 9 and 10, guarantees that (X θ , S) "captures" the whole discrete spectrum of (H λ , m H λ , R)). We repeat the above construction: fix a partial order on A that is complete on every M ∈ X and identify (a, M ) with (j, M ) if the ordering of the elements in M according to the ordering on A restricted to M yields a as the (j + 1)-th smallest element and rewrite instructions using the alphabet. This yields the substitution The second coordinate is still θ which corresponds to the synchronizing part and can be defined independently of the first coordinate. The first coordinate (which now depends on the second coordinate), gives the "invertible part".
Proof. As Θ is obtained by renaming the alphabet of Θ, this follows immediately from Proposition 11 and Corollary 6.

Remark 19.
A point in the space X Θ is of the form (y n , M n ) n∈Z with y n ∈ {0, . . . , c − 1} and there is an equivariant map between X Θ and X θ∨ θ given by where z n ∈ M n is the (y n + 1)-th letter in the ordering on M n . Composing this map with the projection on the first coordinate yields the topological factor (X θ , S).
The projection of (44) on the second coordinate yields the factor (X θ , S). According to Subsection 5.2.3, we can now represent (X Θ , µ Θ , S) as a skew product over (X θ , µ θ , S) using the following: first Remark 20. As from the measure-theoretic point of view, (X Θ , S) is still (X θ , S) (and (X θ , S) is the same as (H λ , R)), the above formula (45) clears up Remark 9.1 in [38], p. 229, about the form of a cocycle representing (X θ , µ θ , S) as a skew product over (H λ , m H λ , R).
Coming back to our main problem, note that (X Θ , S) has (X θ , S) as its topological factor and it has also (X θ , S) "representing" measure-theoretically (H λ , R) as its topological factor, but still we do not know whether we have decomposition (32) for each F ∈ C(X Θ ). To assure it, we will need another extension.
To conclude this section we want to discuss one particular ordering that has useful properties.
We start by taking an arbitrary complete order < M0 on M 0 such that a 0 is the minimum. By Proposition 9, θ is synchronizing and primitive, thus we find k 0 , j 0 such that θ k 0 (M ) j 0 = M 0 for all M ∈ X . As θ k 0 j 0 permutes the elements of M 0 , we find by iterating θ k 0 j 0 , some k 0 , j 0 < λ k0 such that θ k0 (a) j0 = a for all a ∈ M 0 and θ k0 (M ) j0 = M 0 for all M ∈ X .
Using (38), we extend now the ordering on M 0 to a partial (but not complete in general, as below, This gives a complete ordering on any M ∈ X and, thus, allows us to define Θ. With this ordering we obtain directly that for all M ∈ X and i = 0, 1, . . . , c − 1, we have which will be useful later. In particular, we see that a is the i-th largest element of some M ∈ X if and only if θ k0 (a) j0 is the i-th largest element of M 0 . Thus being the i-th largest element of some set M is indepent of M .

Remark 21.
As the order on each M ∈ X is fixed, we have Θ k = Θ k (cf. footnote 31).

5.4.
Column number, height and pure base -revisited. In this part we come back to connections between the column number and the height of primitive substitutions. We start with simple observations about substitution joinings.
Lemma 5.5. Let θ : A → A λ and ζ : B → B λ be substitutions of length λ fulfilling (15), (18), (25) and (26), but are not necessarily pure. For some A ⊂ A × B, let Σ = θ ∨ ζ : A → A λ be a substitution joining (in particular, we assume that Σ is primitive). Then, which also yields a lower bound for h(Σ). Furthermore, which, in particular, yields lower and upper bounds for the column number of Σ.
whenever j ≥ 0 and j < λ k . As f (u[ ]) ≡ mod h for each ≥ 0, it follows that (letting j be such that (i, M ) = u[j ] and = j λ k + j) By the same token, f (i, M ) ≡ j mod h, so finally for each k ≥ 0 and j < λ k . Furthermore, we know by the construction of Θ that there exist k 0 , j 0 such that Θ k0 (i, M ) j0 = (i, M 0 ) for every i ∈ {0, . . . , c − 1} and M ∈ X , as noticed in (46). 34 One can also see this result dynamically, as both (X θ , S) and (X ζ , S) are topological factors of (X Σ , S).
Thus, in view of (48) (for j = j 0 and k = k 0 ), for all i ∈ {0, . . . , c − 1} and M ∈ X , we obtain that This implies that f (i, M ) does not depend on M (as λ is a multiplicatively invertible element in the ring Z/hZ), so f (i, M ) = f (i, M 0 ) and f only depends on the first coordinate. Therefore, we denote f (i) := f (i, M 0 ). Moreover, we recall that a is the i-th largest element of some set M ∈ X if and only if θ k0 (a) j0 is the i-th largest element of M 0 . Thus, we can actually view f also as a map from A to {0, . . . , h−1}.
As the second coordinate in Θ equals θ is independent of the first coordinate and Θ(·, M ) j is a bijection from {0, . . . , c − 1} to itself, we obtain the following formula involving the incidence matrix M ( Θ) (see (16)), We now claim that i.e. the density of (i, M ) for Θ is equal to e Equation (51) shows directly that h|c and #{i < c : f (i) = j} is constant and equals c/h. It remains to show that the column number of the pure base of θ equals c/h. To this end, let us first recall that f can also be defined on A and that for all M ∈ X and j < h, we have Recall also the construction of the pure basis θ (h) of θ. It is a substitution defined over the alphabet A (h) which is the set {u[mh(θ), mh(θ) + h − 1] : m ≥ 0} (which is of course finite). Then, we define θ (h) : A (h) → (A (h) ) λ by setting for j = 0, . . . , λ − 1.
Next, we want to give a lower bound for c(θ (h) ). We fix any k > 0 and j < λ k . Denote i = jh λ k . In view of (23), it follows that we have for any (w 0 , . . . , (53)). Thus, we find where A i := {a ∈ A : f (a) = i}. Finally, we recall the relative bijectivity: for each M ∈ X and 0 ≤ j < λ we have |λ(M ) j | = c and, therefore, for any different a, a ∈ M , we have θ k (a) j = θ k (a ) j for each j < λ k , see also the remark before Lemma 5.3. This shows that which completes the proof.
Repeating the same with θ replaced with θ k (cf. Remark 21), we define the permutations σ This is a very important formula, as it highlights the connection between Θ and Θ. Furthermore, analogously to (56), we find that for j = 0, . . . , λ k − 1.
It follows directly that σ M,j = σ M,j and from (11) (applied to Θ) that for j 1 < λ k1 and j 2 < λ k2 . If θ(M 0 ) 0 = M 0 , then clearly θ k (M 0 ) 0 = M 0 and applying (59) for (σ, M 0 ), with j 1 = j 2 = 0, we obtain σ for each σ ∈ G. We recall (46), which shows the existence of some k 0 ∈ N and j 0 < λ k0 such that Θ k0 (i, M ) j0 = (i, M 0 ) for all i < c and M ∈ X . This gives directly, by (57), that σ holds for all σ ∈ G, M ∈ X . The next few lemmas will be used to show that Θ is primitive. We start by giving some kind of a dual statement to (61).
Next, we find a better description for G.
Proof. By (11), it is clear that any σ which, by (58), ends the proof.
This allows us to give the final description for G. M0,j : k ∈ N, j < λ k , θ k (M 0 ) j = M 0 } is indeed a (finite) group. As each element in our group G is of finite order, all we need to show is that the multiplication of two elements σ M0,j2 ), M 0 ), so the result follows.
Thus, for any g ∈ G, we have some k g ≥ 1 and j g < λ kg such that g = σ Proof. As we have Θ(σ, M 0 ) 0 = (σ, M 0 ) for all σ ∈ G, it is sufficient to show that for all g, h ∈ G and M, M ∈ X there exist k 1 ≥ 1, j 1 < λ k1 such that Θ k1 (id, M 0 ) j1 = (g, M ) and k 2 ≥ 1, j 2 < λ k2 such that Θ k2 (h, M ) j2 = (id, M 0 ). Indeed, for all k ≥ k 1 + k 2 , we then obtain that As there are only finitely many g, h ∈ G and M, M ∈ X , we find that there exists some (large) k such that for all g, h ∈ G and M, M ∈ X there exists j such that Θ k (h, M ) j = (g, M ).
Proof. We note that a point in the space X Θ is of the form (σ n , M n ) n∈Z and define the equivariant map where i n is defined by σ n (i n ) = 0. We claim that the fixed point of Θ that starts with (id, M 0 ) is mapped to the fixed point of Θ that starts with (0, M 0 ). Indeed, in view of (58), we have for all k ≥ 0, n < λ k that Θ k (id, M 0 ) n = (σ As (X Θ , S) is topologically isomorphic to (X Θ , S), we obtain the following (cf. Remark 20).
We have seen that h(θ) = h(Θ) = h( Θ), and c(θ) = c(Θ) = c( Θ). These equalities do not carry over toΘ. Proof. We recall that, by (46), there are k 0 , j 0 such that Furthermore, it follows from the definition of Θ that for all M ∈ X , we havê This shows directly thatΘ which implies that c(Θ) ≥ |G|. By Corollary 7, it follows that the maximal equicontinuous factor of (X θ , S) is a factor of the maximal equicontinuous factor of (X Θ , S). As θ and Θ have the same length, it follows that h(θ)|h( Θ). Even though one could expect the equality of these two numbers, it is not the case in general. The following example shows that h(Θ) and h(θ) can be different (and, in general, h( Θ) does not divide c(θ)).
This example shows that Θ is much closer to θ than Θ, but Θ has a structural advantage, as it relies on a group G. Indeed, we are able to find a representation of the height within the group G.
Lemma 5.10. Let Θ : G × X → (G × X ) λ be as described above. Then, there exists then ζ is the joining of a periodic substitution p (of period h) and θ, i.e. ζ = p ∨ θ.
Proof. Similarly as at the beginning of Lemma 2.2, by setting h = h( Θ), there exists f : G × X → {0, . . . , h − 1} (which we also treat as a 1-code) such that the fixed point u of Θ obtained by iterations of Θ at (id, M 0 ) is mapped via f to 01 . . . (h − 1)01 . . ., i.e. f (u[n]) = n mod h. Furthermore, we can assume without loss of generality that λ ≡ 1 mod h, as we can always replace Θ by Θ t , if necessary. We find similarly to (48) that M ))λ k + j mod h for each k ≥ 0 and j < λ k . Using additionally (61), this gives which shows that f ((σ, M )) only depends on σ (f (σ) = f (σ, M 0 )). Since (for each k) u begins with Θ k (id, M 0 ), by the definition of f , we have f (σ (k) M0,j ) = j mod h. Due to Lemma 5.8, we can write each g 1 , g 2 ∈ G as g i = σ (ki) M0,ji . Thus, we find This shows, that for all g 1 , g 2 ∈ G we have It follows that f is a group homomorphism and therefore G 0 := f −1 (0) is a normal subgroup. We can identify G 0 g with f (g) and the last statement follows immediately for p : {0, . . . , h − 1} → {0, . . . , h − 1} λ defined by p(i) j = λi + j mod h. Remark 22. Lemma 5.10 allows us to find a representation of the maximal equicontinuous factor of Θ in the case, where θ is not periodic. For a general substitution of constant length, we need to rely on a more complicated construction.
First, we would like to note that we can find a representation of G in the centralizer of (X Θ , S). Indeed, given τ ∈ G, consider V τ : Note that in view of (56), Θ "commutes" with V τ . It follows from Remark 9 that V τ (uniquely) extends to a homeomorphism V τ of X Θ and commutes with S: for each (σ n , M n ) n∈Z ∈ X Θ . It follows that we have a (finite) group V := {V τ : τ ∈ G} of homeomorphisms of X Θ commuting with the shift. Next, we want to determine the dynamical system obtained by factoring (X Θ , S) by V. To this end, consider the map g : (G × X ) Z → (G × X × X ) Z defined by g((σ n , M n ) n∈Z ) = (σ −1 n+1 • σ n , M n , M n+1 ) n∈Z , which means that g is a sliding block code with (right-) radius 1 (that is, there exists some Φ : (G × X ) 2 → G × X × X such that g(x)[n] = Φ(x[n, n + 1])). We see directly that g((σ n , M n ) n∈Z ) = g((σ n , M n ) n∈Z ) if and only if (σ n , M n ) n∈Z = V τ ((σ n , M n ) n∈Z ) for some τ ∈ G. We want to describe g(X Θ ). As g is a sliding block code, it follows that g(XΘ) is a (minimal) subshift. We want to show that it is actually a substitutional dynamical system, where the substitution has column number 1, i.e. it is synchronizing.
To this end, let us define a substitution η : G× X × X → (G×X × X ) λ as follows: We denote the fixpoint of Θ that starts with (id, M 0 ) by u and denote u[n] = (σ n , M n ). This allows us to reduce the alphabet of the substitution η from G×X ×X to B, where The formula above shows that g maps the fixed point of Θ starting with (σ 0 , M 0 ) (σ 1 , M 1 ) . . . to the fixed point of η starting with (σ −1 1 • σ 0 , M 0 , M 1 ). Furthermore, we see directly that the restriction of f : X Θ → B Z is still well-defined on the fixed point of Θ and therefore on X Θ . This restriction is necessary to ensure that η is primitive.
Thus we showed that g(X Θ ) = X η . It remains to show that η has column number 1. As θ has column number 1, we find by (29) some integers k ≥ 1, j < λ k − 1 and M ∈ X such that θ k (M ) j = M holds for all M ∈ X . It follows that for any σ ∈ G and M 1 , M 2 ∈ X , we have Example 5. We give the explicit construction of η for the Rudin-Shapiro sequence, which was already introduced in Example 3. We recall that, This procedure works well to find a representation of the maximal equicontinuous factor when h( Θ) = 1. However, in general we are factoring out too much and need to take a subgroup of V, namely V 1 := {V τ : τ ∈ G 0 } (G 0 comes from Lemma 5.10). Therefore we define g h : (G×X ) Z → (G×X ×X ×{0, . . . , h−1}) Z as an "extension" of the previously mentioned function g : (G × X ) Z → (G × X × X ) Z in the following way (see the proof of Lemma 5.10 for f and its properties): g h ((σ n , M n ) n∈Z ) = (σ −1 n+1 • σ n , M n , M n+1 , f (σ n )) n∈Z . We see directly that g((σ n , M n ) n∈Z ) = g((σ n , M n ) n∈Z ) if and only if (σ n , M n ) n∈Z = V τ ((σ n , M n ) n∈Z ) for some τ ∈ G 0 , as f (τ σ) ≡ f (τ ) + f (σ) mod h.
Thus, we consider now η h : G × X × X × {0, . . . , h − 1} → (G × X × X × {0, . . . , h − 1}) λ , where the first three coordinates coincide with η and the last coordinate can be seen as another substitution p : This gives that η h = η ∨ p. As we have seen that (X η , S) is measure-theoretically isomorphic to (H λ , m H λ , R), and (X p , S) ∼ = (Z/hZ, m Z/hZ , τ h ), we have found a factor of X Θ that is itself given by X η h and is measure-theoretically isomorphic to the maximal equicontinuous factor of X Θ (cf. Proposition 3). 6. Proof of Theorem 0.1, Corollary 1 and some questions.
Proof of Theorem 0.1. We have already shown (i) of Theorem 0.1. We now need to handle the case c(θ) > h(θ).
To this end, we have defined θ∨ θ extending θ (and being primitive), where (X θ , S) "represents" (measure-theoretically) either the (H λ , m H λ , R) factor of (X θ , S) if θ is not quasi-bijective or it is finite. Via an isomorphic copy Θ of θ ∨ θ, we finally have got the substitution Θ which is primitive by Proposition 12. Moreover, by Corollary 7, (X Θ , S) has (X θ , S) as its topological factor, and therefore c( Θ) > h( Θ) (it cannot be synchronized). 35 We only need to prove (ii) of Theorem 0.1 for (X Θ , S).
Via (65), we have shown that there is a compact group, a copy of the group G, namely V = {V τ : τ ∈ G}, in the centralizer of S which consists of homeomorphisms. Because of Lemma 5.9 and Proposition 3, the factor σ-algebra B(V) of subsets which are fixed by all elements of V "represents" (measure-theoretically) (H λ , m H λ , R). Of course, the L 2 -space for (X Θ , µ Θ , S) is spanned by continuous functions. Now, the conditional expectation of F ∈ C(X Θ ) with respect to B(V) is given by 1 |G| τ ∈G F • V τ , so it is still a continuous function. It follows that each continuous function F is represented as F 1 + F 2 , where F 1 is continuous and orthogonal to the L 2 (B(V)), and F 2 is represented by a continuous function that belongs to L 2 (B(V)). If we knew that H λ represents the Kronecker factor of (X Θ , µ Θ , S), via Proposition 7 we could apply Section 4.3 to conclude the proof. But, as we have already noticed, we cannot control h = h( Θ) and the Kronecker factor K of (X Θ , µ Θ , S) is given by (H λ × Z/hZ, m H λ ⊗ m Z/hZ , R × τ h ). But B(V) ⊂ K, so by a result of Veech [42], there is a compact subgroup V 1 ⊂ V such that K is the σ-algebra of subsets fixed by all elements of V 1 . (We are actually able to give a concrete description of V 1 as {V τ : τ ∈ G 0 }, where G 0 was defined in Lemma 5.10.) So, we can now repeat all the above arguments (cf. Footnote 25) with V replaced by V 1 to complete the proof.
Proof of Corollary 1. We begin with a 1-code F : X θ → C (that is, F (y) depends only on the 0-coordinate y[0] of y ∈ X θ ). We assume that for some fixed point y ∈ X θ , the sequence (F (S n y)) n∈N is multiplicative. Such an F is a continuous function and (X θ , S) is a topological factor of (X Θ , S), say π : X Θ → X θ settles a factor map. Now, F • π ∈ C(X Θ ) and (as in the proof of Theorem 0.1) we have F • π = F 1 + F 2 , where both F 1 , F 2 are continuous, F 1 ∈ L 2 (B(V)) and F 2 ⊥ L 2 (B(V)). Take x ∈ π −1 (y) that can be chosen as a fixed point of Θ. By 35 It has also (X θ , S) as its topological factor. Theorem 0.1, 1 N n≤N F 2 (S n x)F (S n (π(x))) → 0 since (F (S n π(x))) is multiplicative. On the other hand, by unique ergodicity, 1 N n≤N F 2 (S n x)F (S n (π(x))) → It follows that F •π ⊥ F 2 , which implies F 2 = 0. It follows that the spectral measure of F • π = F 1 is discrete. Let us first assume that h( Θ) = 1. First we note that π is actually a 1-code, which shows that F • π is also a 1-code. Indeed, π is given by the composition (σ n , M n ) n∈Z → (i n , M n ) n∈Z → (z n ) n∈Z where i n is defined by σ(i n ) = 0 and z n is the i n + 1-st element of M n (see Remark 19 and Proposition 13). Furthermore, we have an explicit formula for F 1 : As F • π is a 1-code and V τ is also a 1-code, it follows that F 1 is a 1-code as well. Moreover, by (66), we obtain that Thus, F 1 (σ, M ) only depends on M and we see that (with y = (σ, M )) F (S n y) = F 1 (S n M ) = F 1 (M n ).
As we have shown that F 1 ∈ C(X θ ) is a 1-code and θ is synchronizing by Proposition 9 (i), the result follows. Consider now the case h( Θ) > 1. We recall that there exists some f : G × X → {0, . . . , h − 1} such that f (y[n]) ≡ n mod h. As we have seen at the end of Section 5.5, we can identify V 1 as {V τ : τ ∈ G 0 }. Now, we find that F 1 (σ, M ) depends on M and σG 0 , or equivalently on M and f (σ, M ) = f (σG 0 , M ). This gives that F (S n y) = F 1 ((n mod h), M n ).
Remark 23. If we want to obtain in the assertion a weaker conclusion, 36 namely that such functions are Besicovitch rationally almost periodic, another proof of Corollary 1 can be obtained in the following way.
We first recall that if M := {v : N → C : |v| ≤ 1 and v is multiplicative} then to check that v has a mean along an arbitrary arithmetic progression, it is enough to show that v · χ has a mean for each Dirichlet character χ, see e.g. Proposition 3.1 in [5]. Now, the argument used in the proof of Lemma 2 in [40] works well and it shows that a function v ∈ M taking finitely many values has a mean along any arithmetic progression. Furthermore, Theorem 1.3 in [5] says that v is either Besicovitch rationally almost periodic or it is uniform. Moreover, uniformity is equivalent to aperiodicity by [15]. So if our v is not Besicovitch rationally almost periodic, then first lim sup N →∞ 1 N n≤N |v n | > 0 but also v is an aperiodic multiplicative automatic sequence, which is in conflict with Theorem 0.1.

7.
Proof of Corollary 2. Let u : N → C be a bounded multiplicative function. Following [1], we say that a topological dynamical system (Y, T ) satisfies the strong u-OMO property 37 if for each increasing sequence (b k ) k≥1 ⊂ N, b 1 = 1, b k+1 − b k → ∞, each sequence (y k ) ⊂ Y and each f ∈ C(Y ), we have lim K→∞ 1 b K k≤K b k ≤n<b k+1 f (T n y k )u(n) = 0 (67) Substituting f = 1 above, we see that u has to satisfy (4). Clearly, given (Y, T ), (67) implies (2). In fact (for u = µ), in the class of zero entropy systems, they are equivalent: it is proved in [1] that Sarnak's conjecture holds if and only if all zero entropy systems enjoy the strong MOMO property. 38 Lemma 7.1. For each primitive substitution θ, the system (X θ , S) has the strong u-OMO property for each bounded, aperiodic, multiplicative u : N → C satisfying (4).
Proof. We only need to prove the result for (X Θ , S). Then, the result follows immediately from Theorem 25 of [1] or, more precisely, from its proof. Indeed, all ergodic rotations satisfy the strong u-OMO property (Corollary 28 in [1]) and the only ergodic joinings between S p and S q whenever p = q are large enough, are relative products over isomorphisms of Kronecker factors. Moreover, the structure of the space of continuous functions allows us to represent each F ∈ C(X Θ ) as F = F 1 +F 2 , with both F i continuous, F 1 measurable with respect to the Kronecker factor, and F 2 orthogonal to L 2 of that factor. Finally, we apply the reasoning from the proof of Theorem 25 of [1] to F 2 . 36 Note that µ 2 is Besicovitch rationally almost periodic but is not Weyl rationally almost periodic, e.g. [4]. 37 Instead of the strong µ-OMO property, we speak about the strong MOMO property. The acronym MOMO stands for Möbius Orthogonality of Moving Orbits. 38 Moreover, no positive entropy system has the strong MOMO property. We recall (see [11]) that there are positive entropy systems (Y, T ) satisfying (1) for all f ∈ C(Y ).
Proof of Corollary 2. It follows from Lemma 7.1 and the equivalence of Properties 1 and 3 in Main Theorem in [1] that all MT-substitutional systems satisfy the strong u-OMO property for each bounded, aperiodic, multiplicative u satisfying (4). The uniform convergence follows now from Theorem 7 in [1].
Note that among MT-substitutional systems there are uniquely ergodic models which are topologically mixing [28]. In such models the maximal equicontinuous factor must be trivial, that is, there is no topological "realization" of the Kronecker factor.
Theorem 0.1 tells us that for each primitive substitution θ and each bounded, multiplicative, aperiodic u, (2) holds for each f ∈ C(X θ ) and arbitrary x ∈ X θ . We do not know however whether this convergence is uniform in x. Note that Corollary 2 gives the positive answer if u satisfies additionally (4).

Remark 24.
In the recent paper [25], it is proved that all q-multiplicative sequences are either almost periodic, which roughly would correspond to (i) of Theorem 0.1, or are orthogonal to all bounded, multiplicative functions. This makes some overlap (some automatic sequences are q-multiplicative) with our main result but the classes considered in [25] and in the present paper are essentially different.
Remark 25. Both of the main results of the paper, Theorem 0.1 and Corollary 1, remain valid if we drop the assumption of primitivity of substitutions. This will be treated in detail in a forthcoming paper.