From Odometers to Circular Systems: A global structure theorem

The main result of this paper is that two large collections of ergodic measure preserving systems, the Odometer Based and the Circular Systems have the same global structure with respect to joinings. The classes are canonically isomorphic by a continuous map that takes factor maps to factor maps, measure-isomorphisms to measure-isomorphisms, weakly mixing extensions to weakly mixing extensions and compact extensions to compact extensions. The first class includes all finite entropy ergodic transformations with an odometer factor. By results in a previous paper, the second class contains all transformations realizable as diffeomorphisms using the strongly uniform untwisted Anosov-Katok method. An application of the main result will appear in a forthcoming paper that shows that the diffeomorphisms of the torus are inherently unclassifiable up to measure-isomorphism. Other consequences include the existence measure distal diffeomorphisms of arbitrary countable distal height.


Introduction
The isomorphism problem in ergodic theory was formulated by von Neumann in 1932 in his pioneering paper [19]. Simply put it asks to determine when two measure preserving actions are isomorphic, in the sense that there is a measure isomorphism between the underlying measure space that intertwines the actions. It has been solved completely only for some special classes of transformations. Halmos and von Neumann [13] used the unitary operators defined by Koopman to completely characterize ergodic measure preserving transformations with pure point spectrum, these transformations can be concretely realized (in a Borel way) as translations on compact groups. Another notable success was the use of the Kolmogorov entropy to distinguish between measure preserving systems. Ornstein's work showed that entropy completely classifies a large class of highly random systems, such as independent processes, mixing Markov chains and certain smooth systems such as geodesic flows on surfaces of negative curvature.
Closely related to the isomorphism problem is the study of structural properties of measure preserving systems. These including mixing properties and compactness. A famous example is the Furstenberg-Zimmer structure theorem for ergodic measure preserving transformations, which characterizes every ergodic transformation as an inverse limit system of compact extensions followed by a weakly mixing extension. This result is fundamental for studying recurrence properties of measure preserving systems and the related proofs of Szemeredi-type combinatorial theorems ( [9]).
In this paper we present a new phenomenon, Global Structure Theory. Most structure theorems in ergodic theory consider a single transformation in vitro. The approach here is study whole, intact ecosystems of transformations with their inherent relationships.
Our main result shows that two large collections of measure preserving transformations have exactly the same structure with respect to factors and isomorphisms (and more generally, joinings). More concretely, define the odometer based transformations to be those finite entropy transformations that contain a non-trivial odometer factor. Spectrally, this is equivalent to the associated unitary operator having infinitely many finite period eigenvalues. To each odometer, we can associate a class of symbolic systems, the circular systems. In [5], it is shown that the circular systems coincide exactly with the ergodic transformations realizable as diffeomorphisms of the torus using the untwisted method of Approximation-by-Conjugacy, due to ).
We can make two categories by taking the objects to be these two classes of systems and by taking morphisms to be factor maps (or more generally joinings) that preserve the underlying timing structure. The main theorem of this paper says that these two categories are isomorphic by a map that takes measure-isomorphisms to measure-isomorphisms, weakly mixing extensions to weakly mixing extensions and compact extensions to compact extensions. It follows that it takes distal towers to distal towers. Moreover the map preserves the simplex of non-atomic invariant measures, takes rank one transformations to rank one transformations and much more. (This will be discussed further in the forthcoming [8].) In other words the global structure of these two categories is identical.
We can get more detail by considering systems based on a fixed odometer map and circular systems based on that odometer map and an arbitrary fast growing coefficient sequence. Doing so gives us collections of pairwise isomorphic categories that can be amalgamated to yield the statement above. The main theorem is framed in this more granular setting.
Our result might be a mere curiosity, were it not for an application which we now describe.
Foreshadowed by a remarkable early result by Feldman [4], in the late 1990's a different type of result began to appear: anti-classification results that demonstrate in a rigorous way that classification is not possible. This type of theorem requires a precise definition of what a classification is. Informally a classification is a method of determining isomorphism between transformations perhaps by computing (in a liberal sense) other invariants for which equivalence is easy to determine.
The key words here are method and computing. For negative theorems, the more liberal a notion one takes the stronger the theorem. One natural notion is the Borel/non-Borel distinction. Saying a set X or function f is Borel is a loose way of saying that membership in X or the computation of f can be done using a countable (possibly transfinite) protocol whose basic input is membership in open sets. Say that X or f is not Borel is saying that determining membership in X or computing f cannot be done with any amount of countable resources.
In the context of classification problems, saying that an equivalence relation E on a space X is not Borel is saying that there is no countable amount of information and no countable transfinite protocol for determining, for arbitrary x, y ∈ X whether xEy. Any such method must inherently use uncountable resources. 1 In considering the isomorphism relation as a collection I of pairs (S, T ) of measure preserving transformations, Hjorth showed that I is not a Borel set. However the pairs of transformations he used to demonstrate this were inherently non-ergodic 2 , leaving open the essential problem: Is isomorphism of ergodic measure preserving transformations Borel?
This question was answered by Foreman, Rudolph and Weiss in [6], where they gave a negative answer. This answer can be interpreted as saying that determining isomorphism between ergodic transformations is inaccessible to countable methods that use countable amounts of information.
In the same foundational paper from 1932 where von Neumann formulated the isomorphism problem he expressed the likelihood that any abstract measure preserving transformation is isomorphic to a continuous measure preserving transformation and perhaps even to a differentiable one. This brief remark eventually gave rise to one of the outstanding problems in smooth dynamics, namely: Does every ergodic MPT have a smooth model? By a smooth model is meant an isomorphic copy of the transformation which is given by smooth diffeomorphism of a compact manifold preserving a measure equivalent to the volume element. Soon after entropy was introduced, A. G. Kushnirenko showed that such a diffeomorphism must have finite entropy, and up to now this is the only restriction that is known. This paper is the second in a series of papers whose original purpose was to show that the variety of ergodic transformations that have smooth models is rich enough so that the abstract isomorphism relation, when restricted to these smooth systems, is as complicated as it is in general. We show this to be the case even when restricting to diffeomorphisms of the 2-torus that preserve Lebesgue measure this is the case. In the third paper we will complete the proof of the following theorem: Theorem (Anti-classification of Diffeomorphisms). If M is either the torus T 2 , the disk D or the annulus then the measure-isomorphism relation among pairs (S, T ) of measure preserving C ∞ -diffeomorphisms of M is not a Borel set with respect to the C ∞ -topology.
It was natural for us to try to adapt our earlier work to establish this result. However we were faced at first with the following difficulty. The transformations built in [6] were based on odometers (in the sense that the Kronecker factor was an odometer). It is a well known open problem whether it is possible to have any smooth transformation on a compact manifold that has a non-trivial odometer factor. Thus proving the anti-classification theorem in the smooth context required constructing a different collection of hard-to-classify transformations and then showing that this collection could be realized smoothly. This is our application of the main result of this paper.
The paper ( [5]) constructed a new collection of systems, the Circular Systems, which are defined as symbolic systems constructed using the Circular Operator, a formal operation on words. The main result in [5] has as a consequence that uniform circular systems can be realized as smooth models using the method developed by Anosov and Katok. The primary theorem of this paper allows us to transfer the general isomorphism structure for odometer based systems to the isomorphism structure for circular systems, at least up to automorphisms of the underlying odometer or rotation. Namely there remains the issue of preserving the timing mechanism. In the forthcoming [7] it is shown how to construct odometers so that for the resulting circular systems, up to a small correction factor, all isomorphisms preserve the underlying timing structure. This allows us to conclude the proof of the anti-classification theorem for diffeomorphisms.
Here is a more concrete description of the results in the paper. In the present paper we are concerned with the entire class OB of systems based on a fixed odometer and the relations between them. The odometer is determined by a sequence of positive integers greater than one, k n : n ∈ N . The the circular operator is determined by an additional sequence of integers l n : n ∈ N . For this paper, the sequence of l n 's can be arbitrary subject to the requirement that 1/l n < ∞. However for realizing circular systems as diffeomorphisms there is a fixed growth rate, determined by the size of the alphabet of the odometer based system and k n : n ∈ N , that the sequence of l n 's must eventually exceed.
We describe OB symbolically here, but show in a forthcoming paper that OB consists of representations of arbitrary ergodic systems with finite entropy that have the specific odometer as a factor. In the language of "cutting and stacking" constructions these are those constructions where no spacers are introduced. We fix l n : n ∈ N , and hence a sequence of circular operators. Applying these to each of the elements of OB we obtain a second class, CB, of circular systems. This class consists of some of the extensions of a fixed irrational rotation which is determined by the circular operator. As remarked above, for suitably chosen coefficient sequences, this class can be characterized as those transformations realizable as diffeomorphisms using the Anosov-Katok technique. We consider the two classes as categories where the morphisms are graph joinings which are either the identity of the base or reverse it. These are called synchronous and anti-synchronous joinings respectively. Our main theorem then takes the form: Theorem 1. For a fixed circular coefficient sequence k n , l n : n ∈ N the categories OB and CB are isomorphic by a functor F that takes synchronous joinings to synchronous joinings, anti-synchronous joinings to anti-synchronous joinings, isomorphisms to isomorphisms and weakly mixing extensions to weakly mixing extensions. 3 It is natural to extend the collections of morphisms of OB and CB to general synchronous and non-synchronous joinings. Because the ergodic joinings are not closed under composition, in extending Theorem 1 one is forced to consider at least some non-ergodic joinings. At the end of the paper we discuss how to extend Theorem 1 to expanded categories that have as morphisms arbitrary synchronous and anti-synchronous joinings. This involve expanding our analysis of generic sequences to non-ergodic joinings. We also describe some detailed analysis of the combinatorics behind the isomorphism F.
We have provided a detailed table of contents which enumerates the contents of the paper. Here is a brief summary. Much of the section following this one is standard, with the exception §2.6, which is exposes generic sequences for transformations and extends that notion to joinings. In §3, the reader will find an explanation of our two categories and a proof that circular systems contain a canonical rotation factor. Section 4 is primarily concerned with defining a map that is a symbolic analogue of complex conjugation on the unit circle. In sections 5 and 6 the mapping F is defined on morphisms, while §7 contains the proof of the main result. In §8 there is a more detailed analysis of of the dynamical properties of our mapping F which may prove useful in the future, and in the final section we collect some problems that are left open.

Acknowledgements
This work was inspired by the pioneering work of our co-author Dan Rudolph, who passed away before this portion of the grand project was undertaken. We owe an inestimable debt to J.P. Thouvenot who suggested using the Anosov-Katok technique to produce our badly behaved transformations rather than directly attacking the "odometer obstacle." We would like to thank E. Glasner for showing that F preserves compact extensions. Finally the first author would like to thank Christian Rosendal for asking very useful questions about how general our results were.

Preliminaries
This section establishes some of the conventions we follow in this paper. There are many sources of background information on this including any standard text or [20], [15]. A small portion of the material in this section was presented in [5], but is repeated here in an attempt to be self-contained. The reader is referred to [5] for any missing definitions.

Measure Spaces
We will call separable non-atomic probability spaces measure spaces and denote them (X, B, µ) where B is the Boolean algebra of measurable subsets of X and µ is a countably additive, non-atomic measure defined on B. 4 We will often identify two members of B that differ by a set of µ-measure 0 and seldom distinguish between B and the σ-algebra of classes of measurable sets modulo measure zero unless we are making a pointwise definition and need to claim it is well defined on equivalence classes.
We will frequently use without explicit mention the Maharam-von Neumann result that every standard measure space is isomorphic to ([0, 1], B, λ) where λ is Lebesgue measure and B is the algebra of Lebesgue measurable sets.
If (X, B, µ) and (Y, C, ν) are measure spaces, an isomorphism between X and Y is a bijection φ : X → Y such that φ is measure preserving and both φ and φ −1 are measurable. We will ignore sets of measure zero when discussing isomorphisms; i.e. we allow the domain and range of φ to be subsets of X and Y (resp.) of measure one. A measure preserving system is an object (X, B, µ, T ) where T : X → X is a measure isomorphism. A factor map between two measure preserving systems (X, B, µ, T ) and (Y, C, ν, S) is a measurable, measure preserving function φ : X → Y such that S • φ = φ • T . A factor map is an isomorphism or conjugacy between systems iff φ is a measure isomorphism. Following common practice, we will use the word conjugacy interchangeably with isomorphism in this context.
For a fixed measure space (X, µ) we can consider the collection of measure preserving transformations T : X → X. These form a group that can be endowed with a Polish topology that has basic open sets described as follows. We fix a finite measurable partition A of X and an > 0 and take as a neighborhood of T Details about this topology can be found in many sources including [12], [20].

Joinings
We remind the readers of the definitions. Extensive treatments of joinings can be found in [11] or [16]. All of the definitions and basic results about joinings necessary for this paper occur in Chapter 6 of the latter reference.

Definition 2.
A joining between two measure preserving systems (X, B, µ, T ) and (Y, C, ν, S) is a measure ρ on X × Y defined on the product σ-algebra B ⊗ C such that The graphs of factor maps provide natural examples of joinings. We characterize these with a definition.

Definition 3.
A joining ρ is a graph joining between X and Y if and only if for all C ∈ C and all > 0, there is a B ∈ B such that A joining ρ between (X, B, µ, T ) and (Y, C, ν, S) is an invertible graph joining if and only for all B ∈ B there is a C ∈ C such that and vice versa: for all C ∈ C, there is a B ∈ B such that equation 1 holds.
Here are some standard facts (see [11]): Proposition 4. Let X = (X, B, µ, T ) and Y = (Y, C, ν, S). Then 1. There is a canonical one-to-one correspondence between the collection of graph joinings of X and Y and the collection of factor maps from X to Y . A graph joining concentrates on the graph of the factor map. We can represent the graph joining corresponding to a measure preserving map φ : X → Y by 2. There is a canonical one-to-one correspondence between the collection of invertible graph joinings of X and Y and the collection of conjugacies between X and Y.
3. Suppose that B ⊆ B and C ⊆ C are Boolean algebras that generate B and C respectively as σ-algebras. Let ρ be a joining of X with Y such that for all > 0 and all C ∈ C there are B 1 , . . . B n ∈ B such that we have ρ( i (B i × Y )∆(X × C)) < , then ρ is a graph joining.
We note that perhaps a more proper term for an invertible graph joining is the earlier usage diagonal joining. In view of the results of this section we will often be careless and say that ρ is a factor map or ρ is a conjugacy/isomorphism to mean that ρ is a graph joining or ρ is an invertible graph joining.
To each joining ρ of X and Y we can associate its adjoint ρ * , the joining of Y with X defined for B ∈ B and C ∈ C as: If ρ is a graph joining corresponding to a factor map π : X → Y , then ρ * concentrates on {(y, x) : π(x) = y}.
The following is immediate: Proposition 5. ρ is an invertible graph joining if and only if both ρ and ρ * are graph joinings.
Thus we can apply Proposition 4, item 3 to both ρ and ρ * to get a criterion for being the joining associated with a conjugacy.
A potential source of confusion. Proposition 4 allows us to identify graph joinings with factor maps and invertible graph joinings with conjugacies. These joinings are always ergodic as joinings. However, there are non-ergodic conjugacies between ergodic measure preserving transformations. More explicitly: there are ergodic systems (X, T ) and (X, S) and non-ergodic isomorphisms φ : (X, T ) → (X, S). 5 The associated joining ρ φ is, however, ergodic as a T × S-invariant measure.
Let (X, µ), (Y, ν) and (Z,μ) be measure spaces and π X : X → Y and π Z : Z → Y be factor maps. We can define a canonical joining of X and Z that reflects the factor structure as follows. We let {µ y : y ∈ Y } and {μ y : y ∈ Y } be the disintegrations of X and Z over Y respectively. The relatively independent joining of X and Z over Y is the joining ρ: ρ = (µ y ×μ y )dν(y).
We will sometimes write this as X × Y Z. 5 The second author has given examples of of isomorphic ergodic transformations where every conjugacy is non-ergodic.
We will be concerned about categories of measure preserving systems where the morphisms are joinings. For this we must describe the composition operation. Suppose we are given joinings ρ XY between X and Y and ρ Y Z between Y and Z. Then (Y, ν) is a common factor of both (X × Y, ρ XY ) and (Y × Z, ρ Y Z ) and we can consider the relatively independent joining We define the composition of ρ XY and ρ Y Z to be the projection of the relatively independent joining of ρ XY and ρ Y Z to a measure on X ×Z. Formally, if A ⊆ X × Z and ρ is the relatively independent joining, then: Example 6. Suppose that π 0 : X → Y and π 1 : Y → Z are factor maps. If ρ XY is the joining associated with π 0 and ρ Y Z is the joining associated with π 1 , then (ρ * Y Z • ρ * XY ) * is the joining associated with the factor map π 1 • π 0 : X → Z. 6 The following are standard facts (e.g. in §6.2 of [11]): 1. The operation of composition of joinings is associative: if ρ 1 , ρ 2 and ρ 3 are joinings, then 2. Suppose that π X : X → X and π Z : Z → Z are factor maps Let ρ 1 and ρ 2 be joinings of X, Y and Y, Z respectively. Let ρ π 1 be the projection of ρ 1 to a joining of X and Y via π X × id and ρ π 2 be defined similarly. Finally let (ρ 1 • ρ 2 ) π be the projection of the composition of ρ 1 and ρ 2 to a joining of X with Z. Then:

Symbolic Systems
Let Σ be a countable or finite alphabet endowed with the discrete topology. Then Σ Z can be given the product topology, which makes it into a separable, totally disconnected space that is compact if Σ is finite.
Notation: If u = σ 0 , . . . σ n−1 ∈ Σ <∞ is a finite sequence of elements of Σ, then we denote the cylinder set based at k in Σ Z by writing u k . If k = 0 we abbreviate this and write u . Explicitly: The collection of cylinder sets form a base for the product topology on Σ Z .
Notation: For a word w ∈ Σ <N we will write |w| for the length of w. We will write 1 w for the characteristic function of the interval w 0 in Σ Z .
The shift map: sh : Σ Z → Σ Z defined by setting sh(f )(n) = f (n + 1) is a homeomorphism. If µ is a shift invariant Borel measure then the resulting measure preserving system (Σ Z , B, µ, sh) is called a symbolic system. The closed support of µ is a shift invariant closed subset of Σ Z called a symbolic shift or sub-shift. Symbolic shifts are often described intrinsically by giving a collection of words that constitute a clopen basis for the support of an invariant measure. Fix a language Σ, and a sequence of collections of words W n : n ∈ N with the properties that: 1. for each n all of the words in W n have the same length q n , 2. each w ∈ W n occurs at least once as a subword of each w ∈ W n+1 , 3. there is a summable sequence n : n ∈ N of positive numbers such that for each n, every word w ∈ W n+1 can be uniquely parsed into segments u 0 w 0 u 1 w 1 . . . w l u l+1 (2) such that each w i ∈ W n , u i ∈ Σ <N and for this parsing The segments u i in condition 2 are called the spacer or boundary portions of w.
Definition 8. A sequence W n : n ∈ N satisfying properties 1.)-3.) will be called a construction sequence.
Associated with a construction sequence is a symbolic shift defined as follows. Let K be the collection of x ∈ Σ Z such that every finite contiguous subword of x occurs inside some w ∈ W n . Then K is a closed shift invariant subset of Σ Z that is compact if Σ is finite. 7 Formally, we have constructed a symbolic shift. To get a measure preserving system we find a shift invariant measure µ concentrating on K and write (K, µ). In [5] we define the notion of a uniform construction sequence and show that the resulting K are uniquely ergodic.
We want to be able to unambiguously parse elements of K. For this we will use construction sequences consisting of uniquely readable words.
Definition 9. Let Σ be a language and W be a collection of finite words in Σ. Then W is uniquely readable iff whenever u, v, w ∈ W and uv = pws then either p or s is the empty word.
In our constructions we will restrict our measures to a natural set: Definition 10. Suppose that W n : n ∈ N is a construction sequence for a symbolic system K with each W n uniquely readable. Let S be the collection x ∈ K such that there are sequences of natural numbers a m : m ∈ N , b m : m ∈ N going to infinity such that for all m there is an n, x [−a m , b m ) ∈ W n .
Note that S is a dense shift invariant G δ set. The following lemma is routine: Lemma 11. Fix a construction sequence W n : n ∈ N for a symbolic system K in a finite language. Then: 1. K is the smallest shift invariant closed subset of Σ Z such that for all n, and w ∈ W n , K has non-empty intersection with the basic open interval w ⊂ Σ Z .
2. Suppose that there is a unique invariant measure ν on S ⊆ K, then ν is ergodic.
Item 1 is clear from the definitions. If X is a Polish space, T : X → X is a Borel automorphism and D is a T -invariant Borel set with a unique T -invariant measure on D, then that measure must be ergodic.
Let W n : n ∈ N be a uniquely readable construction sequence, and s ∈ S. By the unique readability, for each n either s(0) lies in a well-defined subword of s belonging to W n or in a spacer of a subword of s belonging to some W n+k .
Lemma 12. Suppose that K is built from W n : n ∈ N and ν is a shift invariant measure on K concentrating on S. Then for ν-almost every s there is an N for all n > N , there are a n ≤ 0 < b n such that s [a n , b n ) ∈ W n .
Let B n be the collection of s ∈ S such that for some a n ≤ 0 < b n , s [a n , b n ) ∈ W n but s(0) is in a boundary portion of s [a n , b n ). By the Ergodic Theorem and clause 3.) of the definition of a construction sequence ν(B n ) < ∞. It follows from the Borel-Cantelli Lemma that for almost all s there is an N such that for all n ≥ N , s / ∈ B n . Fix an s ∈ S and such an N . From the definition of S there are arbitrarily large n * > N and a n * ≤ 0 < b n * such that s [a n * , b n * ) ∈ W n * . Using backwards induction from n * to N and the definition of B n , this also holds for all n ∈ [N, n * ).

Locations
By Lemma 12 for ν-almost all x and for all large enough n there is a unique k with 0 ≤ k < q n such that s [−k, q n − k) ∈ W n . Definition 13. Let s ∈ S and suppose that for some 0 ≤ k < q n , s [−k, q n − k) ∈ W n . We define r n (s) to be the unique k with with this property. We will call the interval [−k, q n −k) the principal n-block of s, and s [−k, q n −k) its principal n-subword. The sequence of r n 's will be called the location sequence of s.
We interpret r n (s) = k as saying that s(0) is the k th symbol in the principal n-subword of s containing 0. We can view the principal n-subword of s as being located on an interval I inside the principal n + 1-subword. Counting from the beginning of the principal n + 1-subword, the r n+1 (s) position is located at the r n (s) position in I.

Remark 14.
Suppose that s ∈ S has a principal n-block for all n ≥ N . Let N ≤ n < m. It follows immediately from the definitions that r n (s) and r m (s) are well defined and the r m (s) th position of the principal m-block of s is in the r n (s) th position inside the principal n-block of s.
The next lemma tells us that an element of s is determined by knowing any tail of the sequence r n (s) : n ≥ N together with a tail of the principal subwords of s.
Lemma 15. Suppose that s, s ∈ S and r n (s) : n ≥ N = r n (s ) : n ≥ N and for all n ≥ N , s and s have the same principal n-subwords. Then s = s .
Since s, s ∈ S there are sequences a n , a n , b n , b n : n ≥ N tending to infinity such that s [−a n , b n ) ∈ W n and s [a n , b n ) ∈ W n . Since r n (s) = r n (s ) we know that a n = a n and b n = b n . Since s and s have the same principal subwords, s [a n , b n ) = s [a n , b n ). The lemma follows.
Remark 16. We record some consequences of Lemma 15: 1. Suppose that we are given a sequence u n : M ≤ n with u n ∈ W n . If we specify which occurrence of u n in u n+1 is the principal occurrence, and the distances of the principle occurrence to the beginning of u n+1 go to infinity, then u n : M ≤ n determines an s ∈ S ⊆ K completely up to a shift k with |k| ≤ q M .
2. A sequence r n : N ≤ n and sequence of words w n ∈ W n comes from an infinite word s ∈ S if both r n and q n − r n go to infinity and that the r n+1 position in w n+1 is in the r n position in a subword of w n+1 identical to w n .
Caveat: just because r n : N ≤ n is the location sequence of some s ∈ S and w n : N ≤ n is the sequence of principal subwords of some s ∈ S, it does not follow that there is an x ∈ S with location sequence r n : N ≤ n and sequence of subwords w n : N ≤ n .
3. If x, y ∈ S have the same principal n-subwords and r n (y) = r n (x) + 1 for all large enough n, then y = sh(x).

A note on inverses of symbolic shifts
We define operators we label rev(), and apply them in several contexts Definition 17. If x is in K, we define the reverse of x by setting rev(x)(k) = x(−k). For A ⊆ K, define: If w is a word, we define rev(w) to be the reverse of w. If we are viewing w as sitting on an interval, we take rev(w) to sit on the same interval. Similarly, if W is a collection of words, rev(W) is the collection of reverses of the words in W.
If (K, sh) is an arbitrary symbolic shift then its inverse is (K, sh −1 ). It will be convenient to have all of our shifts go in the same direction, thus: Proposition 18. The map φ sending x to rev(x) is a canonical isomorphism between (K, sh −1 ) and (rev(K), sh). We will use the notation L −1 for the system (L, sh −1 ) and rev(L) for the system (rev(L), sh).
We will also use the following remark.

Remark 19.
Assume that there is a unique non-atomic measure on a shift invariant set S ⊆ K. Then there is also a unique non-atomic shift invariant measure on rev(S) and for this measure, which we denote ν −1 , we have ν( w ) = ν −1 ( rev(w) ).

Generic points and sequences
Let T be a measure preserving transformation from (X, µ) to (X, µ), where X is a compact metric space. Let C(X) be the space of all real valued complex functions. Then a point x ∈ X is generic for T if and only if for all f ∈ C(X), The Ergodic Theorem tells us that for a given f and ergodic T equation above holds for a set of µ-measure one. Intersecting over a countable dense set of f gives a set of µ-measure one of generic points. For symbolic systems K ⊆ Σ Z we can describe generic points x as being those x such that the µ-measure of all basic open intervals u 0 is equal to the density of k such that u occurs in x at k.
The symbolic systems we consider will be built from construction sequences and are characterized by the limiting properties of finite information. We now describe how this works in greater detail. A more complete discussion of this can be found in [21].
Let µ be a shift invariant measure on a symbolic system K defined by a uniquely readable construction sequence W n : n ∈ N in a finite language Σ. Assume that q n is the length of the words in W n . By µ m we will denote the discrete measure on the finite set Σ m given by µ m (u) = µ( u ). Byμ n (w) we will denote the discrete probability measure on W n defined bŷ .
Thusμ n (w) is the relative measure of w among all w , w ∈ W n . The denominator is a normalizing constant to account for spacers at stages m > n and for shifts of size less than q n .
Explicitly, if A n = {s ∈ K : s(0) is the start of a word in W n }, then the sets {sh j (A n )} qn−1 j=0 are disjoint and their union has a measure that tends to one as n grows to infinity. The set A n is partitioned into |W n | many sets by the words w ∈ W n andμ n gives their relative size in A n . Since the measure of an arbitrary finite cylinder set can be calculated along the individual columns represented by a fixed w, it is clear that theμ n (w) determine uniquely the measure µ.
Using the unique readability of words in W k a word w in Σ q k+l determines a unique sequence of words w j in W k such that , When w ∈ W k+l , each u j is in the region of spacers added in W k+l , l ≤ l. We will denote the empirical distribution of W k -words in w by EmpDist k (w). Formally: Then EmpDist extends to a measure on P(W k ) in the obvious way.
To finitize the idea of a generic point in K we introduce the notion of a generic sequence of words.
Definition 20. A sequence v n ∈ W n : n ∈ N is a generic sequence of words if and only if for all k and > 0 there is an N for all m, n > N , The sequence is generic for a measure µ if for all k: var is the variation norm on probability distributions.
It follows that if v n : n ∈ N is a generic sequence of words then it is generic for a unique measure µ. Even though Definition 20 involves only the measuresμ k it is easy to see (using the Ergodic Theorem) that for any u ∈ Σ k , if v n : n ∈ N is generic then the density of the occurrences of u in the v n will converge to µ( u ).
We can summarize the exact relationship between the empirical distributions and the µ q k by saying that the empirical distribution is the proportion of occurrences of w ∈ W k among the k-words that appear in v n , whereas µ q k is approximately the density of the locations of the start of k-words in v n . Letting u ∈ W k , d be the density of the positions where an occurrence of u begins in v n , and d s be the density of locations of letters in some spacer u i we see that these are related by: We record the following consequence of the Ergodic Theorem for future reference: Proposition 21. Let K be an ergodic symbolic system with construction sequence W n : n ∈ N and measure µ. Then for any generic s the sequence of principal subwords of s, w n : n ∈ N , is generic for µ. In particular, generic sequences for µ exist.
We will need a characterization of when a generic sequence of words w n : n ∈ N determines an ergodic measure.

Definition 22.
A sequence v n : n ∈ N with v n ∈ W n is an ergodic sequence if for any k and > 0 there are n 0 > k, and m 0 such that for all is the parsing of v m into W n 0 words and spacers u i then there is a subset I ⊆ {0, 1, 2 . . . J} with |I|/J > 1 − and for all j, j ∈ I Notice that in the definition of an ergodic sequence v n we are not assuming that it is a generic sequence for a measure. This follow easily (see Lemma 24), but we have not made it part of the definition to emphasize its finitary nature. In the next lemma we use the fact that the language Σ is finite.
Lemma 23. Any generic sequence v n : n ∈ N for an ergodic measure µ is an ergodic sequence.
Suppose we are given k and > 0. For all δ > 0 we can apply the Ergodic Theorem to find an N much bigger than q k and a set B with µ(B) > 1 − δ such that for all s ∈ B and all w ∈ W k : Fix a generic point s for µ. Let I = {i ≥ 0 : T i s ∈ B}, and define an infinite sequence of disjoint intervals of length N that cover I by inductively letting i 0 = min(I), and i j+1 = min({i ∈ I : i ≥ i j + N }). We take the intervals to be the sequence Notice that the complement of these intervals in Z + has density less than δ since their union clearly covers I.
Though this is an infinite sequence of intervals, the fact our language is finite implies that only finitely many distinct words of length N occur as subwords of s on these intervals. For each such word w * , the density of those i in the domain of w * such that an occurrence of a w ∈ W k starts at i is within δ of µ q k ( w ). 8 Next take n 0 large enough that N/q n 0 < δ, and parse s into words from W n 0 and the sections of s corresponding to spacers in words in W j for some j ≥ n 0 + 1. By taking n 0 large enough we can take the density of locations in s occurring in spacers to be arbitrarily small. Let δ be this density.
The words from W n 0 have length much larger than N , and we can collect all those words w ∈ W n 0 that are (1 − √ δ)-covered by the N -intervals we chose above into a set A ⊆ W n 0 .
The proportion of s Z + not covered by words in A can be split into the spacer section and the portion inside words w in B = W n 0 \ A. For w ∈ B the complement of the N -intervals has density at least √ δ. It follows that the density of sections of s covered by elements of B is less than √ δ. Thus the fraction of s not covered by words in A is at most √ δ + δ . It is now clear that if δ, δ are chosen to be sufficiently small then and all w ∈ A will have the property that EmpDist k (w) −μ k var < /2 which implies inequality 3 for pairs of words in A. Using inequality 4 and the fact that v n is generic for µ gives an m 0 so that for all m ≥ m 0 when v m is parsed into n 0 words a (1 − )-fraction will lie in A and this concludes the proof.
We will also need the converse to Lemma 23, namely that the limiting measure defined by an ergodic sequence is, in fact, ergodic.
Lemma 24. An ergodic sequence is generic and the measure µ defined by an ergodic sequence v n : n ∈ N is ergodic.
Inequality 3 implies that for each k and w ∈ W k , the limit of the density of occurrences of w in v n exists as n goes to infinity. It follows (since W k is finite) that v n : n ∈ N is a generic sequence and hence it defines a unique measure µ. 8 By taking N q k , we can account for negligible "end effects" so that We ignore end effects in the rest of the proof.
The ergodicity of µ is equivalent to the fact that the ergodic averages of all L 2 functions converge almost everywhere to a constant. Functions of the form 1 w where w ∈ n W n and their shifts linearly span a dense set in L 2 from which it easily follows that if µ were not ergodic there would be some k, and w ∈ W k with (1/N ) N −1 0 1 w (T i x) converging µ-a.e. to a non-constant function. This means that there is a γ > 0 and disjoint sets B 0 , B 1 of positive measure in K such that for all large enough N for all Take small compared to γ and µ(B 0 ), µ(B 1 ). Find n 0 , m 0 as in the definition of ergodic sequence for this k and . Choose N large enough that inequality 5 holds and so that q n 0 /N is negligible. Finally take m ≥ n 0 so that N/q m is negligible.
The inequality 5 depends only on the initial (N + q k )-block of x 0 and x 1 . Thus for large enough m we can compute µ(B 0 ) and µ(B 1 ) by the empirical distributions of the (N + q k )-blocks in v m .
Since N is large compared to q n 0 the frequency of occurrence of w in a block of length N + q k is determined by its frequencies in the words in W n 0 in the n 0 -parsing of v m . We now get a contradiction to inequality 5, since except for an -fraction, these w n 0 -words have their k-words distributed very close toμ k (w).
If S and T are symbolic systems then a joining ρ of S and T will be a symbolic system, but may not have well-defined construction sequence, even if S and T do. 9 Accordingly we must generalize our definition of empirical distribution to take into account the relative locations of words in typical (s, t) ∈ K × L. We express this by shifting one of the basic open sets and considering words (w, sh s (v)), which we view as starting at the locations (0, s).
Let W n : n ∈ N and V n : n ∈ N be uniquely readable construction sequences for K and L in the languages Σ, Λ respectively. Assume for simplicity that all words in W n and V n have the same length.
Let n ≤ n < n + l. Then we can uniquely parse a word w ∈ W n+l as where each w j ∈ W n and each u j is in the region of spacers for words in W n+l , l < l. The similar statement holds for v k ∈ V n , and v ∈ V n+l : The definition must take into account the relative shifts of w and v, the shifts of (w j , v k ) allow spacers to occur in different places and for the possibility that J = K.
Let n ≤ n < n + l be natural numbers, s, s ∈ Z, and (w , v ) ∈ W n × V n and (w, v) ∈ W n+l × V n+l . Write w and v in terms of n and n -words as above. For s, s , define an occurrence of (w , sh s (v )) in (w, sh s (v)) to be a j ≤ J such that w j = w and if k is the location of w j in w, then v occurs at k + s in sh s (v). We note the bijection between occurrences of (w , sh s (v )) in (w, sh s (v)) and occurrences of (v , sh −s (w )) in (v, sh −s (w)).
In defining empirical distributions for joinings we generalize Definition 20. The empirical distribution of a shifted pair is defined to be the proportion of times it occurs, relative to the proportion of times arbitrary pairs with the same shift occur.
Definition 25. Fix w, s, v Let A be the collection {j : for some (w * , v * ) ∈ W n × V n , (w * , sh s (v * )) occurs at j in (w, sh s (v))}.
Assume that A = ∅. For w ∈ W n and v ∈ V n , we define: As before, EmpDist n,n ,s (w, sh s (v)) extends uniquely to a probability measure on P(W n × V n ). Definition 25 facilitates a notion of a generic sequence for a joining.
Definition 26. A sequence of (w n , v n , s n ) ∈ W n × V n × Z : n ∈ N is called generic iff 1.
|sn| qn < ∞ and 2. for all n, n , s and > 0 there is an N for all m, m > N , EmpDist n,n ,s (w m , sh sm (v n ))−EmpDist n,n ,s (w m , sh s m (v m )) var < .
The definition of an ergodic sequence of pairs is done analogously.
It is easy to check that (w n , v n , s n ) : n ∈ N is generic/ergodic if and only if (v n , w n , −s n ) : n ∈ N is generic/ergodic. For ergodic joinings the analogues of Proposition 21, and Lemmas 23 and 24 hold and are proved in exactly the same way.
We have given these definitions in the case of a product of two symbolic shifts, but they generalize immediately to products of three or more shifts. For example, to consider three shifts with construction sequences U n n , V n n , W n n , we would consider a sequence of the form: where the words belong to the respective construction sequences and the s n 's and t n 's give the shifts relative to the first coordinate.
We will be concerned with compositions of joinings, which involves products of three shifts. To prepare for this we need the notion of a conditional empirical distribution.
Definition 27. Let n, n < n + l. Given a fixed w * ∈ W n and a pair (w, v) ∈ W n+l × V n+l and (s, s ) we define the conditional empirical distribution to be: Using the same ideas we can define the empirical distribution conditioned on a v * ∈ V k by looking at (sh −s (w), v) and counting occurrences of (sh −s (w ), v * ) for the w ∈ W k . This definition generalizes to products of three or more systems. When working in three or more systems, there will be multiple s's playing the role of s in Definition 27. They will refer to the position of the sequences being counted, relative to the conditioning sequence. So for example, if K, L, M have construction sequences U n n , V n n , W n n and (u n , v n , w n , s n , t n ) : n ∈ N is a generic sequence for a joining ρ of K, L and M, then EmpDists k,k ,s,s (u n , sh sn (v n ), sh tn (w n )|v) counts pairs (sh s (u), sh s (w)), where (u, w) ∈ U k × W k have been shifted by s and s relative to v.
Recall from Section 2.2 that the composition of ρ 1 and ρ 2 is defined to be projection of the relative independent joining of ρ 1 and ρ 2 over the common factor Y to a measure on X × Z. We now describe a method for detecting generic sequences for relatively independent joinings.
Suppose that systems X and Z have a common factor Y .
Let ρ = X × Y Z be the relatively independent joining of X and Y . Let µ y ,μ y , ρ y be the distintegrations of µ,μ and ρ respectively. Then the relatively independent joining ρ is characterized by the fact that for ν-a.e y, ρ y = µ y ×μ y .
Let A n ,Ã n , A n : n ∈ N be sequences of refining partitions that generate B, D and C respectively. Since the sequence of partitions A n ×Ã n generates B ⊗ D, equation 6 is equivalent to the property that for all A k ∈ A k ,Ã k ∈Ã k and ν-a.e. y, To finitize this we approximate µ y (A k ) by µ(A k |A m (y)) for large m, where A m (y) is the atom of A m to which y belongs. We let µ y (A k ) be shorthand for the distribution µ y (A k ) : A k ∈ A k , and µ(A k |A m )(y) stands for the conditional distribution µ(A k |A m (y)), A k ∈ A k . (We use similar notation in Lemma 28 for the conditional distribution given by ρ, µ andμ on various partitions.) By Martingale convergence, 10 for > 0 and fixed k if m sufficiently large, then for (1 − ) proportion of the y in the same atom as y: but for a collection of A m of whose union has ν-measure less than .
One can deal similarly withμ n and ρ y . We have shown: Lemma 28. In the notation above, ρ is the relatively independent joining of µ andμ if and only if for all k, > 0, for all large enough m, there is a collection of atoms A m ∈ A m of total measure at least 1 − for which: We now express Lemma 28 in terms of sequences of finite words. Suppose that U n , V n , and W n are the uniquely readable construction sequences for X, Y and Z.
Proposition 29. Let (u n , v n , w n , s n , t n ) ∈ U n × V n × W n × Z 2 : n ∈ N be a sequence of words. Suppose that: 1. (u n , v n , s n ) n is generic for ρ 1 , 2. (v n , w n , t n ) n is generic for ρ 2 .
3. for all > 0, k and s * for all sufficiently large k there is an N and a set G k ⊂ V k and for each v ∈ G k a set of indices I v ⊆ [0, q k ) that satisfies |I v | > (1 − )q k such that for all n > N : EmpDist k,k,s,s+s * (u n , sh sn (v n ), sh tn (w n )|v) − EmpDist k,s (u n , sh sn (v n )|v) * EmpDist k,s+s * (v n , sh tn−sn (w n )|v) var is less than .
If ρ is the relatively independent joining of ρ 1 , ρ 2 , then (u n , v n , w n , s n , t n ) : n ∈ N is a generic sequence for ρ.
Observe that the hypothesis 3b implies a similar equation for any k 1 < k while the other parameters are fixed. Now use hypothesis 3a with a summable sequence of 's and we can conclude by the Borel-Cantelli lemma that for νalmost every y ∈ Y for k sufficiently large, if v k (y) is the principal k -block of y with location r k , then the inequality in 3b will hold for s = r k and v = v k (y). Now by hypotheses 1 and 2, the single empirical distributions are converging to (ρ 1 ) y and (ρ 2 ) y respectively (where (ρ i ) y is the disintegration of ρ i over y).
It then follows by integration that the sequence of (u n , v n , w n , s n , t n )'s is generic for a measure ρ on X × Y × Z, which is the relatively independent joining.
Remark 30. It follows immediately from hypothesis 3 of Proposition 29 that if we are given a finite set F of natural numbers then for all sufficiently large k we can find an N , G k and I v as in hypothesis 3 so that (a) and (b) hold simultaneously for all s * ∈ F .

An immediate corollary of this is:
Corollary 31. Suppose that (u n , v n , w n , s n , t n ) : n ∈ N satisfies the hypotheses of Proposition 29. Then (u n , sh tn (w n )) : n ∈ N is generic for There is a converse to Proposition 29, namely that a generic sequence for the relatively independent joining of two odometer based system satisfies the conditions 1-3 of the Proposition. The first two are immediate while the third simply expresses the fact that the generic sequence sequence is actually representing the relatively independent joining. For later use we record this as: Lemma 32. Given joinings ρ 1 of X×Y and ρ 2 of Y ×Z if (u n , v n , w n , s n , t n ) : n ∈ N is generic for the relatively independent joining ρ then it satisfies the hypotheses of Proposition 29.

Unitary Operators
We will use spectral tools introduced by Koopman and studied by Halmos and von Neumann. We reprise the basic facts we will use. Readers unfamiliar with this material can find it in [20] or [11]. Let (X, B, µ, T ) and (Y, C, ν, S) be measure preserving systems.
If T : X → Y is a measure preserving transformation then T induces a unitary isometry U T : L 2 (Y ) → L 2 (X) by setting If π : X → Y is a factor map, then the map f → f •π gives an injection of L 2 (Y ) into L 2 (X), whose range is a closed U T invariant subspace. Conversely if M ⊆ L 2 (X) is a closed U T invariant subspace containing 1 that is closed under taking complex conjugates, truncation and multiplication by elements For the rest of this discussion assume that T is ergodic. Then the eigenvalues of U T all have multiplicity one and form a subgroup G T ⊆ T. The group G T is an isomorphism invariant.
The collection of eigenfunctions generate a closed subspace of L 2 (X) corresponding to a factor K of X. This factor is called the Kronecker factor. If H is any subgroup of G T then there is a further factor K H of K that is canonically determined by the eigenfunctions coming from eigenvalues in H.
Assume that φ is an isomorphism from (X, T ) to (Y, S). Then G T = G S and if K X H , K Y H are the factors of X and Y determined by H ⊆ G T then U φ determines an unique isomorphism between K X H and K Y H . It follows from this that if α ∈ T is an eigenvalue of U T then there are factors of X and Y isomorphic to rotation R α of T by α. Moreover there is a unique isomorphism U π φ : (T, B, λ, R α ) → (T, B, λ, R α ) that intertwines U φ and the projection maps of X and Y to (T, B, λ, R α ).
The analogous statement holds for odometers. If G T consists of finite order eigenvalues and O is the corresponding odometer transformation, then there is a unique isomorphism U π φ : O → O that intertwines U φ and the projection maps of X and Y to O.

Stationary Codes andd-Distance
In this section we briefly describe a standard idea, that of a stationary code that we will use to understand the existence of factor maps and isomorphisms.
We review some standard facts here. A reader unfamiliar with this material who wants to see proofs should see [17].
is the interval of integers starting at −N and ending at N . Given a code Λ and an s ∈ Σ Z we define the stationary code determined by Λ to beΛ(s) where:Λ Let (Σ Z , B, ν, sh) be a symbolic system. Suppose we have two codes Λ 0 and Λ 1 that are not necessarily of the same length.
. Then d is a semi-metric on the collection of codes. The following is a consequence of the Borel-Cantelli lemma.
. Hence a convergent sequence of stationary codes determines a factor of (Σ Z , B, ν, sh).
Let Λ 0 and Λ 1 be codes. Defined(Λ 0 (s),Λ 1 (s)) to be More generally we can define thed metric on Σ [a,b] by settinḡ provided this limit exists.
To compute distances between codes we will use the following application of the Ergodic Theorem.
Lemma 35. Suppose that (Σ Z , sh, ν) is ergodic and that Λ 0 and Λ 1 be codes. Then for almost all s ∈ S: We finish with a useful remark: Remark 36. If w 1 and w 2 are words in a language Σ defined on an interval

Odometer based and Circular Symbolic Systems
Two types of symbolic shifts play central roles for the proofs of our main theorem. We dub them odometer based and circular systems. In this section we give some general facts about symbolic systems with uniquely readable construction sequences, define odometer and circular systems, and show that every circular system has a canonical rotation factor.

Odometer Based Systems
We recall the definition of an odometer transformation. Let k n : n ∈ N be a sequence of natural numbers greater than or equal to 2. Let be the k n -adic integers. Then O naturally has a compact abelian group structure and hence carries a Haar measure µ. We make O into a measure preserving system O by defining T : O → O to be addition by 1 in the k n -adic integers. Concretely, this is the map that "adds one to Z/k 0 Z and carries right". Then T is an invertible transformation that preserves the Haar The following results are standard: Lemma 37. Let O be an odometer system. Then: 3. Odometer maps are transformations with discrete spectrum and the eigenvalues of the associated linear operator are the K th n roots of unity (n > 0). Any natural number a can be uniquely written as: for some sequence of natural numbers a 0 , a 1 , . . . a j with 0 ≤ a j < k j .
Lemma 38. Suppose that r n : n ∈ N is a sequence of natural numbers with 0 ≤ r n < k 0 k 1 . . . k n−1 and r n ≡ r n+1 mod (K n ). Then there is a unique element We now define the collection of symbolic systems that have odometer maps as their timing mechanism. This timing mechanism can be used to parse typical elements of the symbolic system.
Definition 39. Let W n : n ∈ N be a uniquely readable construction sequence with the properties that W 0 = Σ and for all n, W n+1 ⊆ (W n ) kn for some k n . The associated symbolic system will be called an odometer based system.
Thus odometer based systems are those built from construction sequences W n : n ∈ N such that the words in W n+1 are concatenations of words in W n of a fixed length k n . The words in W n all have length K n and the words u i in equation 2 are all the empty words.
Equivalently, an odometer based transformation is one that can be built by a cut-and-stack construction using no spacers. An easy consequence of the definition is that for odometer based systems K, for all s ∈ K and for all n ∈ N, r n (s) exists.
Proposition 40. Let K be an odometer based system and suppose that ν is a shift invariant measure. Then ν concentrates on S.
Let B = K \ S. Then B is shift invariant. Suppose that ν gives B positive measure. For s ∈ B let a n (s) ≤ 0 ≤ b n (s) be the left and right endpoints of the principal n-block of s. Then for all s ∈ B there is an N ∈ N such that: 1. for all n, −N ≤ a n or 2. for all n, b n ≤ N .
We assume that ν gives the collection B * of s such that there is an N ∈ N for all n, −N ≤ a n positive measure, the other case is similar. Define The next lemma justifies our terminology.
Lemma 41. Let K be an odometer based system with each W n+1 ⊆ (W n ) kn . Then there is a canonical factor map where O is the odometer system determined by k n : n ∈ N .
For each s ∈ S, we know that for all n, r n (s) is defined and both r n and k n − r n go to infinity. By Lemma 38, the sequence r n (s) : n ∈ N defines a unique element π(s) in O. It is easily checked that π intertwines sh and T .
In the forthcoming paper [8] we show a strong converse to this result: if T has finite entropy and an odometer factor then T can be presented by an odometer based system.
Heuristically, the odometer transformation O parses the sequences s in S ⊆ K by indicating where the words constituting s begin and end. Shifting s by one unit shifts this parsing by one. We can understand elements of s as being an element of the odometer with words in W n filled in inductively.
We will use the following remark about the canonical factor of the inverse of an odometer based system.
Remark 42. If π : L → O is the canonical factor map, then the function π : L → O is also factor map from (L, sh −1 ) to O −1 (i.e. O with the operation "−1"). If W n : n ∈ N is the construction sequence for L, then rev(W n ) : n ∈ N is a construction sequence for rev(L). If φ : L −1 → rev(L) is the canonical isomorphism given by Proposition 18, then Lemma 37 tells us that the projection of φ to a map φ π : O → O is given by x → −x.
From this remark we immediately see: Lemma 43. Let ρ ↔ ρ be the canonical correspondence between joinings of (K, sh) and (L, sh −1 ) and joinings of (K, sh) and (rev(L), sh) given after Proposition 18. Then the joining ρ concentrates on the set of pairs (s, t) such that π K (t) = −π L (s) if and only if ρ concentrates on the collection of (s, t) such that π K (s) = π L −1 (t).

Circular systems
We now define and discuss circular systems. The paper [5] showed that the circular systems give symbolic characterizations of the smooth diffeomorphisms defined by the Anosov-Katok method of conjugacies. The construction sequences of circular systems have quite specific combinatorial properties that will be important to our understanding of the Anosov-Katok systems and their centralizers in the third paper in this series.
We call these systems circular because they are closely tied to the behavior of rotations by a convergent sequence of rationals α n = p n /q n . The rational rotation by p/q permutes the 1/q intervals of the circle cyclically along a sequence determined by some numbers j i = def p −1 i (mod q): the interval [i/q, (i + 1)/q) is the j th i interval in the sequence. 11 The operation C which we are about to describe models the relationship between rotations by p/q and p /q when q is very close to q.
Let k, l, p, q be positive natural numbers with p < q relatively prime. Set with j i < q. It is easy to verify that: Let Σ be a non-empty set. We define an operation C, which depends on p, q, an integer l > 1, and on sequences w 0 , . . . w k−1 of words in a language Σ ∪ {b, e} by setting: 12 To start our construction we frequently take p 0 = 0 and q 0 = 1. In this case we adopt the convention that j 0 = 0. Hence Remark 44. We remark: • Suppose that each w i has length q, then the length of C(w 0 , w 1 , . . . w k−1 ) is klq 2 .
• Every occurrence of an e in C(w 0 , . . . w k−1 ) has an occurrence of a b to the left of it. If p = 0 then every occurrence of a b has an e to the right of it.
• Suppose that n < m and b occurs at position n in C(w 0 , w 1 , . . . w k−1 ) and e occurs at m and neither occurrence is in a w i . Then there must be some w i occurring between n and m.
The C operator automatically creates uniquely readable words, as the next lemma shows, however we will need a stronger unique readability assumption for our definition of circular systems.
Lemma 45. Suppose that Σ is a language, b, e / ∈ Σ, 0 < p < q and that u 0 , . . . u k−1 , v 0 , . . . v k−1 and w 0 . . . w k−1 , are words in the language Σ ∪ {b, e} of some fixed length q < l/2. Let Suppose that uv is written as pws where p and s are words in Σ ∪ {b, e}. Then either p is the empty word and u = w, v = s or s is the empty word and u = p, v = w.
The map i → j i is one-to-one. Hence each location in the word of length klq 2 is uniquely determined by the lengths of nearby sequences of b's and e's.
In fact something stronger is true: if σ ∈ Σ occurs at place m in w then m is uniquely determined by the knowing the w 0 , w 1 , . . . w k−1 and the kq l /2 + 1 letters on either side of σ.
We now describe how to use the C operation to build a collection of symbolic shifts. Our systems will be defined using a sequence of natural number parameters k n and l n that is fundamental to the version of the Anosov-Katok construction presented in [14].
Fix an arbitrary sequence of positive natural numbers k n : n ∈ N . Let l n : n ∈ N be an increasing sequence of natural numbers such that n 1/l n < ∞. From the k n and l n we define sequences of numbers: p n , q n , α n : n ∈ N . We begin by letting p 0 = 0 and q 0 = 1 and inductively set q n+1 = k n l n q n 2 (12) (thus q 1 = k 0 l 0 ) and take p n+1 = p n q n k n l n + 1.
Then clearly p n+1 is relatively prime to q n+1 . 13 Definition 46. A sequence of integers k n , l n : n ∈ N such that k n ≥ 2, 1/l n < ∞ will be called a circular coefficient sequence.
Let Σ be a non-empty finite or countable alphabet. We will construct the systems we study by building collections of words W n in the alphabet Σ ∪ {b, e} by induction as follows: • Fix a circular coefficient sequence k n , l n : n ∈ N .
• Having built W n we choose a set P n+1 ⊆ (W n ) kn and form W n+1 by taking all words of the form C(w 0 , w 1 . . . w kn−1 ) with (w 0 , . . . w kn−1 ) ∈ P n+1 . 14 13 p n and q n being relatively prime for n ≥ 1, allows us to define the integer j i in equation 9. For q 0 = 1, Z/q 0 Z has one element, [0], so we set p 0 −1 = p 0 = 0. 14 Passing from W n to W n+1 we use C with parameters k = k n , l = l n , p = p n and q = q n and take j i = (p n ) −1 i modulo q n . By Remark 44, the length of each of the words in W n+1 is q n+1 .
We will call the elements of P n+1 prewords.
Strong Unique Readability Assumption: Let n ∈ N, and view W n as a collection Λ n of letters. Then each element of P n+1 can be viewed as a word with letters in Λ n . We assume that in the alphabet Λ n , each P n+1 is uniquely readable.
Definition 47. A construction sequence W n : n ∈ N will be called circular if it is built in this manner using the C-operators, a circular coefficient sequence and each P n+1 satisfies the strong unique readability assumption.
It follows from Lemma 45 that each W n in a circular construction sequence is uniquely readable.
Definition 48. A symbolic shift K built from a circular construction sequence will be called a circular system.
For emphasis we will often write circular construction sequences as W c n : n ∈ N and the associated circular shift K c . We sometimes write w c to emphasize that a word is a circular word.
We will need to analyze the words constructed by C in detail. We start by describing the boundary and interior portions of the words.
Definition 49. Suppose that w = C(w 0 , w 1 , . . . w k−1 ). Then w consists of blocks of w i repeated l − 1 times, together with some b's and e's that are not in the w i 's. The interior of w is the portion of w in the w i 's. The remainder of w consists of blocks of the form b q−j i and e j i . We call this portion the boundary of w.
In a block of the form w l−1 j the first and last occurrences of w j will be called the boundary occurrences of the block w l−1 j . The other occurrences will be the interior occurrences.
While the boundary consists of sections of w made up of b's and e's, not all b's and e's occurring in w are in the boundary, as they may be part of a power w l−1 i . The boundary of w constitutes a small portion of the word: Lemma 50. The proportion of the word w written in equation 11 that belongs to its boundary is 1/l. Moreover the proportion of the word that is within q letters of boundary of w is 3/l.

The next lemma was proved in [5] (Lemma 20).
Lemma 51. Let K c be a circular system and ν be a shift invariant measure on K c . Then the following are equivalent: 1. ν has no atoms.
2. ν concentrates on the collection of s ∈ K c such that {i : 3. ν concentrates on S.
Remark 52. Let K c be a circular system.
1. There are only two invariant atomic measures, one concentrates on the constant "b" sequence, the other on the constant "e" sequence.
2. for K c , Lemma 12 can be strengthened to say that for all s ∈ S for all large enough n, the principal n-block of s exists.
3. The symbolic shift K c has zero topological entropy.
A direct inspection reveals that the only periodic points in K c are the two fixed points constant "b" and "e".
The second item follows because if s has a principal n-block at [a n , b n ) then it has a principal n + 1-block at some [a n+1 , a n+1 + q n+1 ) for an a n+1 with |a n+1 | ≤ |a n | + (q n+1 − q n ).
The fact that the topological entropy of K c is zero follows easily from the fact that the l n tend to infinity.

The structure of the words
The words used to form circular transformations have quite specific combinatorial properties. We begin with an important definition for our understanding of rotations; the three subscales at stage n + 1. Fix a sequence W c n : n ∈ N defining a circular system. Using equation 11 we define the subscales of a word w * ∈ W n+1 : Subscale 0 is the scale of the individual powers of w j ∈ W c n of the form w l−1 j ; we call each such occurrence of a w l−1 j a 0-subsection Subscale 1 is the scale of each term in the product k−1 j=0 (b q−j i w l−1 j e j i ) that has the form (b q−j i w l−1 j e j i ); We call these terms 1-subsections.
We call these terms 2-subsections.

Summary
Whole Word: By contrast we will discuss n-subwords of a word w. These will be subwords that lie in W c n , the n th stage of the construction sequence. We will use n-block to mean the location of the n-subword.

The canonical circle factor K
We now define a canonical factor K of a circular system and show that this factor is isomorphic to a rotation of the circle by α, where α is the limit of α n = pn qn as n goes to infinity.
Definition 53. Let k n , l n : n ∈ N be a circular coefficient sequence. Let Σ 0 = { * }. We define a circular construction sequence such that each W c n has a unique element as follows: Let K be the resulting circular system.
It is easy to check that K has unique ergodic non-atomic measure, since every w n occurs exactly k n (l n − 1)q n many times in w n+1 .
Let K c be an arbitrary circular system with coefficients k n , l n . Then K c has a canonical factor isomorphic to K. This canonical factor plays a role for circular systems analogous to the role odometer transformations play for odometer based systems.
To see K is a factor of K c , we define the following function: We record the following easy lemma that justifies the terminology of Definition 53: Lemma 54. Let π be defined by equation 14. Then: 2. π(sh ±1 (x)) = sh ±1 (π(x)) and thus 3. π is a factor map of K c to K and (K c ) −1 to K −1 A variant of item 3 is also true: π can be interpreted as a function from rev(K c ) to rev(K). With this interpretation π is also a factor map. We will call K the circle factor of any circular system with construction coefficients k n , l n : n ∈ N .
Fix a circular coefficient sequence k n , l n : n ∈ N , and let K and W α n : n ∈ N be given in definition 53. Let α n = p n /q n and α = lim α n .
If s ∈ S, from r n (s) we can determine the locations of the beginnings and ends of the words w α n that contain s(0). Since |W α n | = 1 for all n, for all s ∈ S the sequence r n (s) : n ∈ N uniquely determines s.
Theorem 55. Let ν be the unique non-atomic shift invariant measure on K.
where R α is the rotation of the unit circle by α and B, D are the σ-algebras of measurable sets.
A more involved geometric proof of this fact is given in [5]. Here present a simple algebraic proof. As usual we identify the unit circle S 1 with [0, 1) and use additive notation for the group operations.
By Lemma 12, the collection S of s ∈ S such that for all large enough n, the principal n-block of s exists, has measure one. We define a map φ 0 : S → [0, 1) by a limiting process. For s such that r n (s) exists, we let ρ n (s) = p q n iff p ≡ p n r n (s) mod q n Claim 56. If r n is defined, then |ρ n+1 (s) − ρ n (s)| < 2/q n .
From equation 11, we see that the position of s(0) in an n + 1-block is determined by the parameters i ∈ [0, q n − 1), j ∈ [0, k n − 1), l * ∈ [0, l − 1] and r n , which determine its location among the 2-subsections, 1-subsections, 0-subsections and inside the n-words w n respectively. Explicitly: where r n (s) is the position of s(0) in its principal w n -word.
From the definition of ρ n+1 , and working mod 1 : Expanding this, using our formula for r n+1 (s) and the fact that all but two terms of r n+1 (s) are divisible by q n , we get: where δ = j k n q n + 1 k n l n q n + l * k n l n q n + r n (s) − j i k n l n q 2 n .
The first and third terms of equation 15 cancel, thus: Since δ < 2/q n , the claim follows.
It is easy to check that φ 0 is one-to-one. By the unique ergodicity of the rotation R α , Theorem 55 will be proved when we establish: Claim 57. The map φ 0 : S → [0, 1) satisfies: In particular, if ν is the unique invariant measure on S Suppose that r n (s) and r n (sh(s)) both exist. Then r n (sh(s)) = r n (s) + 1. If follows that ρ n (sh(s)) = ρ n (s) + p n /q n . Taking limits we see that This finishes the proof of Theorem 55.

Kronecker Factors
Both odometer transformations and irrational rotations of the circle are ergodic discrete spectrum transformations. Because the odometer transformation based on k n : n ∈ N is a factor of any odometer based system T and the rotation R α is a factor of any circular system S, both are factors of the respective Kronecker factors of T or S. In general it is not the whole Kronecker factor in either case.
We make the following lemma explicit in the case of odometer based transformations. In the case of systems with a circle factor the exactly analogous results hold.
Lemma 58. Let (K, B, µ, T ) and (L, C, ν, S) be measure preserving systems. Suppose that K has an odometer factor O and that φ : K → L is an isomorphism. Then there is a unique odometer factor O * of L with an isomorphism φ π : O → O * such that the following diagram commutes: If each finite order eigenvalue of L has multiplicity 1 (e.g. if L is ergodic), then O * is the unique odometer factor of L isomorphic to O.
Since the unitary operator U φ : L 2 (K) → L 2 (K) takes eigenfunctions to eigenfunctions, we know that U φ takes the subspaces of L 2 (K) corresponding to O to a subspace of L 2 (L) corresponding to an isomorphic copy of O. The lemma follows.
An immediate corollary of Lemma 58 is that if K and L are ergodic odometer based systems over the same odometer O, with projections π K and π L , then φ π is an isomorphism between the canonical odometer factors.
We record the following consequences for later use; Proposition 59. Suppose that K and L are both ergodic odometer based systems with coefficients k n : n ∈ N . Then any isomorphism φ : K → L takes the canonical odometer factor O K of K to the canonical odometer factor Similarly if K c and L c are both ergodic circular systems with the same coefficient sequences k n , l n : n ∈ N , then any isomorphism between K c and L c takes the canonical rotation K K to the canonical rotation factor K L In the first case there is a unique factor of K and L corresponding to the eigenvalues of O K and O L . Any isomorphism must preserve the factor corresponding to these eigenvalues. The same argument works for K, as it is isomorphic to the rotation by α = lim n p n /q n .

Uniform Systems
In [5] it is established that the strongly uniform circular systems with sufficiently fast growing l n : n ∈ N , are realizable as measure preserving diffeomorphisms of the torus. Strongly uniform systems are those for which each word in W n occurs the same number of times in each word in W n+1 . These systems carry unique non-atomic invariant measures, simplifying much of what we do later in this paper. For example the correspondence between the measures ν on uniform odometer systems K and ν c on their uniform circular system counterparts K c given in equation 33, is automatic.
In the forthcoming [8] we show that arbitrary (i.e. non-uniform) circular systems are realizable as measure preserving diffeomorphisms of the torus, provided that the measures of the words in W n go to zero.

Details of Circular Systems
This section examines the circular systems defined in section 3.2 in more detail. Initially we are given a circular coefficient sequence k n , l n : n ∈ N and q n : n ∈ N where q n satisfies the inductive definition in equation 12. When n is fixed, we again let j i = (p n ) −1 i modulo q n and 0 ≤ j i < q n . Without significant loss of generality it is convenient to assume that 1/q n < 1/10. To understand joinings of circular systems we will be comparing generic elements (s, t) of circular K c and L c , and their parsings into subwords. We will use the following terminology: Definition 60. Let u, v be finite sequences of elements of Σ ∪ {b, e} having length q. Given intervals I and J in Z of length q we can view u and v as functions having domain I and J respectively. We will say that u is shifted by k relative to v iff I is the shift of the interval J by k. We say that u is the k-shift of v iff u and v are the same words and I is the shift of the interval j by k.

Understanding the words
We elaborate on the descriptions given in Section 3.3. Our first combinatorial lemma is the following: Lemma 61. Let w = C(w 0 , . . . w kn−1 ) for some n and q = q n , k = k n , l = l n . View w as a word in the alphabet Σ ∪ {b, e} lying on the interval of integers [0, klq 2 ). 2. If m 0 and m 1 are such that m 0 is the location of the beginning of a 0subsection occurring in a 2-subsection k−1 j=0 (b q−j i w l−1 j e j i ) and m 1 at the i beginning of a 0-subsection occurring in the next 2-subsection To see the first point, the indices of the beginnings of 0-subsections in the same 2-subsection differ by multiples of q coming from powers of a w j and intervals of w of the form b q−j i e j i .
To see the second point, let u and v be consecutive 2-subsections. In view of the first point it suffices to consider the last 0-subsection of u and the first 0-subsection of v. But these sit on either side of an interval of the form Assume that u ∈ W n+1 and v ∈ W n+1 ∪ rev(W n+1 ) and v is shifted with respect to u. On the overlap of u and v, the 2-subsections of u split each 2-subsection of v into either one or two pieces. Since all of the 2-subsections in both words have the same length, the number of pieces in the splitting and the size of each piece is constant across the overlap except perhaps at the two ends of the overlap. If u splits a 2-subsection of v into two pieces, then we call the left piece of the pair the even piece and the right piece the odd piece.
If v is shifted only slightly, it can happen that either the even piece or the odd piece does not contain a 1-subsection. In this case we will say that split is trivial on the left or trivial on the right

This follows easily from Lemma 61
In the case where the split is trivial we get Lemma 62 with just one coefficient, s or t.
A special case Lemma 62 that we will use is: Lemma 63. Suppose that the 2-subsections of u divide the 2-subsections of v into two pieces and that for some occurrence of an n-subword of v in an even (resp. odd) piece is lined up with an occurrence of some n-word in u. Then every occurrence of an n-word in an even (resp. odd) piece of v is either: a.) lined up with some n-subword of u or b.) lined up with a portion of a 2-subsection that has the form e j i b q−j i .
Moreover, no n-subword in an odd (resp. even) piece of v is lined up with a n-subword in u.

Full measure sets for circular systems
Fix a summable sequence ε n : n ∈ N of numbers in [0, 1) and a circular coefficient sequence k n , l n : n ∈ N . As we argued in the proof of Lemma 50, the proportion of boundaries that occur in words of W c n is always summable, independently of the way we build W c n . Recall the set S ⊆ K c given in Definition 10, where K c is the symbolic shift defined from a construction sequence.
Definition 64. We define some sets that a typical generic point for a circular system eventually avoids. Let: 1. E n be the collection of s ∈ S such that s does not have a principal n-block or s(0) is in the boundary of that n-block, 2. E 0 n = {s : s(0) is in the first or last ε n l n copies of w in a power of the form w ln−1 where w ∈ W n }, Lemma 65. Assume that 1/l n < ∞. Let ν be a shift invariant measure on S ⊆ K c , where K c is a circular system. Then: Assume that ε n is a summable sequence, then for i = 0, 1, 2: This is an application of the Ergodic Theorem.
In particular we see: Corollary 66. For ν-almost all s there is an N = N (s) such that for all n > N , In particular, for almost all s and all large enough n: This follows from the Borel-Cantelli Lemma.
The elements s of S such that some shift sh k (s) fails one of the conclusions 1.)-4.) of Corollary 66 form a measure zero set. Consequently we work on those elements of S whose whole orbit satisfies the conclusions of Corollary 66. Note, however that the N (sh k (s)) depends on the shift k.
Definition 67. We will call n mature for s (or say that s is mature at stage Thus if s is mature at stage n then for all m > n the principal m-block of s exists and conclusions 1-4 of Corollary 66 hold. Recall that in Section 3.2, we defined a canonical factor of a circular system which we called the circle factor. Since the notion of maturity only involves the punctuation of the words involved, it is an easy remark that for all s ∈ S, n is mature for s just in case n is mature for π(s), where π is the canonical factor map.
For the following definition and lemma, we view s ∈ S as a function with domain Z, and s ∈ W n as a function with domain [0, q n ) or, sometimes, an interval [k, k + q n ). In each of these cases we use dom(s) to mean the domain of s.
Definition 68. We will use the symbol ∂ n in multiple equivalent ways. If s ∈ S or s ∈ W c m we define ∂ n = ∂ n (s) to be the collection of i such that sh i (s)(0) is in the boundary portion of an n-subword of s. This is well-defined by our unique readability lemma. In the spatial context we will say that s ∈ ∂ n if s(0) is the boundary of an n-subword of s.
In what follows we will be considering a generic point s and all of its shifts. We will use the fact if s is mature at stage n, then we can detect locally those i for which the i-shifts of s are mature.
Lemma 69. Suppose that s ∈ S, n is mature for s and n < m.
2. For all but at most ( n<k≤m 1/l k ) + ( n≤k<m 6ε k q k+1 )/q m portion of the i ∈ [r m (s), q m − r m (s)), the point sh i (s) is mature for n.
In particular, if ε n−1 > sup m (1/q m ) m−1 k=n 6ε k q k+1 , 1/l n−1 > ∞ k=n 1/l k and n is mature for s, the upper density of those i ∈ Z for which the i-shift of s is not mature for n is less than 1/l n−1 + ε n−1 .

Similarly:
Lemma 70. Suppose that s ∈ S and s has a principal n-block. Then n is mature provided that s / In particular, if n is mature for s and s is not in a boundary portion of its principal n − 1-block or in E 0 n−1 ∪ E 1 n−1 ∪ E 2 n−1 , then n − 1 is mature for s.

The map
Proposition 59 implies that any isomorphism φ between an ergodic (K c , sh) and (K c , sh −1 ) induces an isomorphism φ π between (K, sh) and (K, sh −1 ), where K is the canonical circle factor. Because (K, sh −1 ) is canonically isomorphic with (rev(K), sh) (Proposition 18) and (K, sh) is isomorphic to the rotation R α of the circle, we see that (rev(K), sh) is isomorphic to the rotation R −α . We use a specific isomorphism : (K, sh) → (rev(K), sh) as a benchmark for understanding of potential maps φ : K c → rev(K c ). If we view K as a rotation R α of the unit circle by α radians one can view the transformation as a symbolic analogue of complex conjugation z →z on the unit circle, which is an isomorphism between R α and R −α . Copying over to a map on the unit circle gives an isomorphism φ between R α and R −α . Such an isomorphism must be of the form for some β. It follows immediately from this characterization that is an involution, however for completeness we prove this directly (and symbolically) in Proposition 79.
As usual we find it more convenient to work on the unit interval I = [0, 1) rather than the unit circle. The complex conjugacy map z →z corresponds to the map x → −x on [0, 1).
We begin by recalling from equation 11 the formula for a w ∈ W c n+1 that is of the form C(w 0 , . . . w kn−1 ): where q = q n , k = k n , l = l n and j i ≡ qn (p n ) −1 i with 0 ≤ j i < q n . By examining this formula we see that Applying the identity in formula 10, we see that this can be rewritten as 15 We can reindex again and get another form of equation 17: We can now state the basic lemma about the way w lines up with a shift of rev(w).
Lemma 71. Let w ∈ W c n+1 and view w as sitting at location [0, q n+1 ) ⊆ Z. Let q = q n and k = k n . Consider sh −j 1 (rev(w)) as being the word rev(w) in location [j 1 , q n+1 + j 1 )) ⊆ Z. For all but at most 2kq of the occurrences of an n-subword w j of w starting in a location r ∈ [0, q n+1 ), the reversed word rev(w k−j−1 ) occurs in sh −j 1 (rev(w)) starting at r.
The word w starts with a block of q b's and then a block of l − 1 copies of w 0 , whereas rev(w) starts with a block of q − j 1 e's followed by l − 1 copies of rev(w k−1 ). Hence if we shift rev(w) to the right by j 1 (to get sh −j 1 (rev(w))) the first copy of rev(w k−1 ) is aligned with the first copy of w 0 in w. Hence all of the copies of rev(w k−1 ) in the first 1-subsection are aligned with the copies of w 0 in the first 1-subsection of w. Because the consecutive blocks of b's and e's (or e's and b's) in the 2-subsections add up to q we see that every copy of rev(w k−j−1 ) in the first 2-subsection of sh −j 1 (rev(w)) is aligned with with a copy of w j .
We now argue as in Section 4.1. At the end of each 2-subsection, w has a block of e's of length j i , followed at the beginning of the next 2-subsection, by a block of b's of length q − j i+1 . Together the e's and b's form a block of length j i + q − j i+1 , which is equivalent mod(q) to −j 1 . Similarly the combined length of a block of b's and e's finishing and starting consecutive 2-subsections of rev(w) is equal to −j 1 mod(q).
Both the beginning of the block of e's ending the k th 2-subsection and the end of the block of b's starting the k + 1 st 2-subsection are of distance less than q from the location of the end of the k th 2-subsection. It follows from this and the comments in the previous paragraph, that if S 1 and S 2 are consecutive 2-subsections of w and S 1 and S 2 are the corresponding 2subsections of rev(w) then the beginning of the first occurrence of rev(w k−1 ) in S 2 is within 2q of the first occurrence of w 0 is S 2 and their locations are equivalent mod(q). Hence inside the first 1-subsection, the 0-subsections are lined up except for at most 2 copies of w 0 . This pattern is continued through S 2 , giving at most 2k locations of n-blocks that are not aligned in S 2 .
Since there are less than q 2-subsections with potential misalignments, the Lemma is proved.
The next proposition gives a somewhat more detailed view into situation of Lemma 71.
Proposition 72. Let w, w ∈ W c n+1 and suppose that We look at the relative positions of n-words in w and sh −j 1 (rev(w )).
1. Each occurrence of v i in w is either lined up with an occurrence of rev(v kn−i−1 ) or entirely lined up with a section of ∂ n inside sh −j 1 (rev(w )).
2. There is a number C such that for all i the number of occurrences of v i lined up with an occurrence of rev(v kn−i−1 ) is C.
The first part is clear from the proof of Lemma 71. The second part follows because all of the 1-subsections in a given 2-subsection of w have the same alignment relative to sh −j 1 (rev(w )).
Since the total number of occurrences of n-subwords in klq, the proportion of n-subwords lined up with ∂ n in sh −j i (rev(w )) is at most 2/l.
Suppose that K is given by the canonical construction sequence W α n : n ∈ N . We define a sequence of functions Λ n : n ∈ N and argue that they converge to an isomorphism from K to rev(K).
We begin by defining an increasing sequence of natural numbers. Recall the definition of the Anosov-Katok coefficients p n and q n given in equations 13 and 12. Since p n and q n are relatively prime we can define (p n ) −1 in Z/q n Z. For the following definition we will view (p n ) −1 as a natural number with 0 ≤ (p n ) −1 < q n . 16 We let A 0 = 0 and Lemma 73. If A n is defined as above, then |A n+1 | < 2q n .
This is proved inductively using the fact that q n+1 > 2q n .
Let K be the circular system in the language Σ = { * }, as given in Definition 53. We now define a stationary code Λ n with domain S that approximates elements of rev(K) by defining Since for all s ∈ S and all large enough n, r n (s) is defined, the default value is only obtained for finitely many n.
Lemma 74. Λ n is given by a finite code.
To check whether r n (s) is defined one need only examine s on the interval [−q n , q n ] ⊆ Z. The relevant portion of rev(s) necessary to compute Λ n (s) is contained in s [−q n − A n , q n + A n ]. Hence Λ n is determined by a finite code.
The formula in equation 20 can be understood as follows. Suppose that s ∈ S and s has a principal n-block. Then the element s * defined as sh 2rn(s)−(qn−1) (rev(s)) belongs to rev(K), has a principal n-block that is the reverse of the principal n-block of s and moreover, the principal n-block of s * is exactly lined up with the principal n-block of s.
The reverse of the principal n-block of s begins with a block of q n−1 − (p n−1 ) −1 many e's, and hence if s = sh (−(p n−1 ) −1 )+2rn(s)−(qn−1) (rev(s)) then the first n − 1-subword of the principal n-block of s is lined up with the first n − 1-subword of the principal n-block of s. The rest of the terms used to define A n (coming from A n−1 ) are used for lower order adjustments inside this principal n-block.
Thus, a qualitative description ofΛ n (s) can be given as follows: 1. It first reverses the principal n-block of s leaving it exactly lined up. 3. Finally it shifts by A n−1 which is the cumulative adjustment at earlier stages.
The next lemma follows from this description: Lemma 75. Let n < m and suppose that s ∈ K has a principal m-block. Let s = sh 2rm−q+Am−An (rev(s)). Then at least proportion of the n-blocks in the principal m-block of s are lined up with n-blocks in s .
We first consider m = n + 1. By Lemma 71, all but 2k n q n of the n-blocks in w are aligned with the n-blocks in sh −j (rev(w)). This is proportion 1 − 2k n q n k n q n (l n − 1) The general result follows by induction.
Theorem 76. Suppose that k n , l n : n ∈ N is a circular coefficient sequence. Then the sequence of stationary codes Λ n : n ∈ N converges to a shift invariant function : K → ({ * } ∪ {b, e}) Z that induces an isomorphism from K to rev(K).
We first show that the sequence Λ n : n ∈ N converges, which will follow if we show that the code distances between the Λ n and Λ n+1 are summable. For notational simplicity, let q = q n , k = k n , l = l n and j ≡ q (p n ) −1 with 0 ≤ j < q.
Claim: There is a summable sequence of positive numbers δ n such that for almost all s, thed-distance betweenΛ n (s) andΛ n+1 (s) is bounded by δ n , andΛ n (s) andΛ n+1 (s) agree on all but at most δ n proportion of the n-blocks of s.
We use Lemma 35, which tells us that for a typical s ∈ S, the code distance between Λ n and Λ n+1 isd(Λ n (s), Λ n+1 (s)), which is defined to be the density of Because |W α n = 1 for each n, there is only one possible n-subword at any location of any element of rev(K). Thus to computed-distance, it suffices count positions where the Λ m 's disagree on the locations of the n-subwords.
By Lemma 69 for a typical s ∈ S ⊆ K and all n, I n = def {i : n is not mature for sh i (s)} has density at most 1/l n−1 + ε n−1 , hence we can neglect these i when computing the density of D.
This allows us to assume that r n+1 (s) is defined. We compute the density of the difference betweenΛ n andΛ n+1 as they pass across an n + 1-block in s. If this number is d then the distance between Λ n and Λ n+1 is bounded by the sum of d and the density of I n .
As Λ n+1 crosses an n + 1-block it produces the reverse n + 1-block shifted by A n+1 . Explicitly, if w is the n + 1-block of s, as Λ n+1 crosses w it produces sh A n+1 (rev(w)). As Λ n passes across this same section, each time it crosses an n-block w it produces sh An (rev(w )). If w starts at r then the beginning of this copy of sh An (rev(w )) is r − A n .
We begin by rewriting sh A n+1 (rev(w)) as sh An (sh −j (rev(w))) where j = (p n ) −1 . By Lemma 71, all but 2kq of the n-blocks in w are aligned with the n-blocks in sh −j (rev(w)). Hence, relative to the complement of I n , the portion of the principal n + 1-block w of s that lies in an n-block aligned with an n-block of sh −j (rev(w)) is Because there is only one possible n-word, whenever sh An (rev(w )) is aligned with sh An (sh −j (rev(w))) they are equal.
Putting this altogether, we see that Λ n and Λ n+1 agree on all of the nsubwords of the principal n + 1-block of s that are aligned with sh −j (rev(w)).
The disagreements are limited to the n-subwords that are not aligned and the boundary. The total length of the disagreements is therefore bounded by This has proportion 3kq 2 /klq 2 = 3/l.
Thus the distance between Λ n and Λ n+1 is bounded by 1/l n−1 +ε n−1 +3/l n . In particular the distances are summable and the sequence Λ n : n ∈ N converges almost everywhere to a function : We now show that is an isomorphism between K and rev(K). SinceΛ n takes an n-block to a shift of the reverse n-block, it makes sense to discuss the principal n-block ofΛ(s). Since the r n 's cohere as in Remark 14, for n < m, r m (Λ m (s)) is in the r n (Λ m (s)) th position of the principal n-block of Λ m (s) (provided both r n and r m are defined). An application of the Ergodic Theorem shows that if D n is defined to be the collection of s such that: r n (Λ n (s)) exists and the principal n-words ofΛ n (s) andΛ n+1 (s) disagree then ν(D n ) < ∞. From the Borel-Cantelli Lemma, it follows that for almost every s for all large enough n the principal n-blocks ofΛ n (s) and Λ n+1 (s) are the same, and thus that for s ∈ S, (s) ∈ rev(K).
We now argue that if s is typical and s * = (s), then s * ∈ rev(S). It suffices to show that lim n→∞ −r n (s * ) = −∞ and lim n→∞ q n − r n (s * ) = ∞. 17 If n is mature for s and large enough that for m > n,Λ m (s) andΛ n (s) have the same principal n-blocks, then r n (s * ) = r n (s)+A n unless r n (s) ∈ [0, |A n |). Assuming that r n (s) ≥ |A n |, we know from Lemma 73 that r n (s) − 2q n−1 < r * n (s) < r n (s).
Hence, −r * n (s) ≤ 2q n−1 − r n (s) and q n − r * n (s) ≥ q n − r n (s). Applying Lemma 69 (using the fact that nq n−1 /q n < ∞, and hence |A n |/q n < ∞) we see that for large n, r n (s) > |A n | and that r n (s)−2q n → ∞. Since q n −r n (s) → ∞ we have shown that s * ∈ rev(S).
As noted before Theorem 55, if s ∈ S then s is determined by any tail of the sequence r n (s) : n ∈ N . In particular, if we know a tail of r n (s * ) : 17 We are adopting the convention that in defining r n (s * ) for s * ∈ rev(S) we count r n from the left end of an n-block. Thus the position r in a word w ∈ W α n corresponds to the position q − 1 − r in rev(w).
n ∈ N we can determine s * . Since for large n, r n (s * ) = r n (s) + A n , is one-to-one on a set of measure one.
We can now conclude that is an isomorphism. It is shift invariant since it is a limit of stationary codes, it maps from S to rev(S), and is one-to-one on a set of ν-measure one. If we define a measure µ on the Borel sets of rev(K) by setting µ(A) = ν( −1 (A)), then µ is a shift invariant, non-atomic measure on rev(S). Since S is uniquely ergodic, rev(S) is as well and thus µ must be equal to the unique invariant measure ν. We have shown that is an isomorphism between K and rev(K).
Definition 77. We denote the limit of Λ n : n ∈ N by : K → rev(K).
We describe the qualitative behavior of in a remark that we will use later: Remark 78. There is a summable sequence δ n such that for all but 1 − δ n measure of s ∈ S ⊆ K, there is an interval I containing 0 in Λ n (s) such that s I ∈ W α n , and moreover Λ n+1 (s) and Λ n (s) agree on this interval. It follows from the Borel-Cantelli Lemma that for almost all s and large enough n, (s) agrees withΛ n (s) on the principal n-block of s. Thus for a typical s and large enough n, the map reverses the principal n-block while keeping its location and then shifts it by A n .
As noted at the beginning of this section, the next proposition follows immediately from Theorem 55, however we include a symbolic proof for completeness.

Proposition 79. The map is an involution.
It is immediate from the qualitative description ofΛ n given before Lemma 75, that eachΛ n is an involution. To see that 2 is the identity, let > 0. We can choose an m 0 large enough that for all m ≥ m 0 ,Λ m and agree withΛ m 0 on all but proportion of the m 0 -blocks and ∞ m 0 +1 ∂ k has measure * 10 − 6. Then •Λ m 0 is equal to the identity on a set of density at least 1 − . Letting → 0 and m 0 → ∞ completes the argument.

Synchronous and Anti-synchronous joinings
Every odometer based system has a built in metronome: its odometer factor defined in Lemma 41. Correspondingly circular systems can be timed by their canonical rotation factor defined in Lemma 54.
Joinings between odometer based and circular systems may induce nontrivial automorphisms of the underlying timing structure. To avoid this complication we restrict ourselves to synchronous and anti-synchronous joinings: those which preserve or exactly reverse the underlying timing. We now make this idea precise.
Both the odometer transformations and rotations of a circle have easily understood inverse transformations and the isomorphisms between transformations and their inverses are given by the maps x → −x and rev() • respectively. If K and L are either odometer based or circular systems let K π and L π be the corresponding odometer or rotation systems on which they are based.
• Let K and L be odometer based systems with the same coefficient sequence, and ρ a joining between K and L ±1 . Then ρ is synchronous if ρ joins K and L and the projection of ρ to a joining on K π × L π is the graph joining determined by the identity map (the diagonal joining of the odometer factors); ρ is anti-synchronous if ρ is a joining of K with L −1 and its projection to K π × (L −1 ) π is the graph joining determined by the map x → −x.
• Let K c and L c be circular systems with the same coefficient sequence and ρ a joining between K c and (L c ) ±1 . Then ρ is synchronous if ρ joins K c and L c and the projection to a joining of (K c ) π with (L c ) π is the graph joining determined by the identity map of K with L, the underlying rotations; ρ is anti-synchronous if it is a joining of K c with (L c ) −1 and projects to the graph joining determined by rev() • on K × L −1 .
There is always a synchronous joining of odometer systems with the same underlying timing factor O: Definition 81. Suppose that K and L are based on O. Then the relatively independent joining of K and L over O is a synchronous joining, which we will call the synchronous product joining. The relatively independent joining of K and L −1 over the map x → −x we will call the anti-synchronous product joining. We will use the same terminology for the independent joinings of circular systems over the identity and rev() • .

Building the Functor F
The main result of this paper concerns two categories whose objects are odometer based systems and circular systems respectively. The morphisms in these categories will be graph joinings. We will show that there is a functor taking odometer systems to circular systems that preserves the factor and conjugacy structure. In this section we focus on defining the function from odometer based systems to circular systems that underlies the functorial isomorphism between these categories. We begin by defining a function from the odometer based symbolic shifts K to the circular symbolic shifts K c . After having done so we define F on the pairs (K, µ) where µ is an invariant measure on K. Finally we define F on synchronous and anti-synchronous graph joinings. We will use the notation that K n = i<n k i . Then the K n 's are the lengths of the odometer based words in W n and the q n 's are the lengths of the circular words in W c n . Except where otherwise stated we will assume that we are working with a fixed circular coefficient sequence k n , l n : n ∈ N .
Let Σ be a language and W n : n ∈ N be a construction sequence for an odometer based system with coefficients k n : n ∈ N . Then for each n the operation C n is well-defined. We define a construction sequence W c n : n ∈ N and bijections c n : W n → W c n by induction as follows: 1. Let W c 0 = Σ and c 0 be the identity map.
2. Suppose that W n , W c n and c n have already been defined.
We note in case 2 the prewords are: Definition 82. Define a map F from the set of odometer based systems (viewed as subshifts) to circular systems (viewed as subshifts) as follows. Suppose that K is built from a construction sequence W n : n ∈ N . Define where K c has construction sequence W c n : n ∈ N . Suppose that K c is a circular system with coefficients k n , l n : n ∈ N . We can recursively recursively build functions c n −1 from words in Σ ∪ {b, e} to words in Σ. The result is a odometer based system W n : n ∈ N with coefficients k n : n ∈ N . 18 If K is the resulting odometer based system then F(K) = K c . Thus we see: Proposition 83. The map F is a bijection between odometer based symbolic systems with coefficients k n : n ∈ N and circular symbolic systems with coefficients k n , l n : n ∈ N .
That F is one-to-one follows from the unique readability of words occurring in the construction sequence W n : n ∈ N .
Remark 84. It is clear from Definition 82 that F preserves uniformity and strong uniformity (see [5] for these notions). In fact it preserves much more: the simplex of non-atomic invariant measures, rank one transformations and so on. We verify much of this in this paper and more in the forthcoming [8].
To understand the correspondence between measures on K and K c we will have to understand the structure of basic open intervals. Recall that we write u L to mean the basic open interval of K determined by u sitting on the interval [L, L + |u|) ⊆ Z. Without the subscript L, u is shorthand for u 0 . We adopt the same conventions for K c , that the subscripts correspond to the beginning of the sequence and without a subscript the sequence begins at zero.

Genetic Markers
To see that F can be extended to a map from invariant measures on odometer based systems to invariant measures on circular systems, we begin by recalling how to identify elements of a symbolic system. Suppose that W n : n ∈ N is a construction sequence for an odometer based transformation K. Let W c n : n ∈ N be the corresponding circular construction sequence for K c . By Lemma 15 to specify a typical s ∈ K or s c ∈ K c , it suffices to give a tail of the sequence of principal n-blocks w n (s) : N ≤ n ∈ N or w c n (s c ) : N ≤ n ∈ N along with the locations r n (s) : N ≤ n or r n (s c ) : N ≤ n .
Definition 85. Suppose that u, v are words in W n and W n+1 respectively and u occurs as an n-subword of v in a particular location. Viewing v as a concatenation w 0 w 1 . . . w n k −1 of n-subwords, there is a j such that u = w j . Let j * n = j and call j * n the genetic marker of u in v. Suppose that u ∈ W n and v ∈ W n+k and u is an n-subword of v occurring at a particular location. Then there is a sequence of words u n = u, u n+1 , . . . u n+k−1 , u n+k = v such that u i is a n + i-subword of v at a definite location and the location of u in v is inside u i . Let j * n+i be the genetic marker of u n+i inside u n+i+1 . We call the sequence j * = j * n , j * n+1 , . . . j * n+k−1 the genetic marker of u in v. If j * is the genetic marker of some n-word inside and m-word, we will call it an (n, m)-genetic marker.
If u occurs as a subword of v then the genetic marker j * n , j * n+1 . . . j * n+k−1 of that occurrence codes its location inside v. Suppose that s ∈ K has principal n-blocks w n : n ∈ N . Each w n+1 is a concatenation of words v 0 v 1 . . . v kn−1 . Let or equivalently r n+1 (s) = r n (s) + j n K n .
Each w n+1 is a concatenation of words v 0 v 1 . . . v kn−1 , and we see that s(0) belongs to v j n . In particular, the genetic marker of w n inside w n+k is the sequence j n , j n+1 , . . . j n+k−1 .
Genetic markers for regions of words in W c n+k : In circular words, genetic markers code regions rather than subwords. Given u and v as above, we can consider the construction of c n+k (v) starting with the collection {c n (u) : u is an n-subword of v}. Each of the genetic markers j * n , j * n+1 , . . . j * n+k−1 of a subword u of v determines a region of n-subwords of c n+k (v). More explicitly, in the first step of the construction we put u into the (j * n ) th argument of C n . At the next step we put the result into the j * n+1 argument of C n+1 and so on. Thus we see that there are bijections between 3. the regions of v c occupied by the occurrences of powers (u c ) ln−1 where u c is the element of W c n determined by j * n , j * n+1 , . . . j * n+k−1 .
Thus genetic markers give the correspondence between the regions of c n+k (v) that are not in n<m≤n+k ∂ m and particular occurrences of an n-word u in v. The next lemma computes the number of occurrences of a c n (u) with a given genetic marker j * n , j * n+1 , . . . j * n+k−1 in c n+k (v).
Lemma 86. Suppose that u c occurs in v c with genetic marker j * n , j * n+1 , . . . j * n+k−1 . Then the number of occurrences of u c in v c with the same genetic marker j * n , j * n+1 , . . . j * n+k−1 is Fix m and v c ∈ W c m . We prove equation 25 for n = m − k by induction on k ≥ 1. If k = 1 then we have a single genetic marker j * m−1 . By formula 11 for C m−1 we see that the j * m−1 argument occurs in v c exactly q n (l n − 1) times. Suppose now that we know that formula 25 holds for k − 1. We show it for k. Let n = m − k and u c be the n-subword of v c with genetic marker j * n , j * n+1 , . . . j * n+k−1 . Let w c be the subword of v c with genetic marker j * n+1 , . . . j * n+k−1 . Then: The lemma follows.
Since particular (n, m)-genetic markers j * n , j * n+1 , . . . j * n+k−1 correspond to powers of u c 's that occur with the same multiplicity in v c , independently of the marker, we see that for a given u and v: We can restate equation 26 in the language of section 2.6. It says that In particular, if we fix a set S * of genetic markers we can compare the number of occurrences of a word with genetic marker in S * in v ∈ W n+k with the number of occurrences in the corresponding v c ∈ W c n+k . Specifically, the number of occurrences of a word u c in v c at some genetic marker in S * is |S * | * n+k−1 n q i (l i − 1). The proportion of n-words occurring with a genetic marker in S * relative to all n-words occurring in v c is the same as the proportion of n-words with genetic markers in S * occurring in v relative to the total number of genetic markers. The number of (n, m)-genetic markers is n+k−1 n k i so this proportion is equal to This is simply a restatement of our discussion involving empirical distributions in Section 2.6.
We introduce some notation that allows us to compare densities of various sets between odometer based and circular words. For sets A ⊆ [0, K m ) and A c ⊆ [0, q m ) we denote their densities by: Lemma 87. Let n ≤ m, w ∈ W m and w c = def c m (w) ∈ W c m . We view w as sitting on the interval [0, K m ) and w c as sitting on [0, q m ) Let S * be a collection of (n, m)-genetic markers, g the total number of (n, m)-genetic markers and d = |S * |/g. If: • A = {k ∈ [0, K m ) : some u ∈ W n with genetic marker in S * begins at k in w} : some u c ∈ W c n with genetic marker in S * begins at k in w c }, then the following equations hold: We prove equation 30. Equation 29 is similar but easier. The other two equations follow algebraically.
The union of the boundary regions ∂ p for p = n to m−1 consist exactly of the elements of [0, q m ) that are not part of any n-word. We denote the complement of m−1 p=n ∂ p by ( m−1 p=n ∂ p )˜. The various ∂ p are pairwise disjoint and for each n * , ( m−1 p=n * ∂ p )˜consists of the locations of entire n * -words. Starting with p = m−1, iteratively deleting boundary sections as p decreases to n, and using Lemma 50 we see that the d c m -measure of ( m−1 p=n ∂ p )˜is m−1 p=n (1 − 1/l p ). Let B = {k ∈ [0, q m ) : k is at the beginning of an n-word}. Then B consists of a 1/q n portion of the regions made up of n-words; i.e. ( m−1 p=n ∂ p )˜. We note that A c ⊆ B and B is disjoint from m−1 p=n ∂ p . By Lemma 86, the number C 1 of n-words occurring in w c with a given genetic marker does not depend on the marker. Let C 2 be the total number of n-words occurring in w c . Then: We compute conditional expectations to get equation 30: Equation 29 is similar and 31, 32 follow from the first two equations by substitution.
The following relationship between pairs of measures ν on K and ν c on is the limit of equation 32 as m goes to infinity. This relationship will hold for a correspondence between measures that we build in forthcoming sections. We note that since ∂ m has a density that depends only on the circular coefficient sequence, the measures of ∂ m is the same for all invariant measures. If we set d ∂n be this density, then we can rewrite the previous equation as: A consequence of equation 33 is that for all basic open sets u, ν( u ) determines ν c ( c n (u) ) and vice versa.
For counting arguments the following inequalities will be helpful.
Lemma 88. Let n be a number greater than 0. Then there are constants K U n , K L n between 0 and 1 such that for all k > 0 and w c ∈ W c n+k and all collections S * of (n, n + k)-genetic markers, if Since the 1/l n is a summable sequence, k−1 m=1 (1 − 1 l n+m ) converges as k goes to ∞. The inequality 34 follows.
Since K n+k = n+k−1 m=0 k m , inequality 34 can be rewritten as: Infinite genetic markers: Suppose that we are given a construction sequence W n : n ∈ N for an odometer based or circular system K, s ∈ S and an occurrence of an n-word u in s. Then we can inductively define an infinite sequence of words u m : n ≤ m ∈ N , letting u n = u, and u m+1 to be the m + 1-subword of s that contains u m . For each n < m we get a genetic marker j * n , j * n+1 , . . . j * m−1 , and these cohere as m goes to infinity. We define the infinite genetic marker to be j * = j * m : n ≤ m ∈ N . If an n-word u occurs inside an occurrence of an m-word v in s, then v = u m . Thus their infinite genetic markers agree on the tail j * i : m ≤ i ∈ N . As in Remark 16, if we are given a sequence of words u m : n ≤ m , with u m ∈ W m , and an infinite sequence j m : n ≤ m such that the genetic marker j m denotes an instance of u m in u m+1 then we can find an s ∈ K with u m : m ≥ n as a tail of its principal subwords. If K is odometer then s is unique up to a shift of size less than or equal to K m . A similar statement holds for circular systems.

T U and U T .
To understand the relationships between K and K c , we define maps T U : S → S c and U T : S c → S where S ⊆ K and S c ⊆ K c are as in definition 10. The map T U will be one-to-one but U T will not, in general it is continuumto-one. Nevertheless U T • T U will be the identity map.
We begin by considering a element s ∈ S. Let u n be the principal nsubword of s. The sequence u n : n ∈ N determines a sequence of circular words u c n : n ∈ N which we assemble to define T U (s). Let j = j n : n ∈ N be the infinite genetic marker of s(0). To describe T U (s) completely we need to define r c n : n ∈ N . Set r c 0 = 0, and inductively define r c n+1 to be the (r c n ) th position in the first occurrence of an n-word with genetic marker j n in u c n+1 . Set T U (s) to be the element of K c with principal subwords u c n : n ∈ N and location sequence r c n : n ∈ N . We define a map U T that associates an element of K to each element of S c . Given such an s c ∈ S c , let u c n : n ≥ N be its sequence of principal n-subwords. For each n ≥ N, u c n occurs as u j * n in the preword corresponding to u c n+1 . Let u n = c −1 n (u c n ). Then the sequence of words u n : n ∈ N and genetic markers j * n : n ≥ N determine an element of s ∈ K except for the location of 0 in the double ended sequence. (The sequence is double ended because s ∈ S c .) We determine this location arbitrarily in a manner that makes the sequence of u n 's the principal n-blocks of s (n ≥ N ) and the j * n the sequence of genetic markers of these n-blocks. Let0 be a sequence of zeros of length N . Then0 j * n : n ≥ N is a well-defined member of the odometer O associated with K. From equation 24,0 j * n : n ≥ N determines a sequence r n : n ∈ N . Thus by Lemma 15, the pair u n : n ≥ N and0 j * n : n ≥ N determines a unique element s of K which we will denote U T (s c ). It is easy to check that U T • T U = id and that for each s ∈ S, there is a perfect set of s c with U T (s c ) = s.
We can get more precise information about correspondences between K and K c by noting that if we are given a sequence u n : n ∈ N of principal subwords of an s ∈ S, the genetic markers j n : n ∈ N define an element s c of K c up to a choices (s c ) π ∈ K. Specifically, suppose that s * ∈ K is such that the infinite genetic marker of s * (0) is j n : n ∈ N . Then there is an s c ∈ K c that has a sequence of principal n-blocks u c n : n ∈ N . The following lemma will be useful for understanding joinings. Given an s ∈ S and a k, the shift sh k (s) and s have a tail of the principal n-blocks u n : N ≤ n in common. Moreover the genetic markers associated with this tail are the same for both s and sh k (s). It follows that T U (sh k (s)) is a shift of T U (s).
We can describe the correspondence as follows. If u occurs in s at k, then u is the principal n-word of sh k (s). Choose an N so large that some N -word u * is the principal N -word of both s and sh k (s). Then (u * ) c is the principal N -block of s c . Let j be the genetic marker of the occurrence of u (at k) in u * . The region of s c corresponding to this occurrence of u is the collection of occurrences of u c with the genetic marker j in the principal N -block of s c .

Transferring measures up and down, I
In this section we develop the tool we need for lifting measures on K to measures on K c . This will also allow us to establish a one-to-one correspondence between synchronous joinings on odometer systems and synchronous joinings on the corresponding circular systems. Throughout this section we will use π to denote either the projection of an odometer based system to its canonical odometer factor or a circular system to its canonical circular factor.
We begin with a proposition relating sequences of words in a construction sequence for an odometer based system to sequences of words in a construction sequence for a circular system.
Proposition 90. Let v n : n ∈ N be a sequence with v n ∈ W n . Let v c n = c n (v n ). Then: 1. v n : n ∈ N is an ergodic sequence iff v c n : n ∈ N is an ergodic sequence.
2. v n : n ∈ N is a generic sequence for a measure ν iff v c n : n ∈ N is a generic sequence for a measure ν c . In case either sequence is generic, the measures ν and ν c satisfy equation 33.
Both parts follow immediately from the definitions using equations 27 and 28 to relate the frequencies of k-words w ∈ W k in n-words u ∈ W n , for k < n to the frequencies of c k (w) in the corresponding c n (u). Equation 33 follows from the Ergodic Theorem and Lemma 87.
We endow that collection of invariant measures on a symbolic system (K, sh) with the weak* topology.
Theorem 91. Let W n : n ∈ N be a uniquely readable construction sequence for an odometer based system K and W c n : n ∈ N be the associated circular construction sequence for K c . Then there is a canonical affine homeomorphism ν → ν c between shift invariant measures ν concentrating on K and non-atomic, shift invariant measures ν c such that equation 33 holds between ν and ν c .
By Proposition 40 and Lemma 51 we can assume that ν and ν c concentrate on S and S c respectively.
We begin by defining the correspondence for ergodic measures. Suppose that we are given an ergodic measure ν and we want to associate a measure ν c . Let s ∈ S be a generic point for (K, ν). Let v n : n ∈ N be the sequence of principal n-blocks of s. By Proposition 21 this sequence is generic for ν. By Proposition 90, if we let v c n = c n (v n ), then v c n : n ∈ N is an ergodic sequence. Let ν c be the measure associated with v c n : n ∈ N . Then ν c is ergodic and equation 33 holds by Proposition 90.
The other direction is similar, let s c ∈ S c be generic for ν c . Propositions 21 and 90 imply that if v c n : n ∈ N is the sequence of principal n-blocks of s c and v n = c −1 n (v c n ), then v n : n ∈ N is ergodic and generic for a measure ν. Again equation 33 holds by Proposition 90.
Suppose now that ν is an arbitrary measure on K. Write the ergodic decomposition of ν as: We define ν c by which gives a corresponding measure on K c . Since equation 33 holds between corresponding ergodic components ν i and ν c i , it holds between ν and ν c . By the ergodic decomposition theorem the map ν → ν c is a surjection. Since the map is invertible, it is a bijection. The map is affine by construction.
It remains to show that it is a homeomorphism. To see that ν → ν c is weak* continuous it suffices to show that for all > 0 and n ∈ N there is a δ and an m such that for all invariant µ, ν, if for all u ∈ W m |µ( u ) − ν( u )| < δ we know that for all v ∈ W n we have But the equation 33 easily implies this taking m = n and The argument that the inverse is continuous is the same.
Definition 92. We will call a pair (ν, ν c ) constructed as in Theorem 91 corresponding measures.
Remark 93. It follows from Proposition 90 that if ν and ν c are corresponding measures on K and K c and s ∈ K is arbitrary then s is generic for ν iff T U (s) is generic for ν c . The point s is generic just in case its sequence of principal subwords is generic for ν. By item 2 of Proposition 90, this holds just in case the sequence of principal subwords of T U (s) is generic; i.e. T U (s) is generic.
We can use Theorem 91 to characterize the possible simplexes of invariant measures for circular systems. By a theorem of Downarowicz ([3], Theorem 5), every non-empty compact metrizable Choquet simplex is affinely homeomorphic to the simplex of invariant probability measures for a dyadic Toeplitz flow. Note that the space of invariant probability measures is always a compact Choquet simplex, hence this theorem is optimal.
Since Toeplitz flows are special cases of odometer based systems it follows immediately that every non-empty compact metrizable Choquet simplex is affinely homeomorphic to the simplex of invariant measures of a 2-symbol odometer based system.
Let K be a compact Choquet simplex and K an odometer based system having its simplex of invariant probability measures affinely homeomorphic to K. Let K c be a circular system corresponding to an odometer based system K. Then the non-atomic measures on K c are a Choquet simplex isomorphic to K. There are two additional ergodic measures, the atomic measures concentrating on the constant "b" sequence and on the constant "e" sequence. These two atomic measures are isolated among the ergodic measures.
In the forthcoming [8] we discuss the question of invariant measures further and show that F preserves several other properties, such as being rank one.

P − , P , genetic markers and the -map
Our goal is to understand the structure of synchronous and anti-synchronous joinings between pairs of ergodic systems (K, L ±1 ). We will use Theorem 91 to define a bijection between synchronous joinings of odometer based systems and synchronous joinings of circular systems. This is relatively easy: to a joining of K with L that projects to the identity we can directly associate an odometer system (K, L) × with a measure ν such that the corresponding measure ν c on ((K, L) × ) c can be identified with a measure on K c × L c that projects to the identity. We carry this construction out in detail in section 7 and show that the map ν → ν c given by Theorem 91 gives a bijection between synchronous joinings of the two kinds of systems.
The situation for anti-synchronous joinings of K and L −1 is more complicated. In Lemma 43, we remarked that the anti-synchronous joinings of K and L −1 can be identified with joinings of K and rev(L) that concentrate on {(s, t) : πs = πt}. Similarly we can identify the anti-synchronous joinings of K c and (L c ) −1 with joinings of K c with rev(L c ) that concentrate on {(s c , t c ) : πt c = (πs c )}. We give notation for these sets: 1. Let P − be the collection of anti-synchronous joinings ρ of K and L −1 .
2. Let P be the collection of anti-synchronous joinings ρ c of K c and (L c ) −1 .
To understand the relationship between P − and P we need an analogue of Lemma 87, and the corresponding analogue of equation 33. We now describe the tools we use to do this.
Fix construction sequences for U n : n ∈ N and V n : n ∈ N for K and L respectively based on k n : n ∈ N and K c , L c the corresponding circular systems based on k n , l n : n ∈ N .
Let (s, t) be an arbitrary point in K×L with πt = −πs and s ∈ S K , t ∈ S L . Let u n : n ∈ N and v n : n ∈ N be the sequence of principal subwords of s and t respectively. If s c = T U (s) and t c = T U (t), then u c n : n ∈ N and v c n : n ∈ N are the sequences of principal subwords of s c and t c . Let x = (πs c ). Then x ∈ rev(K) and set r n = r n (x).
Definition 94. Definet ∈ rev(L c ) by taking rev(v c n ) : n ∈ N as its principal n-subword sequence and r n : n ∈ N as its location sequence.
We will study the relationship between P − and P via the function taking (s, t) to (s c ,t).

Genetic Markers revisited
To understand the relationship between joinings ρ in P − and ρ c in P we need to take into account the manner that shifts the reverse of the second coordinate of a the image of a generic pair (s, t) for K × L −1 and the interplay between the map and genetic markers. Let n < m. Suppose that (u , rev(v )) is a pair of n-words coming from U n × rev(V n ) that occur aligned inside m-words (u, rev(v)) ∈ U m × rev(V m ). If u and rev(v ) occur at the same location in (u, rev(v)), then j u determines j v in the following way: for n ≤ r < m we must have (where j u = (j n , j n+1 , . . . j m−1 )).
Definition 95. Let (u , v ) ∈ W n and (u, v) ∈ W m . Define the (n, m)-genetic marker of an occurrence of the pair (u , rev(v )) in (u, rev(v)) to be ( j u , j v ) where j u is the genetic marker of u in u and j v is the genetic marker of v in v. 19 We call j u and j v a conjugate pair.
Being a conjugate pair is equivalent to satisfying the numerical relationship given in equation 36 and thus either element of a conjugate pair determines the other. Hence for purposes of counting conjugate pairs we need only use the first coordinates, j u . Let (u, rev(v)) ∈ U m × rev(V m ) be words that occur in a pair (s, rev(t)) ∈ K × rev(L). Then the relative alignment of u c and rev(v c ) in (s c ,t ) is determined by the -map. This is approximated with a high degree of accuracy by where the code Λ m sends intervals. Accordingly: Definition 96. Define the pair (u, rev(v)) c to be (u c , sh Am (rev(v c )).

Thus (u, rev(v)) c l determines a basic open interval in
Alternatively we could write this as: We now have a lemma extending Lemma 72 which says that if u and v belong to U n+1 and V n+1 then, relative to sh −j 1 (rev(v)), all occurrences of (u ) c ∈ U c n in u c are either lined up with an occurrence of a rev((v ) c ) for some (v ) c ∈ V c n or a boundary section of sh −j 1 (rev(v c ))). The lemma also says that if (u ) c , rev((v ) c ) are lined up then j u and j v form a conjugate pair. 20 Proposition 97. Let n < m and u ∈ U m , v ∈ V m . Then for u ∈ U n , v ∈ V n we consider occurrences of (u , rev(v )) c in (u, rev(v)) c . 21 1. If (u , rev(v )) c occurs in (u, rev(v)) c , then j u and j v form a conjugate pair.
2. There is a constant C = C(n, m) such that all conjugate pairs occur C times.
3. Fix a conjugate pair (j u , j v ) of genetic markers of (u , rev(v )). If k is a location of an occurrence of (u ) c in u c with genetic marker j u , but not a location of (u , rev(v )) c , then the section of sh Item 1 is immediate from the definitions. The latter items are asking about pairs of the form ((u ) c , sh An (rev((v ) c ))) occurring in (u c , sh Am (rev(v c )). Such a pair occurs at k if and only if the pair ((u ) c , rev((v ) c )) occurs aligned in (u, sh Am−An (rev(v c ))) at k. Item 3 is equivalent to saying that (u ) c is lined up with a portion of sh Am−An (rev(v c )) contained in m i=n+1 ∂ i . We fix m and prove 2 and 3 by induction on m − n. The case that m = n + 1 is the content of Lemma 72. Suppose that the proposition is true for m and n + 1, we prove it for m and n.
A pair of n + 1-circular words (w 0 , w 1 ) c lined up in the shifted pair (u c , sh Am−A n+1 (rev(v c ))) must have conjugate genetic markers. Moreover any there is a number C 0 such that any pair with conjugate genetic markers occurs lined up C 0 many times.
Fix an occurrence k of an n + 1-word w 0 so that no word in sh Am (rev(v c )) occurs at [k + A n+1 , k + A n+1 + q n+1 ), i.e w 0 is not lined up with the reverse of an n + 1-word in sh Am−A n+1 (rev(v c )). Then w 0 is lined up with a segment of sh Am−A n+1 (rev(v c )) that is a subset of in m n+2 ∂ i . To pass from A m − A n+1 to A m − A n we shift by −j 1 , where j 1 = p −1 n mod q n . Noting that each reversed n + 1-word ends with a string of b's of length q n , we see that after the additional shift there can be no n-subwords inside w 0 lined up with anything besides a portion of sh Am−An (rev(v c )) contained in m n+1 ∂ i . Suppose that u and v are n-words and we have an occurrence of (u ) c and rev((v ) c ) lined up in the pair (u c , sh Am−An (rev(v c ))). If j u = k 0 j * u and j v = k 1 j * v , we let (w 0 , w 1 ) be the occurrence of n + 1-subwords of (u, v) with genetic markers j * u and j * v that contain u and v . It follows from the previous paragraph that the genetic markers of w 0 and w 1 are conjugate and w c 0 , rev(w c 1 ) are aligned in (u c , sh Am−A n+1 (rev(v c ))). By Lemma 72, k 0 and k 1 are conjugate and thus j u and j v are conjugate.
Further each conjugate pair occurs aligned the same number C 1 of times in the pair (w c 0 , sh −j 1 (rev(w c 1 ))). The number C 1 is independent of w 0 , w 1 and k 0 and k 1 . It follows now that given a conjugate pair of genetic markers ( j u , j v ), the number of occurrences of a pair of circular n-words with genetic marker j u in u c aligned with an occurrence of a circular word with genetic To finish we note that the unaligned n-words are in two categories, those that are not aligned because the n + 1-words that contain them are not aligned, or those that are not aligned by the final shift −j 1 . In each case, the unaligned n-words in u occur across from boundary sections in the word sh Am−An (rev(v c )).
Thus, using the backwards C-operation to wrap words around the circle in opposite directions introduces some slippage, but the slippage is uniform and predictable.
Definition 98. Suppose that j and j are a conjugate pair of (n, m)-genetic markers and u c ∈ U c m , v c ∈ V c m . Let (u ) c and (v ) c have genetic markers j and j in u c , v c respectively. Then the set of locations k such that (u ) c occurs in u c starting at k with genetic marker j but rev((v ) c ) does not occur starting at k + A n in sh Am (rev(v c )) is called the (n, m)-slippage of j.
A location k can belong to the slippage of j for two mutually exclusive reasons. Either, for some proper tail segment j * of j, k is part of the slippage of the subword of u c with genetic marker j * or k is part of the slippage of the j n inside the n + 1 word containing u caused by sh −j 1 .
Let SL n,m stand for the (n, m)-slippage of n-subwords of u c ; i.e. the locations k in u c of some n-word (u ) c such that there there is no n-word rev((v ) c ) at position k + A m . Inside an m-word u c we find multiple copies of SL n,n+1 corresponding the location of each n + 1 word in u c . Denote the union of these copies as SL m n,n+1 . Then it follows that: and moreover the union is disjoint. The slippage is the portion of of the words that we have no control over when counting, so we want to be able to estimate the proportion of words in the slippage. Let The next proposition allows us to control the (n, m)-slippage by controlling the successive (n, n + 1)-slippages.
We begin by noting that for n * between n and m, all pairs (u * , rev(v * )) of n * -words have the same proportion of slippage of n-words in (u * , rev(v * )) c . Thus n * n is equal to the proportion of slippage of all of the n-words occuring in pairs (u * , rev(v * )) c of n * -subwords of (u, rev(v)) c .
The argument is similar to Lemma 87. Starting with n * = m − 2 and decreasing until n * = n + 1, using that fact that the union in equation 37 is disjoint, one inductively demonstrates that: We can combine item 3 of Lemma 97 with equation 39 to see that if k is in SL n,m , then [k + A n , k + A n + q n ) is a subset of m i=n+1 ∂ v c i . It thus follows from Lemma 75 that: ).
Because the definition of m n was made entirely in terms of genetic markers, the whole discussion could have been carried out simply by considering K c × rev(K c ). The numerics depend only on the circular coefficient sequence, not on particular construction sequences U n , V n : n ∈ N .
Viewing the operator as the limit of the codes Λ m , we can pass to infinity and define SL ∞ n similarly and let ∞ n be the proportion of locations k of n-subwords of a typical s ∈ K c such that no n-subword of (rev(π(s))) occurs at k + A n . Then: We now formulate and prove the version of Lemma 87 involving the map. One might expect that would require considering arbitrary pairs of genetic markers j and j . However, by Proposition 97, if u occurs in u with (n, m)genetic marker j, then the only genetic marker it can occur lined up with in rev(v) is its conjugate pair. Similarly either of the genetic markers of aligned words (u ) c occurring in u c and sh An (rev((v ) c )) occurring in sh Am (rev(v)) determine the other member of the conjugate pair.
It follows that we need only consider pairs (u , rev(v )) whose genetic markers are conjugate in (u, rev(v)). Since the map j to j is a bijection we will refer to either of j or j as the genetic marker of a pair (u , rev(v )) or equivalently (u , rev(v )) c .
We are reduced to considering sets S * ⊆ {(n, m)-genetic markers} rather than sets of pairs of genetic markers. Let n < m and let S * be a set of (n, m)-genetic markers of pairs of n-words in (u, rev(v)). Let Lemma 100. Let n < m and (u, v) ∈ U m × V m . Let S * be a collection of (n, m)-genetic markers, g the total number of (n, m)-genetic markers 22 and d = |S * |/g. Then (in the notation above): The proof is essentially the same as the proof of Lemma 87, indeed the proof of equation 42 is the same. Because all genetic markers occur with the same frequency, after allowing for the portions u c in boundary sections and in slippage (which are disjoint), d/q n is the density of locations k of occurrences of words with genetic markers in S * . Once again equations 44 and 45 follow from 42 and 43 by substitution.
The equation relating ρ ∈ P − and ρ c ∈ P that corresponds to equation 33 is: Once again ρ c (∂ m ) is independent of the choice of ρ c . Setting d ∂m ρ = ρ c (∂ m ), we can write the previous equation as: 22 As before it is easy to check that g = m−1 n k i .
Understanding empirical distributions of joinings along the natural map involves studying how the slippage affects each pair of n-words. Fix u ∈ U n , v ∈ V n and u ∈ U m , v ∈ V m where n < m. Let the conjugate pair ( j, j ) be the genetic marker of (u , rev(v )) in (u, rev(v)). Then, as remarked earlier j is determined by j, since they are a conjugate pair. Define SL n,m (u , rev(v )) to be the collection of locations k ∈ SL n,m of n-subwords of u c that have genetic marker j. Item 2 of Proposition 97 implies that |SL n,m (u , rev(v ))| is the same for all choices of (u , rev(v )). Since SL n,m is the union over all possible pairs of SL n,m (u , rev(v )), we see that From the definition: is equal to |{occurrences of (u , rev(v )) c in (u, rev(v)) c }| |for some (u * , v * ) ∈ W n × V n , (u * , rev(v * )) c occurs in (u, rev(v)) c }| This in turn is equal to: (1 − m n )|{subwords of u c with genetic marker j}| (1 − m n )|{n-subwords of u c }| which in turn is equal to EmpDist n,n,0 (u, rev(v))(u , rev(v )).

Transferring measures up and down, II
In this section we describe the correspondence between joinings in P − and P . We do this by considering generic points for the joinings and transferring them up or down. For the reader's convenience we repeat a definition. Let (s, t) be an arbitrary point in K × L with πt = −πs and s ∈ S K , t ∈ S L . Let u n : n ∈ N and v n : n ∈ N be the sequence of principal subwords of s and t respectively. Then u c n : n ∈ N and v c n : n ∈ N are the sequences of principal subwords of s c = T U (s) and t c = T U (t). If x = (πs c ), then x ∈ rev(K) and we can set r n = r n (x). Recall that we definedt ∈ rev(L c ) by taking rev(v c n ) : n ∈ N as its principal n-subword sequence and r n : n ∈ N as its location sequence.
The following follows immediately from equation 27: Lemma 101. The sequence t is generic for an invariant measure µ on L if and only ift is generic for an invariant measure µ * on rev(L c ).
We will study the relationship between P − and P via the function taking (s, t) to (s c ,t). If [a n , b n ] is the location of the principal n-block of s c , we define w c n to be the word (u c n ,t n ) (in the language Σ × Λ) wheret n =t [A n + a n , A n + b n ]. Rephrasing this, if (u n , rev(v n )) are the principal nsubwords of (s, t) then w c n = (u n , rev(v n )) c .
Proposition 102. The sequence (u n , rev(v n )) : n ∈ N is a generic sequence (resp. an ergodic sequence) if and only if w c n : n ∈ N is a generic sequence (resp. an ergodic sequence).

This follows immediately from equation 48.
It is worth remarking that Proposition 102 can be restated in the language of Definition 26 as saying that (u n , rev(v n ), 0) : n ∈ N is a generic sequence if and only if (u c n , rev(v c n ), A n ) : n ∈ N is a generic sequence.
The next theorem is the analogue of Theorem 91 adapted to lifting joinings of K with L −1 to joining of K c with (L c ) −1 . In the theorem the notation (ν, ν c ) and (µ, µ c ) refer to pairs of corresponding measures. We assume that K is built in the language Σ and L is built in the language Λ.
Theorem 103. Suppose that U n : n ∈ N and V n : n ∈ N are construction sequences for two ergodic odometer based systems (K, ν) and (L, µ) with the same sequence parameters k n : n ∈ N . Let (K c , ν c ) and (L c , µ c ) be the associated ergodic circular systems built with a circular coefficient sequence k n , l n : n ∈ N . Then there is a canonical affine homeomorphism ρ → ρ c between the simplex of anti-synchronous joinings ρ of (K, ν) and (L −1 , µ) and the simplex of anti-synchronous joinings of (K c , ν c ) and ((L c ) −1 , µ c ) such that equation 46 holds between ρ and ρ c .
Suppose that we are given an anti-synchronous ergodic joining ρ between K and L −1 . Let (s, t) be generic for ρ. By lemma 23, the sequence of principal n-blocks, (u n , rev(v n )) : n ∈ N is ergodic. By Proposition 102 the sequence w c n : n ∈ N define an ergodic measure ρ c . Since the (u n , rev(v n )) : n ∈ N satisfy equation 45, the Ergodic Theorem implies that ρ c and ρ satisfy equation 46. It is easy to check that the definition of ρ c is independent of the choice of the generic pair (s, t).
For the other direction we can assume that we are given a generic pair (s c ,t) for an ergodic measure ρ c on K c × rev(L c ) that concentrates on pairs (s c , rev(t c )) ∈ K c × rev(L c ) such that π(rev(t c )) = (π(s c )). Taking principal subwords gives us a generic sequence (u c n ,t n ) : n ∈ N . Eacht n is a welldefined word rev(v c n ) in rev(V c n ). As in the definition of U T the pair (s c , rev(t)) gives a pair of sequences of genetic markers ( j n : n ≥ N , j n : n ≥ N for some N . Letting u n = c n −1 (u c n ) and v n = c n −1 (rev(t n )) the sequences u n , j n and v n , j n determine a pair in K × L up to finite translations. These sequences are defined independently of the exactly location of the zero oft; the small shifts used in the definition of do not change the two sequences.
If we let (s, t) = (U T (s c ), U T (rev(t))), making small adjustments if necessary to make (s, t) anti-synchronous, we get an element of K×L −1 . Applying Proposition 102 again we see the theorem.
We can extend this correspondence to non-ergodic joinings ρ on K × L −1 and ρ c on K c ×rev(L c ), exactly as in Theorem 91; to go up we take an ergodic decomposition of ρ: To go down we use the ergodic decomposition theorem and the measure µ(i) to reverse this process.
Clearly the map ρ → ρ c is an affine bijection. It remains to show that it is continous. However, just as in Theorem 91, we see from equation 46, that for each n there is a constant C n , independent of ρ such that for all u ∈ U n , v ∈ V n , ρ c ( (u, rev(v)) c ) = C n ρ ( (u, v) ).
This clearly implies that the map ρ → ρ c is a weak* homeomorphism.
The proof of Theorem 103 shows that (s, t) is generic for ρ if and only if the pair (s c ,t) is generic for ρ c . Moreover, the proofs of Theorems 91 and 103 are quite robust. In particular the constructions of the corresponding measures are independent of the various choices of generic points s or s c , (s, t) or (s c ,t ).

The Main Result
We now turn to the main results of this paper. Fix an arbitrary circular coefficient sequence k n , l n : n ∈ N for the rest of the section. Let OB be the category whose objects are ergodic odometer based systems with coefficients k n : n ∈ N . The morphisms between objects (K, µ) and (L, ν) will be synchronous graph joinings of (K, µ) and (L, ν) or anti-synchronous graph joinings of (K, µ) and (L −1 , ν). We call this the category of odometer based systems.
Let CB be the category whose objects consists of all ergodic circular systems with coefficients k n , l n : n ∈ N . The morphisms between objects (K c , µ c ) and (L c , ν c ) will be synchronous graph joinings of (K c , µ c ) and (L c , ν c ) or anti-synchronous graph joinings of (K c , µ c ) and ((L c ) −1 , ν c ). We call this the category of circular systems.
Remark 104. Were we to be completely precise we would take objects in OB to be presentations of odometer based systems by construction sequences W n : n ∈ N without spacers together with suitable generic sequences and the objects in CB to be presentations by circular construction sequences and their generic sequences. This subtlety does not cause problems in the applications so we ignore it.
The main theorem of this paper is the following: Theorem 105. For a fixed circular coefficient sequence k n , l n : n ∈ N the categories OB and CB are isomorphic by a function F that takes synchronous joinings to synchronous joinings, anti-synchronous joinings to antisynchronous joinings, isomorphisms to isomorphisms and weakly mixing extensions to weakly mixing extensions.
Elaborating on Example 6: Corollary 106. The map F preserves systems of factor maps (or alternatively extensions). Explicitly: let I, ≤ I be a partial ordering, X i : i ∈ I be a family of odometer based systems and π i,j : j ≤ i is a commuting family of factor maps with π i,j : X i → X j . Then F(π i,j ) : j ≤ i is a commuting family of factor maps among F(X i ) : i ∈ I . Moreover the analogous statement holds for circular systems X c i : i ∈ I , factor maps π i,j : j ≤ i and F −1 .
Theorem 105 can be interpreted as saying that the whole isomorphism and factor structure of systems based on the odometer k n : n ∈ N is canonically isomorphic to the isomorphism and factor structure of circular systems based on k n , l n : n ∈ N . We call this a Global Structure Theorem.

The proof of the main theorem
Before we prove theorem 105 we owe the following lemma: Lemma 107. Both OB and CB are categories, and the composition of synchronous joinings is synchronous, the composition of two anti-synchronous joinings is synchronous and the composition of a synchronous and an antisynchronous joining (in either order) is anti-synchronous.
To see that OB and CB are categories we must see that the morphisms are closed under composition. This is equivalent to the statement that the composition of two synchronous or anti-synchronous joinings are synchronous or anti-synchronous. This, in turn follows from Proposition 7 (item 2) applied to joinings of odometers or rotations.
We now prove Theorem 105. By Proposition 83 the map F gives a bijection between the objects of OB and CB and hence it remains to define the functor on the morphisms (i.e. joinings between systems (K, µ) and (L ±1 , ν)) and show that it preserves composition.

Defining F on morphisms
We split the definition of F(ρ) into two cases according to whether ρ is synchronous or anti-synchronous. In both cases we define F for arbitrary joinings even though the only joinings we use as morphisms in the categories are graph joinings; in particular the morphisms in each category are ergodic.

Case 1: ρ is synchronous:
Suppose that ρ a synchronous joining of odometer based systems K and L with coefficient sequence k n : n ∈ N that are constructed with symbols in Σ and Λ from construction sequences U n : n ∈ N and V n : n ∈ N . We define a new construction sequence W n : n ∈ N with the symbol set Σ × Λ.
Given n, we put a sequence into W n if and only there are words u = (σ 0 , . . . σ Kn−1 ) ∈ U n and v = (λ 0 , . . . λ Kn−1 ) ∈ V n . It is easy to check that W n : n ∈ N is an odometer based construction sequence with coefficients k n : n ∈ N . Let (K, L) × be the associated odometer based system. Since ρ is synchronous, it concentrates on members of K × L that correspond to elements of (K, L) × . We can canonically identify ρ with a shift invariant measure ν on (K, L) × .
Let ((K, L) × ) c be the circular system associated with (K, L) × . We can apply Theorem 91 to find shift invariant measure ν c on ((K, L) × ) c associated with ν that is ergodic just in case ν is ergodic. Shift invariant measures on ((K, L) × ) c can be canonically identified with synchronous joinings on K c ×L c . Let ρ c be the joining of K c × L c corresponding to ν c . We let F(ρ) = ρ c .
Explicitly: A generic sequence (u n , v n , 0) : n ∈ N for the joining ρ, can be viewed as a generic sequence (u n , v n ) : n ∈ N for (K, L) × and transformed into a generic sequence (u c n , v c n ) : n ∈ N for ((K, L) × ) c . The latter corresponds to a generic sequence of the form (u c n , v c n , 0) : n ∈ N for the joining ρ c . This process is clearly reversible so F is a bijection between the synchronous joinings of OB and the synchronous joinings of CB.
We must show that if ρ is a graph joining then so is ρ c . Once this is established it follows by symmetry that if ρ is an isomorphism then ρ c is an isomorphism. Namely if ρ * is the adjoint joining of L with K defined as ρ * (A) = ρ({(s, t) : (t, s) ∈ A}), then (ρ * ) c = (ρ c ) * . Hence ρ * is a graph joining iff (ρ c ) * is a graph joining.
Suppose that ρ is a graph joining. We apply Proposition 4, part 3. It suffices to show that for all basic open sets in K c of the form u c 0 where u c ∈ U c n and all > 0, there are words v c 1 , v c 2 . . . v c k * that belong to n V c n and locations l c 1 , . . . l c k * such that: Consider u such that c n (u) = u c . Because ρ is a graph joining, for all δ > 0 we can find words v 1 , . . . v k and locations l 1 , . . . l k such that Without loss of generality we can assume that for some m ≥ n each v i is an m-word and that each l i ≤ 0. Let (s, t) be generic for ρ and considering the pair s c = T U (s), t c = T U (t). Then by Remark 93 (s c , t c ) is generic for ρ c . We will choose words v c j and locations l c j and compute the measure in inequality 49 by computing the density of locations representing points in the symmetric difference. Let For each i, if −l i is not the location of the beginning of an n-word in v i then dropping v i l i reduces the measure of the symmetric difference in inequality 50. Thus, without loss of generality we can assume that for all i, there is an (n, m)-genetic marker j(i) coding the location of the n-word in v i that starts at −l i . Since B 0 ∪ B 1 has density less than δ, the density of k such that either: 1. u occurs at k but for each i, k is not the position of the beginning of an n-word with genetic marker j(i) in an occurrence of v i or 2. for some i, k is the position of the beginning of an n-word with genetic marker j(i) in an occurrence of v i , but u does not occur at k, has density less than δ.
We are in a position to define the v c j and the l c j . For each i we define index sets J i and a collection {l c j : j ∈ J i }. We arrange the J i 's so that they are pairwise disjoint and for some k * , . Among all n-words the proportion d p that begin with an element of B 0 ∪ B 1 is d 0 * K n . The density of k ∈ Z that start n-words in (s, t) is (1 − µ( ∞ n ∂ i ))/q n . Letting d * be the density of k / ∈ M n ∂ i , we see that d * is bounded away from 0 and 1 independently of M . The proportion d c p of circular n-subwords of (w c . Since ρ concentrates on {(s, t) : π(s) = π(t)} and ρ c concentrates on {(s c , t c ) : π(s c ) = π(t c )}, the n-words with a particular genetic marker in w 0 occupy the position of the same genetic marker in w 1 and similarly for w c 0 and w c 1 . The (n, M )-genetic markers set up a one-to-one correspondence between n subwords u * of w 0 and regions of w c 0 that consist of occurrences of (u * ) c that have the same genetic marker. Each of the regions of w c 0 with the same genetic marker have the same number of n-words in them.
Temporarily call an n-subword of (w c 0 , w c 1 ) bad if it begins with a k in B c 0 ∪ B c 1 and similarly for n-subwords of (w 0 , w 1 ) and B 0 ∪ B 1 . Then the property of being bad is determined by the (n, M )-genetic marker of the nword: if k is the beginning of n-subword of w 0 with genetic marker j, and k is the beginning of an n-subword of w c 0 with the same genetic marker in w c 0 , then k ∈ B 0 ∪ B 1 if and only iff k ∈ B c 0 ∪ B c 1 . It follows the proportion of bad n-subwords of (w 0 , w 1 ) is the same as the proportion of bad subwords of (w c 0 , w c 1 ). In otherwords: It follows that .
Thus by taking δ small enough and M large enough we can make d 0 as small as we want, and thus arrange that d c 0 as desired. To finish showing that F is a bijection between graph joinings in each category and isomorphisms in each category we must also show that if ρ c is a graph joining then so is ρ. But this is very similar. Given a u c ∈ U c n , and an > 0 we can find v c 1 , . . . v c k * and locations l c 1 , . . . l c k * so that inequality 49 holds. Again we can assume that for some m, for all j, v c j ∈ W c m . The numbers |l c j | determine locations in v c j of beginnings of n-words. We can augment our collection of locations by adding more l c j 's so that if l is the start of a location in v c j that has the same (n, m)-genetic marker as l c j , then for some j we have l c j = −l and v c j = v c j . In doing this we do not increase the density of B c 0 ∪ B c 1 . Reversing the procedure above this gives words v j ∈ n V n and locations l j such that the density of B 0 ∪ B 1 is less than . (Note the lack of boundary in K × L makes the computation easier by reducing the density of B 0 ∪ B 1 .)

Case 2: ρ is anti-synchronous
On the anti-synchronous joinings we take F to be the bijection between anti-synchronous joinings of (K, µ) with (L −1 , ν) and of the circular systems (K c , µ c ) with ((L c ) −1 , ν c ) defined in Theorem 103. We show that F takes anti-synchronous graph joinings to anti-synchronous graph joinings and vice versa. Having done this it will follow by a symmetry argument that F sends anti-synchronous isomorphisms to anti-synchronous isomorphisms.
Suppose that ρ is an anti-synchronous graph joining; i.e. ρ is a graph joining of K with L −1 that concentrates on {(s, t) : π(t) = −π(s)}. The map x → rev(x) projects to the odometer map π(x) → −π(x); in particular rev(L) is based on the same odometer that L is. By Lemma 43 we can view ρ as a graph joining of K with rev(L) that concentrates on {(s, t) : π(s) = π(t)}.
Similarly we view ρ c as concentrating on K c × rev(L c ).
We must show that for all basic open sets in K c of the form u c 0 where u c ∈ U c n and all > 0, there are words v c 1 , v c 2 . . . v c k * that belong to n V c n and locations l c 1 , . . . l c k * such that: Consider u such that c n (u) = u c . Because ρ is a graph joining for all δ > 0 and all large enough m we can find words v 1 , . . . v k ∈ V m and locations l 1 , . . . l k such that Without loss of generality we can assume that each l i ≤ 0. We will take m sufficiently large according to a restriction we define later. Let (s, t) be generic for ρ and lett be as in Definition 94. Then (s c ,t) is generic for ρ c . We argue as before considering sets: so that the density of B c 0 ∪ B c 1 is less than . As in the synchronous case, for each i we build index sets J i so that the J i 's to be disjoint and have union the interval {j : 1 ≤ j ≤ k * } for some k * . For all j ∈ J i we take v c j = c m (v i ). We need to find a collection of locations {l j : j ∈ J i }.
Fix an i ≤ k . Without loss of generality we can assume that l i is the beginning of a reversed n-block rev(v ) in rev(v i ), since otherwise, discarding rev(v i ) l i makes inequality 52 sharper. If (s 0 , rev(t 0 )) ∈ K × rev(L) is an arbitrary member of with π(s 0 ) = −π(t 0 ), then there is an m-word u * such that s 0 ∈ u * l i . Let j(i) be the genetic marker of u in u * . We note that j(i) does not depend on s 0 , since it is determined entirely by the location of u in u * and u * must be aligned with rev(v i ).
The genetic marker j(i) defines a region of n-words in U c n inside an mword in U c m . Let L i be the collection of l that are at the beginning of an n-word in U c n with genetic marker j(i) in an m-word in U c m and set This determines the collection {v c j , l c j : 1 ≤ j ≤ k * }. We compute the density d c 0 of elements of B c 0 ∪ B c 1 by separating them into these two sources. Explicity, we divide into: Of these a proportion M m of the n-subwords are in the Slippage. Thus the collection of k that belong to the Slippage has density M m Since M m goes to zero as m goes to infinity we can make this term as small as desired by taking m large enough. Let Let d p be the proportion of m-subwords of s [a M , b M ) that begin with a k ∈ B 0 ∪ B 1 . Since every genetic marker is represented exactly the same number of times in the complement of the slippage (Proposition 97), the proportion of words that begin with k in the Mistakes is If d 0 is the density of B 0 ∪B 1 in [a M , b M ) and d c 0 is the density of the Mistakes, then Putting together equations 57, 58 and 59, we see that if we make d 0 sufficiently small we can make d c 0 as small as desired.
Summarizing: By taking M large enough, the density of . This is the sum of the density of the (m, M ) slippage and the density of the Mistakes. We can make the density of the Slippage arbitrarily small by taking m large enough and the density of the Mistakes arbitrarily small by taking δ 0 sufficiently small. This establishes the claim that if ρ is a graph joining then so is ρ c .
We must show that if ρ c is a graph joining then so is ρ. We suppose that we are given a u ∈ U n , we must find {v i , l i : i ≤ k } so that equation 52 holds. Let u c = c n (u) and approximate u c 0 ×rev(L c ) using {v c j , l c j : i ≤ k * }. Again, we can assume that the collection of locations is saturated in the sense that if l is the start of a location in v c j that has the same (n, m)-genetic marker as l c j , then for some j we have l c j = −l and v c j = v c j . In doing this we do not increase the density of B c 0 ∪ B c 1 . We can now use equations 57, 58 and 59 again to see that if d c 0 is made sufficiently small then so is d 0 .
Our next claim is that ρ is an isomorphism if and only if ρ c is an isomorphism. Recall from Proposition 5 that ρ is an isomorphism iff both ρ and ρ * are graph joinings. Thus if ρ is an isomorphism, both ρ c and (ρ * ) c are graph joinings. Since is an involution: Thus if ρ is an isomorphism, so is ρ c .
Reversing this line of reasoning shows that if ρ c is a graph joining then ρ is.

F preserves composition
To finish the proof that F is a functor we must show that F preserves composition. The argument splits into four natural cases: composing synchronous joinings, composing a synchronous joining with an anti-synchronous joining on either side and composing two anti-synchronous joinings. We will carefully work out the case for compositions of synchronous embeddings, and discuss the appropriate modification in the cases involving at least one antisynchronous embedding after Lemma 108.
The cases differ only that the shifts involved in the generic sequences have different forms. For ergodic synchronous joinings generic sequences can be taken to be of the form (u n , v n , 0) : n ∈ N , whereas for antisynchronous joinings of K c and rev(L c ) a natural generic sequence is of the form (u c n , rev(v c n ), A n ) : n ∈ N . 23 Preparatory Remarks In the characterization of the relatively independent joining ρ of ρ 1 and ρ 2 given in Lemma 28 and Proposition 29, the partitions A k , A k andÃ k are given by u k s 1 , v k s 2 and w k s 3 for s 1 , s 2 , s 3 ∈ Z. Formally the partitions A k ×A k , A k ×Ã k and A k ×Ã k and A k ×A k ×Ã k consist of all possible products of these basic open sets. However, in the situation we are considering we have synchronous and anti-synchronous joinings. For synchronous joinings we can build a generating family for the relatively independent joining ρ of ρ 1 and ρ 2 by considering products of pairs of basic open intervals in the same locations; e.g. pairs of the form u k s × w k s . As a consequence, for verifying the hypotheses of Proposition 29 we can restrict our attention to the case where s * = 0.
In the case of anti-synchronous joinings we need to distinguish the odometer based from the circular systems. For anti-synchronous joinings of odometer based systems K with M −1 we can consider only intervals of the form u k s × rev(w k ) s+s * where s * = 0. For anti-synchronous joinings of the circular systems K c with M c , asymptotically the Empirical Distances concentrate on words of the form u c k × rev(w c k ) A k (where A k is the amount of shift for at scale k). Moreover, translations of sets of this form generate the measure algebra of the anti-synchronous joining.
Thus in the proof of the next lemma, to verify the hypothesis 3 of Proposition 29 we can take s * = 0 or s * = A k depending on whether ρ 1 • ρ 2 is synchronous or anti-synchronous.
Fix odometer based systems K, L and M with construction sequences U n : n ∈ N , V n : n ∈ N and W n : n ∈ N respectively. Let ρ 1 and ρ 2 be synchronous graph joinings of K and L, and L and M respectively and ρ their relatively independent joining over L.
Since ρ 1 and ρ 2 are graph joinings so is their composition. Thus the relatively independent joining is ergodic. Hence by Lemma 32 we can find generic sequences for ρ 1 , ρ 2 and ρ that satisfy the hypothesis of Proposition 29.
Lemma 108. Let (u n , v n , w n , 0, 0) : n ∈ N be generic for ρ. Then the sequence (u c n , v c n , w c n , 0, 0) : n ∈ N is generic for the relatively independent joining ρ c of ρ c 1 with ρ c 2 . Assuming the lemma, we show that F preserves compositions. Corollary 31 shows that (u n , w n , 0) : n ∈ N is generic for ρ 1 • ρ 2 . From the way that F is constructed, if ν c = F(ρ 1 • ρ 2 ), then (u c n , w c n ) : n ∈ N is generic for ν c (viewed as a measure on a circular system). From Lemma 108 and Corollary 31, we know that (u c n , w c n , 0) : n ∈ N is generic for ρ c It remains to prove Lemma 108.
We claim that (u c n , v c n , w c n , 0, 0) : n ∈ N satisfies the hypotheses of Proposition 29 for the joinings ρ c 1 and ρ c 2 . The first two hypotheses follow immediately: ρ c 1 and ρ c 2 are constructed by taking the generic sequences (u c n , v c n , 0) : n ∈ N and (v c n , w c n , 0) : n ∈ N determined by (u n , v n , 0) : n ∈ N and (v n , w n , 0) : n ∈ N respectively, and the measures did not depend on the precise generic sequence taken. Hypothesis 3 remains to be shown.
We are given > 0, k and s * and need to find (k ) c , G c (k ) c and the I v c 's so that inequalitites 3a and 3b hold. Since ρ c 1 and ρ c 2 are synchronous, so is the relatively independent joining. By the preparatory remarks can take s * , the relative location of words in K and M to be 0. Since the sequence of (u n , v n , w n , 0, 0)'s is generic for the relatively independent product of ρ 1 and ρ 2 , we can find k , N, G k ⊆ V k and for each v ∈ G k a set I v ⊂ [0, K k ) such that the conditions in hypothesis 3 hold in the odometer context. 24 Choose k so large that the density d b of the boundary portions of circular k -words is less than * 10 −6 and so that for each v ∈ G k , there is an I v with and each s ∈ I v has a genetic marker j s in v. We let I v c = {s c : s c has the same genetic marker in v as some s ∈ I v does in v}. Equation 26 implies that and thus |I v c | > (1 − )q k . Equation 27 implies that for v ∈ G k and all large n, from which hypothesis 3a follows immediately. Fix a v c 0 ∈ G c k and an s c ∈ I v c 0 . Let v 0 ∈ G k correspond to v c 0 , and s ∈ I v correspond to s c . Let (u c , w c ) ∈ U c k × W c k . To see hypothesis 3b, we need to compute the empirical distributions of (u c , w c ), u c and w c conditioned on v c 0 . Let A c be the collection of ((u ) Then: .
As in the definition of F in Section 7.1.1, we can view the relatively independent joining ρ on K × L M as concentrating on a single odometer system (K, L, M) × and ρ c , the relatively independent joining of ρ c 1 , ρ c 2 as concentrating on ((K, L, M) × ) c , which is canonically isomorphic to K c × L c M c . In the odometer system (K, L, M) × , consider the set A consisting of those k -words (u , v 0 , w ) such that u and w have u and v in position s. Then and EmpDist(u n , v n , w n )(B) = EmpDist(u c n , v c n , w c n )(B c ).
Finally noting that EmpDist k,k,s,s (u n , v n , w n |v 0 )(u, w) = EmpDist k (u n , v n , w n )(A) and using equations 60 and 61 we see that EmpDist k,k,s c ,s c (u c n , v c n , w c n |v c 0 )(u c , w c ) = (64) EmpDist k,k,s,s (u n , v n , w n |v 0 )(u, w).
Arguing in the same manner we see: Since for large n, EmpDist k,k,s, (u n , v n , w n |v 0 ) − EmpDist k,s (u n , v n |v) * EmpDist k,s (v n , w n )|v) < , from equations 64, 65 and 66 we get the desired conclusion that is less than .
Lemma 108 holds where one or both of the joinings ρ 1 and ρ 2 are antisynchronous as well, however the shift coefficients for the circular systems are no longer all 0 but belong to {0, ±A n } depending on which joinings are anti-synchronous. Similarly s * ∈ {0, ±A k }. The argument follows the same path until it reaches equation 61. This equation relies, in turn on equation 27. The analogue of equation 27 for anti-synchronous joinings is equation 48, which in turn carries over to the relatively independent product. The upshot is that equations 64, 65 and 66 hold after applying the appropriate shifts of u c n and v c n relative to u c n .
This finishes the proof of Theorem 105.

Weakly-Mixing and Compact Extensions
We now show that F preserves weakly-mixing and compact extensions. The fact that compact extensions are preserved is due to E. Glasner and we reproduce the proof here with his kind permission.
Proposition 109. Let (K, µ) and (L, ν) be ergodic and suppose that ρ and ρ c are corresponding synchronous joinings determining factor maps Then K is a weakly mixing extension of L (via π) if and only if K c is a weakly mixing extension of L c (via π c ).
Recall that if π : X → Y is a factor map from (X, B, µ, T ) to (Y, C, ν, S), then the extension is weakly-mixing if the relatively independent joining X × Y X of X with itself over Y is ergodic relative to Y . In case Y is ergodic, this simply means that the relatively independent joining is ergodic.
Suppose that K and L are odometer based systems with construction sequences W n : n ∈ N and V n : n ∈ N respectively. If ρ is a synchronous factor joining of K over L, and the extension is weakly-mixing then we can find an ergodic sequence of words (u n , v n , w n ) ∈ W n ×V n ×W n : n ∈ N that is generic for the relatively independent joining of ρ with itself over L, i.e. ρ × L ρ. This sequence will satisfy the hypotheses of Proposition 29. It follows that the sequence of (u c n , v c n , w c n )'s is also generic for an ergodic measure ν. As we argued in Lemma 108, the (u c n , v c n , w c n )'s also satisfy the hypothesis of Proposition 29. It follows that ν is the relatively independent joining ρ c × L ρ c . Since ν is ergodic ρ c is weakly mixing.
If, on the other hand the sequence of (u n , v n , w n ) is not ergodic, then the sequence (u c n , v c n , w c n ) is also not ergodic. Hence if ρ c is weakly-mixing, then ρ is weakly mixing.
It is immediate from the Furstenberg-Zimmer structure theorem ( [11], Chapter 10, Proposition 10.14) that X is a relatively distal extension of Y if and only if there is no intermediate extension Z of Y , with X being a nontrivial weakly-mixing extension of Z. Thus F takes measure-distal extensions to measure-distal extensions.
What requires more effort to establish is the following: Glasner) The functor F takes compact extensions to compact extensions.
Glasner's proof uses a result proved in the forthcoming [8]: If (K, µ) is an ergodic odometer based system the X is a compact group extension of (K, µ) then there is a representation of X as an odometer based system with the same coefficients.
Since X is a compact extension of Y if and only if X is a factor of a compact group extension of Y , 25 it suffices to show that F takes compact group extensions to compact group extensions.
To prove that F takes compact group extensions to compact group extensions we use a remarkable theorem of Veech that characterizes group extensions π : X → Y of ergodic systems. The criteria is that every ergodic joining of X with itself that is the identity on Y (i.e. ρ, as a measure, concentrates on those pairs (x 1 , x 2 ) such that π(x 1 ) = π(x 2 )) comes from a graph joining which is an isomorphism of (X, B, µ, T ) that projects to the identity map on Y . 26 Explicity, Theorem 6.18, on page 136 of [11] shows that if, in the ergodic decomposition of the relatively independent product X × Y X, only graph joinings appear, then X is a compact group extension. The converse follows from Proposition 6.15, part 2 in [11], that if X is a compact group extension of Y then every ergodic self-joining of X over Y which is the identity on Y is a graph joining.
The map F takes ergodic joinings to ergodic joinings, and all graph joinings to graph joinings, and the identity joining to the identity joining. Thus we see it preserves group extensions.
Furstenberg [9] and Zimmer [22] independently showed that for every ergodic system X there is an ordinal α and a tower of extensions X β : β ≤ α such that X 0 is the trivial system, X α = X and for all β < α, X β+1 is a compact extension of X β , unless α = β + 1 where X α is either a compact or a weakly mixing extension of X β . If there is no compact extension at the end of the tower, then X is measure-distal and X β : β < α is a distal tower approximating X. The least ordinal such that X can be represented this way is the distal height or distal order of X.
Let (K, µ) be an odometer based system and consider the odometer factor O. Let (K , µ ) be the Kronecker factor of (K, µ). Then we have 25 See [10] for an explicit statement and proof. 26 This first appears in [18].
where π 2 may or may not be a trivial factor map. This tower is carried by F to If K is a non-trivial extension of O, then Glasner's result tells us that (K ) c is a compact extension of R α , but is silent on the issue of whether (K ) c is discrete spectrum; i.e. we do not know whether F takes the Kronecker factor of K to the Kronecker factor of K c .
Suppose now that K is given by a finite tower of factors: where K 0 is the Kronecker factor of K and for all i, K i+1 is the maximal compact extension of K i in K. Then K is distal of height N . The map F carries this to a tower of compact extensions From this we see that the distal height of K c is either N or 1 + N . We do not know an example whether the height of K c can be 1 + N . However the ordinary skew product construction applied to odometers gives examples of distal height n where O is the Kronecker factor. Hence from our analysis we see that there are ergodic circular systems with distal height N for all finite N .
In [2], Beleznay and Foreman proved that for all countable ordinals α there is an ergodic measure preserving transformation T of distal height α. In that construction there are no eigenvalues of the operator U T of finite order. Hence if we let O be an odometer with coefficient sequence k n : n ∈ N going to infinity, T × O is an ergodic transformation with distal height α and zero entropy. In the forthcoming [8] we see that this implies that T × O can be presented as an odometer based transformation. By the analysis we just gave we see that (T × O) c is a circular system with height 1 + α. In [8] we see that (T × O) c can be realized as a smooth transformation. For infinite α, 1 + α = α, hence we have: Theorem 111. Let N be a finite or countable ordinal. Then there is an ergodic measure distal diffeomorphism of T 2 of distal height N .

Continuity
Fix a measure space (X, µ). As noted in Section 2.3, we can identify symbolic shifts built from construction sequences with cut-and-stack constructions (whose levels generate X). By fixing a countable generating set in advance, we can make this association canonical. The levels in the cut-andstack construction give the relationship with arbitrary partitions of X. In this way the usual weak topology on measure preserving transformation of X described in Section 2.1 determines a topology on the presentations of symbolic shifts as limits of construction sequences.
The finitary nature of the maps c n : n ∈ N that give bijections between words in W n and words in W c n easily shows that the map F is a continuous map from the presentations of odometer based systems to presentations of circular systems. Thus we have: Corollary 112. The functor F is a homeomorphism from the objects in OB to CB.
For the purposes of the complexity of the isomorphism relation we note: Corollary 113. The map F is a continuous reduction of conjugacy between odometer based systems and circular systems.

Extending the main result
In the main result we restricted the morphisms to graph joinings, largely because compositions of graph joinings are ergodic joinings. Unfortunately a composition of ergodic joinings is not necessarily ergodic, and non-ergodic joinings also arise naturally as relatively independent joinings of ergodic joinings. In this section we indicate how to extend our results to the broader categories that include non-ergodic joinings as morphisms. For convenience, we will continue to require that our objects are ergodic measure preserving systems.
Let OB + and CB + be the categories that have the same objects as OB and CB, but where the collections of morphisms are expanded to include all synchronous and anti-synchronous joinings (rather than just graph joinings).
In Section 7.1.1, the definition of F included all such joinings (F(ρ) for a non-ergodic ρ was defined via an ergodic decomposition). Thus without modification we can view F as a map: To show that F is a morphism between these categories, i.e. to show preserves composition for arbitrary morphisms, we develop a more combinatorial approach to lifting morphisms that coincides with the original definition.
We start by generalizing the notion of a generic sequence of words to include non-ergodic measures. Suppose K is a symbolic system with a construction sequence W n : n ∈ N . Let µ be a shift invariant measure which we assume is supported on the set S ⊆ K (where S is given in definition 10). The ergodic decomposition theorem gives a representation of µ as µ p dλ(p), where each µ p is a shift invariant ergodic measure and λ is a probability measure on a set P parameterizing the ergodic components. For each p, there is a generic sequence of words w p n : n ∈ N for the measure µ p . The main observation is that the set of probability measures on words of a fixed length is compact. Thus for any fixed k and > 0, we can find a finite set P k ⊆ P of parameters so that for all p, there is some p ∈ P k with 27 27 The notions of EmpDist andμ k are given in the beginning of Section 2.6.
This gives a partition of the parameter space into sets {E p : p ∈ P k } such that inequality 67 holds for all p ∈ E p . Now let n be sufficiently large such that for each p ∈ P k , we can find an element w p n ∈ W n with If we denote λ(E p ) by α(p), then α(p) ≥ 0 and p∈P k α p = 1. It is clear that one can obtainμ k up to a small error from the finite data {(w p n , α(p)) : p ∈ P k }, which is a weighted finite collection of words.
For the symbolic sequences that we are interested in, such as the circular systems, the measure of the spacers is independent of the invariant measure µ (see Section 5.1). This means that for all n, p, the sum w ∈Wn µ p qn ( w ) is the same. In this context using inequality 68 we can arrange the inequality: The measure λ is defined on the extreme points of the simplex of shift invariant probability measures and if we choose the finite sets P k to consist of points that lie in the closed support of λ then we an easily ensure that when we go from (k, ) to a (k , ) with k > k, < that P k ⊇ P k . Taking a sequence k → ∞ and k → 0 with k < ∞, we get a set {ν 1 , ν 2 , . . . } of ergodic measures and finite sets I k ⊆ I k+1 of integers with probability measures α k on I k such that ( i∈I k α k (i)ν i ) converges to µ in the weak* topology.
Definition 114. Let n k go monotonically to infinity and {(w i n k , α k (i)) k } be a weighted sequence of words as above. Suppose that for each k and i ∈ I k , EmpDist k (w i n k ) −ν i,k < k , then we call {(w i n k , α k (i))} a generic sequence for µ.
We note that for a fixed i, as k varies {w i n k } is a generic sequence for ν i -which is one of the ergodic measures in the support of λ.
In a manner exactly analogous to the analysis in Section 2.6, Definition 114 can be extended to products of symbolic systems, allowing for shifting of words in construction sequences.
Restricting our objects to ergodic systems (X, B, µ, T ), (Y, C, ν, S) and (Z, D,μ,T ) allows us to deal with the non-ergodic analogue of the material discussed between Definition 25 and Lemma 32 in a relatively straightforward way which we now discuss.
For the analogue of Proposition 29 in the non-ergodic case let us make the following observation. Fix a non-ergodic joining ρ of X and Y that has ergodic decomposition ρ = ρ p dλ(p), where, by the ergodicity of X and Y , each ρ p is also a joining of X with Y . Fix a k and an > 0 and a cylinder set determined by a word u ∈ W X k , at location s * and let φ represent its indicator function. For k large, by the Martingale convergence theorem, there is a subset G of Y of measure close to one such that when we look at the conditional expectation of φ with respect to the partition induced by the principal k -words of y ∈ G, for A k and compare it to E(φ|D), the error is small.
The element of that partition that contains y is given by a word v y ∈ W Y k and a location parameter s y , and the conditional expectation is: This easily gives a set G k ⊆ W k withν k (G k ) > 1 − and a J v ⊆ [0, q k ) such that for v ∈ G k , j ∈ J v , formula 69 gives a good approximation to ρ y (sh s * ( u )) for most of the y ∈ sh sy ( v y ). If we have a generic sequence of weighted words for ρ, then we can use it to calculate the expression in 69. This observation makes it possible for us to formulate Proposition 29 for non-ergodic joinings.
We are given ergodic systems X, Y, Z and are given construction sequences U n , V n , W n : n ∈ N such that for each n, the words in each U n , V n , W n have the same length. Two joinings ρ 1 of X and Y and ρ 2 of Y and Z are given. The analogue of Proposition 29 is now: Proposition 115. Let {(u i n k , v i n k , w i n k , s i n k , t i n k ) : i ∈ I k }, α k ∈ P rob(I k ) : k ∈ N (70) be a sequence of weighted words and k < ∞. Suppose that the following hypothesis are satisfied: 1. {(u i n k , v i n k , s i n k ) : i ∈ I k }, α k ) k is generic for ρ 1 , 2. {(v i n k , w i n k , t i n k ) : i ∈ I k }, α k k is generic for ρ 2 , 3. For all , k, s * there are k , N and a set G k ⊂ W Y k and for each v ∈ G k there is a set J v ⊆ [0, q k ) such that (c) For all v ∈ G k and s ∈ J v , if n k > N , i∈I k EmpDist k 0 ,k 0 ,s,s+s * (u i n k , sh s i kn (v i n k ), sh t i kn (w n k )|v)α k (i) − i∈I k EmpDist k 0 ,s (u i n k , sh s i kn (v i n k )|v)α k (i) * i∈I k EmpDist k 0 ,s+s * (v i n k , sh t i n k −s i kn (w i n k )|v)α k (i) < Then the weighted sequence given in 70 is generic for the relatively independent joining X × Y Z.
The analogues of Corollary 31 and Lemma 32 are easily verified, giving us a characterization of compositions of non-ergodic joinings and the existence of generic sequences satisfying the hypothesis of Proposition 115.
Verifying that F preserves composition is now straightforward in the manner of Section 7.1.2: the G c k and J v c are constructed in exactly the same way. Checking the conditional distributions of short words relative to longer words (k vs. k ) involves counting k -words, and these are counted using Equation 27 for each component (u k , v k , w k ) separately. The weighted average is then preserved.

Lagend
In this section we explore the interplay of the geometric, arithmetic and combinatorial aspects of the manner in which F wraps the odometer based words around the circle. The map F does not preserve the dynamics of the odometer when transforming it into a rotation, indeed it can't. The shift sh k of the odometer corresponds to a shift sh k c of the rotation. The relationship between k and k c is characterized combinatorially as an optimal wrapping property. The latter is defined in terms of the notion of a perfect match. The results in this section can be used to give an alternate proof of the fact that if (K, µ) is ergodic then so is (K c , µ c ) that does not use the notion of a generic sequence of words.
Central to our understanding circular systems is the manner in which an s c had its n-words aligned with n-words in sh k (t c ). A word u occurs in s c lined up with a word w in sh k (t c ) if and only if u occurs at some location l in s and w occurs at k + l in t c .
Definition 116. Let x, y be strings in the language Σ ∪ {b, e} and u, v be words of the same length. A k-match of u and v in x and y is a location l in the domain of x such that u occurs at l in x and v occurs at l + k in y.
If w c 0 , w c 1 are circular m-words then a perfect match of u c , v c in w c 0 , w c 1 is a k such that there are (n, m)-genetic markers j u , j v such that u c occurs in w c 0 and v c occurs in w c 1 with genetic markers j u and j v respectively and k is a match between all occurrences of u c and v c with these genetic markers.
Thus k is a perfect match of u and v if and only if the occurrences of j u in w c 0 are exactly aligned with the occurrences of j v is w c 1 . We will say that k is a match between u and v if there is a location l such that such that k is a match between u and v at l, and that every kmatch is perfect when k has the property that for every occurrence of a pair of words u c , v c in w c 0 , w c 1 , if k is a match between u c , v c then k is a perfect match between u c , v c . The astute reader will have already recognized that being a match or a perfect match only refers to the genetic markers and the underlying circular factor-thus the actual identities of u c , v c , w c 0 and w c 1 are not material-only the locations of the genetic markers.
The notion of a perfect match is vacuous for odometer words; for if u, v are odometer n-words and w 0 , w 1 are odometer m-words then u, v are the unique pair with a genetic markers j u and j v . Moreover, if k matches any pair of n-subwords, k matches every pair of corresponding n-subwords in the overlap of w 0 and sh k (w 1 ).
If k > 0, then the n + 1-subwords of w 1 in the overlap of w 0 and sh k (w 1 ) are split into two pieces by the n + 1-subwords of w 0 ; the left portion of each of the n + 1-subwords of w 0 in the overlap coincides with the right portion of the corresponding n + 1-subword of w 1 . Call the matches in the left portion of w 0 left-matches.
Discussion. Let u c have genetic marker j u c = (j n , j n+1 , . . . j m−1 ) in w c 0 and suppose that u c sits inside the n + 1 word (u ) c with genetic marker (j n+1 , . . . j m−1 ). Then words with genetic marker j u sit inside every 2-subsection of u . It follows that if k c > 0 and k c is a perfect match of u c with v c having genetic marker j v c = (j n , j n+1 , . . . j m−1 ) win w c 1 , then j n ≤ j n . Thus the relative position of v c in the n + 1-subword of w c 1 with genetic marker (j n+1 , . . . j m−1 ) is to the right of the position of u c in (u ) c ; i.e. the relative shift is to the left to match u v with v c . For this reason, when k c > 0 we need only consider left shifts.
It is also easy to see that perfect matches between n-words with genetic markers j and j inside an m-words w c 0 , w c 1 are those k c that match the first occurrence of an n-word with genetic marker j in w c 0 with the first occurrence of an n-word with genetic marker j in w c 1 . The next lemma says that perfect matches can be viewed as the locations of shifts of odometer based words wrapped around the circle.
Lemma 117. Suppose that w 0 , w 1 ∈ W m and w c i = c m (w i ). Let n < m and 0 ≤ k c < q m and suppose that k c is a perfect match between some pair of n-subwords of w c 0 and w c 1 . Then there is a unique k such that for all genetic markers j, j , • k c is a perfect match between the n-subwords of the w c i with genetic markers j and j iff • k is a left match between the n-subwords of w with genetic markers j and j .
The Lemma has an obvious analogue for negative k c and right matches.
Suppose that k c is a perfect match between j and j . Call the subwords of w 0 , w 1 with genetic markers j and j u and v. Then u c , v c are perfectly matched by k c . Let k be the distance between the locations of u and v. Since k c ≥ 0 we have k ≥ 0. From our discussion we seen that k is a left match of u, v. We claim that this k satisfies the lemma.
Let u , v be the n+1-subwords of w 0 , w 1 inside which u, v occur. Suppose that u = u 0 u 1 . . . u kn−1 and v = v 0 v 1 . . . v kn−1 , so (u ) c = C((u 0 ) c , . . . , (u kn−1 ) c ), (v ) c = C((v 0 ) c , . . . , (v kn−1 ) c ). If u i , v j are left matched by k in u , v , then the first occurrences of (u i ) c and (v j ) c are matched by k c , hence inside (u ) c , (v ) c , k c is a perfect match of (u i ) c and (v j ) c .
The relative position of (u ) c and (v ) c is duplicated over all n + 1-words with genetic markers j u , j v in w 0 and sh k (w 1 ). It follows that k c is a perfect match of u c and v c inside w c 0 , w c 1 . From the uniformity of the relative positions of n + 1-words it also follows that any two n-subwords of n + 1-subwords (u * ) c , (v * ) c in positions i, j that Without loss of generality k ≥ 0 (otherwise we reverse the role of u and v). Words in W m with m > M (k) start with a block of b's of length at least q M (k) . Hence if k matches n-words u, v inside w ∈ W c m , they both must occur in some M (k)-subword of w.
To see item 1, we need to show how to improve k to a k that is a perfect match. Changing k will involve sacrificing some of the matches of pairs in I, but this will be compensated by the additional multiplicity of the remaining matches.
We Because q n does not divide p n , given a 2-subsection s of w j 0 there is a unique 2-subsection t of w j 1 within which k can match n-words. Moreover this does not depend on j, but rather the underlying locations of the words.
We start by lining up blocks of the form u ln−1 i with blocks of the form v ln−1 i . To do this we classify the k-matches of a pair (u, v) = (u i , v i ) into left block matches if u and sh k (v) align as 28 : . . u* u* u* B u . . u u u u u u . . . . u u u u u . . u u B u* u* u* . . . . v* v* v* B v . . v v v v v v . . 28 In both of these graphics the second row is a portion of sk k (w j 1 ) and B represents a boundary section. These pictures are independent of j.
Note by taking k to be k + lq n for some l < l n − 1 we can turn all left block matches of all of the (u i , v i ) into matches of entire u ln−1 i with sh k (v ln−1 i ), but doing so destroys completely some of the right block matches. Similarly if we can shift to make all right block matches into matches of u ln−1 i with sh k (v ln−1 i ) by destroying left block matches. If we examine a particular left block match of a pair (u i , v i ) in some w j 0 and a right block match of another pair (u i , v i ) in w j 1 and we change k to k to make u ln−1 i match with sh k (v ln−1 i ) then the sum of k -matches between (u i , v i ) and (u i , v i ) goes up by one: we lose the right block matches but we gain left block matches and we gain one more match from the boundary section.
Suppose that j j α j |{left block matches in (w j 0 , w j 1 )| ≥ j j α j |{right block matches in (w j 0 , w j 1 )| Then from the previous paragraph that if we take k = k + lq n for some l < l n − 1 then we can make all left block matches have multiplicity l n − 1 (while removing right block matches) and have: j i α j |k -matches of a u i with a v i in some (w j 0 , w j 1 )| ≥ j i α j |k-matches of a u i with a v i in some (w j 0 , w j 1 )|.
If, on the other hand, the weighted sum of the right block matches is greater than weighted sum of the left block matches, we shift the other direction to fix all right block matches and destroy all left block matches. Thus we can assume that we have a k such that for all (u i , v i ), sh k matches (l n − 1)-powers of u i with (l n − 1)-powers of v i . This k would be a perfect match except that it matches n-words across 2-subsections. Writing each w j s = C n (w 1 , . . . w kn ) then sh k matches blocks of the form w ln−1 s in one 1subsection of w j 0 with a block of the form w ln−1 s in a (potentially) different 1-subsection of w j 1 . Moreover s − s is constant on all of these matches, since the differences between starts of w ln−1 j -blocks are of length l n q n . Fix such a pair s, s . By changing k so that it lines up w ln−1 s with w ln−1 s in the first 1-subsection we create a perfect match of n = m − 1-words and increase the total number of matches of the form (u i , v i ). This establishes the case where d = 1.
We now do the induction step. Let d = m − n and assume the result holds for d − 1. Suppose that we are given {α j : j ∈ J}.
We can decompose a k-match between n-subwords of w j 0 and w j 1 as k 1 +k * where k * ∈ [−q m−1 + 1, q m−1 − 1] and k 1 is a match of m − 1 subwords of w j 0 and w j 1 . Here is a picture of a pair (u , v ) ∈ W c m−1 × W c m−1 comparing w 0 in the upper row with the k 1 -shift of w 1 in the lower row.
Here is a picture after the k = k 1 + k * shift of w 1 : . . . . . .
Let {(u , v ) i : i ∈ I } be the collection of pairs (u , v ) from W c m−1 sitting inside a pair (w j 0 , w j 1 ) that contain k-matches of words (u i , v i ). Arguing as in the case d = 1 we can adjust k 1 to a k 1 so that it is a perfect match of m − 1words in I and, summing over I and J, the weighted sum of k 1 + k * -matches of pairs in I does not decrease. 29 This is how the m − 1-words look after shifting by k 1 + k * : . . .
The offset of the copies of u and v is k * . Note that the boundary sections line up. We now are in the position of having shifted by k 1 so that the powers of pairs {(u , v ) i : i ∈ I } are lined up. The additional shift k * has absolute value less than q m−1 . Moreover all of the words {(u , v ) i : i ∈ I } are lined up the same way when shifted by k * . 29 We note that it is not enough to increase the weighted sum of the number of matches of pairs in I , because various I matches may contain different number of I-matches. Nonetheless, arguing as in the case d − 1, one of the two possibilities for lining up the m − 1 subwords does not decrease the weighted sum of the number of k 1 + k * -matches of I-words.