FROM INVARIANCE TO SELF-SIMILARITY: THE WORK OF MICHAEL HOCHMAN ON FRACTAL DIMENSION AND ITS AFTERMATH

A BSTRACT . M. Hochman’s work on the dimension of self-similar sets has given impetus to resolving other questions regarding fractal dimension. We describe Hochman’s approach and its inﬂuence on the subsequent resolution by P. Shmerkin of the conjecture on the dimension of the intersection of × p and × q -Cantor sets.

In the distant background of M. Hochman's work on self-similar sets [3] is a particular example of "multiparameter rigidity" appearing in [2]. The space in question is the 1-torus T = R/Z, and the action is generated by two endomorphisms, T p and T q , where for n ∈ N, T n x = nx mod 1, and p, q are "multiplicatively independent", i.e., p k = q l for all k, l . (In our discussion, p, q will always satisfy this condition.) The rigidity claim proved in [2] is that the only closed subsets of T invariant under both T p and T q are certain finite sets of rationals and T itself. This implies that for irrational α ∈ T, the orbit {p m q n α; n, m ≥ 0} is dense in T. This suggests that the individual orbit closures, {p n α} and {q n α} cannot both be small. One is led to the following quantitative formulation in [1]: CONJECTURE 1. For every irrational α ∈ T, dim {p n α} + dim {q n α} ≥ 1. (1) Here dim refers to Hausdorff dimension. An equivalent formulation, more in the spirit of the subsequent discussion is: CONJECTURE 2. If A and B are two closed subsets of T with T p A ⊂ A, T q B ⊂ B such that dim A + dim B < 1, then the intersection A ∩ B , if not empty, consists only of rationals.
The conclusion in this conjecture implies that under the given conditions A ∩ B is either empty or countable. One now knows:

THEOREM 1. If A and B are closed subsets of T, invariant under T p and T q respectively, then
Here, the empty set is regarded as having negative dimension so that the inequality in (2) is valid when A∩B = . If A∩B is countable, then its dimension is 0, so (2) is also verified. When A = B , then either dim A = 0 or dim A = 1, which for invariant sets implies A = T. This comes close to the statement in [2].
Theorem 1 was proved independently by P. Shmerkin ([4]) and M. Wu ([5]). Both in the proof of Theorem 1 and in the formulation of Conjecture 2, it suffices to assume that the sets A and B are "Cantor-like" sets. To be precise, we will call a subset S ⊂ T a b-Cantor set if for some subset J ⊂ {0, 1, . . . , b − 1}, S = {x = ∞ n=1 a n b −n with all a n ∈ J }. Suppressing the base b we call a set of this type a Cantor set. One can show that if A ⊂ T is a closed T p -invariant set, and ε > 0, there exists a p-Cantor set A ⊃ A with dim A < dim A + ε. Using this, it follows that if (2) holds for Cantor sets, it holds for arbitrary invariant sets. Also using the fact that the family of all Cantor sets is countable, it follows from Theorem 1 We obtain the following result in the spirit of Conjecture 1: Cantor sets are a special case of invariant sets, but they are also examples of self-similar sets which need not be invariant in the usual sense. It is in this more general setting that Shmerkin and Wu give proofs of Theorem 1. Also Hochman's result in [3] is demonstrated in this setting. We proceed to define self-similarity for subsets of the line. To define such a set we suppose given a finite family of contracting similarity maps It is not hard to show that given Φ, there is a unique compact set X ⊂ R satisfying X is called the attractor of the "iterated function system" determined by Φ. (The expression "iterated function system", or IFS, derives from the fact that the points of X are limits of sequences ϕ i 1 ϕ i 2 . . . ϕ i n (x 0 ), n → ∞, for any fixed x 0 .) When the λ i appearing in the definition of the ϕ i are all equal to b −1 for an integer b, and the α i ∈ J , a subset of {0, 1, . . . , b − 1}, then the corresponding attractor is exactly the b-Cantor set determined by J .
The theorem proved by P. Shmerkin and M. Wu which implies Theorem 1 is the following: As to the dimension of a self-similar set determined by the data {λ i , α i , 1 ≤ i ≤ k} there is no known algorithm covering all cases. An elementary result can be stated when the IFS satisfies a "separation condition".
Note that the condition in the theorem implies σ(Φ) ≤ 1. Without this separation condition the solution s to (5) may exceed 1. Since the dimension of a set in R always has dimension ≤ 1, it will be convenient to define more generally where s is the unique solution to (5).
In [3], Hochman gives a weaker condition on Φ for which the general statement dim X = σ(Φ) is valid. This condition is called "exponential separation". To formulate this condition we define the sets of similarity maps DEFINITION 7. We say that Φ satisfies exponential separation if ∃δ > 0 and n > δ n for all n.
(This condition is slightly stronger than that given by Hochman but is adequate for our purposes). THEOREM 8 ( [3]). If the system Φ satisfies exponential separation, then the at- While the issue dealt with in Theorem 8 is quite unlike that of Theorem 4, Hochman's proof of Theorem 8 has had an impact on subsequent work on fractal dimension, and we will illustrate this by highlighting features of Hochman's argument which appear in Shmerkin's proof of Theorem 4. We identify three components of Hochman's proof which have parallels in Shmerkin's work: i) Reformulating questions dealing with dimension of sets in terms of dimension of probability measures, replacing self-similar sets with self-similar measures, ii) Using "Shannon entropy" of a probability measure to recover the dimension of a self-similar measure, iii) Showing the smoothing effect of convolving one probability measure with another, which can be made precise in terms of entropy.
The result actually established by Hochman concerns self-similar measures and implies the corresponding result for self-similar sets. A self-similar measure is determined by an IFS Φ = {ϕ i } k i =1 as before, together with a probability There is now a unique probability measure µ on where we denote by ϕµ the "pushforward measure" under ϕ, ϕµ(A) = µ(ϕ −1 A). When all λ i = λ, we can give an explicit expression for µ satisfying (8). We define the atomic measure and let the operator T γ for γ ∈ R act on measures correspond to the dilation T γ x = γx. We then define µ (n) as the convolution The limit µ = lim µ (n) satisfies µ = ρ * T λ µ and this is equivalent to (8). Moreover we have for each n, an equality which will play an important role in the sequel. It is not hard to show that the support of the self-similar measure µ is exactly the self-similar set X determined by Φ, so information regarding µ will provide information about X . For this we need the notion of dimension for a probability measure on R. The definition is We will encounter two other definitions of dimension for a measure, and these all coincide for self-similar measures. Note that for any measure ν, dim ν ≤ dim(support ν), so knowing the dimension of a self-similar measure gives a lower bound for the dimension of the self-similar set supporting it. To formulate Hochman's result, we set for a system Φ and vectorp = {p 1 , p 2 , . . . , and define τ(Φ,p) = min(t , 1). We then have THEOREM 9. If the system Φ satisfies exponential separation, then for anyp, the self-similar measure µ determined by (Φ,p) satisfies One should observe that in Theorem 8 it is easy to prove the inequality dim X ≤ σ(Φ). This follows by considering the covering of X by unions of intervals {ϕ i 1 ϕ i 2 . . . ϕ i k J } for each n, where J is an interval containing X , and letting n → ∞. Thus one is left to prove the opposite inequality. This will follow from Theorem 9 if we can choosep so that τ(Φ,p) = σ(Φ). This can be done by setting p i = λ s i with s given by (5). Thus Theorem 9 implies Theorem 8. We will give a proof of Theorem 9 in the special case: λ i = λ for all i , p i = k −1 for all i and kλ ≥ 1. The proof will illustrate the main features of Hochman's proof of the general case. Our hypotheses imply that in (13), t ≥ 1, so τ(Φ,p) = 1, and so we will want to show that dim µ = 1 for the self-similar measure µ. We should note that if (14) is shown to be valid for some self-similar measure with Φ as given, this will imply the equality in Theorem 8. So, as regards Theorem 8, the extra hypothesis p i = k −1 is harmless.
The main tool used by Hochman and his proof is the notion of entropy coming from Shannon's information theory. We start with the definition: and call this the Shannon entropy of the measure with respect to the partition. (We take the value of x log x to be 0 for x = 0.) Given a probability space (X , µ) and a subset A ⊂ X with µ(A) > 0, we can define the conditional measure µ A supported on the set A by We will say a partition P refines the partition P if each set of P is a union of sets in P . In this case, for a set A of P , the partition P restricts to a partition of A and we can speak of the corresponding entropy H (µ A , P A ) where P A represents the induced partition on A.
We have the following two lemmas: LEMMA 11. Let P and P be two partitions on the space X with P refining P . Then . . , A n be a partition of X , and µ a probability measure on X with H (µ, P ) = h. Fix g > h and let G be the set of indices j for which µ(A j ) < n −g . Then We now turn to measures in R and make definitions for measures supported on the unit interval I = [0, 1], but analogous definition can be made for arbitrary intervals. Denote by D n the partition of [0, 1] to 2 n equal intervals. (The number 2 appears for convenience because our logarithms are taken to the base 2.) If we set then we can show PROPOSITION 13. If µ is a self-similar measure, then lim H n (µ) exists and If µ is an atomic measure; e.g., µ = not vanish for small n. However when 2 −n is less than the minimal distance between atoms, the entropy H (µ, To prove Theorem 9 under our special hypotheses we note that since all λ i are equal we have (11 bis) µ = µ (n) * T λ n µ.
We will find that we are led to a contradiction if dim T λ n µ = dim µ < 1. Roughly speaking the contradiction comes from the fact that convolution has a smoothing effect on a measure which should increase the entropy. We can't apply this argument directly to (11) since the atomic measure µ (n) has dimension 0 and convolving with it has no effect on dimension. However it may have an effect on the approximating Shannon entropies.
To give a precise formulation we consider the "dyadic" subintervals of [0, 1]: which for fixed m constitute the partition D m . We say that I m, j is of level m. Hochman's smoothing theorem is the following: THEOREM 14. For all ε > 0 and integers m, there exist η = η(ε, m) > 0 and n 0 (ε, m) such that if µ is a probability measure on [0, 1] and n ≥ n 0 (ε, m), and the proportion of dyadic intervals I of level ≤ n satisfying H m (µ I ) < 1 − ε is greater than 1 − ε, then for any ν with entropy H n (ν) > ε we will have We proceed to the proof of Theorem 9. We are given n > δ n for 0 < δ < 1 and every n. For each n let n be the integer satisfying 2 n ≤ λ −n < 2 n +1 , and choose q sufficiently large that 2 −qn < δ n < n . µ (n) is an atomic measure whose atoms are separated by n so by this choice of q, no two atoms of µ (n) are in the same dyadic interval of D qn . We fix q and we let n → ∞ and so n → ∞ as well. (n /n is bounded from above and below.) We compare the entropies H n (µ) and H qn (µ). D qn refines D n , so by By (11) this can be rewritten We want to estimate H (µ n) I , D I qn ). By (10), µ (n) is a combination of k n atoms of equal weight. Inasmuch as λ n ≤ 2 −n , the convolution µ (n) * T λ µ = µ is close enough to µ (n) with respect to D n , so that H n (µ (n) ) − H n (µ) → 0 as n → ∞. In particular, H n (µ (n) ) → dim µ. We set α = dim µ which we suppose, contrary to the assertion of the theorem, is strictly less than 1. We now apply Lemma 12 to the measure µ (n) with respect to the partition D n . For each I ∈ D n , let N (I ) denote the number of atoms of µ (n) that lie inside I . We have µ (n) (I ) = N (I )k −n . Now take β with α < β < 1. Lemma 12 implies that for a set of intervals I with total mass bounded away from 0, µ (n) (I ) > 2 −βn or N (I ) > k n 2 −βn . Recall that λ ≥ 1 and that λ n ∼ 2 −n so that for some constant c, N (I ) > ck n λ βn and log N (I ) > log c + n(log kλ + log λ β−1 ) ≥ log c + n log λ β−1 so that finally log N (I ) ≥ γn (21) for some γ > 0. Now each interval I of D n is divided into 2 (q−1)n interval of D qn . The atoms of µ (n) lie in distinct intervals of D qn so that when (21) is valid, we have Dilating the measure and the partition D qn by T 2 n leaves the entropy intact and we get H (T 2 n µ (n) I , D (q−1)n ) > γn . (23) T 2 n µ I is a probability measure on an interval J of unit length for which we now obtain We intend to apply Theorem 14 to the convolution T 2 n µ (n) I * T 2 n λ n µ = T 2 n µ n I * T θ n γ where 1 2 < θ n < 1. One more deep fact is needed to verify the hypothesis of Theorem 14. This is the theorem of Hochman to the effect that self-similar measures have uniform entropy dimension which means that in a certain sense, most "dyadic" conditionals measures have entropy dimension close to the dimension of the measure. In our case, assuming α < 1, together with (24), we find that for some ε > 0, the hypotheses of Theorem 14 are fulfilled. The result is that for some η > 0, H (q−1)n (T 2 n (µ (n) I * T λ n µ)) > H (q−1)n (T θ n µ) + η (25) the latter expression approximating α + η, again by virtue of uniform entropy property of µ. Thus for large n, Recall that (22)-and consequently (26)-was valid for a set of intervals I where total µ-measure was bounded away from 0. For the remaining intervals we still have H (µ (n) I * T λ n µ, D I n ) ≥ H (T λ n µ, D I qn ) = H (T θ n µ, D (q−1)n ) because convolution never decreases entropy. As a result every summand in (20) exceeds (q − 1)n α + o(n ).
Since the former intervals comprise a set of intervals with Σµ(I ) > c > 0, substituting in (20) we obtain and this is impossible when n is large. This shows that α = 1 and establishes Hochman's theorem under the special hypotheses: We turn now to Shmerkin's proof of Theorem 1 in which the influence of Hochman's proof of Theorem 9 can be seen. On the face of it the two theorems are of a different nature. Theorem 1 gives an upper bound to the dimension of the intersection of special sets whose individual dimensions are known. Theorem 9 gives a precise formula for the dimension of the set in question, but as has been pointed out, the non-trivial part of the proof is showing that the dimension in question is not less than that indicated in the theorem. But in fact the first step in the proof of Theorem 1 is showing that an upper bound of the dimension of an intersection A ∩ B will, under certain conditions, be determined by a lower bound on the dimension of the difference set A − B = {x − y; x ∈ A, y ∈ B }. As in Hochman's paper the lower bound will come from the dimension of a measure on A − B . A significant difference between the two presentations is that Hochman uses entropy dimension of measures, whereas Shmerkin used L q -dimension.
We begin with an explanation of the connection between dimensions of A∩B and A − B . For this we need the notion of Frostman exponents. DEFINITION 15. Let µ be a non-negative measure on a metric space X . We say that s is a Frostman exponent for µ, if for some n 0 > 0, all the balls B (x, r ) of radius r ≤ r 0 satisfy µ B (x, r ) < r s .
One sees easily that the Hausdorff dimension of a set is never less than any Frostman exponent for any measure on the set. In the following we denote by dim B (X ) the upper box dimension of a metric space X . PROPOSITION 16. Let X be a compact metric space and π : X → R a Lipschitz map to the reals. Suppose µ is a measure on X satisfying µ B (x, r ) ≥ c 0 r β for a constant c 0 and all r < r 0 . Assume that the pushforward measure πµ on R has a Frostman exponent α. Then for every y ∈ R, Given sets A, B as in theorem 1 we take X = A × B ⊂ R 2 and define π(x, y) = x − y. One finds measures µ A , µ B on A and B respectively so that µ = µ A ×µ B satisfies the condition of the proposition with β = dim A ×dim B = dim A ×B . Since π is a Lipschitz map, the dimension of π(A−B ) cannot exceed dim A+dim B , nor can it be > 1. So if π(µ A ×µ B ) can be shown to have Frostman exponent arbitrar- We remarked earlier that to prove Theorem 1 for A and B sets invariant under T p and T q respectively, it suffices to treat the case where A is a p-Cantor set and B a q-Cantor set. In this case there are natural measures µ A and µ B whose dimensions are respectively dim A and dim B . A and B are also self-similar and µ A , µ B can also be characterized as self-similar measures with equal weights: Here The desired result regarding π(µ A ×µ B ) is obtained making use of the L q -dimension of this measure whose definition we will recall momentarily, and the evaluation of which for π(µ A × µ B ) is at the core of Shmerkin's argument. The connection of Frostman exponents to the L q -dimension is given by the following PROPOSITION 17. If µ is a compactly supported probability measure on R, and for some q > 1 the L q -dimension of µ is greater than s, then there is r 0 > 0 with for all r ≤ r 0 .
The L q -dimension of a measure µ was introduced by Rényi, and, as with the entropy dimension, is the limit of expressions depending on the dyadic partitions D n of an interval supporting the measure. DEFINITION 18. The L q -dimension of a probability measure µ on an interval [a, b] is given by where D n is the partition of [a, b] to 2 n equal intervals.
We might note that the individual terms − log as q → 1. For our use of the inequality (29), we will need arbitrarily large values of q. This flexibility is the advantage of the notion of L q -dimension over that of the entropy dimension. For a fixed n it is useful to introduce the notion of the L q -norm: DEFINITION 19. For a probability measure µ on an interval, Thus, As with entropy, there is a smoothing effect in convolving measures which affects the dimension. For L q -norms it can be seen that In line with Hochman's Theorem 14 and inequality (18), Shmerkin shows how under certain conditions the inequality (33) can be improved to Two other points of contact between Shmerkin's argument and Hochman's discussion are (i) the importance of exponential separation, (ii) expressing the measure under investigation as an infinite convolution, giving rise to a "self-similarity" property as in the equality (11).
In all this Shmerkin gives credit to Hochman's exposition. Shmerkin's treatment of the measure π(µ A × µ B ) parts ways with Hochman's discussion for a self-similar measure, in that the foregoing measure needs to be seen as one of a family of measures, π t (µ A × µ B ) where π t (x, y) = x − t y. The phenomenon of self-similarity of a single measure is replaced by a "dynamically driven selfsimilarity" in which each measure in the family is related by convolution to another.
We sketch some of the details. We begin with a description of A × B as a "dynamically driven" self-affine set. Assume p < q where A is p-Cantor and B is q-Cantor. Underlying Shmerkin's analysis is a dynamical system on [1, q] where the endpoints are identified, and a transformation S is defined by Note that with f (t ) = log t / log q, the interval [1, q] is identified with the unit interval and S corresponds to the "irrational rotation", y → y + log p/ log q. It is at this point that the multiplicative independence of p, q enters, as a result of which the system ( [1, q], S) is uniquely ergodic and all orbits are equidistributed.
We define a family of affine transformations in A × B depending on the parameter t ∈ [1, q), and a ∈ J A , b ∈ J B : For any t ∈ [1, q], any point (x, y) ∈ A × B can be represented as a limit for some sequence {(a n , b n )} in (J A × J B ) N . Moreover, any such limit gives a point in A × B . Now define similarity transformations on R: We define for all t , π t (x, y) = x − t y, and we check the commutation relation Combining (34) and (36) we find that As a result, the set A − t B consists of points of the form given by (39). We now define a family of atomic measures ρ(t ) by and setting λ = p −1 , we form the convolutions µ t ,n = T λ ρ(t ) * T λ 2 ρ(St ) * · · · * T λ n ρ(S n−1 t ).
Now in analogy with (11) we have One can verify that µ t = π t (µ A × µ B ) so that the measure π(µ A × µ B ) discussed earlier is the measure µ 1 defined in (40). The proof of Theorem 1 will follow from a general formula that enables us to evaluate the L q -dimension of µ t which turns out to be independent of t . Once we have an explicit expression for the L q -dimension of µ, we invoke Propositions 16 and 17 to give the proof of Theorem 1.
As a consequence of unique ergodicity of ([1, q], S) one has for any piecewise continuous function f (t ) and t ∈ [1, q] where d ν is the unique invariant measure on [1, q]; i.e., We denote by σ(t ) the number of atoms in ρ(t ): σ(t ) = r for 1 ≤ t < q/p r s for q/p ≤ t < q log σ(t ) is piecewise continuous, and so (q − 1)n .
Using the "self-similarity" equations (43) and a version of the subadditive ergodic theorem for uniquely ergodic systems, Shmerkin proves that for values of q → ∞, the limit in (30 bis) exists for a.e. t ∈ [1, q] and is constant, say D. The most intricate section in Shmerkin's proof of Theorem 1 deals with the proof of the following statement: THEOREM 20. For every n ∈ N let n be defined by 2 n ≤ λ −n < 2 n +1 . Let t be such that the limit exists in (30 bis) and equals D. Let R ∈ N, then lim n→∞ − log I ∈D Rn µ t ,n (I ) q (q − 1)n log λ = D.
It is for the proof of this theorem that Shmerkin proves a quantitative version of the smoothing phenomenon for convolutions, and specifically for the equation (43). The proof compares L q -norms relative to D Rn and L p -norms relative to D n and the argument can be seen as motivated by Hochman's treatment of H (µ, D qn ) vis-a-vis H (µ, D n ) in equation (19).
Given Theorem 20 we now make use of exponential separation which can be proved for the atoms in the measures ρ(t ) * ρ(St ) * · · · * ρ(S n t ). This implies that for a large enough R, the atoms of µ t ,n appear in different cells of D Rn . Assume D < 1. Now, each atom of ρ(t ) carries the weight σ(t ) −1 , so that because of the separation of the atoms in D Rn : Recall that λ = p −1 . By (44) this limit is This is a sketch of Shmerkin's argument which shows that for every t and for the measure µ t as defined D(µ t , q) = min(dim A + dim B, 1).
In general, had we chosen µ A and µ B such that the measures ρ(t ) are not equidistributed on their supports, the dimension D(µ t , q) will depend on q, and dim A + dim B is replaced by dim µ A + dim µ B .