Polynomial approximation of self-similar measures and the spectrum of the transfer operator

We consider self-similar measures on $\mathbb R.$ The Hutchinson operator $H$ acts on measures and is the dual of the transfer operator $T$ which acts on continuous functions. We determine polynomial eigenfunctions of $T .$ As a consequence, we obtain eigenvalues of $H$ and natural polynomial approximations of the self-similar measure. Bernoulli convolutions are studied as an example.


Introduction
We consider self-similar measures ν on R, given by affine maps f i (x) = t i x + v i with 0 < t i < 1, and probabilities p i for i = 1, ..., m. A simple example are Bernoulli convolutions where we have two mappings on an interval, t 1 = t 2 = t and p 1 = p 2 = 1 2 . The measure ν has compact support X which is the self-similar set defined by the f i . It is the unique probability measure which is fixed by the Hutchinson operator for Borel sets B ⊂ X .
H acts on the space of all finite Borel measures on X. Hutchinson [10] proved that for any initial probability measure µ 0 the sequence µ n = Hµ n−1 , n = 1, 2, ... converges to ν geometrically, with factor t = max t i , with respect to a certain metric. Thus 1 is an eigenvalue of H, with eigenvector ν, and there should be no other eigenvalues of modulus larger than t. The situation is a bit more complicated and will be discussed in Section 2.
For every positive integer k, there are examples where ν has a density which is k times differentiable. All Bernoulli convolutions with sufficiently large parameter t < 1 outside an exceptional set of dimension zero, belong to this class [12]. In this case, the first k derivatives of the density are eigenfunctions of H corresponding to the eigenvalues t, t 2 , ..., t k .
Numerical experiments indicate that this holds if the density has only k − 1 derivatives, and these values are the leading eigenvalues of the operator (Figure 1). We shall give an explanation for this behavior.
Hutchinson's theorem has led to well-known iteration algorithms which generate pictures of the self-similar measure ν. These methods are fast and often give a nice impression of fractal structures. In cases where ν has a density, however, they are not too accurate. It is hard to decide from such approximations whether the function is smooth or monotone since  one cannot distinguish genuine fractal structure and noise. Moreover, an approximation given by an iteration algorithm is just a set of data, not a mathematical object. We shall present an approximation by polynomials as an alternative. See Figure 2.
The key to our study is the transfer operator T acting on continuous functions h ∈ C(X) as follows.
The Hutchinson operator is the dual operator of T and so their spectra are equal, see for instance [11,Theorem 6.22]. Our first result says that T has polynomial eigenfunctions which are easy to determine. Theorem 1. (Polynomial eigenfunctions of the transfer operator, real case) The transfer operator T has eigenvalues λ n = m i=1 p i t n i and corresponding polynomial eigenfunctions q n of degree n for n = 0, 1, 2, ... If the mappings f i have equal contraction factors t i = t, these eigenvalues are λ n = t n for n = 0, 1, 2, .... Different versions of Theorem 1 will be discussed in Section 3. The proof provides a construction of the q n . Then we turn to the calculation of polynomial approximations of the self-similar measure ν. In the case that the support X of ν is an interval, the following theorem holds with the usual notation Theorem 2. (Polynomial approximation of the self-similar measure) Given the polynomial eigenfunctions q 0 = 1, q 1 , q 2 , ... of the transfer operator T, let v n denote the polynomial on X of degree n which satisfies the equations v n , q 0 = 1 , v n , q k = 0 for k = 1, . . . , n .
(4) Considered as density function, v n is the best approximation of the measure ν among polynomials of degree at most n in the sense that for all polynomials q of degree ≤ n which are not multiples of v n . If ν has an L 2 density v then v n converges to v in the L 2 norm.

Figure 2.
Polynomial approximations v t,n of the Bernoulli convolution measure ν t in the smooth case t = 0.8 (left) and the more fractal case t = 0.6 (right). In the left picture, one could not distinguish at this scale between the approximations of degree n ≥ 8. The gray line shows a histogram with 2000 bins based on 2 20 points generated by the 'chaos game' algorithm. Figure 2 shows approximations of a smooth and a more fractal self-similar measure. In Section 4 we prove Theorem 2 and discuss modifications. In Section 5 we show that the problem of finding the approximating polynomials v n can be regarded as a moment problem, as considered in [4,1]. As a consequence, construction of the approximation v n amounts to solving a system of linear equations with a Hilbert matrix of coefficients. Section 6 deals with the case of Bernoulli convolutions, and Theorem 5 will give an explicit analytical representation of the q n for this case. In the final section we indicate how singular measures can be studied with this approach.

Experiments with the spectrum of Hutchinson's operator
Before we go into proofs, let us discuss some experiments which have motivated this work. All Bernoulli convolutions will be considered on the interval I = [−1, 1]. The parameter t varies between 0.5 and 1, the figures refer to t = 0.8. The mappings are One way to think about self-similar measures is to consider ν as stationary distribution with respect to random iteration of the functions, popularized as 'chaos game'. We have a discrete time Markov process (X 0 , X 1 , ...) with kernel To approximate by a Markov chain (Z 0 , Z 1 , ...) with N states, we divide I into N disjoint intervals I k and introduce the transition probabilities where |J| denotes the length of the interval J. The matrix T N = (p kj ) k,j=1,...,N represents the transfer operator T when multiplied by column vectors (v 1 , ..., v N ) which describe functions, and the Hutchinson operator H when multiplied from the left by row vectors (u 1 , ..., u N ) which describe measures. This matrix is always irreducible and aperiodic. (This follows from p 11 > 0, p N N > 0, and from the fact that an interval J is mapped either by f −1 1 or by f −1 2 to an interval which is 1/t times longer, unless J contains the points t and 1 − t and thus will be mapped to state 1 and N.) So there is a unique left eigenvector of T N for the eigenvalue 1 which represents the self-similar measure, and a corresponding right eigenvector which is constant since we have a Markov matrix. The convergence is fast: even if we take N = 10 6 and start with uniform distribution, 60 iterations yield sufficient accuracy. Calculations were performed with MATLAB. To determine eigenvalues and eigenvectors, we took N between 400 and 2000. For each parameter t, different N were tested, with partitions into intervals of equal lengths and randomly perturbed partitions.
The pattern of eigenvalues was always the same, as shown in Figure 1. The values 1, t, t 2 , ..., t k were eigenvalues, as long as t k > 1 2 . The remaining eigenvalues were spread inside the circle |λ| < 1 2 . In a few experiments, one or two percent of the points were slightly outside that circle. The distribution inside the circle could be more or less uniform, it could be more a ring, especially when t is near 1 2 , or zero could be a multiple eigenvalue, but these properties usually changed with N, as indicated in Figure 1.
The leading eigenvalues t k > 1 2 , k = 0, 1, ... did never change, and their left eigenvectors were also very stable. If a highly oscillating function is drawn for very different N, there are natural differences, but for the smooth eigenfunctions deviations were very small. In Figure 3, the function which represents the measure ν looks like a normal distribution, and this is true whenever t > 0.8 (cf. [14]).
In Figure 3, the eigenvector e 1 represents the negative of the derivative of e 0 . The third eigenvector seems irregular, but when it is integrated (just taking cumulative sums) we get  1 , multiplied by a constant. This indicates that we have the second derivative of ν. And even e 3 , two times integrated, gives the same result, up to a constant. Thus we have a third derivative of ν, in spite of the fact that the second derivative seems a non-differentiable function.
For the case that ν has a k times differentiable density ρ(x) it is easy to check that the derivatives are eigenvectors. The self-similarity condition (1) can be reformulated as Taking derivatives we get ρ = β · Hρ , that is, Hρ = tρ , and Hρ = t 2 · ρ etc. We cannot prove, however, that there are no other eigenfunctions with |λ| > 1 2 . We also do not explain the fact that we get one more derivative than expected. As shown in Figure 4, this even holds when the density function is not at all differentiable. For the non-fractal case t = 1 this can be checked: ν has density equal 1 2 , and the distributional derivative 1 2 (δ −1 − δ 1 ) is an eigenfunction of H with eigenvalue 1 2 . Pisot parameters play a large role in Bernoulli convolutions since Erdös proved 1939 that ν has no density in this case [15,14]. In our numerical study, they did not look very special, although the pattern of eigenvalues seemed somewhat different, and for the golden mean a discontinuous eigenfunction with λ = − 1 2 was identified. Figure 4, for example, is very near to the so-called Tribonacci parameter, and still we have the 'derivative' as second eigenvector.
All eigenvalues with |λ| < 1 2 have extremely noisy left eigenvectors, although they are symmetric, odd or even functions. Since they change with N and with slight changes of partition, they can be interpreted as noise. This is highly correlated noise, however. If these functions are integrated two times, they still have a fractal appearance. If white noise is integrated two times, a smooth function will result.
The right eigenvectors of T N look much more regular. Those which correspond to λ = t k > 1 2 are polynomials, as shown in Figure 5. This will be proved below. The other right eigenvectors represent noise. They are all odd or even, and rather smooth functions -more smooth than left eigenvectors after two integrations. The duality between polynomials and special functions somewhat resembles the classical Sturm-Liouville operators which lead to special functions and associated polynomials of Legendre, Hermite etc, although our operator T is not self-adjoint and we have only a finite set of 'special functions'.
What about the circle |λ| = 1 2 ? For t = 1 2 there is an explanation. In this case ) and T has all points λ of the unit circle as eigenvalues. An eigenfunction for λ is h(x) = ∞ n=0 λ n exp(πk2 n ix) where k is a fixed odd number. See Driebe [7] and the literature quoted there. h(x) is a Weierstrass function. It is well-known that h(x) is differentiable only for |λ| < 1 2 , see [3] and its references. Apparently, numerical experiments only detect eigenvalues of differentiable functions. This point seems worth a further study.

Eigenpolynomials of the transfer operator
We now prove Theorem 1, which concerns the existence of polynomial eigenfunctions of the transfer operator for affine IFS on R. We consider affine maps f i (x) = t i x + v i on R with 0 < t i < 1 for i = 1, ..., m. The self-similar set X given by the f i is the unique compact set which fulfills the equation X = m i=1 f i (X). Moreover, we have probabilities p i > 0 with m i=1 p i = 1, and the self-similar measure ν on X is defined by equation (1). The only assumption which we need is that X consists of more than one point. Then X has infinitely many points, and each polynomial on R is determined by its values on X.
Proof of Theorem 1. The transfer operator T defined in (2) which is a linear combination of b 0 (x), b 1 (x), . . . , b k (x) and is thus a polynomial of degree k. This shows that the space of polynomials of degree ≤ k is an invariant subspace of T . The action of T on the polynomials of degree ≤ n is then represented, with respect to the basis polynomials b 0 , b 1 , b 2 , . . . , b n , by the matrix and 0 else.
. . , m, and probabilities p 1 , . . . , p m . In this case the self-similar set X is a subset of C. The operator T defined in equation (2) acts on the complex vector space C(X) of continuous functions on X and has eigenfunctions and eigenvalues as in the real case. But now we have complex polynomials, and we will add their conjugates. The proof of Theorem 1 goes through when we regard the b k (z) = z k as complex polynomials. With the additional assumption that the complex eigenvalues λ 0 = 1, λ 1 , λ 2 , . . . are all different, we see that T has complex eigenpolynomials q 0 = 1, q 1 , q 2 , . . .. The 'conjugate polynomials' 1, z, z 2 , . . . yield other invariant subspaces of T . Namely, z k is mapped into a linear combination of 1, z, . . . , z k . The action of T on 1, z, . . . , z n is then represented by the matrix T (n) which yields the conjugate eigenpolynomials q 1 , q 2 , . . . for the eigenvalues λ 1 , λ 2 , . . ..

Remark 2. (d-dimensional case)
On R d the polynomials q(x) = q(x 1 , ..., x d ) of degree n are linear combinations of the mono- ..., k d ) runs through the vectors of nonnegative integers with k 1 + ... + k d = n. We consider an affine Then it is easy to check that the space of polynomials of degree ≤ n is an invariant subspace of the transfer operator T for each n.
To get a triangular matrix T (n) , one has to order the basis vectors b k as above with respect to ascending degree n = 0, 1, 2, ... Moreover, one has to impose conditions on the maps. We assume diagonal matrices, as used for Bedford-McMullen carpets and Gatzouras-Lalley carpets: L i has diagonal vector i = ( i1 , ..., id ) and zeros outside the diagonal. Then b k (L i x) = b k ( i )b k (x), and equation (7) obtains the form So we get an upper triangular matrix T (n) with diagonal entries λ k = m i=1 p i b k ( i ) which are the eigenvalues. If all f i have the same matrix L with diagonal , then λ k = b k ( ). In case that the λ k are all different, the eigenpolynomials q k form a basis for the space of polynomials of degree ≤ n. A similar remark holds if the L i are upper triangular matrices.

Approximation of the self-similar measure
We proved that the (n+1)-dimensional vector space P n of polynomials of degree at most n is invariant under T and contains n linearly independent non-constant eigenpolynomials. These eigenpolynomials are all orthognal to the self-similar measure ν in the sense that This follows from the duality of the operators T and H, and the fact that the eigenvalue of q k fulfils λ k = 1 for k ≥ 1 by a standard argument: So there is one dimension left in the space, and Theorem 2 says that the vector v n which is orthogonal to all eigenpolynomials, considered as a density function, is the best possible approximation of ν on P n . Moreover, the space of all polynomials is dense in C(X), so that the sequence of density functions v n is in some respect the best approximation of ν by continuous density functions. We now prove Theorem 2, assuming that the self-similar set X is an interval. We use the scalar product (3) and the norm q = q, q .
Proof of Theorem 2. Let H be the n-dimensional hyperplane generated by the eigenpolynomials q 1 , ..., q n in the (n + 1)-dimensional space P n of polynomials of degree ≤ n. For h ∈ H we have ν(h) = 0 and v n , h = 0. A polynomial q ∈ P n \ H can be written as q = αv n + h with α = 0 and h ∈ H. This implies We see that ν(q)/ v n , q = ν(v n )/ v n , v n is a constant. The assumption ν(1) = v n , 1 = 1 shows that this constant equals one, and that ν(v n ) > 0. The equality ν(q) = v n , q says that v n is the generating vector of the linear form ν on the Hilbert space P n . The Cauchy-Schwartz inequality ν(q) ≤ q · v n directly implies the inequality of Theorem 2. Now we show L 2 convergence of the v n , for the case that ν has an L 2 density v. In this case the above proof implies that v n is the orthogonal projection from v onto the space P n . Thus v n − v equals the distance from v to P n which converges to zero by the Weierstrass theorem and the fact that continuous functions are dense in L 2 , see for example [16,Theorem 1.10.18, Proposition 1.10.4].
The case that ν does not possess an L 2 density will be discussed in Section 7. L 2 convergence applies in particular to the Bernoulli convolutions treated in Section 6 since their self-similar measures belong to L 2 for almost every parameter, cf. [15,14]. We assumed that X is an interval but Theorem 2 also holds under different assumptions.  Based on Remark 2, Theorem 2 can be extended to the multidimensional case which is work in progress. For complex maps, as mentioned in Remark 1, there is the problem that the algebra generated by complex polynomials and their conjugates, needed for the Stone-Weierstrass theorem, is much larger than the generated vector space.

Moments and construction of the best approximation
The j-th moment of the probability measure ν is the integral m j = ν(b j ) = X x j dν, that is, the value of ν on the j-th basis vector in the polynomials, for j = 0, 1, 2, ... Characterization of measures by moments is a classical topic [13], for self-similar measures see [4,1]. The eigenpolynomials of T provide a recursive method to compute moments of ν.
Now we have the moments, and we show how to compute the approximating polynomials v n by a system of linear equations with a well-known coefficient matrix.
Consequently, the coefficient vector u = (u 0 , u 1 , . . . , u n ) of v n (x) = u 0 + u 1 x + . . . + u n x n satisfies the linear system of equations where m = (m 0 , m 1 , . . . , m n ) and G is the Hilbert matrix with entries Proof. The first assertion can be derived recursively from Proposition 3, or from the following argument. Because of (9), condition (4) says that v n and ν generate identical linear forms on the space of polynomials generated by q 0 = 1, q 1 , ..., q n . Equation (10) expresses the same fact, referring to the standard base b 0 , ..., b n . So the two conditions are equivalent. The explicit form of the equations (10) is easily determined: This is the j-th term of Gu.
Of course, given the q k , condition (4) can directly be considered as a system of equations. The above reformulation establishes a connection with classical matrices [9] which have been thoroughly studied by many authors, mainly for their notoriously bad condition number, see e.g. [6], [5,Example 3.3]. Moreover, the system Gu = m can be solved without determining the eigenfunctions q k when moments are directly calculated in a recursive way.

Bernoulli convolutions
The IFS for the Bernoulli convolutions with parameter t ∈ (0, 1) consists of the mappings Since there is only one conjugacy class for each t we can choose v 1 and v 2 arbitrarily. We work with the mappings We denote the corresponding self-similar measure by ν t . In case t ∈ (0, 1 2 ) the measure ν t is supported on a Cantor set and is necessarily singular. For all t ∈ [ 1 2 , 1) the support is the same interval X = [−1, 1], which was a reason for choosing the mappings (11). In this case a density for ν t might exist, and will exist for almost all t > 1 2 . The question for which t exactly we have a density has been studied by many authors over more than 75 years. See the surveys of Peres, Schlag, and Solomyak [15], Solomyak [14], and the recent work of Shmerkin [12].
We will now prove a formula for the eigenpolynomials of the transfer operator for the Bernoulli convolutions. Then we compute the moments of ν t with Proposition 3. Finally, we will use these moments to compute approximations of ν t with Proposition 4.

Theorem 5. (Explicit formula for Bernoulli convolutions)
For the Bernoulli convolution with mappings f 1 (x) = tx − 1 + t and f 2 (x) = tx + 1 − t the polynomial eigenfunctions q n of the transfer operator correspond to the eigenvalues λ n = t n and can be written as q n (x) = n k=0 a n,k x k n = 0, 1, 2, ...
Thus q n is an even function for even n and an odd function for odd n.
Proof. The action of the transfer operator on polynomials of degree n is represented by the matrix T (n) defined in the proof of Theorem 1. Substituting the Bernoulli convolution parameters t 1 = t 2 = t, p 1 = p 2 = 1 2 and v 1 = −1 + t, v 2 = 1 − t in (8) yields the entries T (n) for l ≤ k ≤ n and 0 else. Non-zero entries appear only if k − l is non-negative and even, in which case T The diagonal entries λ 0 = 1, λ 1 = t, . . . , λ n = t n are eigenvalues of T (n) and thus of T too. Let q n (x) = a n,0 + a n,1 x + a n,2 x 2 + . . . + a n,n−1 x n−1 + a n,n x n with a n,n = 1 be the eigenpolynomial corresponding to the eigenvalue λ n = t n . The eigenvalue equation T p n = t n p n can be rewritten as T (n) a = t n a (13) in terms of the coefficient vector a = (a n,0 , a n,1 , a n,2 , . . . , a n,n−1 , 1) . Suppose a n,k+1 , . . . , a n,n are known for some k ∈ {0, 1, . . . , n − 1}. Equating the k−th coordinate in (13) yields a n,k t k + n−k 2 s=1 k + 2s k t k (1 − t) 2s a n,k+2s = t n a n,k .
So we have the recursion a n,k = (1 − t) 2s a n,k+2s . Setting k = n − 1 shows that a n,n−1 = 0. Thus q n is a polynomial of degree n with either odd or even powers of t.
With the aid of the eigenpolynomials we compute the moments of ν t , as shown in Proposition 3. Then we turn to the computation of the approximations v t,n (x) = u 0 + u 1 x + . . . + u n x n defined in Theorem 2. Let m = (m 0 , m 1 , . . . , m n ) be the first n + 1 moments of ν t . By Proposition 4, the coefficients u = (u 0 , u 1 , . . . , u n ) satisfy the equation . . , n. This Hilbert matrix has the shape G * is, up to the factor 2, the Hilbert matrix studied in [9] and shown to be invertible. Note that the entries do not depend on t. This is good for numerical computations, since one has to store the inverse only once for all t. Figure 2 shows approximations of Bernoulli convolutions for t = 0.6 and t = 0.8.

Remarks on singular measures
What happens if ν is a singular measure? We tried to prove that our approximating polynomials v n , taken as measures ν n (B) = B v n (x) dx , will converge weakly to ν, in the sense that ν n (f ) → ν(f ) for continuous functions f. Fortunately, a referee found a gap in our proof, and an error in an attempt to repair the proof. We are very grateful for the referee's careful work! Let us briefly discuss the approximation of singular measures. (1) v n , q = ν(q) for all polynomials q of degree ≤ n .
(2) v n , q ≥ 0 for all polynomials q of degree ≤ n with q(x) ≥ 0 for x ∈ X .
Proof. (1) was proved in Section 4, and (2) is an immediate consequence. For (3), let M = max x∈X v n (x) . Then q = M − v n is a positive function. (2) implies v n , M − v n ≥ 0 since 1, v n = 1.
For (4), we note that v n is an increasing sequence since v n is the orthogonal projection of v n+1 onto P n . If the v n are bounded, they form a weakly relatively compact set in L 2 , and a subsequence of v n weakly converges to some v ∈ L 2 . Since (1) implies v, q = ν(q) for all polynomials q which are dense in L 2 as well as in C(X), this v must be an L 2 density of ν. So for singular ν the sequence v n tends to infinity. This kind of argument is known as uniform boundedness principle, cf. [16, Corollary 1.7.7]. The proposition indicates that approximation of singular measures requires unbounded and highly oscillating polynomials. For instance, (3) shows that the set S = {x | v n (x) ≥ M 2 } fulfils M ≥ v n 2 ≥ S v 2 n ≥ ( M 2 ) 2 · |S| and thus has Lebesgue measure |S| ≤ 4 M which is very small for very large M. Such polynomials seem not to be helpful for a study of specific singular measures.
It seems appropriate, however, to study singular measures and polynomials within the context of a parametric family where most measures have an L 2 density. Due to their construction, the v n continuously depend on the parameter for each n. We have good polynomial approximations in L 2 for all regular parameter values. For the singular values, we can study the singularities of the two-variable function which is obtained in this way. Figure 6 shows a two-variable polynomial approximating the family of Bernoulli convolutions for 0.65 ≤ t ≤ 0.8 . There are three Pisot numbers β = 1.3247, 1.3803, 1.4656 where ν is known to be singular [15]. The figure shows a very smooth function: a polynomial of degree 28 in t and x. The singularities at β = 1.4656 are obvious, those at the other parameters are barely visible. A study of Bernoulli convolutions as a two-parameter L 2 function was initiated in [2] and will be continued elsewhere.