A random cocycle with non H\"older Lyapunov exponent

We provide an example of a Schr\"odinger cocycle over a mixing Markov shift for which the integrated density of states has a very weak modulus of continuity, close to the log-H\"older lower bound established by W. Craig and B. Simon. This model is based upon a classical example due to Y. Kifer of a random Bernoulli cocycle with zero Lyapunov exponents which is not strongly irreducible. It follows that the Lyapunov exponent of a Bernoulli cocycle near this Kifer example cannot be H\"older or weak-H\"older continuous, thus providing a limitation on the modulus of continuity of the Lyapunov exponent of random cocycles.


Introduction
This paper is concerned with providing limitations on the modulus of continuity of the (maximal) Lyapunov exponent (LE) of random linear cocycles. By a random linear cocycle we understand the skew-product dynamical system defined by a Bernoulli or a Markov shift on the base and a locally constant linear fiber map. We fix the base dynamics and vary the fiber map relative to the uniform norm, thus continuity is with respect to the fiber map.
Continuity of the LE in a generic setting (i.e. assuming irreducibility and contraction, which in particular imply simplicity of the LE) was first established by H. Furstenberg and Y. Kifer [12]. Recently, the genericity assumption was removed by C. Bocker-Neto and M. Viana [4] (in the two-dimensional Bernoulli setting) and by E. Malheiro and M. Viana [18] (in a certain two-dimensional Markov setting). A higher dimensional version of the result in [4] was announced by A. Avila, A. Eskin and M. Viana [23,Note 10.7]. All of these results are not quantitative, i.e. they do not provide a modulus of continuity for the LE.
The first quantitative result, namely Hölder continuity of the LE, was obtained by E. Le Page [17] in the generic, Bernoulli setting. This result refers to a oneparameter family of random linear cocycles; as such, it has been widely used in the theory of discrete, random, one-dimensional or strip Schrödinger operators (which give rise to such one-parameter families of cocycles). Still in the generic setting, extensions of this result were obtained by P. Duarte and S. Klein [8] and by A. Baraviera and P. Duarte [3]. See also P. Duarte and S. Klein [9] for a simpler approach in the two-dimensional setting. More recently, the first two authors of this paper considered the problem 1 of obtaining a modulus of continuity of the LE for two-dimensional random Bernoulli linear cocycles in the absence of any genericity assumption (see [10]), but assuming the simplicity of the maximal LE. Under this assumption, the results in [10] establish local weak-Hölder continuity of the maximal LE in the most "degenerate" situation (i.e. in the vicinity of a diagonalizable cocycle) and local Hölder continuity elsewhere. There is work in progress establishing similar results for Markov cocycles.
A natural question arising from these developments is determining how weak the modulus of continuity of the LE can be. An example of B. Halperin, made rigorous by B. Simon and M. Taylor in [20], shows that at this level of generality, the LE cannot be more regular than Hölder, and in fact the Hölder exponent may be arbitrarily close to zero. When the LE is simple (that is, positive, in the SL 2 (R) setting), by [10] it is at least weak-Hölder continuous. Can it be much weaker than this when the LE is not simple? Y. Kifer [15] considered the random Bernoulli cocycle (C, D; p, 1 − p) generated by the matrices with probabilities (p, 1 − p). A simple calculation shows that if p > 0 then the corresponding Lyapunov exponent is 0, while when p = 0, the Lyapunov exponent is 1, thus implying the discontinuity of the Lyapunov exponent as a function of the probability vector (p, 1 − p) at the boundary of the simplex. In this work we provide an upper-bound for the regularity of the LE as a function of the matrices at (C, D; 1 2 , 1 2 ). A similar upper-bound should hold for any probability 0 < p < 1. So far the only available method for proving limitations on the regularity of the LE for random cocycles is that of Halperin. This method in fact relies on the Thouless formula, which relates the LE to another quantity called the integrated density of states (IDS). The Thouless formula is only available for Schrödinger (and Jacobi) cocycles, which is not the case with Kifer's example. Our idea was then to embed Kifer's example into a family of Schrödinger cocycles (thus making the Thouless formula applicable) but over a finite type mixing Markov shift.
The example in this paper shows a huge breakdown on the regularity of the IDS (Theorem 1) which implies a similar breakdown for the LE in Kifer's example (Theorem 2). In this example the two assumptions of the classical result of Le Page (and of its extensions) fail, namely the cocycle is not strongly irreducible and it has zero Lyapunov exponent.
Furthermore, Proposition 11 shows that given a cocycle with zero LE, if it is strongly irreducible, then the LE must be pointwise Lipschitz at that cocycle.
Therefore, in some sense it is the simultaneous failing of the two assumptions that produces the break in regularity. 1 Independently and with different methods, the same problem has also been studied by E. Y. Tall and M. Viana.

Basic concepts
Linear cocycles. Consider a probability space (X, µ) and an ergodic measure preserving transformation T : X → X on (X, µ). An SL(2, R)-linear cocycle over T is any map F A : X × R 2 → X × R 2 defined by a measurable function A : X → SL(2, R) through the expression When the base map T is fixed we identify F A with A.
The forward iterates F n A are given by F n The Lyapunov exponent (LE) of F A is defined as the µ-almost sure limit whose existence follows by Furstenberg-Kesten's theorem [11].
Schrödinger operators and cocycles. Let T : X → X be an ergodic transformation over a probability space (X, µ). Denote by l 2 (Z) the usual Hilbert space of square summable sequences of real numbers (ψ n ) n∈Z . Note that lim n→±∞ ψ n = 0 for all ψ ∈ l 2 (Z). Given some bounded measurable function υ : X → R, at every site n on the integer lattice Z we define the potential v n (x) := υ(T n x) .
The discrete ergodic Schrödinger operator with potential n → υ n (x) is the operator H x defined on l 2 (Z) ψ = {ψ n } n∈Z as follows: [H x ψ] n := −(ψ n+1 + ψ n−1 ) + v n (x) ψ n . (2.1) Due to the ergodicity of the system, the spectral properties of the family of operators {H x : x ∈ X} are independent of x µ-almost surely.
Consider the Schrödinger eigenvalue equation for some eigenvalue E ∈ R and eigenvector ψ = {ψ n } n∈Z . The associated Schrödinger cocycle is the cocycle A E defined by Note that the Schrödinger equation (2.2) is a second order finite difference equation. An easy calculation shows that its formal solutions are given by Denote by P n : l 2 (Z) → C n+1 the coordinate projection to {0, 1, 2, . . . , n} ⊂ Z, by P * n its adjoint and let H (n) x := P n H(x) P * n .

(2.4)
This finite rank operator is called the n-truncation of H x . By ergodicity, the following limit exists for µ-a.e. x ∈ X: The function E → N (E) is called the integrated density of states (IDS) of the family of ergodic operators {H x : x ∈ X} (see [7]). The LE and the IDS are related via the Thouless formula: Random cocycles. Let Σ = {1, . . . , s} be a finite alphabet, let X = Σ Z be the compact product space of bi-infinite sequences of symbols in the set Σ and let T : X → X be the full shift map, T {x n } n∈Z := {x n+1 } n∈Z . Given a probability vector q = (q 1 , . . . , q s ) on Σ, consider the product probability measure P q = q Z on X. The map T determines an ergodic transformation on (X, P q ) called the two-sided Bernoulli shift.
Next we introduce the broader class of Markov shifts. Recall that a stochastic matrix is any square matrix P = (p ij ) ∈ Mat s (R) such that: (1) p ij ≥ 0 for all i, j = 1, . . . , s, (2) s i=1 p ij = 1 for all j = 1, . . . , s. A P -stationary vector is any probability vector q ∈ R s such that q = P q, that is, q i = s j=1 p ij q j for all i = 1, . . . , s. Each power P n is itself a stochastic matrix. Given a pair (P, q) where P is a stochastic matrix and q is a P -stationary probability vector there exists a unique probability measure P = P (P,q) on X = Σ Z such that the stochastic process {e n : X → Σ} n∈Z , e n (x) := x n , has constant distribution q and transition probability matrix P , i.e., for all i, j = 1, . . . , s, The support of P (P,q) is the space of admissible sequences B(P ) := {x ∈ X : p xnx n−1 > 0 ∀n ∈ Z} commonly referred to as the sub-shift of finite type defined by P . The stochastic matrix P is called primitive if P n > 0 for some n ≥ 1 (that is all the entries of P n are positive). If P is primitive then the two-sided shift T : X → X is a mixing measure preserving transformation on (X, P (P,q) ), called a mixing Markov shift.
A (locally constant) random cocycle is any cocycle A : X → SL(2, R) over a Bernoulli or Markov shift T such that A({x n }) depends only on the first coordinate x 0 ∈ Σ. Once the base dynamics given by the full shift is fixed, a random cocycle is completely determined by a list of s matrices, A 1 , . . . , A s ∈ SL(2, R), such that Modulus of continuity. Any continuous and strictly-increasing function ω : [0, +∞) → [0, +∞) with ω(0) = 0 will be referred to as a modulus of continuity. Given a metric space (X, d), we say that a function f : X → R has modulus of continuity ω if f (x) − f (y) ≤ ω(d(x, y)), ∀ x, y ∈ X.
Let us recall some common moduli of continuity. A function f : X → R is Hölder continuous if it has modulus of continuity ω(r) = C r α = C e −α log 1 r for some pair of constants C < ∞ and 0 < α ≤ 1. When α = 1 this corresponds to Lipschitz continuity.
A function f is log-Hölder continuous if it has modulus of continuity ω(r) = C (log 1 r ) −1 for some constant C < ∞. Additionally, we define a stronger modulus of continuity than log-Hölder.
Note that when γ = 1 and β = 1, this corresponds to log-Hölder continuity, while as these parameters increase, the modulus of continuity improves.
M. Goldstein and W. Schlag [14,Lemma 10.3] showed that any singular integral operator on a space of functions preserves the modulus of continuity, as long as it is sharp enough. This applies to (γ, β)-log-Hölder continuity with γ > 1, β ≥ 1 (and so to weak-Hölder and Hölder as well) but not to log-Hölder (or to a slighly stronger) modulus of continuity.
Since the Thouless formula relates the IDS and the LE via such a singular integral operator (essentially the Hilbert transform), we conclude the following.
However, the mere log-Hölder continuity of the IDS has no implications on the regularity of the LE (which in general may even be discontinuous).

Main results
Consider Σ = {0, a, b, c} and the following Markov chain. Let X = Σ Z = {0, a, b, c} Z , T : X → X be the shift map and let be a probability vector on Σ. The transition probability matrix is primitive since P 5 > 0 and the vector q is P -stationary. Therefore the pair (P, q) determines a unique probability measure P = P (P,q) on X, and with this measure the map T : X → X is a mixing Markov shift.
Consider now the function υ : Σ → R defined by This function determines the locally constant random potential υ : X → R defined by υ(x) = υ(x 0 ) for all x ∈ X, which in turn determines the family of Schrödinger cocycles depending on the parameter E ∈ R, over the Markov shift above.
Consider the corresponding discrete Schrödinger operator The following is the first main result of this paper.
Theorem 1. For any β > 2, the integrated density of states N (E) of the discrete Schrödinger operator corresponding to the random Markov shift defined above is not (1, β)-log-Hölder continuous at E = 0.
Recall that W. Craig B. Simon [6] established log-Hölder continuity of the IDS in the general setting of ergodic Schrödinger operators. W. Craig [5], J. Pöschel [19] and more recently H. Krüger and Z. Gan [16] constructed examples showing that this result is optimal in the setting of Schrödinger operators with limit periodic potentials. By a result of A. Avila [1], (non periodic) limit periodic potentials can be obtained by sampling a continuous function along the orbits of a minimal translation of a Cantor group. Finally, we were made aware of the work in progress [2] by A. Avila, Y. Last, M. Shamis and Q. Zhou, where log-Hölder is proven to be the optimal modulus of continuity for cocycles over a torus translation.
Theorem 1 shows that the log-Hölder continuity of the IDS obtained by Craig and Simon is nearly optimal at the other end of ergodic behavior, namely for Schrödinger operators with random potentials.
Using the above considerations regarding the transfer of a modulus of continuity via the Thouless formula, we derive the following about the regularity of the LE for random Bernoulli cocycles.

A Probabilistic Lemma
The purpose of this section is to prove the following key lemma. Lemma 1. Consider the matrices C and D defined in (1.1). There exist n 0 ∈ N such that for all n ≥ n 0 , the event that a random i.
√ n has probability > 4 10 . This will be proved at the end of this section.
C} has the form (a) ± e κ 0 0 e −κ when the number of factors A j = C is even, when the number of factors A j = C is odd.
Proof. C 2 = −I so, C 2n = ±I depending on whether n is even or odd, which proves (1). Item (2) follows from the fact that DCDC = −I. Let us prove (3). Given a product A n . . . A 1 A 0 substitute any even length list of consecutive C's in it by a sign ±1, to get a product of the form: Since there was an even number of C's cancellations, the number of factors A j = C has the same parity as the number of C's in (4.1). Items (1) and (2) imply that C D l = D −l C for all l ∈ Z.
Making use of these commutation relations combined with the identity C 2 = −I, we can transform (4.1) into either ±D κ or ±C D κ , for some κ ∈ Z. Since the number of C's cancellations is even, the product (4.1) is equal to ±D κ if and only if the number of C's in it is even. This proves item (3).
Consider a random i.i.d. process {A n } n≥0 , such that for all n ≥ 0, Define the product process By Proposition 2, the process M n takes value in the union of the following two disjoint classes of matrices.
By the same proposition we have Consider the sign valued process {η n } n as well as the real valued process {S n } n characterized by  With the notation introduced, from Proposition 3 we get i · · · −4 −3 −2 −1 0 +1 +2 +3 +4 +5 · · · a(1, i) 1 1 a(2, i) where the symbol stands for disjoint union. These identities imply that From the initial conditions we see by induction in n that a − (n, i) = 0 and a(n, i) = a + (n, i) when n + i is even, while a + (n, i) = 0 and a(n, i) = a − (n, i) when n + i is odd. From (4.6) and (4.7) we get that when n + i is odd and a − (n, i) = 0 = a + (n, i + 1) otherwise. Therefore, because of these equalities, if n + i is even then Similarly, if n + i is odd then This establishes identity (4.2). Table 1 presents the calculation of the first five rows of a(n, i). The recursive relations (4.3) and (4.2) show that both sequences {a(n, 2i) : −n + 1 ≤ 2i ≤ n} and {a(n, 2i) : −n + 1 ≤ 2i + 1 ≤ n} have exactly n entries matching the binomial numbers { n−1 k : 0 ≤ k ≤ n − 1}. Formulas (4.4) and (4.5) hold because as 2i ranges from −n + 1 to n the variable k = n−1 2 + i ranges from 0 to n, while as 2i + 1 ranges from −n + 1 to n the variable k = n 2 + i ranges from 0 to n. Proof of Lemma 1. Consider the event E n = [ η n = +, S n ≥ 1 10 √ n] whose probability we want to estimate, which can be identified with the following set of words if n is even We now use the Central Limit Theorem (CLT) to estimate these sums. Consider an i.i.d. process {Y n } where each Y n is a Bernoulli random variable with probabilities ( 1 2 , 1 2 ), that is P(Y n = 0) = P(Y n = 1) = 1 2 . All moments of Y n are equal to 1 2 and so is its standard deviation σ(Y n ) = 1 2 . Next consider the normalized sum process The CLT says that T n converges in distribution to the standard normal N (0, 1), whose cumulative distribution is given by More precisely this means that for all u ∈ R, lim n→∞ P(T n ≤ u) = F (u).
On the other hand because the random variables Y n are Bernoulli, The Berry-Esseen's Theorem (see [22]) implies that there exists C < ∞ such that for all n ∈ N |P(T n ≤ u) − F (u)| ≤ C √ n .
Using this fact, the threshold after which P(E n ) > 0.4 holds can be explicitly computed.

Proof of Theorem 1
B. Halperin gave an example of a random Schrödinger cocycle where the IDS (hence also the LE), as a function of the energy E, cannot be better than Hölder continuous, with some explicitly given Hölder exponent. Our argument follows closely the proof of this result given by B. Simon and M. Taylor in [20].
Lemma 2 (Temple's Inequality). Let A be a self-adjoint operator in some Hilbert space. Assume {f j } k j=1 is an orthonormal family such that: for some ε > 0 and E 0 ∈ R. Then A has at least k eigenvalues (counted with multiplicity) in the range  Proof. Straightforward calculation.

Clearly the set of allowable sequence has full probability.
A word w ∈ Σ n is called allowable if it is a path in the graph of Figure 1. We denote by B(n) the set of all allowable words w ∈ Σ n . A word w ∈ Σ n is called admissible if it is allowable and moreover it only contains full 'abc' blocks. For instance the word (00abc0ab) is allowable but not admissible. We denote by A(n) the set of all admissible words w ∈ Σ n . We write b(n) = #B(n) and a(n) = #A(n). Given w ∈ X = Σ Z such that the finite word (w 0 , w 1 , . . . , w n−1 ) is admissible, we can decompose the iterate A (n) 0 (w) (at the energy level E = 0) as a product of matrices with factors C and D.
Consider any w ∈ X in the cylinder determined by w * . The associated ntruncation of the Schrödinger operator, defined in (2.4), is given by: Let ψ = (ψ j ) j∈Z ∈ l 2 (Z) be the sequence defined recursively by Proof. We only prove items (1) and (2). The proof of the third item is similar.
Recall that a(n) and b(n) count, respectively, the admissible and allowable words of length n.
For the second sequence, note that Looking at the first terms of a(n) in Table 2, since a(n) and b(n) satisfy the same recursive relation, we get that b(n) = a(n + 4) for all n ≥ 1.
Now the characteristic equation of the linear recursive equation for a(n) is the polynomial equation −x 3 + x 2 + 1 = 0. This polynomial has 3 roots, the Pisot number λ = 1.46557 . . . and two more complex roots σ, σ inside the unit circle. Hence there are constants c 1 ∈ R and c 2 ∈ C such that a(n) = c 1 λ n + c 2 σ n + c 2 σ n ∀ n ∈ N. has probability > 8 100 . Proof. Consider the set A l+1 = A(l + 1) of all admissible words in Σ l+1 . By Corollary 7, if n 0 is large enough and l ≥ n 0 , P(A l+1 ) > 0.216.
To finish we now derive the lower bound for the probability of the word set Applying the Law of Total Probabilities we have In the first step we have used that n ≥ l/3. Also, by Lemma 1 we get This concludes the proof.
as the j-th block of the word w.
Moreover, the block of length 2l + 1, , obtained by removing the first and last symbols from the j-th block of w, is called the inner j-th block of w. Proof. Given a word w ∈ X, for each 1 ≤ j ≤ m such that the inner j-th block of w lies in C l we take the quasi-eigenfunction of Proposition 5 and shift it to become supported on I • j . Let f 1 , f 2 , . . . , f n l,m (w) ∈ l 2 (Z) be the list of functions thus obtained. Since each f i vanishes outside some I • j i , by Proposition 5 the truncated function P L f i satisfies H w . By construction P L f i vanishes at the endpoints of the block I j i . It follows that H (L) w (P L f i ) is also supported on the block I j i . Because these blocks are pairwise disjoint, assumptions (2) of Lemma 2 are automatically satisfied. The conclusion follows then by Temple's inequality.
Hence by Lemma 4, Given > 0, consider the (2 + )-log Hölder modulus of continuity For sufficiently large l by Lemma 4 one has K l ≈ l 1/2 , so .
which means that the IDS is not (2 + )-log Hölder continuous.

Proof of Theorem 2
Recall that Σ = {0, a, b, c}, X = Σ Z and T : X → X denotes the two-sided shift. Let A = C(0) ∪ C(a) be the union of cylinders determined by the one letter words '0' and 'a'. Let N : A → N be the first return time to A and T A : A → A be the induced (first return) map on A. The function N : A → N takes two values and hence the induced map on A is given by The family of Schrödinger cocycles A E : X → SL 2 (R) also induces a family of Consider now the sum process S n : A → N, defined by S n (x) := n−1 j=0 N (T j A x). By the ergodicity of (T, P) and (T A , P A ), for P-almost every x ∈ A, This proves the proposition.
Consider the map h : A → {0, 1} Z that to each admissible sequence x ∈ A associates the sequence y ∈ {0, 1} Z obtained from x by replacing each block 'abc' by the single letter '1'. This map conjugates the return map T A : A → A with the full Bernoulli shift T : {0, 1} Z → {0, 1} Z . It also determines a conjugation between the family of cocyclesÃ E over T A : A → A and the family of random Bernoulli cocyclesÂ E = (C(E), D(E)) over the full shift T : {0, 1} Z → {0, 1} Z defined by the following matrices Therefore, by Theorem 1 and Proposition 1 the function E → L(Â E ) is not (γ, β)log-Hölder continuous at E = 0 for any γ > 1. This proves Theorem 2.

Irreducible cocycles
In this section we consider random SL 2 -cocycles over a finite Bernoulli shift. Let Σ = {1, . . . , s} be a finite alphabet and fix some Bernoulli measure P q = q Z , where q is a probability vector on Σ. Let T : X → X denote the two sided shift on the space of sequences X = Σ Z endowed with the probability measure P q .
Recall that a random cocycle over the Bernoulli shift T is defined by a locally constant measurable function A : X → SL 2 (R), i.e., a function which depends only on the 0-th coordinate. This implies that A is determined by a function A : Σ → SL 2 (R), or, in other words, by a list of s matrices A 1 , . . . , A s ∈ SL 2 (R). Definition 7.1. A random cocycle A : Σ → SL 2 (R) is said to be irreducible if there is no point L ∈ P(R 2 ) such that A(x)L = L for all x ∈ Σ. Definition 7.2. A random cocycle A : Σ → SL 2 (R) is said to be strongly irreducible if there is no finite subset L ⊂ P(R 2 ), L = ∅, such that for all x ∈ Σ, A(x)L = L.
Clearly, strongly irreducible cocycles are also irreducible. Irreducible cocycles which are not strongly irreducible will be referred to as simply irreducible cocycles.
The following statement is a classical theorem of H. Furstenberg [13]. Proof. By Theorem 3, the sub-semigroup S ⊂ SL 2 (R) generated by the matrices A 1 , . . . , A s of the cocycle A must be compact. This implies that the group generated by S is also compact, and that all matrices A j are orthogonal w.r.t. some inner product. Denoting by · the corresponding operator norm (on the space of matrices) one has A j = 1 for all j = 1, . . . , s. Note that E q [log A ] = 0 for the cocycle A. Hence for any constant C < ∞ such that M ≤ C M , for all M ∈ Mat 2 (R), where · stands for the canonical operator norm on Mat 2 (R).
Remark 7.2. Proposition 11 provides a modulus of Lipschitz continuity at A, but it does not imply that the LE is always Lipschitz in a neighborhood of A.
We conclude this paper with the following question. Given a random SL 2 (R)cocycle under the assumptions of Proposition 11, is the LE always uniformly Hölder continuous in a neighborhood of that cocycle?