A functional CLT for nonconventional polynomial arrays

In this paper we will prove a functional central limit theorems for"nonconventional"sums indexed by polynomial arrays.


Introduction
Since the ergodic theory proof of Szemerédi's theorem due to Furstenberg (see [13]), ergodic theorems for "nonconventional" averages 1 N N n=1 T q1(n) f 1 · · · T q ℓ (n) f ℓ became a well established field of research (the term "nonconventional" comes from [14]). Here T is a measure preserving transformation, f 1 , ..., f ℓ are bounded measurable functions and the q i 's are functions taking positive integer values on the set of positive integers. General polynomial q i 's in this setup were first considered in [3]. Taking f i 's to be indicators of measurable sets asymptotic results on numbers of multiple recurrences follow, which was the original motivation for this study.
From a probabilistic point of view, ergodic theorems are laws of large numbers, and so the question about other probabilistic limit theorems is natural. In [15] and [17] central limit theorems for random functions of the form F (ξ q1(n) , ξ q2(n) , ..., ξ q ℓ (n) ) (1.1) were obtained. Here F = F (x 1 , ..., x ℓ ) is a locally Hölder continuous function, {ξ n } is a sequence of random variables satisfying some mixing and moment conditions and the q i 's are functions satisfying certain growth conditions, which take positive integer values on the set of positive integers. Consedering polynomial q i 's, the growth conditions [15] and [17] exclude the case when some of the non-linear polynomials among q 1 , ..., q ℓ have the same degree. In [8] we extended the above functional CLT's to the case when all of the q i 's are polynomials, with no restrictions on their degrees. In this paper we will consider more general random functions of the form F (ξ q1(n,N ) , ξ q2(n,N ) , ..., ξ q ℓ (n,N ) ) where each q i is a bivariate polynomial with integer coefficients. The motivation for considering this case where the q i 's depend also on N comes from applications to multiple recurrence (see the last two paragraphs of this section), and a relatively simple but still very interesting combinatorial number theoretic example can be described as follows. For each point ω ∈ [0, 1) consider its base m or continued fraction expansions with digits ξ k (ω), k = 1, 2, .... Next, count the number S N (ω) of those ℓ-tuples q 1 (n, N ), ..., q ℓ (n, N ), n ≤ N for which, say, ξ qj (n,N ) (ω) = a j , j = 1, ..., ℓ for some fixed integers a 1 , ..., a ℓ . We have S N (ω) = N n=1 ℓ j=1 δ ajξ q j (n,N ) (ω) where δ ks = 1 if k = s and = 0 otherwise. To make ξ k 's random variable, we supply the unit interval with an appropriate probability measure such as the Lebesgue measure for base m expansions and the Gauss measure for continued fraction expansions (see Section 8). Now we can view S N as a random variable and set S N (t) = S [N t] .
In [18] several L 2 ergodic theorems were proved for the averages N − 1 2 S N (1), when F (x 1 , ..., x ℓ ) has the form F (x 1 , ..., x ℓ ) = ℓ i=1 f i (x i ). In [16] a strong law of large numbers (SLLN) was proved for N − 1 2 S N (1) when the q i 's depend only on n, while in Chapter 3 of [10], under certain mixing conditions, we proved an SLLN and a CLT for S N (1) for linear functions q i (n, N ) = a i n + b i N . The above results from [16] and [10] hold true for functions F which do not necessarily have the above product form. In this paper we will prove an SLLN for genera polynomials q i (n, N ) and a functional CLT for certain classes of polynomials q i (n, N ), and our results generalize both [8] and the above results from Chapter 3 in [10]. A crucial step in proving a functional central limit theorem (regardless of the proofs method) is to show that the asymptotic covariances exist. Using some mixing conditions, we will show that these limits exist for several classes of bivariate polynomials q i (n, N ). One of the main difficulties arising here is to understand the asymptotic behaviour as N → ∞ of certain sequences of sets A N ⊂ [1, N ] which are related to approximation properties of bivariate polynomial differences of the form |q(m, N ) − p(n, N )|. This type of behaviour is investigated independently in Section 3. In section 6.4 we address the question of the positivity of D 2 = lim N →∞ Var(S N (1)), which is important since S N converges towards the process which equals identically 0 when D 2 = 0. When any two non-linear polynomials q i and q j do not satisfy that q i (n, N ) − q j (cn + r, N ) = z, for some rational c, r, z, c = 1, and all n and N , or when the polynomials q i (n, N ) are ordered so that q 1 (n, N ) ≤ q 2 (n, N ) ≤ ... ≤ q ℓ (n, N ) for any sufficiently large n and N then we give a complete characterization of the positivity of D 2 . In more general circumstances we provide sufficient conditions for this positivity.
The CLT's from [8] and Chapter 3 of [10] rely on classical martingale approximation techniques, which was shown in [17] to be effective in the nonconventional setup. In the past decades Stein's method has become one of the main tools to prove central limit theorems. In [9] this method was applied successfully for nonconventional sums of the form (1.1), and in Chapter 1 of [10] we generalized [9], and, in particular, showed that Stein's method yields a functional CLT in the case when the q i 's depend only on n. Our proof of the CLT for the random functions S N defined above will also rely on an appropriate functional version of Stein's method. When the appropriate limiting covariances exist, Stein's method in the functional setup can be applied successfully for random functions of the form [N t] n=1 X n,N when the triangular array {X n,N : 1 ≤ n ≤ N } satisfies certain type of strong local dependence conditions, and our arguments will be based on showing that the summands in S N have such a local dependence structure. In fact, we will use this structure also to control the growth rate of the first four moments of S N (1), which is the key to the proof of the SLLN for N − 1 2 S N (1). Our results hold true when, for instance, ξ n = T n f where f = (f 1 , ..., f d ), T is a topologically mixing subshift of finite type, a hyperbolic diffeomorphism or an expanding transformation taken with a Gibbs invariant measure, as well as in the case when ξ n = f (Υ n ), f = (f 1 , ..., f d ) where Υ n is a Markov chain satisfying the Doeblin condition considered as a stationary process with respect to its invariant measure. In the dynamical systems case each f i should be either Hölder continuous or piecewise constant on elements of Markov partitions. We can also extend our results to certain classes of dynamical systems T which can be modelled by a Young tower (even though the conditions specified in Section 2 do not seem to hold true, see Section 8.3). As a consequence, our results hold true for a variety of nonuniformly hyperbolic or distance expanding dynamical systems T , as well. We refer the readers to Section 8 for a detailed description of the sequences {ξ n } mentioned above.
As an application we can consider F (x 1 , ..., x ℓ ) = x ℓ , x j = (x (1) j , ..., x (ℓ) j ), ξ n = (X 1 (n), ..., X ℓ (n)), X j (n) = I Aj (T n x) in the dynamical systems case and X j (n) = I Aj (Υ n ) in the Markov chain case where I A is the indicator of a set A.
Let M (N ) be the number of l's between 0 and N for which T qj (l,N ) x ∈ A j for j = 0, 1, ..., ℓ (or Υ qj (l,N ) ∈ A j in the Markov chains case), where we set q 0 = 0, namely the number of ℓ−tuples of return times to A j 's (either by T qj(l,N ) or by Υ qj (l,N ) ). Then our results yield a functional central limit theorem for the number M ([N t]) and also an SLLN for M (N ).
Acknowledgement. I would like to thank professor Yuri Kifer for suggesting me the problem and for several helpful discussions, as well.
2. Limit theorems for nonconventional polynomial arrays 2.1. Preliminaries. Our setup consists of a ℘-dimensional stochastic process {ξ n , n ≥ 0} on a probability space (Ω, F , P ) and a family of sub-σ-algebras F k,l , −∞ ≤ k ≤ l ≤ ∞ such that F k,l ⊂ F k ′ ,l ′ ⊂ F if k ′ ≤ k and l ′ ≥ l. We will impose restrictions on the mixing coefficients φ(n) = sup{φ(F −∞,k , F k+n,∞ ) : k ∈ Z} (2.1) where for any two sub-σ-algebras G and H of F The quantity φ(G, H) measures the dependence between the σ-algebras G and H; it vanishes if and only if G and H are independent. In order to ensure some applications, in particular, to dynamical systems we will not assume that ξ n is measurable with respect to F n,n but instead impose restrictions on the approximation rate β q (r) = sup k≥0 ξ k − E[ξ k |F k−r,k+r ] L q . (2.3) Our results will be obtained under the assumption that β q (n) and φ(n) converge sufficiently fast to 0 (see (2.8) and Assumption 2.1.3 below), for a certain choice of q's. We note that when the sequence {ξ n , n ≥ 0} itself satisfies some strong quantitative mixing conditions (e.g. when it forms a geometrcially ergodic Markov chain or when ξ n depends only on the n-th coordinate of a topologically mixing subshift of finite type, see Section 8), then our results hold true when F k,l is the σ-algebra generated by ξ max(0,k) , ..., ξ max(0,l) . In this case we have β q (r) = 0 for any q and r. When ξ n = f (T n x) for several types of dynamical systems T and functions f , the sequence {ξ n , n ≥ 0} will only be mixing in an ergodic theoretic sense, but when the function f is Hölder continuous we will have that β ∞ (r) ≤ cδ r for some c > 0 and δ ∈ (0, 1) (see Section 8), where β ∞ (r) is the approximation rate corresponding to some families of σ-algebras F k,l for which φ(n) ≤ cδ n . For instance, when T is a topologically mixing subshift of finite type then we can take F k,l to be the σ-algebra generated by the projections of the coordinates indexed by max(k, 0), ...., max(l, 0), while in the case when T is some hyperbolic map we can take F k,l = l j=k T −j M, where M is a Markov partition with sufficiently small diameter (see Section 8).
Next, we will discuss our stationarity assumptions. We do not require the usual stationarity of the process {ξ n , n ≥ 0}, and instead we only assume that the distribution of ξ n does not depend on n and that the joint distribution of (ξ n , ξ m ) depends only on n − m, which we write for further reference by where Y d ∼ µ means that Y has µ for its distribution. Let F = F (x 1 , ..., x ℓ ), x j ∈ R ℘ be a function on (R ℘ ) ℓ which satisfies the following smoothness and growth conditions: for some K > 0, ι ≥ 0, κ ∈ (0, 1] and all and where x = (x 1 , ..., x ℓ ) and z = (z 1 , ..., z ℓ ). We note that the role of (2.5) is to insure that we can approximate F (ξ n1 , ξ n2 , ..., ξ n ℓ ) by F (ξ n1,r , ξ n2,r , ..., ξ n ℓ ,r ) where n 1 , ..., n ℓ are arbitrary nonnegative integers, ξ n,r = E[ξ n |F n−r,n+r ] and r is sufficiently large, while the role of (2.6) is to insure that F (ξ n1 , ξ n2 , ..., ξ n ℓ ) will satisfy some moment conditions (see a discussion after Assumption 2.1.1). Such an approximations is possible, of course, only when lim r→∞ β q (r) = 0 (for some q) and the ξ i 's satisfy certain moment conditions (such conditions are imposed in Assumptions 2.1.1 and 2.1.3 below). We also remark that in the case when ξ n is measurable with respect to F n,n then our results will follow with any Borel function F satisfying (2.6) without imposing (2.5), since then our proof do not require any approximations of the above form. Our moment conditions on the ξ n 's are summarized in the following 2.1.1. Assumption. There exist w > 4, q ≥ 1 and v > 0 such that where ι and κ come from (2.5) and (2.6).
We note that Assumption 2.1.1 insures that F (ξ n1 , ξ n2 , ..., ξ n ℓ ) L w ≤ C for some C > 0 and all nonnegative integers n 1 , ..., n ℓ . When φ(n) and β κ q0 (n) (for some q 0 ) converge exponentially fast to 0 as n → ∞ then in order to get the strong law of large numbers (Theorem 2.1.2) and the functional central limit theorem (Theorem 2.3.3) we can take any w > 4, q = q 0 and any v > 0 so that 1 w > ι v + κ q0 . We also remark that we have presented the moment conditions with some dependence on the regularity on F (i.e. on κ and ι), but, of course, we could have first assumed that the properties described in Assumption 2.1.1 hold true with some w, q, v, κ ∈ (0, 1] and ι ≥ 0, and then consider only functions satisfying (2.5) and (2.6) with these κ and ι.
Next, let q 1 (n, N ), ..., q ℓ (n, N ) be polynomials with nonnegative integer coefficients which do not depend only on N . We assume here, for the sake of convenience, that deg q i ≤ deg q i+1 for any i = 1, 2, ..., ℓ − 1, where the degree of a bivariate polynomial p(x, y) is the degree of the univariate polynomial p(x, x), and that the differences q i − q j are not constants (the case of constant difference can be treated as in Section 3 of [8]). For each N set N ) , ..., ξ q ℓ (n,N ) ).
The first result we will prove in this paper is the following strong law of large numbers: 2.1.2. Theorem. Suppose that Assumption 2.1.1 holds true with numbers w and q so that where κ comes from 2.5. Then, P -almost surely we have The reason that the mixing and approximation coefficients φ(l) and β q (l) appear additively in (2.8) is that in the proof of Theorem 2.1.2 we approximate each summand F (ξ q1(n,N ) , ξ q2(n,N ) , ..., ξ q ℓ (n,N ) ) by F (ξ q1(n,N ),r(n) , ξ q2(n,N ),r(n) , ..., ξ q ℓ (n,N ),r(n) ), where r(n) is some number which depends on n and ξ m,r = E[ξ m |F m−r,m+r ] for any m and r ≥ 0 (or course, we will require that lim n→∞ r(n) = ∞ at a certain rate).
The main result in this section is a functional central limit theorem for the sequence of random functions S N : [0, 1] → R, N ∈ N given by To simplify formulas we assume the (asymptotic) centering condition which is not really a restriction since we can always replace F by F −F . We note thatF is not the expectation of F (ξ q1(n,N ) , ξ q2(n,N ) , ..., ξ q ℓ (n,N ) ), but the arguments in our proofs show that, in the circumstances of Theorem 2.1.2, the expectation of 1 N S N converges toF as N → ∞ (this is what expected to happen, in view of Theorem 2.1.2).
Our main mixing and approximation conditions for the central limit theorem is the following 2.1.3. Assumption. There exist d ≥ 1 and θ > 2 such that for any n ∈ N, In contrast to the proof of Theorem 2.1.2, in the proof of the CLT we will approximate each summand F (ξ q1(n,N ) , ξ q2(n,N ) , ..., ξ q ℓ (n,N ) ) by F (ξ q1(n,N ),r(N ) , ξ q2(n,N ),r(N ) , ..., ξ q ℓ (n,N ),r(N ) ), where r(N ) depends on N . This is the reason that φ(n) and β q (n) do not appear additively in (2.10).

2.2.
Classes of polynomials. We describe here several classes of polynomials for which we can derive the weak invariance principle for the random functions S N (·).
First, we assume here that the linear polynomials among the q i 's have the form for some integers a i and b i , namely that q i (0, 0) = 0. Our additional requirements from the linear polynomials are described in the following 2.2.1. Assumption. For any linear q i and q j the difference a i − a j is divisible by the greatest common divisor of b i and b j where the a i 's and b i 's are the same as in (2.11).
Next, in order to describe our conditions regarding the non-linear polynomials among the q i 's, we need the following definitions. Let q(n, N ) and p(n, N ) be two bivariate polynomials with nonnegative integer coefficients. We will say that q and p have exploding differences if for any δ ∈ (0, 1) there exist constants C δ > 0 and N δ and sets Γ N,δ ⊂ [1, N ], whose cardinality does not exceed δN , so that for any It is clear that any two polynomials q and p with different degrees have exploding differences and that two linear polynomials do not have exploding differences. In Section 3 we will give several classes of examples of polynomials q and p with the same non-linear degree which have exploding differences. Next, for any 1 ≤ i ≤ ℓ such that deg q i = k > 1 write where y = n/N and each Q i,u is a polynomial with non-negative integer coefficients whose degree does not exceed u. For any distinct 1 ≤ i, j ≤ ℓ such that deg q i = deg q j = k > 1, we will say that q i and q j are linearly related if Q i,k and Q j,k are not constants and there exist constants c i,j , r i,j ∈ R, c i,j > 0 so that Q j,k (c i,j y) = Q i,k (y) and Q i,k−1 (y) − Q j,k−1 (c i,j y) = r i,j Q ′ j,k (c i,j y) for any y ∈ [0, 1]. Then, any two polynomials q i and q j which do not depend on N and have the same non-linear degree k are linearly related. Indeed, in this case we have Q i,u (y) = a i,u y u and Q j,u (y) = a j,u y u for some integers a i,u and a j,u so that a i,k , a j,k > 0, and so we can take c i,j = ( . This means that all the results obtained in this paper generalize the results from [8], in which a nonconventional polynomial CLT was obtained in the case when all the q i 's are polynomial functions of the variable n. Observe also that the linear relation condition involves only the Q i,k 's and Q i,k−1 's and note that q i and q j (with the same non-linear degree k) are linearly related if and q j (n, N ) = α j n s N k−s + β j n s−1 N k−s + G j (n, N ) for some 0 < s ≤ k, polynomials G i and G j whose degree does not exceed k − 2 and positive integers α i , α j , β i and β j . We refer the readers to Corollary 3.3.1 for a characterization when two linearly related polynomials have exploding differences (see also Remark 2.3.2 below).
We will obtain out results under the following 2.2.2. Assumption. Any two non-linear polynomials q i and q j are either linearly related, or the differences of q i and q j explode.
exist, where 0 ≤ t, s ≤ 1. In particular, the limit Note that in Section 6 we will also provide several formulas for the limits b(t, s), as well as a some conditions for the positivity of D 2 (when D 2 = 0 then S N converges to the process which equals 0 identically). We refer the readers to Section 6.3 to a discussion about existence of b(t, s) (or just D 2 ) for (more general) polynomials q i 's satisfying certain number theory related conditions.

2.3.2.
Remark. The property of being linearly related is, in fact, an equivalence relation. Indeed, if both pairs (q i , q j ) and (q j , q l ) are linearly related then we can always take c i,l = c i,j · c j,l and r i,l = r j,l + r i,j · c j,l .
We will say that the polynomials q i and q j are Q-equivalent if there exist rational c and r so that the difference q i (n, N ) − q j (cn + r, N ) does not depend on n and N . Then any two Q-equivalent polynomials are linearly related. In Corollary 3.3.1 we will show that any two linearly related polynomials which are not Q-equivalent have exploding difference. Therefore Assumption 2.2.2 means that any two non-linear polynomials among the q i 's are either equivalent or have exploding differences, and under Assumption 2.2.2 having exploding differences is a symmetric relation.
When the asymptotic covariances b(t, s) exist then, using a functional version of Stein's method due to A.D. Barbour, we derive the following 2.3.3. Theorem. Suppose that Assumptions 2.1.1 and 2.1.3 are satisfied with numbers w and θ so that θ > 4w w−2 . Assume, in addition, that the limiting covariances b(t, s) exist. Then, the random functions S N : [0, 1] → R converge in distribution as N → ∞ towards a centered Gaussian process η(t) whose covariances are given by E[η(t)η(s)] = b(t, s).
The arguments in the proof of Theorem 2.3.3 together with the arguments in Chapter 1 of [10] show that Stein's method also yields almost optimal convergence rate in the CLT for the sequence of random variables S N = S N (1), when D 2 > 0. These results are not included here in order not to overload this paper.
Fix some N . Since the q i 's are polynomials, each F n,N can depend on at most d 0 ℓ 2 of the random variables F m,N , m ≥ 1, where d 0 is the maximal degree of the polynomials q i . We see then that the triangular array {F n,N , n = 1, 2, ..., N } satisfies a "local dependence" type condition (here we view the indexes m of the random variables F m,N which depend on F n,N as a certain "neighborhood of dependence"). When the sequence {ξ n } is only weakly dependent then we get a certain version of the above local dependence, but now, roughly speaking, the dependence is replaced with a certain type of "local strong dependence", which essentially means that each F n,N can "strongly depend" only a number of F m,N 's whose magnitude is smaller than N . Giving a precise meaning to this "strong local dependence" is one of the main ideas behind the proof of Theorem 2.3.3 (see the beginning of Section 7 for the precise definition of the underlying graph in the weakly depend case). This was done in Chapter 1 of [10] in the case when q i depends only on n (in fact, the full details were given there only when q i (n) = in for each i), but here the dependence on N causes additional difficulties. The resulting local dependence structure is a classical situation where we can use the functional version of Stein's method for Gaussian approximation due to A.D. Barbour (see Theorem 7.0.1 and [1]).
Using the above strong local dependence structure together with some counting arguments, we get that the L 4 norms the sums N − 1 2 (S N − NF ) are bounded in N . This is the main idea of the proof of Theorem 2.1.2. Indeed, assume for the sake of convenience thatF = 0. Then by the Markov inequality we have which together with the Borel-Cantelli lemma yields that with probability one we have lim N →∞ 1 N S N = 0 =F . The proof of Theorem 2.3.1 is somehow less transparent even in the case of independent ξ n 's (and it does not rely only on strong local dependence), but let us describe some of its key ingredients. In the case when the q i 's depend only on n, we can order them and just assume that q 1 (n) < ... < q ℓ (n) for any sufficiently large n. When the polynomials q i depend on n and N then, in general, it is impossible to order them. The first step in the proof of Theorem 2.3.1 is to show that, after omitting a "small" number of n's (in comparison to N ) between 1 to N , we can order the polynomials q i (n, N ) when n ∈ Γ i , i = 1, 2, ..., d, where {Γ i } is a partition of the remaining n's. After this is established, roughly speaking, our proof scheme requires to study certain combinatorial number theory related problems which are related to asymptotic densities of sets of the form where a, b are positive numbers between 0 to 1 and q and p are bivariate polynomials with integer coefficients. Such sets were invesigated in [8] when the polynomials q(x, y) and p(x, y) depend only on x, but when they depend also on y many additional number theoretic difficulties arise.

Differences of bivariate non-linear polynomials
Let q(n, N ) = q N (n) and P (n, N ) = P N (n) be two polynomials in the variables n, N with nonnegative integer coefficients so that deg q = deg p = k > 1 for some k > 1. We will also assume here that the polynomials q and p do not depend only N . In particular the functions q −1 The goal in this section is to investigate the asymptotic behaviour of the differences |q N (m)−p N (n)|. In Sections 3.1 and 3.2 we will prove some general results, which will be applied in Sections 3.3 and 3.4 in more specific situations. for some non-constant polynomial H, polynomials Q and P with non-negative integer coefficients and polynomials r and s so that max(deg s, deg r) < deg H.
are the inverse functions of the univariate functions P N (·) and Q N (·), respectively. In particular, for any δ > 0 there exists a constant R δ > 0 so that for any sufficiently large N and δN ≤ n, m ≤ N so that Q N (m) = P N (n), As a consequence, q and p have exploding differences when Q and P have exploding differences or when the degrees of r and s are different.
The polynomials Q and P have exploding differences when they are linearly related, but not Q-equivalent (see Remark 2.3.2 and Corollary 3.3.1). They also have exploding differences in the circumstances of Corollary 3.4.3. Set where |Γ| stands for the cardinality of a finite set Γ. Then, it follows from Proposition 3.1.1 that the polynomials q and p have exploding differences also whend = 0. The upper limitd equals 0 when Q and P have exploding differences, but also when, for instance, P (n, N ) and Q(m, N ) take values at disjoint sets (e.g. when P (n, N ) is odd and Q(m, N ) is even etc.).
Proof of Proposition 3.1.1. It is clearly enough to prove (3.2) in the case when r ≡ 0 and s ≡ 0. Since p N (n) > q N (0) and q N (n) > p N (0), the numbers t n,N = q −1 N p N (n) = Q −1 N P N (n) and s m,N = p −1 N q N (m) = P −1 N Q N (m) are well defined. Suppose first that q N (m) > p N (n). Then Q N (m) > P N (n) and we can write m = t n,N + x, where here x ≥ 0 is considered as a parameter. Define the function D n,N (y) by D n,N (y) = Q N (t n,N + y) − P N (n) = Q N (t n,N + y) − Q N (t n,N ). Then D n,N (0) = 0. Applying the mean value theorem with the function D n,N , taking into account that the derivative of Q N is increasing and that x ≥ 0, we obtain that N )·|m−t n,N |. Next, we define the function g = g N,n (·) by g(t) = Q N (t)−P N (n). Then g(t n,N ) = 0. By the mean value theorem, there exists ξ between m and t n,N so that , which together with the previous estimates implies that .
In the case when Q N (m) < P N (n) we obtain (3.2) by reversing the roles of Q and P and the above arguments.
We refer the readers to Corollary 3.4.3, in which we give a class of examples of polynomials Q and P with exploding differences.
In the case when r and s are polynomials of the same degree, we can check whether Proposition 3.1.1 can be applied with s and r in place of q and p. Still, r and s (or even p and q) may, for instance, contain a monomial which does not depend on N . In the next section we will estimate |q N (m) − p N (n)| under somehow different type of conditions, which will have applications beyond the case considered in Proposition 3.1.1.

3.2.
Estimates using decompositions into homogeneous polynomials. Set y 1 = n N , y 2 = m N and write where P j and Q j are polynomials whose degree does not exceed j. We will also assume here that Q k and P k are not constant polynomials and that Q k (0) ≤ P k (0). In the above circumstances, the function Next, for any y ∈ (0, 1], let the polynomial H N,y be given by ). Note that when k = 2 then we set , it is clear that there exists a constant A 1 which depend only on the polynomials q and p so that for any y ∈ (0, 1], . Therefore, there exist a constant N 0 so that if Q ′ k (γ k (y)) > A1 N and N > N 0 is sufficiently large then the function H N,y is strictly increasing on [−1, 1] and there exists a unique root x N (y) of H N,y in [−1, 1], which, by the mean value theorem, satisfies that H N, .

(3.6)
Observe that H N,y (0) is at most of order N −1 . When all of the functions C 2 , ..., C k−1 are identically 0 then H N,y (0) = x N (y) = 0. In general, we have the following 3.2.1. Lemma. Suppose that not all the C u 's are identically zero. Let s 0 ≤ k − 1 denote the first index u so that the function C u (·) does not equal 0 identically. Then for any δ ∈ (0, 1) there exist positive constants B 1 (δ) and B 2 (δ) and a set Γ N,δ ⊂ [1, N ], whose cardinality does not exceed δN , so that for any sufficiently large N and n ∈ [δN, N ] \ Γ N,δ , Proof. The function C s0 can have only a finite number of roots y 1 , ..., y t in the interval [0, 1] (since C s0 can be extended to an analytic function in a complex neighborhood of [0, 1]). Let us denote these roots by y 1 , ..., y t . Let δ > 0. Then there exists a constant C δ > 0 so that for any y ∈ [0, 1] which satisfy that we have |C s0 (y)| ≥ C δ . Let B be an upper bound of the absolute value of the functions C u : .
and the proof of the lemma is complete.
Our next result is the following 3.2.2. Lemma. For any natural n, m and N , with y = n N , we have .
In particular, for any constants s < k and 0 < B 1 (y) < B 2 (y) < ∞ so that for some K > 0. The constant K(y) depends only on B 1 (y), B 2 (y), K and s.
Let δ ∈ (0, 1). When y = n/N ∈ [δ, 1] then Q ′ k (γ k (y)) ≥ C δ for some C δ > 0 which depends only on δ, and so the magnitude of Q ′ k (γ k (y))N k−1 is N k−1 and the inequality Q ′ k (γ k (y)) > 2A1 N holds true, assuming that N is sufficiently large. Observe also that for such n's we have and that D δ depends only on δ (and on the polynomials q and p). Therefore, when for any n and m satisfying (3.8), for some constant K δ > 0 (and so, the problem of verifying that q and p have exploding differences is reduced to the study of (3.8)-see Corollary 3.4.2 for an application).
Proof of Lemma 3.2.2. Write y = n N and m = N γ k (y) + x, where x is considered here as a parameter. Then, by considering the Taylor expansion of the polynomials Q j around the point γ k (y) we have where we used that Q k (γ k (y)) = P k (y). By considering the above expression as a (polynomial) function of x (where y is considered as a parameter), and then considering its Taylor polynomials around the point R k (y) we arrive at where H N was defined in the statement of the lemma q s,s is the coefficient of monomial y s in the polynomial Q s (y).
In the following sections we will apply Lemma 3.2.2 in several situations, where γ k and R k are assumed to have certain structure. In Section 6.2 we will use the results from Sections 3.3 and 3.4 in order to prove Theorem 2.3.1. For some abstract application of Lemma 3.2.2, we refer the readers to Section 6.3.

3.3.
Application I: linearly related polynomials. We begin with the following consequence of Lemma 3.2.2.
3.3.1. Corollary. Suppose that γ k (y) = cy, R k (y) = r for some constants c and r, namely that q and p are linearly related. Then either q and p have exploding differences, or q and p are Q-equivalent (in the terminology of Remark 2.3.2) and then for any δ ∈ (0, 1) there exist constants W δ and N δ so that for any N > N δ , δN ≤ n ≤ N and m ∈ N either m = cn + r (which happens on a finite union of arithmetic progressions) and where B(c, r, δ) is the set of all natural numbers n so that |m − cn − r| < δ for some integer m, and |Γ| stands for the cardinality of a finite set Γ. Set Note that there exists a constant B δ > 0 so that for any sufficiently large N and n ∈ [δN, N ] we have |x N (n/N )| ≤ B δ N −1 where X N (n/N ) was defined in Section 3.2. Relying now on Lemma 3.2.2, we obtain that for any n ∈ [δN, N ] ∩ B(c, r, δ) and m ∈ N we have Therefore, for any sufficiently large N we have where A δ is a constant which depends only on δ, and thus the difference of q and p explode. Next, suppose that c = u/v is rational and that r is irrational. Then for any n and m we have which as in the previous case is enough in order to derive that the differences of q and p explode. Note that when c and r are not both rational then the polynomials q and p are not Q-equivalent.
Next, suppose that c = u/v and t = w/t are rational and that C u does not equal identically 0 for some 2 ≤ u ≤ k − 1. Then q and p are not Q-equivalent. Since c and r are rational, either m = cn + r or |m − cn − r| = |tv| −1 |mtv − utn − vw| ≥ |tv| −1 := δ 1 > 0. In the case when |m − cn − r| ≥ δ 1 and n ∈ [N δ, N ], as in the first part of this proof, we have Let s + 1 < k and Γ N,δ be as in Lemma 3.2.1. Then for any n ∈ Γ N,δ we have for some constant A δ > 0, which completes the proof that the differences of p and q explode in the case considered above. Finally, suppose that c = u/v and t = w/t are rational and that C u ≡ 0 for any u (i.e. that p and q and Q-equivalent). Then H N,y (0) = k−1 u=2 N −(u−1) C u (y) = 0 for any y. As in the previous cases covered in the this proof, when m = cn + r then |m − cn − r| ≥ δ 1 > 0 and so, by 3.4. Application II: "fractionaly related" polynomials. In this section we will give several classes of examples for polynomials with exploding differences so that γ k (y) is not a linear function of y. We will rely on the following 3.4.1. Lemma. Let a and b be positive coprime integers so that a > b and α b , α b+1 , ..., α b be integers so that |α b | = 1. For each fixed N , consider the equation over the positive integers. Then for any δ > 0 there exists a constant N δ and a set Γ N,δ ⊂ [1, N ] whose cardinality does not exceed δN so that for any N > N δ and n ∈ [1, N ] \ Γ N,δ there is no natural number m such that (3.9) holds true.
Proof. Fix a natural number N . For any v that divides N set N v = N/v and and note that the cardinality Denote by U ′ v,N the set of all members of U v,N for which there exists m satisfying (3.9). In order to find sets Γ N,δ with the properties described in the statement of the lemma, it is enough to show that where |U ′ v,N | is the cardinality of U ′ v,N , since the expression inside the limit is just N −1 times the number of n's in [1, N ] for which (3.9) holds with some m.
Let n ∈ U ′ v,N for some v and. Then (3.9) holds true for some m, and this m must divide v. Therefore (3.9) also holds true with n v = n/v and m v = m/v and N v in place of n, m and N , respectively. Let p be a prime number that divides n v and let e be the largest power of p so that p e divides n v . Then eb is the largest power of p that divides n b v . Write eb = ka + w for some k and 0 ≤ w < a. Then and the second factor of the right hand side is not divisible by p (since gcd(n v , N v ) = 1 and |α b | = 1). This is clearly a contradiction since ak + a > eb. Therefore, w = 0 and so eb = ka for some k. But a and b are coprime, and hence k must have the form k = k ′ b for some integer k ′ . Therefore we can write e = k ′ a, namely n must have the form n = vz a for some integer z. Next, recall the following inequality (see where γ is Euler's constant. The equality n v = z a clearly implies that gcd(z, N v ) = 1 and therefore vz is a member of U v,N . The map n → z from U ′ v,N to U v,N , where n = vz a is one to one, and its image is contained in the interval [ which together with (3.10) yields (3.11).
The following result follows now from Lemma 3.2.2 and 3.4.1.

3.4.2.
Corollary. Suppose that R k ≡ 0 and that γ k has the form where a, b and α j satisfy the conditions of Lemma 3.4.1. Let s ≤ k − 1 be so that C u ≡ 0 for any 2 ≤ u ≤ s. Suppose, in addition, that a < s. Then the polynomials q and p have exploding differences.
Proof. Observe that m = N γ k (n/N ) if and only if the equation (3.9) holds true. Therefore, by Lemma 3.4.1, in order to complete the proof of Corollary 3.4.2, it is sufficient to show that for any n and m so that m = N γ N (n/N ) we have where y = n/N and c is some constant. Since where B > 0 is some constant, then by Lemma 3.2.2 in order to show that (3.13) holds true, it is enough to show that for any n and m so that . In order to prove the latter inequality, we define the polynomial W (x) by Then W (N γ k (y)) = 0 and since we have assumed that m = N γ k (y). Therefore, by the mean value theorem, for some ξ between m and N γ k (y), ) and the proof of the corollary is complete.
The following result also follows 3.4.3. Corollary. Suppose that the conditions of Proposition 3.1.1 hold and that P (n, N ) = a j=b α j n j N a−j and Q(m, N ) = m a for some positive coprime integers a > b and integers α b , ..., α a so that |α b | = 1 (see Corollary 3.4.3). Then the polynomials q and p have exploding differences.

3.4.4.
Remark. Suppose that R k ≡ 0 and let s be as in Corollary 3.4.2. Assume also that γ k (y) has the form γ k (y) = K −1 E(y) for some polynomials K and H with integer coefficients whose degrees do not exceed d, for some d < s. Consider the polynomial W (x) = K(x/N ) − E(n/N ). Then W (N γ k (n/N )) = 0 and therefore, applying the mean value theorem yields that for some constant C > 0 (when 1 ≤ m ≤ N ). We conclude that for any δ > 0 there exists a constant C δ > 0 so that for any n, m ∈ [N δ, N ] we either have The difficulty in using the above estimates in order to determine whether the polynomials q and p have exploding differences arises here in determining for which n's there exists a solution m to the equation m = N K −1 E(n/N ). When γ k satisfies the conditions of Corollary 3.4.2 then we used Lemma 3.4.1, but for general polynomials K and H it does not seem possible to show that the equation m = N K −1 E(n/N ) does not have a solution for any n ∈ [δN, N ] \ Γ N,δ for sets Γ N,δ whose cardinality does not exceed δN .

Expectation estimates
In this section we describe two results from [10] which are key ingredients in the the proofs of Theorems 2.1.2, 2.3.1 and 2.3.3.
Let U i , i = 1, 2, ..., k be d i -dimensional random vectors defined on the probability space (Ω, F , P ) from Section 2, and {C j : 1 ≤ j ≤ s} be a partition of {1, 2, ..., k}. Consider the random vectors U (C j ) = {U i : i ∈ C j }, j = 1, ..., s, and let .., s} be the unique index such that i ∈ C ai , and for any Borel function H : Of course, the above quantity is defined only when the expectations inside the absolute value exist.
The first result appears in [10] as Corollary 1.3.14, and it formulation is as follows: 4.0.1. Proposition. Let H : R d1+d2+...+d k → R be a function satisfying (2.6) and (2.5) with H in place of F and with u i 's in place of x i 's. Let q, w > 1 and v > 0 be such that Here the σ-algebras F k,l are the ones specified in Section 2.
Next, recall that the α-dependence coefficients of any two sub-σ-algebras G and H of F is given by Another result we will need is the following proposition, which is stated in [10] as Corollary 1.3.11: where sup |H| is the supremum of |H|. In particular, when s = 2 then where σ{X} stands for the σ-algebra generated by a random variable X and α(·, ·) is given by (4.3).
In fact, Proposition 4.0.1 follows from Proposition 4.0.2 using standard approximations which involve the Hölder and the Markov inequalities (note the function H in Proposition 4.0.1 is locally Hölder continuous). The proof of Proposition 4.0.1 is based on the following lemma (see Lemma 1.3.10 in [10] or Lemma 3.11 in [9]).
where h(v) = EH(v, V 2 ) and a.s. stands for almost surely.
For the sake of completeness, let us explain the idea behind the proof of Proposition 4.0.2. In the circumstances of Lemma 4.0.3, it follows by induction on k that where ν i is the distribution of U i . The proof of Proposition 4.0.2 is carried out by induction on the number of blocks s, where the main ingredient in the induction step is (4.5) (applied with various functions appearing in the proof).
). Applying Proposition 4.0.1 with the function H and the random vectors U i , when δ N (n, n) = u then where R 0 and R 1 are some constants and we also used (2.9). Note that, in the terminology of Now we will show that the variance of S N grows at most linearly fast in N . In fact, we will prove the following 5.1.2. Lemma. Suppose that Assumption 2.1.1 holds true and that ∞ l=0 a(l) < ∞. Then there exists a constant C > 0 so that for any positive integers n 1 < n 2 < ... < n M and N ∈ N we have Proof. Fix some N and let n 1 < ... < n M be positive integers. Using Lemma 5.1.1, it is enough to show that the variance of the sum Let n, m ∈ N and l ≥ 0 be so that d N (n, m) = l, and consider the sets Then, there exist disjoint sets Q 1 , ..., Q L , L ≤ 2ℓ + 1 so that each one of the Q i 's is contained in either Γ 1 or Γ 2 and for any q i ∈ Q i and q i+1 ∈ Q i+1 , i = 1, 2, .., L − 1 we have Consider now the random vectors U 1 , ..., U L given by Consider the partition {C 1 , C 2 } of the index set {1, ..., L} given by Then F n,N is a function of {U j : j ∈ C 1 } and F m,N is a function of {U j : j ∈ C 2 }. Applying Proposition 4.0.1 with the function H(x, y) = F (x)F (y) we obtain that for any n = n k , l ≥ 0 and m = n s ∈ Γ n k ,N,l , where R 0 > 0 is some constant. Therefore, using (5.3), we obtain that and the proof of the lemma is complete.

5.2.
Fourth moments and the strong law of large numbers. We will prove here the following 5.2.1. Lemma. Set b(l) = (φ(l)) 1− 4 w + (β q (l)) κ and assume that ∞ l=1 lb(l) < ∞. Then there exists a constant C > 0 so that for any positive integers n 1 < n 2 < ... < n M and N ∈ N we have Relying on Lemma 5.2.1 and using the Markov inequality, we obtain that For each u and N set

For any two sets
Since the q i 's are polynomials, there exists a constant A > 0, which does not depend on N , so that for each nonnegative integers k 1 and k 2 and positive integers u 1 and u 2 , there exist at most A pairs (v 1 , v 2 ) of positive integers such that and, if k 1 = dist(Γ u1,N ∪ Γ u2,N , Γ v1,N ) then while when k 1 > dist(Γ u1,N ∪ Γ u2,N , Γ v1,N ) (and so k 1 = dist(Γ u1,N ∪ Γ u2,N , Γ v2,N )) then Let us denote the set of all these indexes (v 1 , v 2 ) by ∆(u 1 , u 2 , k 1 , k 2 , N ). Note that given N, u 1 , u 2 , v 1 , v 2 there exist k 1 and k 2 so that (v 1 , v 2 ) ∈ ∆(u 1 , u 2 , k 1 , k 2 , N ). Next, we claim that for any u 1 , u 2 and (v 1 , v 2 ) ∈ ∆(u 1 , u 2 , k 1 , k 2 , N ) we have where τ (l) = b(l/3), R 0 is some constant and b(·) was defined in the statement of the lemma. Indeed, first consider the case when k 1 ≥ k 2 , and set ∆ 1 = Γ u1,N ∪ Γ u2,N and ∆ 2 = Γ v1,N ∪ Γ v2,N . Then we can write where each Q i is subsets of either ∆ 1 or ∆ 2 , and the Q i 's satisfy all the conditions appearing right after (5.4) with k 1 in place of l. Consider the random vectors U 1 , ..., U L given by U i = {ξ j : j ∈ Q i } and the partition {C 1 , C 2 } of the index set {1, ..., L} given by Then F u1,N · F u2,N is a function of {U j : j ∈ C 1 } and F v1,N · F v2,N is a function of {U j : j ∈ C 2 }. Applying Proposition 4.0.1 with the function H(x, y, z, w) = F (x)F (y)F (z)F (w) we obtain that for some R 0 , which completes the proof of (5.5) in the above case. Next, consider the case when k 1 < k 2 and k 2 = dist(Γ u1,N ∪Γ u2,N ∪Γ v1,N , Γ v2,N ). Set ∆ 1 = Γ u1,N ∪ Γ u2,N ∪ Γ v1,N and ∆ 2 = Γ v2,N . Then dist(∆ 1 , ∆ 2 ) = k 2 . As in the previous case, applying Proposition 4.0.1 yields that for some constant R 1 . An additional application of this corollary yields where a(·) was defined at the beginning of this section and R 2 is some constant. Taking into account that E[X] = 0 for any random variable X, combining the above estimates with (5.1) we obtain that for some constant R 0 > 0. The proof of (5.5) in the case when k 1 < k 2 and k 2 = dist(Γ u1,N ∪ Γ u2,N , Γ v1,N ) proceed exactly in the same way by changing the roles for v 1 and v 2 .
Finally, applying (5.5) we obtain that where C is some constant, and the proof of the lemma is complete.

5.2.2.
Remark. Relying on conditional expectation type estimates, it is possible to prove Lemma 5.2.1 similarly to Chapter 3 in [10], using the functions F ε,i defined in Section 6. Moreover, it is also possible to derive this lemma using the method of cumulants, similarly to Section 6 in [11]. In fact, the arguments leading to the comulants estimates obtained in [11] can be modified to the setup of this paper, which means that we can also obtain moderate deviations theorems and some concentration inequalities for the sums S N .
6. Limiting covariances 6.1. Ordering and decomposition. Consider the homogeneous decomposition where for each i and j the polynomial Q i,j is of degree not exceeding j. Let r 0 be so for any 1 ≤ i, j ≤ ℓ with deg q i > deg q j , for any sufficiently large N we have min N r 0 ≤n,m≤N We will first prove the following 6.1.1. Proposition. There exist constants r 1 ∈ (0, 1), c > 0 and A, B > 0, sets B N , N ≥ 1 containing [1, N r0 ] so that |B N | ≤ AN r1 and disjoint sets I ε (N ) of the form where ε ranges over all the permutations of {1, ..., ℓ} and the sets (a j,ε N, b j,ε N )'s are disjoint, so that for any sufficiently large N and n ∈ I ε (N ) with n ≥ N r we have and when q ε(i+1) and q ε(i) have the same non-linear degree then Proof. We will prove the proposition by induction on the maximal degree of the polynomials q 1 , ..., q ℓ . When the maximal degree is 1, i.e. when all the polynomials are linear, then exactly as in Chapter 3 of [10] there exist a finite union of intervals of the form (a ε N, b ε N ), ε ∈ E ℓ whose union cover [1, N ]\B N , for some set B N whose cardinality does not exceed 2ℓ 2 , so that (6.4) holds true for any n ∈ (a ε N, b ε N ), and q ε(i+1) (n, N ) − q ε(i) (n, N ) ≥ min(n − a ε N − 1, B ε N − n − 1) for any n ∈ (a ε , b ε ) and 1 ≤ i < ℓ. Now we can just take r 1 = 1 2 (when all of the polynomials are linear, (6.2) is meaningless). For readers' convenience, let us repeat the details from [10]. Write q i (n, N ) = p i n + q i N, i = 1, ..., ℓ, and assume without loss of generality that p 1 < p 2 < ... < p ℓ . Since p i n + q i N ≥ 0 always then q i ≥ 0 for i = 1, ..., ℓ. Let N N be the set of n ∈ {1, 2, ..., N } such that all p i n + q i N, i = 1, ..., ℓ are distinct. For each n ∈ N N we define distinct integers ε i (n, N ), i = 1, 2, ..., ℓ such that p εi(n,N ) n + q εi(n,N ) N < p εi+1(n,N ) n + q εi+1(n,N ) N for all i = 1, 2, ..., ℓ − 1. (6.6) For each ε = (ε 1 , ..., ε ℓ ) ∈ E ℓ set N ε,N = {n ∈ {1, 2, ..., N } : ε j (n, N ) = ε j for each j = 1, ..., ℓ}.
Some of the sets N ε,N can be empty and for each n ∈ N ε,N , N ε,N = {n : a ε N < n < b ε N } for some (not unique) a ε ≥ 0 and b ε ≤ 1, N ε,N are disjoint for different ε ∈ E ℓ and, clearly, There is always ε = (ε 1 , ..., ε ℓ ) with a ε = 0 and then ε 1 = min{i : q i = min 1≤j≤ℓ q j } and ε ℓ = max{i : q i = max 1≤j≤ℓ q j }. Set B N = {1, 2, ..., N } \ N N . Then and so the cardinality of B N does not exceed ℓ 2 (since the polynomials are linear). In Lemma 3.2.6 in [10] it was proved that if a ε N < n < b ε N then The proof of (6.8) is as follows. Let m 1 , m 2 ≥ 1 be integers satisfying a ε N < n−m 1 and n + m 2 < b ε N . Then p εi (n − m 1 ) + q εi N < p εi+1 (n − m 1 ) + q εi+1 N and p εi (n + m 2 ) + q εi N < p εi+1 (n + m 2 ) + q εi+1 N. Hence, and so the assertion follows. Now we will make the induction step. Suppose that the proposition holds true when the maximal degree does not exceed d. Let q 1 , ..., q ℓ be polynomials so that the maximal degree equals d + 1. Let k > ℓ be so that q 1 , ..., q k are of degree strictly less than d + 1, and deg q i = d + 1 for any i > k. By the induction hypothesis, there exist constants r 1 = r 1 (H) ∈ (0, 1) and A = A H , c = c H > 0 and sets B N = B H,N , N ≥ 1 and I ε ′ (N ) satisfying all the properties described in the statement of the proposition with the polynomials q 1 , ..., q k , where ε ′ ranges over all the permutations of the set {1, ..., k}.
In order to complete the induction hypothesis, it is enough to show that all the results stated in the proposition hold true for the family of polynomials q k+1 , ..., q ℓ (because of (6.2)). Indeed, assume that there exist constants r 1 = r 1,d+1 ∈ (0, 1) and c = c d+1 and sets B N = B N,d+1 and I ε ′′ (N ) with the properties described in the statement of Proposition 6.1.1 for the polynomials q k+1 , ..., q ℓ , where ε ′′ ranges over all the permutations of the set {k + 1, ..., ℓ}. Take 1 > r 1 > max(r 1 (H), r 1,d+1 , r 0 ), where r 0 comes from (6.2). Consider all the endpoints of the intervals defining the sets I ε ′ (N ) and I ε ′′ (N ) which are larger than N r1 . For any endpoint a, consider the set E a of all endpoints b so that |a − b| is sublinear in N . Then we can partition the set of all endpoints to disjoint sets of the form E a . By omitting all the endpoints from each partition set E a except a, and then considering all the intervals generated by two consecutive (remaining) endpoints we get sets I ε (N ) with the desired properties (for the polynomials q 1 , ..., q ℓ ). Note that for any permutation of {1, ..., ℓ} which does not have the form ε ′ ⊗ ε ′′ we have I ε (N ) = ∅.
Next, consider the decompositions (6.1) of the polynomials q k+1 , ..., q ℓ (whose degree is d + 1). let Q i1,d+1 , ..., Q iu,d+1 be the distinct polynomials among the polynomials Q i,d+1 , i = k + 1, ..., d. Let 0 ≤ y 1 < y 2 < ... < y s ≤ 1 be the set of all points y in [0, 1] so that Q ij ,d+1 (y) = Q i ′ j ,d+1 (y) for some j = j ′ . On each interval of the form (y a , y a+1 ) we can order the polynomials Q ij ,d+1 . In fact, since the degrees of the polynomials Q ij ,d+1 is at most d + 1, there exists a permutation σ a of {i 1 , ..., i u } so that for any y ∈ [y a + N − 1 2d+2 , y a+1 − N − 1 2d+2 ] =: J a,n and 1 ≤ j < u we have for some constant C > 0. Since the degree of q i , i > k is d + 1, it follows that for any j and n that when n/N ∈ J a,N we have where C 1 > 0 is some constant. In the case when all of the Q i,d+1 's are distinct, we have completed the induction step. Otherwise, set Γ j = {i : Then, for any a and n/N ∈ J a,N , j < j ′ and i ∈ Γ σa(j) , i ′ ∈ Γ σa(j ′ ) we have for some constant C ′ > 0. Finally, for each j and i ∈ Γ j we definẽ Then the degrees of the polynomials {q i : i ∈ Γ j } do not exceed d, and so we can apply the induction hypothesis with them. By intersecting the resulting sets I ε (j) (N ) with each interval J a,N , where ε (j) ranges over all the permutations of the set Γ j , omitting the intervals whose length is a sublinear function of N and taking a sufficiently large r ∈ (0, 1) we get disjoint sets I ε ′′ (N ) which have the properties described in the statement of the proposition for the polynomials q k+1 , ..., q ℓ (the ones with degree d + 1), where ε ′′ now ranges over the permutations of the set {k + 1, ..., ℓ}. As we have explained before, it is enough in order to complete the induction step. The proof of the proposition is complete.
Since the cardinality of the set B N = N \ G N does not exceed AN r for some A > 0 and r ∈ (0, 1), using Lemma 5.
and b j,ε and a j,ε are defined in Proposition 6.1.1.
(iii) If q ε(i) and q τ (j) have exploding differences then the limit D ε,τ,i,j (t, s) exists and equals 0.
(iv) If q ε(i) and q τ (j) are linearly related, deg q ε(i) = deg q τ (j) = k > 1 and the differences of q ε(i) and q τ (j) do not explode, then the limit D ε,τ,i,j (t, s) exists and has the form where M is define by (6.19) and the number c ε,i,τ,j (s, t) can be recovered from the proof.
Proof. As we have explained before, the limits b(t, s) exist if the limits exist, for any σ, τ ∈ E ′ ℓ and 1 ≤ i, j ≤ ℓ. Let ε, τ ∈ E ′ ℓ and 1 ≤ i, j ≤ ℓ. When deg q ε(i) = deg q τ (j) = 1, then existence of the limits D ε,τ,i,j (t, s) is obtained exactly as in Chapter 3 of [10], taking into account that ε, τ have the tensor product structure (6.9). For readers' convenince, let us explain the main idea of the proof in [10]. First, it enough to show that the limit lim exists for each integer u (where the sum over the empty set is considered to be zero) and since then we can just over all u's. The idea behind proving (6.13) is to show converges to a limit. After that, using Assumption 2.2.1 we estimated the number of solutions of the equation p εi m − pε j n + N (q εi − qε j ) = u (6.14) in m ∈ N ε,N and n ∈ Nε ,N , divided this number by N and showed that this ratio converges to a limit which will give the limit in (6.13). We refer the readers' to the Section 3.3.2 in Chapter 3 of [10] for the precise details. Next, suppose that max deg q ε(i) , deg q τ (j) > 1. In what follows, we will always assume that deg q ε(i) ≥ deg q τ (j) . We first write and therefore by (6.15), and the proof of Proposition 6.2.2 (ii) is completed. Suppose next that the differences of q ε(i) and q τ (j) explode. Fix some δ ∈ (0, 1). Then there exist constants C δ > 0, N δ > 0 and sets Γ N,δ ⊂ [1, N ] of integers whose cardinality does not exceed δN so that for any N > N δ and n ∈ We will show that the upper limit as N → ∞ of each one of the J i 's does not exceed δ, which by taking δ → 0 will complete the proof of Proposition 6.2.2 (iii). First by Lemma 6.2.1, for some constant C ′ δ > 0, and so by (6.11) we have lim N →∞ J 1 = 0. Next, N ) is the set of all n's so that |q ε(i) (n, N ) − q τ (j) (m, N )| = k. Since the q i 's are polynomials, |A k (m, N )| ≤ C 2 for some constant C 2 which does not depend on m and k. It follows from Lemma 6.2.1 that where Υ was defined in (6.12). Since the cardinality of the set Γ N,δ does not exceed δ, similar arguments show that for any sufficiently large N , Next, suppose that q ε(i) and q τ (j) are linearly related and that their differences do not explode. Then, by Corollary 3.3.1 they are equivalent, namely there exists rational c and r so that for any natural n. If m = cn + r then by Corollary (3.3.1) we have for some continuous function Q which is strictly positive on (0, 1], and so by Lemma Therefore, taking into account (6.11), in order to prove that the limit D ε,τ,i,j (t, s) exists, it is enough to show that the limit is the indicator function of a set A. Since the sets I ε (N ) and I τ (N ) are unions of intervals whose length is proportional to N (i.e they have the form (6.3)), and the set {n ∈ N : cn + r ∈ N} is a finite union of arithmetic progressions, in order to show that the above limit exists it is enough to show that there exist sets ∆ N,δ whose cardinalities do not exceed δN so that for any δ ∈ (0, 1) the limit lim N →∞, n∈Iε(N )\∆ N,δ , cn+r∈Iτ (N ) b ε,τ,i,j,N (n, cn + r) (6.18) exists, and that this limit does not depend on δ. Fix some δ ∈ (0, 1). Suppose .., L 1 be the indexes so that q ε(uj ) and q τ (vj ) are linear, ca ε(uj ) = a τ (vj ) and b ε(uj ) = b τ (vj) . Let q ε(1) , ..., q ε(U) and q τ (1) , ..., q τ (V ) be the linear polynomials among the q ε(u) 's and the q τ (v) 's, respectively, where 1 ≤ u < i and 1 ≤ v < j. We conclude that for any n ∈ L1 j=1 J(w uj ,vj ,N , δ, N ) := Θ N,δ we can order the numbers q ε(u) (n, N ) and q τ (v) (cn + r, N ), where 1 ≤ u ≤ U and 1 ≤ v ≤ V , so that the differences between each consecutive numbers in this ordering is either a constant which does not depend on n and N , or it is not less than B δ N 1 2 for some B δ > 0, where we also used (6.5). Combining this with Assumption 2.2.2, we deduce from Corollary 3.3.1 that the terms q ε(l) (n, N ) − q τ (z) (cn + r, N ) where 1 ≤ l ≤ i and 1 ≤ z ≤ j, either have aboslute values bounded from below by some E δ N , E δ > 0 or they are constants. Set ∆ N,δ = [δN, N ] \ Θ N,δ . Using (6.2) and (6.5), applying Proposition 4.0.1 we derive that there exist constants d 1 , ..., d L (L ≥ 0) and indexes (l e , z e ), e = 1, 2, ..., L so that (1) , ..., y τ (j) ) and M has the form The indexes (l e , z e ) are exactly the ones for which q ε(le) (n, N ) − q τ (ze) (cn + r, N ) = d e = q ε(le) (r, 0) − q τ (ze) (0, 0) where d e does not depend on n and N .
Fix some δ ∈ (0, 1). When |u| ≤ √ N and y ≥ δ then, since H N,y is one to one on intervals of the form [−a, a], a > 0 (when N is large enough), we obtain that for any sufficiently large N there exists a unique solution x = x n,N,v to the equation .
Using Lemma 6.2.1 the following proposition follows similarly to the proof of Proposition 6.2.2.
6.3.1. Proposition. The limit D ε,τ,i,j (t, s) exists if for any v and sufficiently large n and m so that q τ (j) (m, N ) − q ε(i) (n, N ) = v the differences are either constants or they converge to ∞ (in absolute value) as n → ∞ and the limit exists, where |Γ| stands for the cardinality of a finite set Γ.
Note that when the above conditions hold true only with s = t = 1 then we obtain that the limit D 2 exists, which is enough in order to derive that S N (1) converges in distribution towards a centered normal random variable whose variance is D 2 .
6.4. Positivity of the asymptotic variance. We assume here that the conditions of Theorem 2.3.1 are satisfied. Set S N = S N (1) and We will say that the polynomials q i (x, N ) and q j (x, N ) are equivalent if there exist rational c and r so that q i (n, N ) − q j (cn + r, N ) is a constant function of n and N (in Remark 2.3.2 we called this equivalence relation Q-equivalence). Then any two equivalent polynomials have the same degree. Let q s+1 , ..., q ℓ be the non-linear polynomials among the q i 's and consider the decomposition of {q i : i > s} into equivalence classes A 1 , A 2 , ..., A w , ordered so that the degree of each member of A i does not exceed the degree of each member of A i+1 , i = 1, ..., w − 1. For any ε, τ, i, j so that q ε(i) and q τ (j) are not both linear and not equivalent we have D ε,τ,i,j := D ε,τ,i,j (1, 1) = 0.
Recall next the following result from Chapter 3 in [10], which was stated there as Theorem 3.3.4, and is reformulated here for the function G: For readers' convenience, let us give some of the ideas behind the proof of Theorem 6.4.1. Let us first explain (6.20). Since the above copies are independent and the distribution of (ξ n , ξ m ) depends only on n − m we have Observe next that the seqeunce G(ξ (1) a1n , ..., ξ (s) asn ) is centered and stationary in the wide sense. We recall now the following result, which is a combination of Proposition 8.3 and Theorem 8.6 of [5] (modified for a one sided sequence): let {U n } be a centered sequence of weakly stationary random variables with correlations b n = Cor(U n , U 0 ) satisfying ∞ n=0 (n + 1)|b n | < ∞. Then the limit v 2 = lim n→∞ 1 n E( n−1 j=0 U n ) 2 exists, and it is positive if and only if there exists a weakly stationary process Z n so that U n = Z n+1 − Z n for any n ≥ 0. We conclude that σ 2 = 0 if and only there exists a stationary in the wide sense sequence {Z n } of random variables so that for any n, We note that the correlations of {U n } indeed satisfy the above conditions because of the definition of G, (2.5), (2.9) and Assumption 2.1.3. We also remark that U n is indeed centered because of our assumption thatF = 0. Next, for readers' convenience we will give the idea behind the equivalence σ 2 = 0 ⇐⇒ v 2 = 0 in the case when s = 2, namely when there are only two linear polynomials q 1 (n, N ) = a 1 n + b 1 N and q 2 (n, N ) = a 2 n + b 2 N . In the case when and therefore the whole problem is reduced to the case when q 1 and q 2 do not depend on N , and this case was covered in Section 4 of [7]. We assume without a loss of generality that b 2 > b 1 . The proof in the case when a 1 ≤ a 2 proceeds exactly as in [7]. Indeed, we can use the same block partitions which were construted in [7] in the case when q 1 (n) = (a 1 + b 1 )n and q 2 (n) = (a 2 + b 2 )n. The distance between (a 2 + b 2 )n and (a 1 + b 1 ) 2 is smaller than the distance between a 2 n + b 2 N and a 1 n + b 1 N , and so the proof in [7] proceeds similarly. We note that only the case when q i (n) = in was considered in [7], but the general case when all q i 's have the form q i (n) = l i n, i = 1, 2, ..., s, l i ∈ N follows from this case by considering functions F (x 1 , ..., x ls ) which depend only on the variables x li .
In view of the above, we will focus here on the case when a 1 > a 2 . We will show that, in fact, in this case we have v 2 = σ 2 . Consider the functions G 1 and G 2 given by G 1 (x) = G(x, y)dµ(y), and G 2 (x, y) − G 1 (x).
Using (6.22) we get that for i = 1, 2  for some constant C > 0. In the case when ξ n 's were independent for any n < αN and m > αN we have a2m+b2N , ξ a1m+b1N )|ξ a1n+b1N ] = 0 (6.25) since the conditional expectation inside the expectation vanishes due to (6.22). This shows that we can take C = 0 in the independent case when i = 1, and similar idea yields the same with i = 2. When ξ n 's are not independent we can use our mixing and approximation conditions from Assumption 2.1.3 in order to derive (6.23) (applying Lemma 6.2.1). Similar arguments yield that |E[S

|E[S
where c N is some bounded sequence. The above equality also holds true when we replace ξ a1n+b1N withξ a1n+b1N , whereξ is an independent copy of ξ. It is enough to show that E S (i) 2 2 behaves the same as in the case when the left and right coordinates are independent. We will explain why this is true for j = 1. Write a1n+b1N , ξ a2n+b2N ) and where ε > 0 is sufficiently small (we can just take M = βN for an appropriate β < α). This means that the difference between the largest index appearing in the left coordinate of G 2 in the sum A N is much smaller than the minimal index appearing in the right coordinate. Therefore, where c N is some bounded sequence andÃ N is defined similarly to A N but with ξ a1n+b1N in place of ξ a1n+b1N . Using the same reasoning which lead to (6.23), we obtain that |EA N b N | is bounded in N . Therefore, N . Now we decompose B N into two blocks. On the first block the maximal index in the left coordinate will is smaller than the minimal index in the right coordinate, and the expectation of the product of the two blocks is bounded in N . Proceeding this way we derive that σ 2 = v 2 .
The next result we have is the following 6.4.2. Theorem. Suppose that any two non-linear polynomials q i and q j are not equivalent. Then D 2 = 0 if and only if (6.21) holds true and for any ε and i so that deg q ε(i) > 0 we have F 2 ε,i (x 1 , ...x i ) = 0 for µ i = µ × · · · µ almost any (x 1 , ..., x i ). Proof. Let deg q ε(i) and deg q τ (j) be two non-linear polynomials. If they not equivalent then by Proposition 6.2.2 we have D ε,τ,i,j (s, t) = 0 for any s and t. Suppose that they are equivalent. Then, using the assumptions of Theorem 6.4.2, we must have q ε(i) = q τ (j) . We claim that in this case ε = τ . Indeed, since q ε(i) = q τ (j) then we have c = 1 in (6.18). The sets I ε (N ) and I τ (N ) are disjoint when ε = τ , and in this case the sum in (6.17) is over a set of n's whose asymptotic density is 0. Therefore, either D ε,τ,i,j (t, s) = 0 for any t, s or ε = τ , i = j and D ε,τ,i,j (1, 1) = D ε,ε,i,i (1, 1) = s ε F 2 ε,i µ i . We conclude that and the proof of the proposition is completed using Theorem 6.4.1.

6.4.3.
Remark. The proof of Theorem 6.4.2 shows that if there exists one nonlinear polynomial q i0 which is not equivalent to any other of the q j 's (i.e. there exists a non-linear equivalence class which is a singleton), then where δ 2 > 0 is the part of the variance which comes from the non-linear polynomials q ε(i) and q τ (j) which do not equal q i0 . Therefore, we derive that D 2 > 0 if one of the functions F ε,ε −1 (i0) is not identically 0 (µ ε −1 (i0) -almost surely). Moreover, when there exists one class of equivalent non-linear polynomials for which c = 1 in (6.18) for any q ε(i) , q τ (j) in this class, then the proof of Theorem 6.4.2 shows i ) vanish with respect to an appropriate probability measure κ A,ε , where A is the underlying class and {u (ε) i } is a set of variables which depends on ε. Indeed, when q ε(i) and q τ (j) are equivalent and non-linear and c = 1 in (6.18) we get that D ε,τ,i,j = I(τ = ε)s ε F ε,i F ε,j dM ε,i,j,A for some probability measure M ε,i,j,A (which was constructed in the proof of Proposition 6.2.2). It remains to show that the measures M ε,i,j are an appropriate marginal of a measure M ε,A which does not depend on i and j (this is done as in Lemma 7.1 in [8], where we recall that ε is fixed). We conclude that if any two equivalent non-linear polynomials q i and q i satisfy q i (x, y) = q j (x + r, y) + z for some r, z ∈ Z, then D 2 = 0 if and only if (6.21) holds true and the functions i:q ε(i) ∈A F ε,i (u (ε) i ) vanish with respect to M ε,A for any ε and a non-linear class A.

6.4.4.
Remark. When the polynomials q i are ordered so that q 1 (n, N ) ≤ q 2 (n, N ) ≤ ... ≤ q ℓ (n, N ) for any sufficiently large n and N , then Theroem 2.3 in [8] is proved exactly in the same why. This means that there exists a family of functions G α and a family of probability measures κ α so that D 2 = 0 if and only if (6.21) holds true and G 2 α = 0, κ α almost surely, for any α. The measures and the functions can be constructed explicitly, see Section 7 in [8].

A functional CLT via Stein's method
Let (Ω, F , P ), F n,m , {ξ n : n ≥ 0} and F be as described in Section 2 and Consider the random function S N given by The proceeding arguments will be true for any ζ 1 satisfying ( We also set and that the weak convergence of S N (·) follows from the weak convergence of W N (·). We refer the readers to the beginning of the proof of Theorem 1.6.2 from [10] for the precise details.
In the rest of this section we will prove the weak invariance principle for W N (·) using a functional version of Stein's method. Hence m solves one of the equations Each equation has at most d j = deg q j solutions, m g,1 , ..., m g,ug , where u g ≤ d j . Therefore, the unit ball around n is contained in the set {n} ∪ g∈γ {m g,1 , ..., m g,ug } whose cardinality does not exceed 2d * (l(N )+1)+1, where d * is the maximal degree of the polynomials q 1 , ..., q ℓ . Now, for each N and n ∈ V N set X n,N = F (Ξ n,N,r ) − EF (Ξ n,N,r(N ) ) √ N .
Next, for any two sub-σ-algebras G, H of F we will measure the dependence between G and H via the quantities where in the definition of ε p,q (G, H) we consider function g and h so that g is measureable with respect to G and h is measureable with respect to H. We recall the following classical relations between α(G, H) (defined in 4.3) and ε p,q (G, H) (see, for instance, Theorem A.5, Corollary A.1 and Corollary A.2 in [6]): for any p > 1, we have for any q > 1 such that 1 p + 1 q < 1. Now, for any A ⊂ V N , let G A be the σ-algebra generated by the random vector X A = {X n,N : n ∈ A}. For any A, B ⊂ V N and 1 ≤ p, q ≤ ∞ we will measure the dependence between X A and X B via the quantities Then by (7.3) for any A, B ⊂ V N and p > 1, for any q > 1 such that 1 p + 1 q < 1. Now, for reader's convenience, we will restate Theorem 1.5.1 from [10]: 7.0.1. Theorem. Let p 0 , q 0 ≥ 1 and set where with X n = X n,N , σ n,m = EX n X m , N n = {n} {m : (n, m) ∈ E N } and N c n = V N \ N n , Suppose that there exists Γ > 0 such that for any N ∈ N and 0 ≤ t ≤ s ≤ 1. Furthermore assume that the limits lim N →∞ b N (t, s) = b(t, s), s, t ∈ [0, 1] exist and that lim N →∞ τ N ln 2 N = 0. (7.10) Then W N weakly converge in the Skorokhod space D to a continuous centered Gaussian process G with covariance function b(·, ·).
Theorem 7.0.1 is essentialy due to A.D. Barbour, and it follows from the arguments in [1] and [2] (see the proof of Theorem 1.5.1 in [10]).
In the rest of this section we will show that all the conditions of Theorem 7.0.1 are satisfied with p 0 = 2q 0 = w where w comes from Assumption 2.1.1. Then there exists a constant C > 0 so that for any n we have X n w ≤ CN − 1 2 . (7.11) That the covariances converge we have already shown. We claim next that there exists C > 0 such that 7.12) for any N ∈ N and 0 ≤ t ≤ s ≤ 1. The arguments proceeding (7.2) yield that it is enough to prove that (7.12) holds true with S N in place of W N . As in (7.2), we refer the readers to the proof of Theorem 1.6.2 in [10] for the exact technical details. By Proposition 4.0.1 there exists a constant c 0 > 0 such that for any n, m ∈ N, Let d * be the maximal of the polynomials q i . Then for any k ≥ 0 and n ∈ N there exist at most 2ℓ 2 d * natural m's such that d N (n, m) = k. Hence, for any n ∈ N we have ∞ m=1 |Cov(F (Ξ n,N ), F (Ξ m,N ))| ≤ 6ℓ 2 d * c 0 ∞ s=0 τ (s) := A < ∞ for any natural n, which implies that implying (7.12) with S N in place of W N . In that above arguments we have used Assumption 2.1.3 (and that θ(1 − 1 w ) > 1) in order to derive that A is indeed finite. It remains to show that lim N →∞ τ N ln 2 N = 0. We begin with estimating d 1 and Then for each γ 1 ∈ Γ 1 and γ 1 ∈ Γ 2 we have |γ 1 − γ 2 | > l and therefore, there exist disjoint sets Q 1 , ..., Q L , L ≤ 2ℓ + 1 so that each one of the Q i 's is contained either in Γ 1 or in Γ 2 and for any 1 ≤ i ≤ L − 1, q i ∈ Q i and q i+1 ∈ Q i+1 . Consider the random vector U = (U 1 , ..., U L ) where for each i, Let {C 1 , C 2 } be the partition of Γ 1 ∪ Γ 2 given by Then α(A, B) = α σ{U (C 1 ), U (C 2 )} and by Proposition 4.0.2 we have where U (C i ) = {U j : j ∈ C i } and σ{U (C i )} is the σ-algebra generated by U (C i ), i = 1, 2. We conclude from (7.3), (7.5), (7.8) and (7.11) that there exists a constants C 1 and C 4 so that Next, we will estimate d 2 . Let n, m ∈ V N be so that m ∈ N n . Set Then for any γ 1 ∈ Γ 1 and γ 2 ∈ Γ 2 we have |γ 1 − γ 2 | > l and therefore, there exist disjoint sets Q 1 , ..., Q L , L ≤ 4ℓ + 1 so that each one of the Q i 's is contained either in Γ 1 or in Γ 2 and for any 1 ≤ i ≤ L − 1, q i ∈ Q i and q i+1 ∈ Q i+1 . Using this partition, exactly as in the estimates of d 1 and d 4 we obtain that α(A, B) ≤ Lφ(r(N )). (7.14) We conclude from (7.3), (7.6) and (7.11) that there exists a constant C 2 so that Finally, by (7.11) and the Hölder inequality, each one of the summands in d 3 does not exceed CN − 3 2 for some C > 0 and therefore where we used that |N n | ≤ K 1 l(N ) for some K 1 > 0 and any n and N . Relying on (7.1), (7.13), (7.15), (7.16) and on the inequality where d and θ come from Assumption 2.1.3, we conclude that (7.10) holds true and the proof of Theorem 2.3.3 is complete.

Applications
In this section we will describe several type of processes {ξ n } fro which all the results stated in Section 2 hold true.
8.1. Hidden Markov chains and related processes. Let X be a topological space and let B be the space of all bounded measurable functions on X , equipped with the supremum norm · ∞ . Let R : B → B be a positive operator so that R1 = 1, where 1 is the function which takes the constant value 1 (i.e. R is a Markov-Feller operator). We assume here that R has a stationary probability measure µ so that for any n ≥ 1 and g ∈ B we have for some sequence τ (n) which converges to 0 as n → ∞. Let {Υ n } be the stationary Markov chain with initial distribution µ, whose transition operator is R. Then the inequality (8.1) holds true with τ (n) of the form τ (n) = Ce −cn , c, C > 0 for an aperiodic Markov chain {Υ n } if, for instance, a version of the Doeblin condition holds true (see, for instance, Section 21.23 in [5]). Next, for any 0 ≤ n ≤ m let F n,m = σ{Υ n , ..., Υ m } be the σ-algebra generated by the random variables Υ n , Υ n+1 , ..., Υ m . When n is negative we set F n,m = F 0,max(0,m) . The following result is well known (see [5]), but it has a short proof which is given here for readers' convenience.
Proof. First (see [5], Ch. 4), for any two sub-σ−algebras G, H ⊂ F we have where φ(G, H) is defined in 2.2 (so φ(n) is given by (2.1)). Let k and n be nonnegative integers, and set where we used that σ{Υ 0 , Υ 1 , ..., Υ n+k } is finer than G (and the tower property of conditional expectations). Using (8.1) and taking into account that |H n+k | ≤ 1 and that Eh = EH n+k (Υ n+k ) = µ n+k (H n+k ) we obtain that Taking the supremum over all the above functions h and using (8.2) we obtain that φ(G, H) ≤ 2τ (n).
Taking the supremum over all choices of k completes the proof of the lemma.
Let f = (f 1 , ..., f d ) is a measurable bounded function and for each n ≥ 0 set ξ n = f (Υ n ). Then, in the notations of Section 2 we have β q (r) = 0 for any r ≥ 0 and q. Therefore, all he results stated there hold true with the stationary sequence {ξ n } when τ (n) satisfies τ (n) ≤ dn −θ for some d > 0 and θ > 0. When the random variables ξ n above only belong to L w for some w > 2 then we our results hold true when τ (n) ≤ dn −θ for some d > 0 and θ > 4w w−2 . In particular we can consider the case when τ (n) converges exponentially fast to 0. Suppose now that (X , ρ) is a metric space. Let ρ ∞ be the metric on X N given by Let f : X N → R d be a Hölder continuous function with respect to the metric ρ ∞ , and for each n ≥ 0 set ξ n = f (Υ n , Υ n+1 , Υ n+2 , ...). In these circumstances, it is clear that for some constants A > 0 and δ ∈ (0, 1). Hence, τ (n) ≤ dn −θ , then all the results stated in Section 2 hold true for the stationary sequence {ξ n } defined above. Here θ > 4w w−2 , d > 0 and w > 2 satisfies that ξ n L w < ∞ (assuming that such a w exists).
We can also consider several types of linear Markov processes, described in what follows. Let {Υ n : n ∈ Z} be a two sided stationary Markov chain with transition operator R and stationary distribution µ. Then Lemma 8.1.1 also holds true with the σ−algebras F n,m = σ{Υ n , ..., Υ m }. Let (a n ) be a two sided sequence such that n∈Z |a n | < ∞ and let f : X → R be a bounded function so that f dµ = 0. For each i set ξ i = n∈Z a n f (Υ n+i ).
Then {ξ i : i ∈ Z} is a bounded stationary sequence of random variables. Observe that for each n and k ≥ 0, where in the last inequality we used (8.1). Suppose that ∞ n=0 τ (n) < ∞. Then, using (8.3), a direct calculation shows that for any r > 0, where C is some constant. Therefore, the approximation coefficients β 2 (r) defined in Section 2 satisfy Thus, when |n|>r |a n | and τ (r) converge to 0 sufficiently fast as r → ∞, all of the results stated in Section 2 (with q = 2) hold true with the stationary sequence {ξ i : i ≥ 0}. 8.2. Subshifts of finite type (and uniformly hyperbolic and distance expanding maps) and continued fraction expansions. Next, we recall the definition of a (topologically mixing) subshift of finite type. Let d > 1 be a positive integer and set A = {1, 2, ..., d}. We consider here A as a discrete topological space, and let X = A N∪{0} be the product (topological) space. We define a metric on X by d(x, y) = 2 − inf{n: xn =yn} where we set inf ∅ = ∞ and 2 −∞ = 0. Then the product topology is generated by this metric. Let A = (A i,j ) be a d × d matrix with 0 − 1 entries. Suppose that (A M ) i,j > 0 for some M and all 1 ≤ i, j ≤ d, and set Let T : Σ(A) → Σ(A) be the left shift given by Then Σ(A) is T invariant. Let µ be any invariant Gibbs measure (see [4]). For each finite word (a 0 , ..., a r ) ∈ A r+1 we define its corresponding cylinder set [a 0 , ..., a r ] to be the set of all x ∈ Σ(A) so that x i = a i for any 1 = 0, 1, ..., r. The length of such a set is defined to be r + 1. Let F 0,n be the σ-algebra generated by all cylinder sets of length n and for each 0 ≤ n ≤ m set F n,m = T −n F 0,m−n . When n is negative we set F n,m = F 0,max(0,m) . Then (see [4]), these σ-algebras satisfy that φ(n) ≤ Aδ n for some A > 0 and δ ∈ (0, 1) (in fact, we also have exponentially fast ψ-mixing).
For each n ≥ 0 set ξ n (x) = T n f (x), where f = (f 1 , ..., f d ) is an R d -valued function, each f i is a Hölder continuous function, and x is distributed according to µ. Then β ∞ (r) = sup n≥0 ξ n − E[ξ n |F n−r,n+r ] L ∞ ≤ Ca r for some C > 0 and a ∈ (0, 1). Note that when f is constant on cylinder sets (and hence Hölder continuous) then β ∞ (r) = 0 for any sufficiently large r. We conclude that all the results stated in Section 2 hold true for the sequences {ξ n } defined above. Using [4], we obtain that the results from Section 2 also hold in the case when ξ n = T n f , where f = (f 1 , ..., f d ), T is a hyperbolic diffeomorphism or an expanding transformation taken with a Gibbs invariant measure, and each f i is either Hölder continuous or piecewise constant on elements of Markov partitions. Next, set X = (0, 1) \ Q, let T : X → X be the Gauss map which is given by , and let µ be the unique absolutely continuous T -invariant probability measure given by µ(A) = 1 ln 2 A 1 1+x dx. Let A be the partition of X into the intervals I n = ( 1 n+1 , 1 n ), where n = 1, 2, .... For each i = 0, 1, 2, ... let n i (x) be the unique positive integer so that T i x ∈ I ni(x) . Then the map x → (n 0 (x), n 1 (x), ...) represents the continued fraction expansion of x. Set F n,m = m0 j=n0 T −j A, where n 0 = max(0, n) and m 0 = max(0, m). Then these σ−algebras are exponentially fast ψ-mixing, and, in particular φ(n) ≤ Aδ n for some A > 0, δ ∈ (0, 1) and all nonnegative integers n. Moreover, the partition F 0,n is a partition into intervals whose lengths do not exceed Ce −cn for some constants c, C > 0. Therefore, all the conditions from Section 2 also holds true for stationary sequences of the form ξ n = f • T n (x), where x is distributed according to µ and f : [0, 1] → R d is either a Hölder continuous function or a function which is constants of the elements of the partition F 0,r for some fixed r.
8.3. Extension to Young towers. Let (∆, ν, T ) be the noninvertible and mixing Young tower considered in [23] (or the projected tower considered in Section 3 of [22]). Let ∆ 0 be the base of the tower, R : ∆ 0 → ∆ 0 be the return time function and let d(x, r) = β s(x,y) , β ∈ (0, 1) be the dynamical distance defined by the separation time s(x, y) from [23] (or the one on the projected tower in [22]). In this section we will denote the levels of the tower by ∆ ℓ , ℓ ≥ 0 where ∆ 0 is identified with ∆ 0 × {0} and for each ℓ > 0, Let L be the transfer operator associated with the tower T (so ν is its conformal measure-the eigen-measure of L) and h be the eigenfunction of L (see [22] and [23]). Let us denote by A n the σ-algebra generated by all cylinder sets of length n, and let the σ−algebras F n,m be given by F n,m = T −n A m−n , 0 ≤ n ≤ m while when n < 0 we set F n,m = F 0,max(0,m) .
We will consider here processes of the form ξ n (x) = f • T n (x) where f : ∆ → R d is a Hölder continuous function so that f L 2q (ν) < ∞ for some q ≥ 1, and x is distributed according to the absolutely continuous invariant measure µ given by dµ = hdν. In this case, it is clear that β ∞ (r) = sup n≥0 ξ n − E[ξ n |F n−r,n+r ] L ∞ ≤ Ce −cr (8.4) for some C, c > 0, since we can approximate f uniformly by functions which depend only on elements of the partition A r , and E[ξ n |F n−r,n+r ] = E[f |A r ] • T n . The family of σ-algebras F n,m does not seem to be φ-mixing in the sense of Section 2, and so we can not apply the results from Section 2. Note that when the tails µ{R > j} decay polynomially sufficiently fast to 0 then the map T R : ∆ 0 → ∆ 0 is φ-mixing (see Lemma 2.4 (b) in [19]), in the sense that its corresponding family of σ−algebras F n,m is (left) exponentially φ-mixing. Relying on this we could probably extend the results from Section 2 under certain restriction on the behaviour of the nonconventional sums between two consecutive returns to the base ∆ 0 . Still, we claim that all the results stated in Section 2 hold true for the above sequence {ξ n } when ν{R > n} ≤ An −d for some A > 0 and a sufficiently large d > 0, without restrictions of that kind.
Using tower extensions, our results also hold true in the case when ξ n = f •T n , for several classes of map which satisfy a certain hyperbolicity or expansion conditions only on some peices of the underlying manifold (or, for any non-invertible dynamical systems that can be modeled by a Young tower). Results in the invertible case also follow (in the exponential tails case), by considering first the projected tower (see Section 3 in [22]) and then proceeding essentially as in Section 4 in [22] in order to derive Theorems 2.3.1 and 2.3.3 for the original system from the corresponding limit theorems on the projected tower. We refer the readers to [22], [23], [19] and [12] for examples of maps T which can be modelled by towers.
In the rest of the section we will explain how to obtain the results from Section 2 in the above Young tower setup. First, let v > 0 be a function which is constant on the levels ∆ ℓ of the tower and define a transfer operator L by Then the measure ν L given by dν L = vdν is conformal with respect to L and the function h L = h/v is preserved under L. For any measuable set A, let us denote by I A its indicator function. We will rely on 8.3.1. Lemma. There exists a constant C > 0 so that for any n ≥ 0, A ∈ A n and an arbitrary measurable set B, Here · ∞ stands for the supremum norm and g := g ∞ + K g , where K g is the infimum of the set of values K so that |g(x) − g(y)| ≤ Kd(x, y) for any x, y in the same level of the tower.
Before proving this lemma we will first explain how it will be used. In the setup of Section 3 of [22] (the projected tower setup), for some v so that v ℓ := v|∆ ℓ = e εℓ (where ∆ ℓ is the ℓ-th floor and ε > 0 is some constant), in Section 3 of [22] (Proposition A) it was proved that sup g: g =1 L k (g)/v − ν(g)h/v ≤ Aδ k for some A > 0 and δ ∈ (0, 1). In the setup of [23], when ν{R > j} ≤ Cj −d for some C > 0 and d > 2 then by taking v ℓ = ℓ d−2 (so that ℓ v ℓ ν(R > ℓ) < ∞) we obtain from Proposition 3.13 in [21] that sup g: g =1 L n (g/v) − ν L (g/v)h L ∞ ≤ An −(d −2) for some constant n. In fact, also the exponential case is considered in [21] and it is possible also to get the same estimates with the norm · in place of the supremum norm.
When v|∆ ℓ = e εℓ and B is contained in a the union of the first j floors we get that ν L (B) = ν(v · I B ) ≤ e εj ν(B).
Hence, there exists a constant c > 0 so that for any j and k satisfying j ≤ ck we have φ(T −(n+k) G j , A n ) ≤ Ae −ak (8.5) where A, a > 0 and G j is the induced σ-algebra on the union of the first j floors. Similarly, when v ℓ = ℓ d−2 , then for any j, k and α ∈ (0, 1) so that j ≤ k α we have for some constant A > 0. We will show after the proof of Lemma 8.3.1 how to use (8.6) in order to derive all the results stated in Section 2 in the Young tower case.
Proof of the Lemma 8.3.1. Let n, A and B be as in the statement of the lemma. Write Observe that vf n ∞ ≤ L n h ∞ = h ∞ < ∞. Moreover, by Lemma 4 in [12], for any x, y in the same floor we have where c 1 is some constant which does not depend on n and A (note that similar estimates appear in the Sublemma at the beginning of Section 4.2 in [22]). We conclude that Next, we will explain how to use Lemma 8.3.1 in order to obtain functional central limit theorems for nonconventional polynomial arrays in the case when ξ n (x) = f • T n (x) discussed at the beginning of this section. For any j ≥ 0, let ∆ (j) be the union of the first j floors, and let χ j be its indicator set. Then there exists a constant C > 0 so that for any r ≥ 0, where we assumed that the tails decay at least as fast as j −d , that f 2q < ∞ and we write · q = · L q (∆,µ) . Taking into account (8.4) with n = 0 we conclude that β q (r, j) := sup for some constant C 1 > 0. Using the approximation coefficients β q (r, j) in place of β q (r) from Section 2, and Lemma 8.3.1, the proofs of all the results stated in Section 2 proceed essentially in the same way when d/2q is sufficiently large. Indeed, all the results from Sections 5 and 6 rely only on Proposition 4.0.1 together with several combinatorial arguments. This corollary has an appropriate version which involves the approximation coefficients β q (r, j) (instead of β q (r)), since Proposition 4.0.2 can be derived also using the right φ-mixing coefficients. Using this version of Proposition 4.0.1, we obtain all the results from Sections 5 and 6 also in the Young tower case. Relying on the above version of Proposition 4.0.2, Theorem 7.0.1 can be applied successfully also in the Young tower case similarly to Section 7, using the seqeunce β q (r(N ), j(N )) instead of β q (r(N )), where r(N ) = [l(N )/3] is the same as in Section 7 and j(N ) = [ r(N )) α ] for a sufficiently small α ∈ (0, 1). Note that in the appropriate applications of (8.6) we have to take k = r(N ) and so, j = j(N ) will indeed satisfy j ≤ k α .