Ergodic Theorems for Nonconventional Arrays and an Extension of the Szemeredi Theorem

The paper is primarily concerned with the asymptotic behavior as $N\to\infty$ of averages of nonconventional arrays having the form $N^{-1}\sum_{n=1}^N\prod_{j=1}^\ell T^{P_j(n,N)}f_j$ where $f_j$'s are bounded measurable functions, $T$ is an invertible measure preserving transformation and $P_j$'s are polynomials of $n$ and $N$ taking on integer values on integers. It turns out that when $T$ is weakly mixing and $P_j(n,N)=p_jn+q_jN$ are linear or, more generally, have the form $P_j(n,N)=P_j(n)+Q_j(N)$ for some integer valued polynomials $P_j$ and $Q_j$ then the above averages converge in $L^2$ but for general polynomials $P_j$ the $L^2$ convergence can be ensured even in the case $\ell=1$ only when $T$ is strongly mixing. Studying also weakly mixing and compact extensions and relying on Furstenberg's structure theorem we derive an extension of Szemer\' edi's theorem saying that for any subset of integers $\Lambda$ with positive upper density there exists a subset $\mathcal N_\Lambda$ of positive integers having uniformly bounded gaps such that for $N\in\mathcal N_\Lambda$ and at least $\varepsilon N,\,\varepsilon>0$ of $n$'s all numbers $p_jn+q_jN,\, j=1,...,\ell$ belong to $\Lambda$. We obtain also a version of these results for several commuting transformations which yields a corresponding extension of the multidimensional Szemer\' edi theorem.


1.
Introduction. In 1975 Szemerédi proved the conjecture of Erdős and Turan saying that any set of integers with positive upper density contains arbitrary long arithmetic progressions. In 1977 Furstenberg [9] published an ergodic theory proof of this result which turned out to be a corollary of a multiple recurrence statement for measure preserving transformations.
Namely, let (X, B, µ) be a probability space, T : X → X be an invertible µpreserving transformation and A ∈ B be a set of positive µ-measure. Furstenberg proved that in these circumstances for any positive integer , which, in fact, implies existence of infinitely many arithmetic progressions in any set of integers having positive upper density (for a nice exposition of this result see [13]).
An important part of the proof of (1.1) was to show that where T m f (x) = f (T m x)) provided T is a measure preserving weakly mixing invertible transformation and f j 's are bounded measurable functions. In fact, (1.1) required more general results concerning weak mixing and compact extensions together with a structure theorem describing all possible extensions. Observe that in [2] the L 2 convergence (1.2) for weakly mixing transformations was extended from powers jn to arbitrary essentially distinct polynomials P j (n) (i.e. having nonconstant pairwise differences) taking on integer values on integers.
In this paper we consider the averages of the form where f j 's are bounded measurable functions, T is an invertible measure preserving transformation and P j (n, N ), j = 1, ..., , are essentially distinct polynomials of n and N taking on integer values on integers. It is customary in probability to call sums whose summands depend on the number N of summands by the name (triangular) arrays and it seems appropriate to use the same name for sums in (1.3) too while the term "nonconventional" comes from [11]. First, we study the linear case P j (n, N ) = p j n + q j N where p j 's are distinct and q j 's are arbitrary integers. It turns out that under the weak mixing assumption on T , (1. 4) In particular, when = 2k, q i = −p i = k − i + 1 for i = 1, ..., k and p i = i − k, q i = 0 for i = k + 1, ..., 2k the left hand side of (1.4) takes on the following symmetric form It is known by [18] that when q j = 0 for all j = 1, ..., then the left hand side of (1.4) still converges in L 2 also without the weak mixing assumption on T but not necessarily to the right hand side of (1.4). On the other hand, a simple example shows that for arbitrary q j 's there is no convergence of the left hand side of (1.4) if T is not weakly mixing. Indeed, take k = 1 in (1.5) and let T be the rotation of the unit circle by one half of it while f 1 = f 2 = f = I A be the indicator of an arc A having length less than one half of the circle. Then (T n f )(T N −n f ) = I T −n A∩T −(N −n) A and this expression equals the indicator I A of A or the indicator I T A of T A (depending on the parity of n) if N is even while it equals zero for otherwise. Thus, the averages (1.5) will be equal to 1 2 (I A + I T A ) for each even N and 0 for each odd N . We complement the above study by considering weak mixing and compact extensions and relying on the structure theorem from [9] and [10] we conclude that for any invertible measure preserving transformation T , numbers p j , q j as above and a set A of positive measure there exists a subset N A ⊂ N of positive integers with uniformly bounded gaps, called syndetic set, such that lim inf while this does not hold true, in general, if we take the limit over all positive integers which can be seen from the above example. In fact, we also show that (1.6) follows by a shorter argument relying on recent advanced results from [5] and [1] concerning convergence along Følner sequences in multidimensional multiple recurrence theorems but our direct proof still has a value since, in particular, it concentrates attention on nonconventional arrays and the convergence results like (1.4) cannot be derived from the above references. By the standard Furstenberg's argument (1.6) implies an extended version of Szemerédi's theorem saying that for any subset of integers Λ with positive upper density there exists a syndetic subset N Λ ⊂ N such that for all N ∈ N Λ and at least εN, ε > 0 of n's, all numbers p j n + q j N, j = 1, ..., , belong to Λ. We obtain also more general results concerning families of commuting transformations T j ,T j , j = 1, 2, ..., , studying the limits of where A and f j , j = 1, ..., , are as above.
If we consider P j (n, N ) = P j (n) + Q j (N ) in (1.3), where P j 's are essentially distinct and Q j 's are arbitrary polynomials, then the convergence in L 2 of nonconventional averages (1.3) to the product of integrals can be established under the weak mixing assumption. On the other hand, already for = 1 and P 1 (n, N ) = nN weak mixing is not sufficient, in general, for the L 2 convergence of averages (1.3) though strong mixing suffices here. For > 1 and general polynomials P j (n, N ) we show L 2 convergence of the expression (1.3) assuming strong 2 -mixing of T .
2. Preliminaries and main results. Let (X, B, µ) be a separable probability space and T : X → X be an invertible measure µ preserving transformation. In studying polynomial nonconventional averages (1.3) we start with the linear case P j (n, N ) = p j n + q j N . Let f i , i = 0, 1, ..., , be bounded measurable functions on X. The example described in Introduction shows that, in general, the limit does not exist in the L 2 -sense. Still, we will see that the limit (2.1) exists in the L 2 sense if T is weakly mixing which means that the product transformation T × T on (X × X, B × B, µ × µ) is ergodic (see, for instance, [10]). Thus we have the following L 2 ergodic theorem for nonconventional arrays.
Theorem 2.1. Suppose that an invertible transformation T is weakly mixing, f j , j = 1, ..., , are bounded measurable functions and p j , q j , j = 1, ..., , are integers such that p j 's are distinct (ordered without loss of generality as p 1 < p 2 < ... < p ) and q j 's are arbitrary. Then The condition of Theorem 2.1 that p j , j = 1, ..., , are distinct is important for (2.2). Indeed, let p 1 = p 2 = p and f 1 dµ = 0. Then which does not converge to zero as N → ∞, in general, unless T is (strongly) mixing (and q 1 = q 2 ) while under weak mixing only convergence outside of a set of N 's having zero density can be ensured. We observe (as pointed out by the referee) that Theorem 2.1 actually follows from Theorem 3.2 in the recent paper [19] though motivation and goals of the latter paper seem to be different from ours. In fact, we will study convergence in a more general situation of weak mixing extensions and, in addition, will consider also compact extensions, which together with the structure theorem from [9] and [10] will produce the following result.
Theorem 2.2. Let p j , q j , j = 0, 1, ..., , be integers such that p 0 = q 0 = 0, p j = 0 if j = 0 and p 1 < p 2 < ... < p . Then for any A ∈ B with µ(A) > 0 there exists an infinite subset of positive integers N A ⊂ N with uniformly bounded gaps such that As the example in Introduction shows this statement does not hold true, in general, if we take lim inf over all N → ∞. On the other hand, if q j = 0 for all j then (2.1) was proved in [9] (see also [13]) with lim inf over all N → ∞ and it was shown there how such result yields the Szemerédi type theorem. Recall briefly the latter argument. Let {0, 1} Z = {ω = (ω i ) : ω i ∈ {0, 1}, −∞ < i < ∞} be the space of sequences, (T ω) i = ω i+1 be the left shift and consider the special sequencē ω = (ω i ) ∞ i=1 whereω i = 1 if and only if i ∈ Λ) with Λ ⊂ Z being a subset of integers with a positive upper density (called, also the upper Banach density), i.e., for some sequence of intervals with b n − a n → ∞ as n → ∞, denoting by |Γ| the number of elements in a set Γ. Take X =the closure in {0, 1} Z of {T nω } ∞ n=−∞ then any weak limit µ of the sequence of measures µ n = (b n − a n ) −1 bn j=an δ T jω (where δ ω is the unit mass at ω) is a T -invariant probability measure on X and if It is easy to see that Λ contains an arithmetic progression of length if and only if −1 j=0 T −jb A is nonempty for some b = 0. More generally, Λ contains all numbers a + p j n + q j N, j = 0, 1, ..., , for some a ∈ Λ if and only if j=0 T −(pj n+qj N ) A is nonempty. Thus, Theorem 2.2 yields the following result. Corollary 2.3. Let Λ be a subset of nonnegative integers with a positive upper density and p j , q j , j = 0, 1, ..., , be integers satisfying conditions of Theorem 2.2. Then there exist ε > 0 and an infinite set of positive integers N Λ with uniformly bounded gaps such that for any N ∈ N Λ the interval [0, N ] contains not less than εN integers n with the property that for some a n , a n + p j n + q j N ∈ Λ for all j = 0, 1, ..., . (2.5) In particular, if = 2k, q j = −p j = k − j + 1 for j = 1, ..., k and p j = j − k, q j = 0 for j = k + 1, ..., 2k then for at least εN, N ∈ N Λ integers n in the interval [0, N ] the set Λ contains arithmetic progressions with length k + 1 of both step n and of step N − n.
Clearly, the above corollary does not hold true, in general, if we replace N Λ by all positive integers. Indeed, let Λ be the set of all even numbers then a + n and a + (N − n) cannot both belong to Λ if N is odd since then a + n and a + (N − n) cannot be both even.
Next, we will discuss an extension of the above results to families of commuting transformations, which will yield also a multidimensional version of Corollary 2.3. Let G be a multiplicative free finitely generated abelian group acting on X by measure µ-preserving transformations which are necessarily invertible. Any such group is isomorphic to a d-dimensional integer lattice Z d group. Let f i , i = 0, 1, ..., , be bounded measurable functions on X. As in the case of one transformation, in general, the limit does not exists if N → ∞ over all N . Nevertheless, we will see that the limit (2.6) exists in the L 2 sense if the abelian group G is totally weak mixing, i.e. it consists of weakly mixing transformations with the only exception of the identity.
Theorem 2.4. Suppose that distinct and different from the identity ( id) transformations T 1 , ..., T belong to a totally weak mixing free finitely generated abelian group G acting on (X, B, µ) by measure preserving transformations. LetT 1 , ...,T be invertible µ-preserving transformations of X, which commute with each other and with T 1 , ..., T . Then for any bounded measurable functions f j , j = 0, 1, ..., , where T 0 =T 0 = id.
Considering weak mixing and primitive extensions we will obtain the following generalization of Theorem 2.2.
Theorem 2.5. Let T j ,T j ∈ G, j = 1, ..., , where T 1 , ..., T are distinct and different from the identity id of G whileT 1 , ...,T are any transformations from G. Then for any A ∈ B with µ(A) > 0 there exists an infinite subset of positive integers N A ⊂ N with uniformly bounded gaps such that Clearly, if we set T j = T pj andT j = T qj then we arrive back at the setup of Theorem 2.2. ForT j , j = 1, 2, ..., , equal the identity (2.9) was proved in [12] with N A = N but our proof will follow more closely Chapter 7 of [10]. Similarly to the one transformation case Theorem 2.5 yields an extension of a multidimensional version of the Szemerédi theorem. Recall, the notion of the upper (Banach) density of a set Λ ⊂ Z d . For any two vectorsā = (a 1 , ..., a d ) where, again, |Γ| denotes the number of points in a set Γ. Since the group in Theorem 2.5 is isomorphic to Z d we can identify the actions of T j andT j with additions of some vectors z i ∈ Z d andẑ i ∈ Z d . For any ordered finite set Γ = {z 1 , ..., z }, z i ∈ Z d , n ∈ Z and a ∈ Z d we set nΓ = {nz 1 , ..., nz } and a + Γ = {a + z 1 , ..., a + z }. Next, if Γ = {z 1 , ..., z } andΓ = {ẑ 1 , ...,ẑ }, z i ,ẑ i ∈ Z d are two ordered finite sets then we write Γ +Γ = {z 1 +ẑ 1 , ..., z +ẑ }. Now Theorem 2.5 yields the following extension of the multidimensional Szemerédi theorem.
Corollary 2.6. Let Λ be a subset of Z d with a positive upper (Banach) density and let Γ = {z 1 , ..., z },Γ = {ẑ 1 , ...,ẑ } be two ordered sets of vectors from Z d such that z 1 , z 2 , ..., z are all distinct and non zero. Then there exist ε > 0 and an infinite set of positive integers N Λ with uniformly bounded gaps such that for any N ∈ N Λ the interval [0, N ] contains not less than εN integers n such that for some a n ∈ Λ, a n + nΓ + NΓ ⊂ Λ.
(2.10) Corollary 2.6 follows from Theorem 2.5 similarly to the one transformation case. Namely, we consider the action of Again, we take X to be the closure in {0, 1} Z d of the orbit Z dω of the special sequenceω = (ω v ,ω v = 1 if and only if v ∈ Λ) and an Z d -invariant measure µ comes as a weak limit as n → ∞ of the measures z∈B(ā(n),b(n)) δ zω where B(ā(n),b(n)), n = 1, 2, ..., are the same as in (2.9).
The proofs of the above results proceed similarly to [13] and [10], and so we will be trying to make a compromise between keeping the paper relatively self-contained and still avoiding too many repetitions of arguments from [13] and [10]. Though, of course, Theorem 2.2 is a particular case of Theorem 2.5, in order to facilitate the reading, we will consider first the one transformation case and then pass to the case of commuting transformations.
As we mentioned it in Introduction it is possible to give a shorter argument yielding Theorems 2.2 and 2.5, which will be presented in Section 5. This argument relies on quite general results from the recent paper [1]. In fact, this argument together with [5] yields Theorem 2.1 with linear terms p i n+q i N replaced by arbitrary polynomials p i (n, N ), i = 1, ..., , taking on integer values for integer pairs n, N and such that for any integer k there exist n and N with p i (n, N ) divisible by k for each i = 1, ..., . The results in [5] and [1] rely on advanced machinery developed with the purpose to derive convergence of nonconventional averages in various situations. The direct proof presented here, which proceeds along the lines of the original proof in [13] and [10], still seems to be useful, in particular, for focusing attention on limiting behavior of nonconventional arrays, which is a somewhat different point of view in comparison to other research on multiple recurrence problems and since Theorems 2.2 and 2.5 do not follow from [5] and [1]. Remark 2.7. As we have seen, the limit in Theorem 2.1 does not exist, in general, without the weak mixing assumption but it is plausible that the limit may exist over syndetic subsequences of N 's. It would be interesting also to obtain some uniform versions of Theorems 2.2 and 2.5 in the spirit of [3]. It would be also natural to find most general conditions, which ensure almost everywhere convergence of averages of nonconventional arrays though this question is not completely settled even for standard nonconventional averages (i.e. without dependence of summands on N ). Finally, we observe that it may be interesting to obtain a result of the type of Corollary 2.3 for the set of primes in place of a set of positive upper density extending to this situation the main result of [14]. In this case relevant sets of N 's will probably have gaps containing only bounded number of primes.
Next, we consider averages of nonconventional arrays (1.3) with higher degree polynomials P j (n, N ), j = 1, ..., . When we can separate dependencies on n and N , then applying the "PET-induction" from [4] for polynomials in n and, essentially, treating N as a constant there, we will obtain in Section 6 the following result.
Theorem 2.8. Let T 1 , ..., T k be different from identity transformations belonging to a totally weak mixing finitely generated free abelian group G acting on (X, B, µ) by measure preserving transformations andT 1 , ...,T k be invertible µ-preserving transformations of X which commute with each other and with T 1 , ..., T k . Furthermore, let P ij (n), i = 1, ..., , j = 1, ..., k, be polynomials taking on integer values on integers and suppose that the expressions and the expressions depend nontrivially on n (i.e. that they are nonconstant maps from Z to G). In addition, let Q ij (N ), i = 1, ..., , j = 1, ..., k, be arbitrary functions of N taking on integer values on integers. Then, for any bounded measurable functions f i , i = 1, ..., , If all T i 's andT i 's coincide with one transformation T then (2.11) becomes with P j (n, N )'s taking the form P j (n, N ) = P j (n) + Q j (N ) where P j (n)'s are nonconstant essentially distinct polynomials of n and Q j (N )'s are function of N , both taking on integer values on integers. It turns out that for general polynomials of n and N weak mixing may not be enough for the L 2 convergence in (1.3). In Section 6 we will show employing a version of a spectral argument suggested to us by Benji Weiss that already the averages do not converge in L 2 as N → ∞, in general, if T is only weak mixing. Still, strong mixing of T ensures convergence in L 2 for this example. More generally, we will prove the following result where we rely on the notion of strong m-mixing, which means that Theorem 2.9. Let P j (n, N ), j = 1, ..., , be nonconstant essentially distinct polynomials of n and N (i.e. P i (n, N ) − P j (n, N ), i = j is not a constant identically) taking on integer values on integers and nontrivially depending on n (i.e. P i (n, N ) is not just a polynomial of N ). If T is a strongly 2 -mixing invertible transformation of (X, B, µ) then (2.12) holds true for any bounded measurable functions f j , j = 1, ..., .
Observe that both conditions that the polynomials P j are essentially distinct and nontrivially depend on n are important for Theorem 2.9 to hold true. As to the first condition consider 1 T n (f T g) which by the L 2 ergodic theorem converges as N → ∞ to f T gdµ which usually differs from the product of integrals of f and g. As to the second condition we can consider It would be natural to try to show that for ≥ 2 strong mixing (i.e. 2-mixing) is not enough, in general, for Theorem 2.9 to hold true but this is not easy since then we would have to construct an example of a 2mixing but not 2 -mixing transformation which is a version of the old open problem attributed to Rokhlin.
We observe that such dynamical systems as topologically mixing subshifts of finite type, Axiom A diffeomorphisms and expanding transformations considered with an invariant Gibbs measure constructed by a Hölder continuous function (potential) are strong mixing of all orders so the above theorem is applicable for them. This is also true for the Gauss map T x = 1/x mod 1, x ∈ (0, 1), T 0 = 0 considered with its Gauss invariant measure µ(Γ) = 1 ln 2 Γ dx 1+x , as well as some other maps of the interval. Actually, mixing of all orders follows from the property called in probability α-mixing (or strong mixing) and the above dynamical systems have this property (and even stronger property called ψ-mixing with exponential speed, see, for instance, [6], [17] and [7]).
These notions are defined via two parameter families of σ-algebras F mn ⊂ F, −∞ < m ≤ n < ∞ on a probability space (X, F, P ) such that F mn ⊂ F m n if m ≤ m ≤ n ≤ n . We define also F mn for m = −∞ and n < ∞, for m > −∞ and n = ∞ or for m = −∞ and n = ∞ as minimal σ-algebras containing F kn for all k > −∞, containing F ml for all l < ∞ or containing F kl for all −∞ < k ≤ l < ∞, respectively. Such family of σ-algebras is called α-mixing if Now, we have the following result which is probably well known but for readers' convenience we provide details here. (2.14) then applying the definition of the mixing coefficient α subsequently we obtain that and since ε > 0 is arbitrary (2.14) follows.
Observe, that a typical application of the above setup is in the symbolic framework where X is a sequence space, T is the left shift and the σ-algebras F mn are generated by the cylinder sets for which the sequence elements on places from m to n are fixed. This can be extended to dynamical systems having appropriate symbolic representations via, for instance, Markov partitions.
3. One transformation case. In this section we will establish Theorems 2.1, 2.2 and Corollary 2.3.

Factors and extensions.
The strategy of our proof is the same as in [13]. It is based on the notions of factors and extensions. Recall, that if T is a measure preserving transformation of a probability space (X, B, µ) and T −1 B 1 ⊂ B 1 ⊂ B then (X, B 1 , µ, T ) is called a factor of (X, B, µ, T ) while the latter is called an extension of (X, B 1 , µ, T ). The latter factor is said to be nontrivial if B 1 contains sets of measure strictly between 0 and 1. It is often more convenient to view factors in the following equivalent way (see [13] for more details). Namely, the factor (X, B 1 , µ, T ) is identified with a system (Y, D, ν, S) such that for some measurable onto map π : X → Y we have πµ = ν, πT = Sπ and B = π −1 D. Furthermore, µ disintegrates into µ y , y ∈ Y so that µ = µ y dν(y) and T µ y = µ Sy ν-almost everywhere (a.e.).
Next, let g ∈ L 2 (X, B, µ) and let Y = (Y, D, ν, S) be a factor of (X, B, µ, T ). Following [13] we set This is essentially the conditional expectation E(g|B 1 ) provided (Y, D, ν, S) is identified with (X, B 1 , µ, T ). Since B 1 = π −1 D and Y = πX then E(g|B 1 ) is constant on π −1 y for ν-almost all y, and so this conditional expectation can be viewed as a function on Y . Since we refer often to [13] we will keep the notations from there though they differ slightly from the way conditional expectations with respect to σ-algebras are written in probability. We will use also the following well known formulas provided f and f g are integrable.
Fix a measure preserving system (X, B, µ, T ) and let B 1 ⊂ B be a T -invariant σ-subalgebra. If (2.3) holds true for any A ∈ B 1 , and p j , q j , j = 0, 1, ..., , all satisfying the conditions of Theorem 2.2 then we say that the action of T on the factor (X, B 1 , µ) is generalized Szemerédi (GSZ). To make this shorter we will also say in this case that the action of T on B 1 is GSZ and if (X, B 1 , µ, T ) is identified with (Y, D, ν, S) then this is equivalent to saying that the action of S on D is GSZ.
Similarly to [13] we can see that the set of factors for which T is GSZ contains a maximal element and that no proper factor can be maximal. The proof of Theorem 2.2 is based on the notions of relative weak mixing and relative compact extensions of other factors, which will be defined below. We will show that if the action T is GSZ for smaller factor then it is also GSZ for a larger factor which is either relative mixing or relative compact with respect to the smaller factor. Considered together with two following facts this will yield our result. First, similarly to [13] we see that if T is GSZ for a totally ordered (by inclusion) family of factors {B α } (i.e. factors (X, B α , µ)) then T is GSZ for sup α B α (i.e. for (X, sup α B α , µ)) where the latter is the minimal σ-algebra containing each B α . Secondly, we rely on the general result from [13] saying that if X = (X, B, µ, T ) is an extension of Y = (Y, D, ν, S), which is not relative weak mixing, then there exists an intermediate factor X * between Y and X so that X * is a (relative) compact extension of Y.
Proof. The proof proceeds similarly to Theorem 8.3 in [13]. Recall, that the conditional expectations E(f j |Y) can be viewed as functions in both L ∞ (X, B, µ) and in L ∞ (Y, D, ν), which is identified with L ∞ (X, B 1 , µ), and so this conditional expectation is B 1 -measurable. Denote the assertions (3.2) and (3.3) by A m and B m , respectively, where both mean that they hold true for all relatively weak mixing extensions of (Y, D, ν, S) and all L ∞ functions on corresponding spaces. First, observe that A 0 is obvious and B 0 will not play a role here so we can denote by B 0 any correct assertion. Next, we proceed by induction in m showing that (cf. [13]), (i) A m−1 implies B m and (ii) B m for (X,B,μ,T ) (which is also a relative weak mixing extension of (Y, D, ν, S)) implies A m for (X, B, µ, T ).
We start with (ii) which is easier. If f 0 is measurable with respect to B 1 = π −1 (D), the integrals in (3.2) have the form where g ⊗ g(y, z, z ) = g(y, z)g(y, z ) is a function onX whenever g is a function on X (see (6.6) in [13]). By B m for (X,B,μ,T ) the above limit equals Since the sum here is B 1 -measurable we can insert the conditional expectation inside of the integral concluding that the latter limit is zero since completing the proof of (ii). In order to prove (i) we observe that It follows that it suffices to prove B m under the additional condition that for some where H will be chosen large but much smaller than N . By the convexity of the function ϕ(x) = x 2 we have (up to O(H/N )), By integration and the fact that T is measure preserving, Set r = k − n and observe that a pair (n, k) appears in the above sums only if |r| = |k − n| < H and then for H − |r| values of j we rewrite the above estimate as Tp Inserting conditional expectation inside the integral and using A m−1 for a fixed H, every r such that |r| < H and N large enough we can replace the integral term in the above inequality by Hence, we obtain Next, we estimate the integrals appearing in (3.4) by Since E(f j0 |Y) = 0 we obtain from A 1 for the case when q 1 = 0, which is proved as Lemma 8.1 in [13] (where the ergodicity ofT by the definition of weak mixing extensions is used), that Hence, most of the terms in the right hand side of (3.4) are small provided that H is large enough. Since all terms in the right hand side of (3. Now Theorem 2.1 is a particular case of (3.3) considering a trivial factor Y, i.e. such that the corresponding σ-algebra B 1 contains only sets of zero or full measure. As to Theorem 2.2 we will need the following corollary of Proposition 3.1. Proof. The result follows immediately from (3.3) in the same way as in Theorem 8.4 from [13].
We observe that Proposition 3.1 implies also that if (X, B, µ, T ) is a relative weak mixing extension of (X, B 1 , µ, T ) and (2.3) holds true for any A ∈ B 1 , µ(A) > 0 with lim inf taken over all N → ∞ then the same is true for any A ∈ B, µ(A) > 0, and so the restriction of lim inf to N ∈ N A comes not from relative weak extensions but from relative compact extensions which will be studied below.
3.3. Relative compact extensions. For brevity and following [13] we will drop here the word "relative" and will speak about compact extensions. Recall, that (X, B, µ, T ) is said to be a compact extension of (Y, D, ν, S) if there exists a set R ⊂ L 2 (X, B, µ) dense in L 2 (X, B, µ) and such that for every δ > 0 there exist functions g 1 , ..., g m ∈ L 2 (X, B, µ) satisfying where, again, µ = µ y dν(y).
As explained in Section 3.1 above the proof of Theorem 2.2 will be complete after we establish the following result. Proof. We will follow the proof of Theorem 9.1 from [13] with a modification at the end. For an arbitrary A ∈ B with µ(A) > 0 we have to show that (2.3) holds true. First, similarly to [13] we conclude that without loss of generality the indicator function f = I A of A can be assumed to belong to the set R appearing in the above definition of compact extensions. We will assume for convenience that T is ergodic, otherwise pass to an ergodic decomposition. Then S is also ergodic. The condition f ∈ R is equivalent to saying that the sequence {T k f } k∈Z is totally bounded, or relatively compact, in L 2 (µ y ) for almost all y. Since T µ y = µ Sy we conclude that the total boundedness of {T k f } k∈Z in L 2 (µ y ) for y in a set of positive measure already implies for an ergodic S that {T k f } k∈Z is totally bounded in a uniform manner in L 2 (µ y ) for almost all y.
Denote by ⊕ j=0 L 2 (µ y ) the direct sum of + 1 copies of L 2 (µ y ) endowed with the norm (f 0 , f 1 , ..., f ) y = max j f j L 2 (µy) . It is clear that if f ∈ R then the set where (·, ..., ·) y means that the vector function is considered on a fiber above y ∈ Y and, recall, f = I A ∈ R. Throwing away ν-measure zero set of y's we can assume that uniform estimates hold true on the whole Y .
Set ). Thus, we can assume without loss of generality that µ y (A) = 0 for all y ∈ A 1 . We consider only y ∈ A 1 for which the corresponding elements of L( , f, y) have all nonzero components, and so these elements have norm ≥ 1 2 µ(A) in L 2 (µ y ). The corresponding subset of L(m, f, y) is denoted by L * ( , f, y) and it is still uniformly totally bounded. For each y ∈ A 1 and ε > 0 let M (ε, y) denote the maximum cardinality of ε-separated sets in L * ( , f, y), which is a finite monotone decreasing piece-wise constant function of ε with at most countably many jumps. Since M (ε, y) is measurable as a function of y there exist ε 0 < µ(A)/10 , η > 0 and A 2 ⊂ A 1 with ν(A 2 ) > 0 so that M (ε, y) equals a constant M for ε 0 − η ≤ εε 0 and y ∈ A 2 .
Take y 0 ∈ A 2 and find integers n 1 , ..., n M and N 1 , ..., N M so that {(f, T p1nj +q1Nj f, ..., T p nj +q Nj f )}, j = 1, 2, ..., M, is a maximal ε 0 -separated set in L * ( , f, y 0 ). Next, T p l ni+q l Ni f − T p l nj +q l Nj f L 2 (µy) , 1 ≤ i < j ≤ M , l = 0, 1, ..., , as functions on Y are measurable and y 0 can be chosen so that each neighborhood of values of these functions at y 0 occurs with positive measure in the set A 2 . Let now A 3 be the subset of A 2 of points y such that for any i, j, l with 1 ≤ i ≤ j ≤ M and 0 ≤ l ≤ . Then ν(A 3 ) > 0 by the choice of y 0 . Now we use the assumption that the action of S on (Y, D, ν) is GSZ, applying it to A 3 . Let n, N ∈ Z, n ≤ N be such that and let y ∈ l=0 S −(p l n+q l N ) A 3 . Since S p l n+q l N y ∈ A 3 for l = 0, 1, ..., , and A 3 ⊂ l=0 S −(p l nj +q l Nj ) A 1 for j = 1, ..., M by the definition of L * ( , f, y) (together with (3.6)) then S p l (nj +n)+q l (Nj +N ) y ∈ A 1 for l = 0, 1, ..., and j = 1, ..., M .
Similarly to [13] we conclude that the vectors {(f, T p1(n+nj )+q1(N +Nj ) f, T p2(n+nj )+q2(N +Nj ) f, ..., T p (n+nj )+q (N +Nj ) f ), j = 1, ..., M, } are ε 0 − η separated in L * ( , f, y) for y ∈ l=0 S −(p l n+q l N ) A 3 , and so these vectors form a maximal such set which must be then ε 0 − η dense in L * ( , f, y). Since (f, f, ..., f ) ∈ L * ( , f, y) there exists j such that {(f, T p1(n+nj )+q1(N +Nj ) f, ..., T p (n+nj )+q (N +Nj ) f )} is ε 0close to it. By the choice of ε 0 this implies µ y l=0 T −(p l (n+nj )+q l (N +Nj )) A = l=0 T p l (n+nj )+q l (N +Nj ) f dµ y ≥ 9 10 µ y (A) > 1 3 µ(A). The index j depends on y, so now we sum over j to obtain that for each y ∈ Now we sum in n, 1 ≤ n ≤ N and multiply by 1 N , Now we use the assumption that the action of S on (Y, D, ν) is GSZ which implies that where N is an infinite set of positive integers with bounded gaps. Define N j = N + N j = {N + N j : N ∈ N }, j = 1, ..., M, which are also sets with bounded gaps. Clearly, (3.9) implies that there exists ε > 0 such that for any N ∈ N large enough Then by (3.7) and (3.8) we obtain that for any N ∈ N large enough Then by (3.10) for any N ∈ N large enough there exists j such that N + N j ∈ N A . Hence, the gaps in N A are bounded by the bound on gaps of N plus 2 max 1≤j≤M N j and, clearly, This completes the proof of Proposition 3.3, as well, as of Theorem 2.2.
4. Commuting transformations. In this section we will obtain Theorems 2.4, 2.5 and Corollary 2.6.

4.1.
Factors and extensions with respect to an abelian group of transformations. Let G be a commutative group of transformations acting on (X, B) so that all T ∈ G preserve a probability measure µ on (X, B). A probability space (Y, D, ν) is called a factor of (X, B, ν) if there exists an onto map π : X → Y such that πµ = ν and π −1 D = B. Define the action of G on (Y, D, ν) by T πx = πT x for each T ∈ G and x ∈ X. This action preserves the measure ν and we say that the system (X, B, µ, G) is an extension of (Y, D, ν, G) and the latter is called a factor of the former. Clearly, this definition is compatible with the one given for one transformation in Section 3.1.
Next, (X, B, µ, G) is called a relative weak mixing extension of (Y, D, ν, G) if (X, B, µ, T ) is a relative weak mixing extension of (Y, D, ν, T ) for each T ∈ G, T = id as defined in Section 3.2. Furthermore, (X, B, µ, T ) is called a (relative) compact extension of (Y, B, ν, G) if (3.5) holds true simultaneously for all T ∈ G (with the same R, δ and g 1 , ..., g m ) for ν-almost all y ∈ Y . Finally, following [10] we call an extension α : (X, B, µ, G) → (Y, D, ν, G) primitive if G is the direct product of two subgroups G = G c × G w where (X, B, µ, G c ) is a compact and (X, B, µ, G w ) is a relative weak mixing extensions of (Y, D, ν, G c ) and of (Y, D, ν, G w ), respectively.
Next, X = (X, B, µ, G) as above will be called GSZ if (2.8) holds true for any A ∈ B with µ(A) > 0 and all T j ,T j ∈ G, j = 0, 1, ..., , where the set N A depends on A and T j ,T j 's, T 0 =T 0 = id and T 1 , ..., T are distinct and different from the identity. Next, we rely on the Theorem 6.17 in [10] describing the structure of extensions and show similarly to Proposition 7.1 in [10] that if each (X, B β , µ, G) is GSZ for totally ordered (by inclusion) family of σ-algebras then (X, sup β B β , µ, G) is also GSZ. It follows that in order to establish Theorem 2.5 it suffices to show that any primitive extension (X, B, µ, G) of (Y, D, ν, G) is GSZ provided (Y, D, ν, G) is GSZ itself.  First, observe that A 0 is obvious and B 0 does not play role here so we can denote by it any valid assertion. The proof proceeds essentially in the same way as for one transformation. We start with (ii) which is easier. As in the one transformation case we assume first that f 0 is B 1 = π −1 (D)-measurable. Then the integrals in (4.1) have the form Hence, as in Section 3.2 we can assume that E(f 0 |Y) = 0. Then the left hand side of (4.1) takes the form By B m for (X,B,μ) andT j ,T j , j = 1, ..., m, the above limit equals Since the sum here is B 1 -measurable we can insert the conditional expectation inside of the integral concluding as in Section 3.2 that the latter limit is zero completing the proof of (ii). In order to prove (i) we observe that This enables us to prove B m under the additional condition that for some where H will be chosen large but much smaller than N . By convexity of the function ϕ(x) = x 2 we have (up to O(H/N )), Integrating the above inequality we obtain and we observe thatT i , i = 1, ..., m − 1, remain distinct and different from the identity. Writing r = k − n we conclude similarly to Section 3.2 that this inequality implies that Inserting conditional expectation inside the integral in the right hand side of (4.3) and using A m−1 for a fixed H, every r such that |r| < H and N large enough we can replace the integral term in the above inequality by N ). Next, we estimate the integrals appearing in (4.4) by Since we assume that E(f j0 |Y) = 0 then by A 1 for the case whenT 1 = id which is proved as Lemma 8.1 in [13] (where ergodicity ofT j0 is used which we know from the definition of relative weak mixing), The concluding argument is the same as in Proposition 3.1 which yields A m and completes the proof of Proposition 4.1.

Primitive extensions. Let
are relative compact and weak mixing extensions, respectively. Here G is supposed to be a finitely generated free abelian group and µ = µ y dν(y). It follows from Proposition 4.1 that denoting by #Γ the cardinality of a set Γ.
The implications of compactness which will be needed below are summarized in the following lemma (see Lemma 7.10 in [10]). Lemma 4.3. Let A ∈ B with µ(A) > 0. Then we can find a measurable set A ⊂ A with µ(A ) as close to µ(A) as we like and such that for any ε > 0 there exist a finite set of functions g 1 , ..., g K ∈ H = L 2 (X, B, µ) and a measurable function k : Y × G c → {1, ..., K} with the property that RI A − g k(y,R) y < ε for ν almost all y ∈ Y and every R ∈ G c .
We will need also the following consequence of the multidimensional van der Waerden theorem. Proof. The assertion (i) is Lemma 7.11 in [10]. In order to prove (ii) we apply (i) withk and S i = T iTi , i = 1, ..., H, in place of k and T 1 , ..., T H , respectively, there.
With Ψ and T given by (i) for suchk and S i 's we obtain The following is the main result of this section which, as explained in Section 4.1, yields Theorem 2.5.
Proposition 4.5. Let α : X = (X, B, µ, G) → Y = (Y, D, ν, G) be a primitive extension and Y be a GSZ system. Then X is also a GSZ system.
Proof. We proceed similarly to Proposition 7.12 in [10] adapting the proof there to our situation. Let A ∈ B with µ(A) > 0 and let T 1 , ..., T ,T 1 , ...,T ∈ G. Replacing A by a slightly smaller set, we can assume that I A has the compactness property described in Lemma 4.3. Writing µ(A) = µ y (A)dν(y), we see that there exists a measurable subset B ⊂ Y, ν(B) > 0 with µ y (A) > a = µ(A)/2 for all y ∈ B. We express T j ,T j as products of elements in G c and in G w and assume without loss of generality that for all n ≤ N , where .., r, S j ,Ŝ j ∈ G w , j = 1, ..., s, and S 1 , ..., S s are distinct. Since the set of transformations in the right hand side of (4.6) is at least as large as the one in the left hand side of (4.6) then (2.8) will follow if we prove that for an infinite syndetic set N A ⊂ N, Let a 1 < a s . We will show that there exist an infinite syndetic set N A ⊂ N and ε > 0 such that for each N ∈ N A there exist a subset P N ⊂ {1, 2, ..., N } with #P N ≥ εN and η > 0 such that for every n ∈ P N we can find a set B n,N ⊂ Y , B n,N ∈ D with ν(B n,N ) > η satisfying Integrating the inequality (4.8) over B n,N and taking into account (4.6) we obtain that for any N ∈ N A , and both (4.7) and (2.8) will follow.
The set B n,N will be determined by two requirements. For a 1 < a 2 < a s we will require that µ y ( whenever n ∈ P N and y ∈ B n,N . Choose ε 1 > 0 such that if (where denotes the symmetric difference) then (4.9) implies (4.8). Then we require that (4.10) holds true for any n ∈ P N and y ∈ B n,N .
Suppose now that P N and {B n,N , n ∈ P N , N ∈ N A } have been found so that (4.10) is satisfied for all n ∈ P N , y ∈ B n,N and, in addition, Now, applying Lemma 4.2 with f = I A , ε < a s − a 2 and δ < 1 2 η we obtain with ψ defined in Lemma 4.2, for all y ∈ B n,N except for a setB n,N of y's of measure ν less than 1 2 η and for n ∈ N ε,δ,N . SetP N = P N \ N ε,δ,N andB n,N = B n,N \B n,N then considering new P n =P N and B n,N =B n,N we obtain (4.9). The problem is reduced to finding P N and B n,N such that (4.10) and (4.11) are satisfied.
Next, we replace (4.10) by the requirement that there exists g ∈ H y = L 2 (X, B, µ y ) such that (where · y = · L 2 (X,µy) ) with ε 2 < 1 2 √ ε 1 . Since R 1 =R 1 = id we will have which gives (4.10) since Now recall that A was chosen to comply with conditions of Lemma 4.3. We can therefore find g 1 , g 2 , ..., g K ∈ L 2 (X, µ) and a function k : Y × G c → {1, 2, ..., K} so that RI A − g k(y,R) y < ε 2 for every R ∈ G c and ν-almost all y. We define now a sequence of functions k q,Q : Y × G × G → {1, 2, ..., K} by k q,Q (y, SR,ŜR) = k(S qŜQ y, R qRQ ) for integers 1 ≤ q ≤ Q and transformations R,R ∈ G c , S,Ŝ ∈ G w . This is well defined since G = G c × G w is a direct product. Then for ν-almost all y, S q R qŜQRQ I A − S qŜQ g k q,Q (y,RS,RŜ) y = R nRN I A − g k(S qŜQ y,R qRQ ) S qŜQ y < ε 2 . (4.13) Fix q ≤ Q and y for which (4.13) holds true and apply Lemma 4.4(ii) to the function k(·, ·) = k q,Q (y, ·, ·) on G × G. Independently of q, Q and y there is a finite set Ψ ⊂ G and a number M such that k q,Q (y, T R m i S m j ,R m iŜ m i ) takes on the same value k for 1 ≤ i ≤ r, 1 ≤ j ≤ s, for some T ∈ Ψ and some m with 1 ≤ m ≤ M . Then if T = R S and g (q,y) is the corresponding g k we obtain from (4.13) for Qm j ((R ) −q g (q,y) ) (T ) q y (4.14) where we took into account that g (q,y) = g k = g k q,Q (y,(R R m i )(S S m j ),R m iŜ m i ) . We have shown that for every q = 1, ..., Q, Q ∈ N and ν-almost all y ∈ Y there exist m and T , both having a finite range of possibilities, such that (4.12) is satisfied with n = qm and N = Qm for (T ) q y in place of y.
Next, we will produce the set P N and the sets B n,N , n ∈ P N such that both (4.11) and (4.12) are satisfied for (y, n), y ∈ B n,N . For each q form the set where the intersection is taken over j, m, T with 1 ≤ j ≤ s, 1 ≤ m ≤ M, T ∈ Ψ. Using the fact that (Y, D, ν) is a GSZ system we conclude that for each Q from an infinite syndetic set N ⊂ N there exists P Q ⊂ {1, ..., Q} with #P N ≥ εQ for some ε > 0 independent of Q and such that ν(C q ) > η for some η > 0 and all q ∈ P Q . Now let y ∈ C q for q ∈ P Q . There exist m = m(q, y) and T = T (q, y) such that (T ) q y (in place of y) satisfies (4.12) for n = qm, N = Qm and q ≤ Q. In addition, (T ) q y also satisfies (4.11) for these T and m taking into account that by the definition of C q this condition is satisfied with n = mq, N = mQ by all (T ) q y and all m such that T ∈ Ψ, 1 ≤ m ≤ M since (T ) q y ∈ j,m S −mq jŜ −mQ j B whenever y ∈ C q , and so S mq jŜ mQ j (T ) q y ∈ B. Let J be the total number of possibilities for (m, T ). Then for a subset D q ⊂ C q with ν(D q ) > η J , m(q, y) and T (q, y) take a constant value, say, m(q) and T (q), respectively. We now define n(q) = qm(q) and set P Q = {n(q) ≤ Q : q ∈ P Q } and B n(q) = (T (q)) q D q . Then ν(B n(q) ) = ν(D q ) > η /J, S n(q) jŜ m(q)Q j B n(q) ∈ B, j = 1, ..., s, and for y ∈ B n(q) , 1 ≤ i ≤ r, 1 ≤ j ≤ s for an appropriately defined g (g,y) . Finally, #{n(q), q ≤ Q} ≥ ε/M and the gaps of the set {m(q)Q, Q ∈ N } are bounded by M times of the maximal gap of N . This complets the proof of Proposition 4.5, as well as of Theorem 2.5.
5. Short proofs of Theorems 2.2 and 2.5. Recall that F k ⊂ Z d , k = 1, 2, ... is called a Følner sequence if the cardinality of the symmetric difference (n + F k ) F k is o(|F k |) as k → ∞ for anyn ∈ Z d . Now, suppose that for any Følner sequence F k ⊂ Z 2 , k = 1, 2, ..., lim inf k→∞ 1 |F k | (n,m)∈F k a n,m > 0 (in fact, we will need this only when F k 's are squares). Then there exists ε > 0 and an integer M ≥ 1 such that in any square R ⊂ Z 2 with the side of length M we can find (n, m) ∈ R such that a n,m > ε. Indeed, if this were not true then we could find a sequence of squares R j ⊂ Z 2 with sides of length M j → ∞ as j → ∞ and a sequence ε j → 0 as j → ∞ such that a n,m ≤ ε j for all (n, m) ∈ R j . Then, of course, lim inf j→∞ 1 |R j | (n,m)∈Rj a n,m ≤ lim inf j→∞ ε j = 0, which contradicts our assumption since {R j } ∞ j=1 is a Følner sequence. Clearly, this argument remains true for any Z d replacing squares by d-dimensional boxes but we will not need this here. Now, let M, ε > 0 be numbers whose existence was established above and assume that a n,m ≥ 0 for all integer n and m. Set Q j = {(n, m) : j(M + 1) ≤ m < (j + 1)(M + 1) and 0 < n ≤ j(M + 1)}. Then Q j contains j disjoint squares with the side of length M , and so (n,m)∈Qj a n,m ≥ εj.
Hence, there exists j(M + 1) ≤ N j < (j + 1)(M + 1) such that Next, we will apply the above arguments to the situation of Theorem 2.5. Let T j ,T j , j = 1, ..., , be as in Theorem 2.5 commuting measure preserving transformations of a measure space (X, B, µ) and set S i.e. the limit exists and it is positive. Taking a n,m = µ j=0 (T n jT m j ) −1 A we obtain by the above arguments that there exists an infinite set with bounded gaps N A such that (2.8) holds true, completing the proof of Theorem 2.5.
Next, we derive a polynomial version of Theorem 2. 6. Nonconventional polynomial arrays.
6.1. Proof of Theorem 2.8. We start with the proof of Theorem 2.8 which proceeds close to the proof of Theorem D in [4]. First, by changing functions f j we can always assume without loss of generality that P ij (0) = 0, Q ij (0) = 0, i = 1, ..., , j = 1, ..., k. Next, we consider the particular case which will serve as an initial step of the "PET" induction which will be described later on. Let P 11 (n) = pn for some integer p, P ij (n) ≡ 0 if either i > 1 or j > 1, Q ij (N ), i = 1, ..., , j = 1, ..., k be functions taking on integer values on integers and f i , i = 1, ..., be any bounded measurable functions such that f 1 dµ = 0. Then since T is weakly mixing, whence T p is weakly mixing and, in particular, ergodic, and so the last equality follows from the L 2 ergodic theorem.
In order to deal with the general case of Theorem 2.8 we will need the following version of the van der Corput theorem whose proof is the same as of Theorem 1.4 in [2] (see also Theorem 1.5 there), and so we refer the reader there. This follows also from uniform versions of the van der Corput theorem (see, for instance, [20]). where ·, · is the inner product and D − lim h denotes the limit as h → ∞ outside a set of integers having zero upper density. Then where · is the Hilbert space norm.
Next, we will describe the "PET induction" in our circumstances where we closely follow [4] and refer the reader there for more details. Let P j , j = 1, ..., k, be any polynomials and Q j , j = 1, ..., k be any functions taking on integer values on integers and such that P j (0) = Q j (0) = 0. Similarly to [4] we will call and Φ(n, N ) = ϕ(n)ψ(N ) P -polynomial expressions where P indicates the fact that Q i 's are not necessarily polynomials. Products of P -polynomial expressions and their inverses are Ppolynomial expressions, and so they form a group P E. Clearly, if Φ(n, N ) = ϕ(n)ψ(N ) ∈ P E then Φ −1 (n 0 , N )Φ(n + n 0 , N ) = ϕ −1 (n 0 )ϕ(n + n 0 ) ∈ P E. The degree, deg(ϕ(n)) of ϕ(n) = T is the maximal degree of polynomials P j , j = 1, ..., k and the degree, deg(Φ(n, N )) of a P -polynomial expression Φ(n, N ) = ϕ(n)ψ(N ) is defined as the degree of ϕ. Again, following [4] we define the weight of a P -polynomial expression Φ(n, N ) = ϕ(n)ψ(N ) with as the pair (r, d) such that degP r+1 = ... =degP k (n) = 0, degP r (n) = d ≥ 1. The weight (r, c) is greater than (s, d) if r > s or if r = s and c > d.
Two P -polynomial expressions Φ 1 (n, N ) = ϕ 1 (n)ψ 1 (N ) and Φ 2 (n, N ) = ϕ 2 (n) are called equivalent if they have the same weight (r, d) and the leading coefficient of the polynomials P (1) r and P (2) r coincide, as well. Any finite subset of P E is called a system and the degree of a system is the maximal degree of its elements. To every system a weight matrix (N rd , 1 ≤ r ≤ k, 1 ≤ d ≤ D) is associated where N rd is the number of equivalence classes formed by the elements of the system whose weights are (r, d) and D is the maximal degree of the polynomials P ij appearing in Theorem 2.8. As in [4] we say that the weight matrix M = (N rd , 1 ≤ r ≤ k, 1 ≤ d ≤ D) precedes the weight matrix M = (N r,d , 1 ≤ r ≤ k, 1 ≤ d ≤ D) if for some (r 0 , d 0 ), N r0d0 = N r0d0 −1, N rd = N rd when r ≥ r 0 and d ≥ d 0 except for r = r 0 and d = d 0 , N rd = 0 and N rd are arbitrary nonnegative integers when r ≤ r 0 and d ≤ d 0 except for r = r 0 and d = d 0 (for a picture explanation see [4]). Now observe that the system appearing in (6.1) has the weight matrix M 0 = (N rd ) where N 11 = 1 and N rd = 0 if (r, d) = (1, 1). Thus, (6.1) proves Theorem 2.8 for any system with the weight matrix M 0 . Next, we proceed step by step considering systems with weight matrices M 0 , M 1 , M 2 , ..., M K such that each M i precedes M i+1 , i = 0, 1, ..., K − 1 arriving finally to the matrix M K with arbitrary predefined weights N rd , 1 ≤ r ≤ k, 1 ≤ d ≤ D (for a graphical explanation of this see [4]). Our goal is to show that if Theorem 2.8 is valid for any system with the weight matrix M i then it is valid for any system with the weight matrix M i+1 which by induction will yield Theorem 2.8.
Next, we remark that without loss of generality we can assume that f i dµ = 0 for any i = 1, ..., which is the result of the equality transform the left hand side of (2.11) into a sum of similar product expressions where all functions have zero integrals and the result to be proved now is that all corresponding limits are zero. Thus, writing As in [4] we can assume without loss of generality that T 1 , ..., T k are linearly independent elements of the basis of the finitely generated free abelian group G. Then ϕ(n) = T P1(n) 1 · · · T P k k = id for some polynomials P 1 , ..., P k implies P 1 = · · · = P k = 0. By Lemma 6.1, (6.4) would follow if Next, we will need the following result.
Lemma 6.2. Let nonconstant polynomials P 1 (n, N ), P 2 (n, N ), ..., P k (n, N ) of n and N be essentially distinct and nontrivially depend on n. Then for each sufficiently large h the polynomials P 1 (n, N ), P 2 (n, N ), ..., P k (n, N ), P 1 (n + h, N ), ..., P k (n + h, N ) are pairwise essentially distinct (where h is viewed as a constant) except for pairs P i (n, N ), P i (n + h, N ) where P i (n, N ) = p i n + Q i (N ) and then P i (n + h, N ) − P i (n, N ) = a i h.
Proof. Clearly, P 1 (n + h, N ), ..., P k (n + h, N ) are essentially distinct since this was true for P 1 (n, N ), P 2 (n, N ), ..., P k (n, N ). It remains to show that P i (n, N ) and P j (n + h, N ) are essentially distinct for any i, j = 1, ..., k provided h is large enough and either i = j or i = j and P i (n, N ) does not have the form P i (n, N ) = p i n + Q i (N ). Clearly, this is true if P i and P j have different degrees in n, and so we can assume that they have the same degree d in n. Then we can write P i (n, N ) = n d V i (N ) + n d−1 W i (N ) + r i (n, N ) and P j (n, N ) = n d V j (N ) + n d−1 W j (N ) + r j (n, N ) where V i (N ), V j (N ) are nonzero while W i (N ), W j (N ) are arbitrary polynomials in N only and r i (n, N ), r j (n, N ) are polynomials of degree less than d − 1 in n. Then P j (n + h, N ) = n d V j (N ) + n d−1 (W j (N ) + dhV j (N )) +r j,h (n, N ) wherer j,h (n, N ) is a polynomial whose degree in n is less than d−1 having coefficients depending on h. Since V i (N ) is a nonzero polynomial then for any h large enough W i (N )+dhV i (N ) = W j (N ) and if d > 1 then P j (n + h, N ) and P i (n, N ) are essentially distinct provided h is large enough. The case d = 0 is ruled out by our assumptions. If d = 1 and i = j then either V i = V j or W i = W j and either W i or W j is nonconstant. In both of these cases P j (n + h, N ) and P i (n, N ) are essentially distinct. Next, if d = 1 and i = j then P i (n + h, N ) − P i (n, N ) = hV i (N ), and so P i (n + h, N ) and P i (n, N ) are essentially distinct if and only if V i is nonconstant concluding the proof of the lemma (where, in fact, we did not use that P i 's depend polynomially on N ).
Observe, that if deg(ϕ i (n)) ≥ 2, ϕ i (n) = T Pi1(n) 1 · · · T P ik (n) k then max 1≤j≤k deg(P ij (n)) ≥ 2, and it follows from Lemma 6.2 that ϕ i (n + h)ϕ −1 i (h) depends nontrivially on n provided h is large enough. Rearranging P -polynomial expressions if needed, we can assume that deg(ϕ i (n)) = 1 for i = 1, ..., q and deg(ϕ i (n)) ≥ 2 for i = q + 1, ..., k. The condition deg(ϕ i (n)) = 1 means that P ij (n) = p ij n for some integers p ij , j = 1, ..., k. Hence, in this case for some l between 1 and k andΦ l (n, N ) is either Φ l (n, N ) for some l between 1 and k or it is Φ l (n+h, N )ϕ −1 i (h) for some l between q + 1 and k. Consider .., k} and suppose, without loss of generality, thatΦ 1 (n, N ) has the minimal weight inÃ h . Since all ϕ i (n) = id then w(Φ 1 (n, N ) is measure preserving and we can write N ). It follows from the assumptions of Theorem 2.8 that ϕ i (n) ≡ ϕ l (n) and ϕ i (n + h) ≡ ϕ l (n + h) for i, l = 1, ..., k, i = l. Writing Φ i (n, N ) =φ i (n)ψ i (N ) we see from here and Lemma 6.2 thatφ i (n) ≡φ l (n) for i = l and large enough h. WritingΦ i (n, N ) =φ i (n)ψ i (N ) we conclude from here thatφ i (n) ≡ id andφ i (n) ≡φ l (n) for i, l = 2, ..., k , i = l for all h large enough. Introduce the new system A h = {Φ i (n, N ), i = 2, ..., k }. In the same way as in [4] (referring the reader for more explanations there) we conclude that the weight matrix of A h precedes that of A. In order to invoke PET-induction we assume that Theorem 2.8 holds true for all systems whose weight matrices precede that of A. Hence, we have for A h , as N → ∞. Then by the Cauchy inequality Hence, by (6.6)-(6.8), If one of P ij (n), j = 1, ..., k is not linear then deg(Φ i (n, N )) =deg(ϕ i (n)) ≥ 2 andf l = f k for some l ≤ k , and so the last product in (6.6) equals zero yielding which together with (6.9) yields again (6.10) concluding the proof of Theorem 2.8 since the initial step of the induction is given by (6.1).
6.2. Nonconvergence under weak mixing. Next, we will show that, in general, weak mixing of T is not enough to ensure L 2 -convergence in (1.3) for general polynomials P j (n, N ), j = 1, ..., taking on integer values on integers even in the "conventional" case = 1. Consider the sum where T is a measure preserving transformation of a separable probability space (X, B, µ) and f is a bounded measurable function. Recal, that the Koopman operator U T f (x) = f (T x) is unitary and it has a spectral representation in the form where {e 2πiu , u ∈ Γ} is the spectrum of U T and E is the corresponding projection operator valued spectral measure (see, for instance, [16] or [21]). Then Observe that if u ∈ Γ ε,N then nu ∈ Γ nε,N and Γ ε,N ⊂ Γ nε,nN . Define inductively N 0 = 1 and N k+1 = [ 5N 2 k ε ], k = 0, 1, 2, ... where [a] is the integral part of a. Set also ε k = ε N k , k = 0, 1, .... Then Γ ε,nN k ⊂ Γ ε k ,N k for all n = 1, 2, ..., N k (6.14) and Γ ε = ∞ k=1 Γ ε k ,N k is a Cantor like set, in particular, it is a perfect set and for any k, max Let ν ε be a continuous (non-atomic) probability measure on Γ ε , say, constructed in the same way as the Cantor distribution on the standard Cantor set. Next, we introduce a spectral measure E (ε) concentrated on Γ ε by the standard formula E (ε) U g = I U g for each measurable function g on Γ ε and a measurable set U ⊂ Γ ε where I U is the indicator of U . The spectral measure E (ε) is continuous considering it on the probability space (Γ ε , ν ε ) since for each u ∈ Γ ε any function I {u} g is zero ν ε -almost everywhere. Next, we can find a transformation T such that its Koopman operator U T ϕ = T ϕ has the spectral representation U T = Γε e 2πiu dE (ε) u (see, for instance, Ch. 4 in [8]) and since E (ε) is a continuous spectral measure then T is weakly mixing (see, for instance, [15] or [22]). By (6.15), T nN k f − f L 2 ≤ 2πε f L 2 (6.16) for any n = 1, 2, ..., N k , and all k = 1, 2, .... Hence Now, choose a function f such that f dµ = 0 and |f |dµ > 0. If the L 2 ergodic theorem holds true for the averages 1 N S N then 1 N k S N k L 2 → 0 as k → ∞ which leads to the contradiction in the above inequality if ε < 1 2π . 6.3. Proof of Theorem 2.9. For the proof of Theorem 2.9 we will need the following result. Next we can prove Theorem 2.9. As before, without loss of generality we can assume that, at least, one of functions f j has zero integral with respect to µ. Set x n,N L 2 = 0. (6.18) which according to Lemma 6.1 will follow if (6.2) holds true. Without loss of generality assume that 1, 2, ..., k, k ≤ are all indexes j such that P j (n, N ) = p j n + Q j (N ) for some nonzero integers p j and polynomials Q j in N taking on integer values on integers. Then x n,N , x n+h,N = j=1 T Pj (n,N ) f j j=1 T Pj (n+h,N ) f j dµ = k j=1 T pj n+Qj (N ) (f j T pj h f j ) j=k+1 T Pj (n,N ) f j j=1 T Pj (n+h,N ) f j dµ. By Lemma 6.2, P 1 (n, N ), ..., P (n, N ); P k+1 (n + h, N ), ..., P (n + h, N ) are essentially distinct polynomials, and so their pairwise differences p (1) ij (n, N ) = P i (n, N ) − P j (n, N ), p (2) ij (n + h, N ) = P i (n + h, N ) − P j (n + h, N ), i, j = 1, ..., , i = j and p (3) ij (n, N ) = P i (n, N ) − P j (n + h, N ), i = 1, ..., , j = k + 1, ..., are nonconstant polynomials of n and N . Since T is strongly 2 -mixing then for any ε > 0 and any bounded measurable functions g 1 , ..., g L with L ≤ 2 there exists K ε > 0 such that | L j=1 T mj g j dµ −