Local correlation entropy

Local correlation entropy, introduced by Takens in 1983, represents the exponential decay rate of the relative frequency of recurrences in the trajectory of a point, as the embedding dimension grows to infinity. In this paper we study relationship between the supremum of local correlation entropies and the topological entropy. For dynamical systems on graphs we prove that the two quantities coincide. Moreover, there is an uncountable set of points with local correlation entropy arbitrarily close to the topological entropy. On the other hand, we construct a strictly ergodic subshift with positive topological entropy having all local correlation entropies equal to zero. As a necessary tool, we derive an expected relationship between the local correlation entropies of a system and those of its iterates.


Introduction
A (topological) dynamical system is a pair (X, f ) where X is a compact metric space X and f : X → X is a continuous map. A point x ∈ X is recurrent when its trajectory (f n (x)) ∞ n=0 returns repeatedly to every neighborhood of x. The topological version of the famous Poincaré recurrence theorem states that, with respect to every invariant Borel measure, almost every point is recurrent. So if we look at the trajectory of a typical point x, we see infinitely many indices n such that f n (x) is close to x. Moreover, continuity of f implies that we see infinitely many pairs of indices i = j such that f i (x) is close to f j (x). Such pairs are called recurrences.
Recurrences can be effectively visualized via recurrence plots, introduced by Eckmann, Kamphorst, and Ruelle in [6]. In its basic form, a recurrence plot is a black-and-white square image with black pixels representing recurrences. Quantitative study of patterns occurring in recurrence plots is the subject of recurrence quantification analysis initiated by Zbilut and Webber [31]; for surveys see [16,30].
In connection with correlation dimension [9,10] and correlation entropy [25] introduced in the beginning of 80's, the so-called correlation sums were studied. Recall that the correlation sum C ̺ (x, n, ε) of (the beginning of) the trajectory of a point x is C ̺ (x, n, ε) = 1 n 2 card (i, j) : 0 ≤ i, j < n, ̺(f i (x), f j (x)) ≤ ε , (1.1) where ̺ is the metric of X, n ∈ N, and ε > 0. It is the relative frequency of recurrences seen in the initial segment of the trajectory of x, with closeness defined by the metric ̺ and the distance threshold ε (with pairs (i, i) counted as recurrences). Correlation sums appear naturally in different contexts. They are used in the estimation of correlation dimension and correlation entropy. In the recurrence quantification analysis, several of the basic quantitative characteristics can be expressed in terms of correlation sums [11]. Also note that, by removing the diagonal pairs (i, i), correlation sum becomes a U -statistic [5,1]. One of the fundamental results states that, with respect to any f -ergodic measure µ, correlation sums of µ-almost every point x converges to the correlation integral where B ̺ (x, ε) denotes the closed ball with the center x and radius ε. This result, proved (by different methods and under different conditions) in [20,21,1,23,15], justifies the use of correlation sums in estimating the correlation dimension, as suggested by [9,10]. The correlation entropy, introduced by Takens [25], is a quantitative characteristic based on correlation sums / integrals. To define it, in (1.1) and (1.2) replace the metric ̺ by Bowen's one The obtained quantities are the correlation sum C f m (x, n, ε) and the correlation integral c f m (µ, ε) corresponding to the trajectory of x embedded to X m . The upper and lower correlation entropies of an f -invariant measure µ [2, p. 361] quantify exponential decay rate of correlation integrals as m grows to infinitȳ h cor (f, µ) = lim  Correlation entropy is a member of a 1-parameter family of entropies [26,27]. The definition above which is recently used in the literature, differs from the original one [25] by using correlation integrals instead of correlation sums. Consequently, it depends on an invariant measure µ instead of a point x. To distinguish the original definition from the recently used one, the correlation entropy of f at a point x will be called local. So, following [25], the upper and lower local correlation entropies of f at x are defined bȳ (Note that, in [25], the author considered the lower entropy only.) Of course, due to the convergence of correlation sums to the correlation integral, these local correlation entropies are often equal to the correlation entropy of a measure µ. Nevertheless, we believe that these local correlation entropies deserve to be studied, for what we have several reasons. First, the ergodic results hold (usually) only for almost every point, but, from the topological point of view, local correlation entropy at every point should be considered. Second, since local correlation entropy depends solely on the trajectory of a selected point, it is computationally more tractable than correlation entropy of a measure. In fact, when estimating correlation entropy of an invariant measure µ, correlation sums are often used and thus the local correlation entropy is being estimated; see e.g. [2, §7.7]. Finally, study of local correlation entropies can yield new results, which have not yet been obtained for correlation entropy of a measure. Let us now briefly outline the main results of this paper. We start with summarizing basic properties of the local correlation entropy. One of them is the relationship between local correlation entropies of f and those of its iterates f k . Since we were not able to find a corresponding result in the literature, we have included a proof of it in this paper. The proof is based on a combinatorial lemma (see §3.2), which gives a relationship between correlation sum of f at a point x and correlation sums of f k at points f h (x) (0 ≤ h < k), see Lemma 18.
Theorem A. Let (X, f ) be a dynamical system. Then, for every k ∈ N and x ∈ X, The basic motivation of the paper comes from studying the relationship between the local correlation entropies and the topological entropy of the system (X, f ). Already Takens [25] proved that the lower local correlation entropy is bounded from above by the topological entropy of f restricted to the orbit closure of x. In Proposition 21 we prove that this is true also for the upper local correlation entropy, which yields that We will show that, for dynamical systems on topological graphs, the above inequalities are in fact equalities. Recall that a topological graph is a continuum which can be written as the union of finitely many arcs any two of which are either disjoint or intersect only in one or both of their end points.
Theorem B. Let X be a topological graph and f : X → X be a continuous map.

Moreover, for every
The conclusion of Theorem B clearly also holds for any (uncountable) system with zero topological entropy, and for any full shift (see Corollary 7). However, for general dynamical systems the supremum of local correlation entropies can be strictly smaller than the topological entropy. We prove this by constructing a strictly ergodic subshift with positive entropy and with all local correlation entropies equal to zero; our construction is a modification of Grillenberger's one [12].
Theorem C. There is a subshift (X, σ) such that (a) (X, σ) is strictly ergodic; (b) (X, σ) has positive topological entropy; (c) the local correlation entropy h cor (σ, y) at every y ∈ X is zero; (d) the correlation entropy h cor (σ, µ) of the unique invariant measure µ is zero.
For some other results which are worth mentioning and are not covered by Theorems A-C, see Corollary 13 and Propositions 3 and 23.
The paper is organized as follows. In §2 we recall definitions and known facts which will be required later. In § §3 and 4 we prove Theorems A and B. A technical lemma concerning strictly ergodic subshifts is given in §5. Finally, in §6 we prove Theorem C.

Preliminaries
We write N (N 0 ) for the set of positive (nonnegative) integers. If no confusion can arise, segments of integers {n, n + 1, . . . , m − 1} (n < m) will be denoted by [n, m). For x ∈ R, ⌈x⌉ and ⌊x⌋ denotes the ceiling and the floor of x, that is, the smallest integer greater than or equal to x, and the largest integer smaller than or equal to x. The cardinality of a set A is denoted by |A| or by card A. By log we mean the natural logarithm.
Let X = (X, ̺) be a metric space and A be a subset of it. The diameter of a subset A of X is denoted by diam ̺ (A). By B ̺ (x, ε) we mean the closed ball with the center x and radius ε, and by B ̺ (A, ε) we mean the union of all The smallest cardinality of an ε-spanning subset of X is denoted by r ̺ (ε, X), and the largest cardinality of an ε-separated subset of X is denoted by s ̺ (ε, X). If X is compact, both r ̺ (ε, X) and s ̺ (ε, X) are always finite, and we can define the upper and lower box dimensions of X by [8, §2.1] A measure-theoretical dynamical system is a quadruple (X, F , µ, f ), where X is a nonempty set, F is a σ-algebra of subsets of X, µ is a probability measure on (X, F ), and f : X → X is an F -measurable map preserving µ (that is, µ f −1 (A) = µ(A) for every A ∈ F ). The system (X, F , µ, f ) is called ergodic if µ(A) ∈ {0, 1} for every A ∈ F such that f −1 (A) = A.
A (topological) dynamical system is a pair (X, f ) where X = (X, ̺) is a compact metric space and f : X → X is a continuous map.
Every point of a minimal system (X, f ) is almost periodic: for every neighborhood U of x the return time set N (x, U ) is syndetic (that is, it has bounded gaps).
An f -invariant measure of (X, f ) is any Borel probability measure µ such that (X, B, µ, f ), with B denoting the Borel σ-algebra on X, is a measure-theoretical dynamical system. If (X, B, µ, f ) is ergodic we say that µ is f -ergodic. A system (X, f ) is called uniquely ergodic if it has unique invariant measure; if it is also minimal it is called strictly ergodic.
Let (X, f ) be a (topological) dynamical system and ̺ be the metric of X. For m ∈ N define (equivalent) Bowen's metric ̺ f m on X as in Introduction. We write , and s ̺ f m (ε, X). A subset A of X is called (m, ε)-spanning or (m, ε)-separated if it is ε-spanning or ε-separated with respect to ̺ f m . By Bowen's definition of the topological entropy,

2.1.
Local correlation entropy. Let X = (X, ̺) be a compact metric space with a metric ̺, and f : X → X be a continuous map. For m ∈ N, x ∈ X, ε > 0, and n ∈ N define the correlation sum C f m (x, n, ε) by Recall the definition (1.5) of the upper and lower local correlation entropiesh cor (f, x) andh cor (f, x) of f at x. Ifh cor (f, x) =h cor (f, x) then we say that the local correlation entropy h cor (f, x) of f at x exists and we put h cor (f, If µ is an f -invariant probability, the upper and lower (measure-theoretic) correlation entropies (of order 2) of f with respect to µ are defined by (1.4), see e.g. [2, p. 361]. Notice that in this paper we deal solely with correlation entropies of order q = 2; for the definition and properties of (measure-theoretic) correlation entropies of arbitrary order q see e.g. [26,28,2]. In the following we summarize some of the known results which will be used later. The first one was in fact proved in [25, p. 355], see also [28,Lemma 2.14].
The following result was first proved by Pesin [20], see also [21,1,23,15]. (There, the space X can be any complete separable metric space.) Proposition 2 ( [20]). Let (X, f ) be a dynamical system. Then, for every f -ergodic measure µ,c f m (x, ε) =c f m (x, ε) = c f m (µ, ε) for µ-a.e. x ∈ X and every ε > 0 which is a continuity point of c f m (µ, ·). As a consequence of Proposition 2 we obtain that, for ergodic µ, for µ-a.e. x ∈ X. For uniquely ergodic systems one can strengthen the previous theorem and obtain convergence of correlation sums to correlation integral for every point.

Proposition 3.
Let (X, f ) be a uniquely ergodic dynamical system and µ be the unique f -invariant measure. Then for every x ∈ X and every ε > 0 which is a continuity point of c f m (µ, ·). Proof. For any y ∈ X, the Dirac measure at y is denoted by δ y . Fix x ∈ X, m ∈ N, and ε > 0. Unique ergodicity of (X, f ) implies that measures µ n = (1/n)  [9,10] is another widely used characteristic based on the correlation integral. Recall that upper and lower correlation dimensions (of order 2) of a measure µ are defined bȳ One can analogously define upper and lower local correlation dimensionsd cor (f, x)

Shifts and subshifts. Let p ≥ 2 be an integer and
for x = y, and ̺(x, y) = 0 for x = y; thus ̺(x, y) ≤ 1 2 if and only if x 0 = y 0 . Then (Σ p , ̺) is a compact metric space homeomorphic to the Cantor ternary set. The shift σ : Σ p → Σ p is defined by where y i = x i+1 for every i.
The dynamical system (Σ p , σ) is called the (one-sided) full shift on p symbols. If X ⊆ Σ p is a nonempty closed σ-invariant set then the restriction σ| X : X → X is called a subshift ; since no confusion can arise, the restriction σ| X will be denoted by σ.
The members of A * p = k≥0 A k p are called words. Let k ≥ 0 and w = w 0 . . . w k−1 ∈ A k p . Then we say that w is a k-word and that the length of it is |w| = k. The cylinder For a σ-invariant measure µ put The next two lemmas (for the second one see e.g. [26, p. 774 Lemma 4. Let (X, σ) be a subshift and ε ∈ (0, 1]. Let k ≥ 0 be an integer such that ε ∈ [2 −k , 2 −(k−1) ). Then, for every x ∈ X and m, n ∈ N, and soh If π = (π 0 , . . . , π p−1 ) is a probability vector (that is, π i ≥ 0 and i π i = 1), then the (σ-invariant Borel probability) measure µ on (Σ p , B(Σ p )) such that µ([w]) = i<k π wi for every k ≥ 1 and w ∈ A k p , is called the Bernoulli measure generated by π. An easy consequence of Lemma 5 is the following result, see [26, p. 773 Lemma 6. Let (Σ p , σ) be the full shift, π = (π 0 , . . . , π p−1 ) be a probability vector, and µ be the Bernoulli measure generated by π. Then Proof. Since h ∈ [0, log p], there is a probability vector π = (π 0 , . . . , π p−1 ) such that Let µ be the Bernoulli measure generated by π; note that µ is σergodic. By (2.1) and Lemma 6, there is a Borel subset Y h of Σ p such that µ(Y h ) = 1 and h cor (σ, x) = h for every x ∈ Y h . Since µ is non-atomic, Y h is uncountable and hence it contains a Cantor set (see e.g. [24, Theorem 3.2.7]).

Proof of Theorem A
Lemma 8. Let X be a compact metric space and ε > 0. Put η = r(ε/2, X) −1 . Then for every continuous map f : X → X, x ∈ X, and m, n ∈ N, Proof. Put p = r(ε/2, X), η = 1/p, and take a finite subset {y 0 , . . . , y p−1 } of X which (ε/2)-spans X. Fix arbitrary continuous f : X → X, x ∈ X, and m, n ∈ N; Since w n w = n, the arithmetic-quadratic mean inequality yields The easy proof of the following lemma is skipped.
. The next lemma states that in the limits from (1.5) and (1.6) one can use any sublacunary sequences (n j ) j≥1 and (m j ) j≥1 of integers.
Proof. If n j ≤ n < n j+1 then Since correlation sums are bounded, |C f m (x, n, ε) − C f m (x, n j , ε)| is arbitrarily small for j large enough. Now the first part of the lemma follows. For whenever m j ≤ m < m j+1 . Using this and the fact that a m /m ≤ r(ε/2, X) for every m by Lemma 8, we easily obtain that This proves the second part of the lemma.

3.1.
Local correlation entropy of f k : The lower bound.
Lemma 11. Let (X, f ) be a dynamical system, m, h ∈ N, x ∈ X, and ε > 0. Then Proof. For every n ∈ N we easily have from which the lemma immediately follows.
Let (X, f ) be a dynamical system and k, h ∈ N. Then for every ε > 0 there are 0 < γ < δ < ε such that for every x ∈ X and m ∈ N.
Proof. Applying Lemma 11 to f k allows us to assume that h < k.
An analogous application of uniform continuity of f h gives that there is γ ∈ (0, δ) such that for every n. Corollary 13. Let (X, f ) be a dynamical system, k, h ∈ N, and x ∈ X. Then

Lemma 14.
Let (X, f ) be a dynamical system and k ∈ N. Then for every ε > 0 there is δ ∈ (0, ε) such that for every y, z ∈ X with ̺ f k m (y, z) ≤ δ. This gives, for every x ∈ X and m, n ∈ N, n, ε). Now the lemma immediately follows.

3.2.
Local correlation entropy of f k : A combinatorial lemma. Fix a finite set V consisting of n points, and a partition V = (V 0 , V 1 , . . . , V k−1 ) of it into k ≥ 2 nonempty subsets. Consider an undirected simple (not necessarily connected) graph G with the set of vertices V . The number of edges of G is denoted by m(G). For 0 ≤ a, b < k, an edge {i, j} of G is called an ab-edge if i ∈ V a and j ∈ V b , or vice versa. We say that a graph G is V-admissible if the following hold: The number of all ab-edges of G is denoted by m ab (G). Put Our aim is to find an upper bound for κ(G) depending only on n and k. To this end, we say that a V-admissible graph G is V-optimal if κ(G ′ ) ≤ κ(G) for every V-admissible graph G ′ . Further, if G is V-optimal and the number of edges of every V-optimal graph G ′ is greater than or equal to that of G, we say that G is a minimal V-optimal graph. The following lemma gives a characterization of minimal V-optimal graphs. Proof. We start by proving that for (every V , V, and) every V-admissible graph G; clearly, it suffices to prove (3.7) for minimal V-optimal graphs G. Assume first that k = 2, i.e., V = {V 0 , V 1 }. Fix a minimal V-optimal graph G and take any a = b from {0, 1} (i.e., a = 0 and b = 1, or vice versa). For i ∈ V a define Assume that A ib = ∅. Take the (V-admissible) graphG created from G by removing all ab-edges {i, j} (with j ∈ A ib ) as well as all aa-edges {i, i ′ } (with i ′ ∈ B ib \ {i}).
Thus, in both cases, Assume again that A ib = ∅. Take any j ∈ A ib and define A ja , B ja analogously. Then B ja ⊇ A ib and B ib ⊇ A ja . Inequality (3.8), applied also to j and a, yields V-admissibility of G now gives that A ib ∪ B ib is a clique of G (that is, the induced subgraph is complete). Since G is minimal, this easily implies that A ib is a singleton. (3.11) Realize that κ(G) can be written in the form This, together with (3.11) applied to every a = b, yields for every V-admissible graph G. Thus, the proof of (3.7) is finished. Now take any graph H with the set of vertices V which satisfies (a) and (b); such a graph obviously exists. By (b), H is V-admissible (indeed, the condition (3.5) is trivially satisfied). By (a), κ(H) = a<b min{|V a |, |V b |}. Thus, by (3.7), H is V-optimal. For every graph G (with the set of vertices V ) having smaller number of edges than H we have κ(G) ≤ m(G) < m(H) = κ(H), hence G is not V-optimal. So H is a minimal V-optimal graph.
On the other hand, let G be any minimal V-optimal graph. By the previous part of the proof, κ(G) = a<b min{|V a |, |V b |}. This, together with (3.12) and Proof. We first prove that for every x ∈ K = (x 0 , . . . , x k−1 ) ∈ R k : h x h = 1, x 0 ≥ · · · ≥ x k−1 ≥ 0 . To this end, define a map f : , a contradiction. Thusx h = 1/k for every h and (3.14) follows Now we can prove Lemma 17. Put n h = |V h | for h = 0, . . . , k − 1; we may assume that n 0 ≥ n 1 ≥ · · · ≥ n k−1 . Let G be a V-admissible graph. By Lemma 16 and (3.14) with x h = n h /n,

3.3.
Local correlation entropy of f k : The upper bound.
Lemma 18. Let (X, f ) be a dynamical system, k ≥ 2, ε > 0, x ∈ X, and m, n ∈ N. Then Proof. Putn = kn, V = {0, 1, . . . ,n − 1} and, for 0 ≤ a < k, V a = {i ∈ V : i ≡ a (mod k)}. Let G be an undirected simple graph with the set of vertices V and such that, for any i = j from V , {i, j} is an edge of G if and only if Notice that the number m(G) of edges of G satisfies Since ̺ f k m ≤ ̺ f km , for every 0 ≤ a < k the definition of G gives , n, 2ε) ≥ 2m aa (G) + n. This together with (3.15) and (3.16) yield Now a simple computation gives the desired inequality.

Lemma 19.
Let (X, f ) be a dynamical system and 0 ≤ h < k be integers. Then for every ε > 0 there is η(ε) > 0 such that for every x ∈ X and m, n ∈ N.
Proof of Theorem A. We may assume that k ≥ 2. Lemma 19, applied to every h ∈ {0, . . . , k − 1}, gives that for every ε > 0 there is η(ε) > 0 such that lim ε→0 η(ε) = 0 and n for every 0 ≤ h < k, x ∈ X, and m, n ∈ N. Now, by Lemma 18, By taking the limit as n approaches infinity, and using Lemma 10 we obtain ). Consequently, again using Lemma 10, Since the opposite inequalities were shown in Corollary 15, Theorem A is proved.

Proof of Theorem B
Lemma 20. Let (X, f ) be a dynamical system, x ∈ X, ε > 0, and m, n ∈ N. Then Proof. The proof is pretty similar to that of Lemma 8; the only difference is that instead of (ε/2)-spanning sets we use (m, ε/2)-spanning sets. For completeness, the details follow. Let {y 0 , . . . , y p−1 } be an (m, ε/2)-spanning subset of minimal cardinality p = r m (ε/2, X). Hence for every i ≥ 0 and Then, by the arithmetic-quadratic mean inequality, .
Let (X, f ) be a dynamical system and x ∈ X. Then The part corresponding to the lower local correlation entropy was proved in [25, p. 354]. The proof used the fact that if x is a quasi-generic point [4, (4.4)] of an invariant measure µ, then [25, p. 355] h cor (f, x) ≤h cor (f, µ) ≤ h µ (f ).

Proof. By Lemma 20 and Bowen's definition of topological entropy,
Applying this to X ′ = Orb f (x) and f ′ = f | X ′ yields the required inequality.
Now we embark on the proof of the fact that, for dynamical systems on topological graphs, local correlation entropies can be arbitrarily close to the topological entropy.
Recall that subsets X 0 , . . . , X p−1 of X form a strict p-horseshoe of a dynamical system (X, f ) if the sets X i are nonempty, closed, pairwise disjoint, and f (X i ) ⊇ j X j for every 0 ≤ i < p.

Lemma 24.
Let (X, f ) be a dynamical system containing a strict p-horseshoe X 0 , . . . , X p−1 for some p ≥ 2. Then (X, f ) has a subsystem (Y, f ) which is a topological extension of the full shift (Σ p , σ).
Proof. This is standard. Since the sets X 0 , . . . , X p−1 form a strict p-horseshoe, in a usual way for every k ≥ 2 we can construct disjoint nonempty compact subsets X a (a ∈ A k p ) such that f (X a0a1...a k−1 ) = X a1a2...a k−1 and is a topological extension of the full shift (Σ p , σ).

Now we are ready to prove Theorem B.
Proof of Theorem B. By Proposition 21 it suffices to prove the second part of the theorem. We may assume that h top (f ) > 0. Take arbitrary 0 < h < h top (f ). By [14] there are integers p, k with (1/k) log p ≥ h such that f k has a strict phorseshoe. By Corollary 7, Lemma 24, and Proposition 23, there is a Cantor set X h such thath cor (f k , x) ≥ log p for every x ∈ X h . Hence, by Theorem A,h cor (f, This follows from Proposition 21 and from the fact that positive entropy maps of topological graphs have (dense) periodic points.
The following two examples show that it can happen that the local correlation entropy at every point is strictly smaller than the topological entropy of f and that, in positive entropy systems on topological graphs, the set of those x with positive local correlation entropy can be negligible from the measure-theoretic point of view.
Example 26. Take λ ∈ (0, ∞]. For n ∈ N let I n = [1/(n+1), 1/n] and let f n : I n → I n be such that it fixes the end points of I n , h top (f n ) < λ and sup n h top (f n ) = λ.
Define a map f : I → I by Then f is continuous and h top (f ) = λ (see e.g. [19,Theorem 11.2]). On the other hand, for every x we haveh cor (f, x) < λ. In fact, if x = 0 then h cor (f, x) = 0 since x is fixed, and if x ∈ I n thenh cor (f, x) ≤ h top (f n ) < λ by Proposition 21.

Uniquely ergodic systems
In this section we summarize facts on uniquely ergodic systems, which will be used in Section 6. Following [12], we say that a set A ⊆ N 0 is uniform Cesàro with density α ≥ 0 if for every ε > 0 there is n 0 ∈ N such that, for every n ≥ n 0 and In such a case the density α of A will be denoted by d(A). It is easy to check that A ⊆ N 0 is uniform Cesàro with density α if and only if there is l ∈ N such that for every ε > 0 there is n 0 ∈ N with 1 ln · |A ∩ [lj, lj + ln)| − α < ε for every n ≥ n 0 and j ∈ N 0 .
Let p ≥ 2. For words u, v ∈ A * p with |u| ≤ |v| and an integer l ≥ 1 put v (u) we should divide by 1 + ⌊(|v| − |u|)/l⌋; the difference is, of course, asymptotically negligible.) If u ∈ A * p and x ∈ Σ p , define N (l) x (u) analogously. For abbreviation, we often write N x , τ x and N v , τ v instead of N (1) x , τ (1) x and N is uniform Cesàro if and only if for every u ∈ A * p the limit lim n τ (l) x[jl,(j+n)l) (u) exists uniformly in j and does not depend on j; in such a case we have x[jl,(j+n)l) (u) for every j. . Let x ∈ Σ p be almost periodic. Assume that N x (u) is uniform Cesàro for every u ∈ A * p . Then the subshift (Orb σ (x), σ) is strictly ergodic. Moreover, for every u ∈ A * p and j ∈ N 0 , where µ is the unique invariant measure of (Orb σ (x), σ).
The following lemma gives a condition on x implying strict ergodicity of its orbit closure.
Lemma 29. Let x ∈ Σ p be almost periodic and let (l j ) j≥1 be an increasing sequence of positive integers with every l j+1 being a multiple of l j . Assume that, for every j ≥ 1 and every l j -word v, the set Proof. The proof is inspired by that of [12,Lemma 1.9]. Fix any nonempty word u ∈ A * p ; we want to prove that N x (u) is uniform Cesàro. Take j such that l = l j > |u|. Further, take arbitrary integers 1 ≤ r < t and 0 ≤ s; for abbreviation, write N x[sl,(s+t)l) and τ (·) x[sl,(s+t)l) . We first prove that To this end, for i ∈ sl + N (1) st (u) and h ∈ B i . But every such pair (i, h) corresponds (in a one-to-one way) to a triple (v, Further, by (5.5), 0 ≤ |B i | ≤ r for every i and, provided (s + r)l ≤ i ≤ (s + t − r)l, for v ∈ A rl p , dividing (5.7) by trl and using (5.6) gives (5.4). Now take any ε > 0. Let j ′ ≥ j be such that l j ′ /l > 1/ε; put r = l j ′ /l and ε ′ = ε/ A rl p . By the assumption, for every word v ∈ A rl p the set N (rl) x (v) . Thus, by (5.1), we can find j ′′ > j ′ such that |τ (l) st (v) − d v | < ε ′ for every t ≥ l j ′′ /l and every v ∈ A rl p . We may assume that j ′′ is so large that (2r/t) < ε. Put d = v∈A rl and (5.4) gives This is true for every sufficiently large t and so, by (5.2), the set N (1) Since u was arbitrary, Lemma 28 yields strict ergodicity of the subshift (Orb σ (x), σ).

Proof of Theorem C
In this section we show that Theorem B cannot be generalized to arbitrary dynamical system. We construct a strictly ergodic system for which local correlation entropy of every point is zero, but the topological entropy is positive. The construction is a modification of that from [12, pp. 327-329].
Fix an integer p ≥ 3 and take the alphabet A = A p = {0, . . . , p − 1}. Recall that A * = m≥0 A m denotes the set of all words over A. If w, v are words, their concatenation is denoted by wv. Further, for a word w and a positive integer n, the concatenation ww . . . w (n-times) is denoted by w n .
Let M = {w 1 < w 2 < · · · < w n } be an ordered set of words over A (the order of M need not be lexicographical) such that the lengths |w i | are the same; denote their common value by l(M ). For r ≥ 0 let M (r) be the ordered set where w (r) j = w r 1 π n j (w 1 , w 2 , . . . , w n ). (6.1) Note that the words w Let x ∈ Σ p = A N0 be the unique sequence such that x[0, l j ) =w j ; (6.6) such x exists sincew j+1 starts with (r j + 1 copies of)w j ; x is unique since l j = |w j | ր ∞ by Lemma 30(c) below. Put X = Orb σ (x). The proof of Theorem C goes as follows. First, in Lemmas 31 and 33 we show that the system (X, σ) is strictly ergodic, which will prove (a) of the theorem. The fact that the topological entropy is positive is given in Lemma 35. Finally, correlation entropies of the system are described in Lemmas 37 and 38. We start by summarizing some of the properties of the constructed sets M j . Lemma 30. The following hold: (a) m j /l j is an even integer provided j ≥ 2, and so r j = m j /l j (that is, ceiling in (6.2) is unnecessary); (b) r j > p provided j ≥ 3; (c) lim j m j = lim j l j = lim j r j = ∞; (d) l j+1 > pl 2 j provided j ≥ 3; (e) j≥4 (1/l j ) < 1/(pl 2 3 − 1) and l 3 = (p + 1)!.
Remark 36. Since the beginning of this section we excluded the case p = 2. Nevertheless, the construction can be carried over also for such p. The obtained subshift will be strictly ergodic (by the same reasoning as in Lemmas 31-33). However, the topological entropy will be zero. In fact, by (6.5), l j = 2 · 3 j−2 , m j = 2, r j = 1 for every j ≥ 2.
Proof. Recall that, by Lemma 28 and the choice of x, for every v ∈ A * . (6.14) We start the proof by showing that µ [w k j ] ≥ r j − k + 1 2m j l j for every j ≥ 1 and 1 ≤ k ≤ r j . (6.15) To this end, fix any j ≥ 1 and 1 ≤ k ≤ r j . By the construction, every word u from M j+1 begins with r j copies ofw j . Hence u il j , (i + k)l j =w k j for every 0 ≤ i ≤ (r j − k). By (6.5) and Lemma 30(a), τ u (w k j ) ≥ r j − k + 1 l j+1 ≥ r j − k + 1 2m j l j for every u ∈ M j+1 . Now take any t > j. Sincew t is a concatenation of words from M j+1 , we have τw t w k j ≥ (r j − k + 1)/(2m j l j ) for every u ∈ M t . By (6.14), this yields (6.15). Recall the definition (2.4) ofμ. Take any n ∈ N, put w = x[0, n), and find j such that l j ≤ n < l j+1 ; we may assume that j ≥ 2. Assume first that n < (r j /2)l j ;