A formula of conditional entropy and some applications

In this paper we establish a formula of conditional entropy 
and give two examples of applications of the formula.

where h µ (T |E) denotes the conditional entropy with respect to E (see Section 2 for details). A question arises naturally whether there is analogous conclusion for arbitrary T -invariant sub-σ-algebra (may include non-invariant set) of B µ . More precise, let A be a T -invariant sub-σ-algebra of B µ (i.e. T −1 A = A( mod µ)) and µ = X µ A x dµ(x) is the disintegration of µ over A, how can we extend the formula (1) to general conditional entropy h µ (T |A)? However, conditional measure µ A x is only Borel probability measure but maybe not T -invariant for µ-a.e. x ∈ X. Therefore we need to introduce the measure-theoretic entropy for general Borel probability measures defined by Feng and Huang in [9].
Given > 0, let B dn (x, ) = {y ∈ X : d n (x, y) < } be the d n -ball about x of radius . We also write B n (x, ) for convenience, when there is no confusion. Feng and Huang in [9] introduced the measure-theoretical lower and upper entropies of Borel probability measures which follows the idea of Brin and Katok [4]. Brin and Katok [4] proved that for any T -invariant Borel probability measure µ, h µ (T, x) = h µ (T, x) for µ-a.e. x ∈ X, and h µ (T ) = h µ (T ) = h µ (T ).
The main result of this paper is as follows: Theorem 1.2. Let (X, T ) be a TDS, µ ∈ M(X, T ) and B µ be the completion of B X under µ. If A is a T -invariant sub-σ-algebra of B µ and is the measure disintegration of µ over A, then h µ A In particular, if µ is ergodic then As applications of Theorem 1.2, we give below two examples. Let (X, T ) and (Y, S) be two TDSs. Suppose that (Y, S) is a factor of (X, T ) in the sense that there exists a continuous surjective map π : (X, T ) → (Y, S) such that π • T = S • π. The map π is called a factor map from X to Y . We always write h top (T |π) for the conditional topological entropy relative to π (see Section 2 for details). Corollary 1. Let π : (X, T ) → (Y, S) be a factor map between two TDSs where S is homeomorphism. The following hold We remark that it is shown in [16, Thereorm 1.1] that two semi-conjugate random dynamical systems (RDSs) on Polish spaces have the same entropy if the cardinal number of the pre-image of a point under the semi-conjugacy is finite almost everywhere. Here Corollary 1.3 gives an extension of this result to the case of countable fibers when X is compact.
To state the next corollary we need some notions. Let (X, T ) be a TDS with a homeomorphism T , and µ ∈ M(X, T ). The Pinsker σ-algebra P µ (T ) is defined as the smallest sub-σ-algebra of B µ containing {B ∈ B X : h µ (T, {B, X \ B}) = 0}. It is easy to see that the Pinsker σ-algebra P µ (T ) is a T -invariant σ-algebra of B µ . The stable set of a point x ∈ X is defined as (see [14]) and the unstable set of x is defined as The Bowen topological entropy h B top (T, Z) for any set Z in (X, T ) was introduced by Bowen in [3] (see Section 4 for details).
Corollary 2. Let (X, T ) be a TDS with a homeomorphism T from X onto itself. Then 1. if µ ∈ M e (X, T ) and µ = X µ P x dµ(x) is the measure decomposition of µ over the Pinsker σ-algebra P µ (T ), then ). We remark that it is shown in [8, Thereorm 1.2] that if (X, T ) is a finite entropy TDS with a homeomorphism T from X onto itself, and µ is an invariant ergodic measure of positive entropy h µ (T ) > 0, then for µ-a.e. x ∈ X we have the following lower bound on the Bowen dimension entropy of the closure of the stable and unstable sets of The part (2) of Corollory 1.4 gives a little improvement because it does not require that the system have a finite entropy. This paper is organized as follows. In Section 2 we give the definitions and some basic properties of the measure-theoretic conditional entropy and measure decomposition, and also introduce a conditional version of Shannon-McMillan-Breiman theorem. In Section 3, we prove Theorem 1.2. In Section 4, we give proofs of Corollary 1 and Corollary 2.

2.
Preliminaries. Let (X, T ) be a TDS. A partition of X is a family of subsets of X with union X and all elements of the family are disjoint; Denote by P X the collection of all finite Borel partitions of X. For any α, β ∈ P X , α is said to be finer than β(write β α) if each atom of α is contained in some atom of β. Given a partition α of X and x ∈ X, denote by α(x) the atom of α containing x. If {α i } i∈I is a countable family of finite Borel partition of X, the partition α = ∨ i∈I α i is called a measurable partition. For a measurable partition α, put α n−1 It is known that M(X) and M(X, T ) (defined before) are convex, compact metric spaces when endowed with the weak * -topology.
Let µ ∈ M(X, T ). Given a T -invariant sub-σ-algebra A of B µ and α ∈ P X , the conditional informational function of α with respect to A is defined by Moreover, define the conditional entropy with respect to A by If {α i } i≥1 is a family of finite Borel partition with α 1 α 2 α 3 · · · and diam(α i ) → 0 as i → ∞, we can compute the conditional entropy by A cover of X is a finite family of subsets of X whose union is X. Let C X denote the collection of all finite open covers of X. For U, V ∈ C X , we say that U is finer than V (write V U) if each elements of U is contained in some element of V. Given a factor map π : (X, T ) → (Y, S) between two TDSs and a cover U ∈ C X . For any E ⊂ X, denote by N (U, E) the minimal cardinality of any subcover of U that covers E. Let N (U|π) = sup y∈Y N (U, π −1 (y)). Clearly, N (U|π) ≤ N (U, X). It is not hard to see that a n = log N ( is a subadditive sequence, so we can define the conditional topological entropy of U relative to π by Similarly, we can define the conditional topological entropy relative to π by There is a well-known result which characterizes the relation between the measuretheoretic conditional entropy and conditional topological entropy relative to a factor map (see [6,15]), i.e.
The support of µ ∈ M(X, T ) is defined to be the set of all points x in X for which every open neighborhood U of x has positive measure, that is In the following we give the definitions and some properties of measure disintegration and conditional measures (see [7,Section 5]).
Let (X, T ) be a TDS, µ ∈ M(X, T ) and B µ be the completion of B X under the measure µ. Then (X, B µ , µ, T ) is a Lebesgue system. The sets A ∈ B µ , which are the unions of atoms of a measurable partition α, form a sub-σ-algebra of B µ denoted byα or α if there is no ambiguity. Every sub-σ-algebra of B µ coincides with a σ-algebra constructed in this way (mod µ). Let A be a sub-σ-algebra of B µ and α be a measurable partition of X withα = A (mod µ). Then µ can disintegrated over A as where µ A x is a Borel probability measure on X and µ A x (α(x)) = 1 for µ-almost every x ∈ X. This disintegration is characterized by (2) and (3) below: 1 x for µ-a.e. x ∈ X. Then, for any f ∈ L 1 (X, B X , µ), the following holds: Hence given any f ∈ L 1 (X, B X , µ), for µ-almost every x ∈ X, one has To prepare the proof of the main result in the next section we also need the following result which is a conditional version of Shannon-McMillan-Breiman theorem. Its proof is completely similar to the proof of Shannon-McMillan-Breiman theorem (see for example [2, Theorem 4.2], [10] or [18]). for µ-a.e. x ∈ X and in L 1 (µ). Moreover, if µ is ergodic then h µ (α|A, x) = h µ (T, α|A) for µ-a.e. x ∈ X.
3. Proof of main theorem. In this section, we prove Theorem 1.2. The following lemma plays a key role in our proof.
is the measure disintegration of µ over A and h µ (α|A, x) is the function obtained in Theorem 2.1.
Proof. Note that lim n→∞ (I µ (α n−1 0 |A)(x)/n) = h µ (α|A, x) for µ-a.e. x ∈ X and the function h µ (α|A, x) is A-measurable. Using (3) for the characteristic function 1 B , B ∈ α n−1 0 and (4) for h µ (α|A, x), there exists X 1 ∈ B X with µ(X 1 ) = 1 such that for each x ∈ X 1 , one can find W x ∈ B X with µ A x (W x ) = 1, and for y ∈ W x we have Moreover, for any x ∈ X 1 and y ∈ W x one has This ends the proof of the lemma.
We are going to prove the Theorem 1.2.
Proof of Theorem 1.2. Let {α i } ∞ i=1 be a family of finite Borel partition of X with α 1 α 2 α 3 · · · , diam(α i ) → 0 as i → ∞ and µ(∂α i ) = 0 for i ∈ N. Actually by the Monotone Convergence Theorem it is sufficient to show that the following equation holds is the measure disintegration of µ over A and h µ (α|A, x) is the function obtained in Theorem 2.1.
First we show that h µ A x (T ) ≤ sup i≥1 h µ (α i |A, x) for µ-a.e. x ∈ X. For any > 0, since diam(α i ) → 0, there exists N = N ( ) ∈ N such that diam(α i ) ≤ when i ≥ N . So, for every x, y ∈ X, we have (α i ) n−1 0 (y) ⊂ B n (y, ) for n ≥ 1 and Thus Therefore by Lemma 3.1 and (6), for µ-a.e.x ∈ X Next we are going to show that for µ-a.e. x ∈ X It is sufficient to show the following property (#): for any finite Borel partition α of X satisfying µ(∂α) = 0 and > 0, we can find a δ > 0, a measurable subset I of X satisfying µ(I) > 1 − 1 4 such that forx ∈ I, there exist a measurable subset D of X such that ) and M is the number of elements in α.
Let α be a finite Borel partition of X with µ(∂α) = 0 and > 0. Let M be the number of elements in α. If M = 1 then it is clear that property(#) holds for α and . Now assume that M ≥ 2. Without of loss generality, we require additionally < ( M −1 2M ) 2 . We divide the remaining proof into the following three steps.
Step 1. We are going to find such δ and I satisfying the property (#) for α and .
For n ∈ N we define then we are going to show the measure of A n is large enough for n ≥ L for some L.
By the Birkhoff ergodic theorem, the averages 1 n converge almost everywhere and in L 1 µ as n → ∞ to a T -invariant function f * and f * (y)dµ(y) = µ(U δ (α)) < , is the characteristic function of the set U δ (α). Thus by Egoroff Theorem we can find a large natural number L such that By Chebyshev's Inequality we have For n ≥ L, Therefore we obtain that We have µ(Q n ) > 1 − 1 4 , for n ≥ L. Obviously, the sets A n are nested, i.e. A n−1 ⊆ A n for n ≥ 1. Thus there exists l 0 > L such that for x ∈ Q l0 , for any l ≥ l 0 . By Lemma 3.1 there exists X 1 ⊆ X with µ(X 1 ) = 1 such that for x ∈ X 1 , there for each y ∈ W x . Let I = X 1 ∩ Q l0 . Obviously µ(I) > 1 − Step 2. For y ∈ X and n ∈ N, we call the collection C(n, y) = (α(y), α(T (y)), · · · , α(T n−1 (y))) the (α, n)-name of y. Since each point in one atom V of α n−1 0 has the same (α, n)name, we define C(n, V ) := C(n, x) for any x ∈ V, which is called the (α, n)-name of V . We give a metric d α n between (α, n)-names of y and z as follows: Now if z ∈ B n (y, δ) then for any 0 ≤ i ≤ n − 1 either T i y and T i z belong to the same element of α or T i y ∈ U δ (α). Hence when y ∈ E, n ≥ l and z ∈ B n (y, δ), we have that the d α n distance between the (α, n)-names of y and z is less than 2 √ . Furthermore, Bowen's ball B n (y, δ) is contained in the set of points z where (α, n)-names are 2 √ -close to the (α, n)-name of y. Let B ∈ α n−1 0 and L n (B) be the total number of V ∈ α n−1 0 such that C(n, V ) is 2 √ -close to C(n, B). By Stirling's formula and also a combinatorial argument admits the following estimate: for each n ≥ l 2 for some l 2 ∈ N, where M is the number of elements in α and (see for example [17,Page 144]). More precisely, we had shown that for any y ∈ E, n ≥ max{l, l 2 } and B ∈ α n−1 0 , B n (y, δ) ⊆ {z ∈ X : C(n, z) is 2 √ -close to C(n, y)} = {V ∈ α n−1 0 : C(n, V ) is 2 √ -close to C(n, y)} and #{V ∈ α n−1 0 : C(n, V ) is 2 √ -close to C(n, B)} ≤ exp{( + ∆)n.
Step 3. In the following we want to find suitable subset D of E satisfying the property (8).

XIAOMIN ZHOU
On the other hand by the definition of E n , there exists V ∈ F n with (α, n)-name of V is 2 √ -close to the (α, n)-name of α n−1 0 (y). Summing up, we had shown that where G n is the set all elements B ∈ α n−1 0 satisfying that √ -close to the (α, n)-name of V for some V ∈ F n . Thus by (11) #G n ≤ exp{(∆ + )n} · #F n . Moreover . Given y ∈ D and n ≥ l 3 . Since y ∈ E \ E n , it is clear that for each V ∈ α n−1 0 whose (α, n)-name is 2 √ -close to the (α, n)-name of y, one has µ Â x (V ) ≤ exp{(−h µ (α|A,x) + 2(∆ + ))n}. Moreover using (10) and (11), we have Thus for any y ∈ D and n ≥ l 3 , Summing up, we obtain the property (#). This finishes the proof of the theorem.

4.
Applications of Theorem 1.2. In this section, we prove Corollary 1 and 2 using the formula of conditional entropy given by Theorem 1.2. To prepare the proofs we state some notions and useful lemmas. Let (X, T ) be a TDS. Recall that d n and B n (x, ) are defined in Introduction. The Bowen topological entropy of subsets was first introduced by Bowen [3] in a way resembling Hausdorff dimension and can be defined as follows. For Z ⊆ X, s ≥ 0, N ∈ N and > 0, define where the infimum is taken over all finite or countable families {B ni (x i , ε)} such that x i ∈ X, n i ≥ N and i B ni (x i , ε) ⊇ Z. The quantity M s N,ε (Z) does not decrease as N increases and ε decreases, hence the following limits exists: The following results are elementary (see for example [3, Propositions 1 and 2]).
3. In particular, h B top (T, X) = h top (T ). Feng and Huang [9] establish the variational principles for Bowen topological entropy of subsets. We restate their result as follows.
Proof of Corollary 2. Let µ ∈ M e (X, T ) and B µ be the completion of B X under µ.
If µ = X µ P x dµ(x) is the disintegration of µ over P µ (T ), applying Theorem 1.2 we have h µ (T |P µ (T )) = h µ P x (T )dµ(x). Suppose that h µ (T ) = 0, then we know for µ-a.e. x ∈ X h µ P x (T ) = 0 ≤ h B top (T, W s (x, T ) ∩ W u (x, T )). Hence in the following we assume that h µ (T ) > 0. By Lemma 4.2, for µ-a.e. x ∈ X supp(µ P x ) ⊆ W s (x, T ) ∩ W u (x, T ). Applying Theorem 4.1 and Proposition 1, we have Hence we obtain T )). Therefore by the variational principle of entropy, This completes the proof.