STOCHASTIC DYNAMICS: MARKOV CHAINS AND RANDOM TRANSFORMATIONS

This article outlines an attempt to lay the groundwork for understanding stochastic dynamical descriptions of biological processes in terms of a discrete-state space, discrete-time random dynamical system (RDS), or random transformation approach. Such mathematics is not new for continuous systems, but the discrete state space formulation significantly reduces the technical requirements for its introduction to a much broader audiences. In particular, we establish some elementary contradistinctions between Markov chain (MC) and RDS descriptions of a stochastic dynamics. It is shown that a given MC is compatible with many possible RDS, and we study in particular the corresponding RDS with maximum metric entropy. Specifically, we show an emergent behavior of an MC with a unique absorbing and aperiodic communicating class, after all the trajectories of the RDS synchronizes. In biological modeling, it is now widely acknowledged that stochastic dynamics is a more complete description of biological reality than deterministic equations; here we further suggest that the RDS description could be a more refined description of stochastic dynamics than a Markov process. Possible applications of discretestate RDS are systems with fluctuating law of motion, or environment, rather than inherent stochastic movements of individuals.


1.
Introduction. Mathematical theories of biological dynamics are usually divided into "deterministic" and "stochastic" models. Hodgkin-Huxley's theory of membrane action potential belongs to the former, the Wright-Fisher model for random mating is one of the latter [33,1,11]. Characterizing the solution of a dynamic model with given initial conditions, either analytically or computationally, is one of the main tasks in mathematical biology.
The modern theory of nonlinear dynamical systems [45,23], however, articulates a qualitative, yet global understanding of dynamics by "looking at the evolution of the whole phase space of initial conditions" [22]. Dynamical systems with discrete or continuous times are usually refered to as iterative maps or flows, respectively [15]. Phase portrait and local linear stability analysis of ordinary differential equations (ODEs) are now routines of analyzing biological dynamics. Important concepts emerge from this type of studies are the notions of attractors, phase portrait, local and global vector field bifurcation, topological equivalence and canonical forms, Lyapunov exponents, among others [33,11]. One is interested in the simultaneous trajectories with different initial conditions. Most stochastic models in biology are based on the theory of stochastic processes. A large class of biological models is based on discrete state, discrete time Markov chain (MC), with either finite or countable states space [1]. The stochastic counterpart of dynamical systems theory is known as random transformation [21], or discrete-time random dynamical system (RDS) [2]. It has a mathematical setup that is rather different from the theory of Markov processes, and the literature is somewhat inaccessible to a nonspecialist. RDS approach has found interesting applications in studies of synchronization, a concept inherited from the theory of non-autonomous ODEs, of neural firing in recent years [26]; it will have wide applications in studying cellular automata [51] within a fluctuating environment as well as many other complex dynamics, such as in finance [46]. With a fixed environment, a cellular automaton is a discrete-state discrete-time deterministic dynamical system which has an unambiguously defined response follows an unambiguous stimulus [49].
The goal of the present paper is to initiate an applied mathematical study of RDS with discrete state space. In particular, we like to establish the relation, as intuitive as possible, between the theory of MC and the theory of random transformation. Many of the materials presented in the present paper were contained in highly abstract mathematical theorems. We believe, however, that reducing the disciplinary barriers and developing a unified applied mathematics of stochastic dynamics in a discrete setting is critical for the wider applications of this new frontier of dynamic modeling in biology.
The paper is organized as follows. In Sec. 2, we establish the connection between the discrete i.i.d RDS and the finite-state Markov chain. For a given finite i.i.d RDS, the transition probability on state space can be defined and it further induces a finite-state Markov chain. In the mean time, for a given MC, it is shown that there exists a representation by means of i.i.d random transformations with some weak conditions [7]. In the finite state space, it is equivalent to decompose the induced MC transition matrix into the convex combination of deterministic transition matrices, with each coefficient being the probability for the corresponding deterministic transformation in the RDS. This RDS representation of an MC, however, is not unique. Different RDS representations of the same MC yield different behaviors, such as synchronization among trajectories of different initial states. From the viewpoint of random dynamical system, we can also study some properties of a given MC, such as invariant measure and reversibility.
In Sec. 3, we introduce the metric entropy of an RDS, and use it to characterize the different RDS representations of an MC. The upper and lower bounds of the metric entropy associated with an MC are analyzed. Actually the metric entropy of an RDS is no less than the Shannon-Khinchin entropy [20] of the induced MC [21]. We discover that when the equality holds, the transformations with positive probability have no "common dynamics". The upper bound of the metric entropy among all the RDS representations, on the other hand, can be obtained explicitly, called Maximum-Metric-Entropy representation. This RDS has statistically independent movements given current positions being different.
In Sec. 4, we introduce the notion of synchronization in a RDS. Also it is shown that the sufficient and necessary condition for this maximum metric entropy RDS representation synchronizes is the Markov chain having a unique absorbing and aperiodic communicating class. This concept turns out to be intimately related to the Doeblin's coupling in the theory of Markov process. We also present some numerical results that characterize the synchronization phenomenon in different RDS representations of an MC. In Sec. 5, we explore the relation between RDS and another body of literature: the coupled stochastic processes. We show that the latter can be viewed as a generalization of the RDS to random Markov system. The mathematical models for biological motors proteins in terms of the coupled (or switching) diffusion process, called Brownian ratchet, is in fact one such example [38]. This subject is too broad to be discussed comprehensively. We shall illustrate one issue: the reversibility and Kolmogorov cycle condition. The paper concludes with Sec. 6 which contains a summary as well as some suggestions for future work.
Some of the mathematical materials can be found in the Appendices.
2. Discrete i.i.d. Random dynamical systems. The theory of random dynamical system (RDS) studies the action of random maps, drawn from a collection with prescribed probability, on a state space. Heuristically, the difference between an MC and an RDS is that the randomness in the former arises in a particular dynamics while in the latter is embedded in the "law of motion". In this section, we present a brief overview of some essential concepts of RDS that are particularly relevant; we refer the reader to [21] for more mathematical discussion. The presentation is pedagogically self-contained.
2.1. RDS with discrete state space. We shall adopt the notations from [5], and focus on the case that the state space is finite. An i.i.d random dynamical system is described by a triplet (S, Γ, Q), where S is the state space, Γ is a family of maps from S into itself and Q is the probability measure on the σ-field of Γ, denoted as F. So (Γ, F, Q) forms a probability space. The set Γ is interpreted as the set of all admissible laws of dynamics.
In what follows, while all the mathematical definitions are general enough for a continuous state space S, we will give explicit examples in terms of a finite state space S = {s 1 , s 2 , · · · , s n }. Any α ∈ Γ is called a deterministic transformation on S. Each such transformation has a n × n matrix representation, called deterministic transition matrix. A deterministic transition matrix is a binary matrix that has all entries from {0, 1} but it has exactly one entry 1 in each row and 0s otherwise. Therefore, the cardinality Γ ≤ n n . For convenience, later in this paper, n n is denoted as N . Among all deterministic transition matrices, there are n! permutation matrices and they correspond all invertible mappings of the finite state space S to itself; all other N −n! matrices correspond to non-invertible maps, which necessarily have at least one column of 0s.

Eigenvalues and singular values of deterministic transition matrix.
If a deterministic transition matrix A is invertible, and we denote its transpose as A T , then AA T is the identity matrix. In fact, A T = A −1 is the inverse of the corresponding one-to-one transformation. Therefore, invertible deterministic transition matrix has all eigenvalues on the unit circle and singular value being 1. If A is a non-invertible deterministic transition matrix, then its eigenvalues are either on the unit circle or zero. It necessarily has at least one column of 0's. Note, however, that both This indicates that state 3 in A 2 is also a transient state; in fact state 4 is also transient. The deterministic dynamics represented by It has a global "attractor" with a single state 1; thus its zero eigenvalue has a multiplicity of 3. In contrast, the deterministic transformation represented by A 1 is 2 → 3 → 4 → 1 → 3. Its global "attractor" is a 3-state cycle. This is the dynamical reason why the zero eigenvalue of A 1 has only multiplicity of 1. We can rearrange rows and columns of A 1 to make it block upper triangular, then the above statement is clear.  [5]: initially, the system is in some state x 0 in S, a map α 1 in Γ is chosen according to the probability measure Q and the system moves to the state x 1 = α 1 (x 0 ) in step 1. Again, independently of previous maps, another map α 2 is chosen according to the probability measure Q and the system moves to the state x 2 = α 2 (x 1 ). The procedure repeats. The initial state x 0 can be a fixed state or an S-valued random variable X 0 independent of all maps α n . The random variable X n is constructed by means of composition of independent random maps, X n = α n • α n−1 • · · · • α 1 (X 0 ). Clearly, X n is a Markov chain (MC) and its transition probability for any x ∈ S and any measurable set G ∈ B(S). In terms of the n × n deterministic transition matrix P α corresponding to α ∈ Γ, the transition probability matrix for the MC can be simply expressed as M = E Q P α , where the expectation is taken with respect to probability measure Q.
From a stochastic dynamics perspective, therefore, there is no difference between an i.i.d RDS and its induced MC if only a single sample trajectory is simulated. In other words, the one-point motion in this RDS is the Markov chain with transition probability P (x, G). The difference between them comes out when one is interested in two simultaneous sample trajectories with different initial states (but same {α n }) since two sequences {X n } in the RDS are not independent. This is sometime called "identical noise realizations" [8] or "two-point motion" [4]. The difference in the two theories has been described as "two cultures" in [3].
For a given RDS, Eq. 2 uniquely defines an induced Markov chain. In the present work, we are interested in the reverse question: Can, and how, a Markov chain be represented by the compositions of i.i.d. random transformations? In the world of stochastic modeling, this provides a more refined stochastic description of dynamics that is consistent with a Markov model.
In more precise mathematical writings: Given the transition probability P (x, ·), x ∈ S, does there exist a probability measure on Γ, such that Q{α : α(x) ∈ G} = P (x, G), for all x ∈ S and any measurable set G ∈ B(S)? There is a proof of the "can" for a general continuous transformations, given in [7] and [21]. In the finite state situation, it follows the Theorem 2.1, which is an analog of the Birkhoff-von Neumann theorem in the theory of doubly stochastic matrices [6] and [31].
Theorem 2.1. The set of n × n Markov transition matrices, Ω n , forms a convex polyhedron with deterministic transition matrices as its vertices.
Proof. See Appendix A.
In more plain words, if matrix M ∈ Ω n , then where P γ are deterministic transition matrices and a γ are nonnegative numbers, called weights, satisfying N γ=1 a γ = 1. This implies that there always exists at least one RDS representation for any finite Markov chain with the probability mass function {a} N 1 on the set Γ. The proof is based on a min-max algorithm. Here is the scratch of the algorithm: at each step the weight α is the minimum of maximum entry in each row of the matrix M and the corresponding deterministic matrix P is the index of maximum entry in each row; then redefine M − αP as M and keep this iteration until M becomes a zero matrix.
Such representation in general is not unique. This gives rise to the question of "how": Which representation is reasonable under some prior information or requirements.
Here is an example that illustrates the existence and proves non-uniqueness.
This gives rise to the question of how many ways can a given Markov transition probability matrix be expressed in the form of (3). In fact, one can ask the following combinatorial question: In the representations of a Markov transition matrix in the form (3), what is the least possible number of deterministic transition matrices, κ * (M)? Some results of upper bounds for κ * (M) can be obtained. Unlike a stochastic differential equation which has a welldefined deterministic counterpart, a Markov chain has never had an unambiguous deterministic reference. The "least", therefore, could be interpreted as the "closest to a deterministic dynamics". The following theorem gives a loose estimation.  Figure 1. This is the map diagram to illustrate transitions with probability.
Proof. There are n linear conditions on the row sums of a n × n Markov transition matrix. Therefore, dim Ω n = n 2 − n. Carathodory's theorem shows that every matrix M ∈ Ω n is in a convex hull of n 2 − n + 1 deterministic transition matrices. Then κ * (M) ≤ n 2 − n + 1.
Notice that in Example 1, this upper bound has been reached. For irreducible periodic Markov transition matrices, the upper bound can be further improved by using the period of the matrix [29].
The sequences of maps in the RDS can be generated other than i.i.d process. For example, another common type is to generate the random transformation via a Markov process [35,12]: The probability which map being chosen at each step depends on the previous map. This RDS also induces a stochastic process on the state space S, but its Markovian property is lost in general. Here is an example that no matter how many steps of memory one keeps, it will not be a Markov chain.

Example 2. Consider a set of four deterministic transition matrices
Define the transition probability between different maps as following: It is illustrated by the Figure. 1 Choosing the initiate map with equal probability (0.5) to be A or C. Then possible map sequences can be AAABAABA · · · or CDCCDCCC · · · . Consider the RDS trajectory starting from state X 0 = 1, then the corresponding state sequences are 111131131 · · · and 112112111 · · · . Notice that 2 and 3 cannot appear in the same sequence. Assume it is a Markov chain with respect to the memory of previous n digits. Then we have which is a contradiction. Thus the stochastic process {X } is not a Markov chain with any finite length of memory.
2.4. Subshift of finite type. In i.i.d RDS, each map is chosen independently from previous chosen maps. The concept of independence in probability has to be defined in a product space. In our case, this product space will be infinite product of the copies of Γ, Ω 1 = Γ Z + and it is the set of all possible one-sided infinitely long sequence of deterministic transformations [21].
A shift map θ acting on each sequence shifts all the symbols to the left, i.e, θ α 0 α 1 α 2 · · · α k · · · = α 1 α 2 · · · α k · · · . At each step, the first transformation in this one-sided sequence will be the one applies on the state space. The sequence space together with the shift map, (Ω 1 , θ), is called a subshift of finite type (SFT). Very insightfully, in terms of SFT one can show the shift map of an RDS is a deterministic dynamics with chaos, while in the state space the dynamics is nevertheless stochastic [27]. Probability measure on this product space Ω 1 are defined on the cylinder set, Clearly maps at different steps are chosen independently with the same distribution Q.
The concept of SFT can also be defined for an MC. It would be interesting to discuss the relationship between SFT in an i.i.d RDS and SFT in its induced MC. Recall that the induced Markov chain is the one-point motion of the i.i.d RDS, with a finite state space of n alphabets, S, a transition probability matrix M = p xy |x, y ∈ S = Q{α : α(x) = y}, and an initial distribution {p z , z ∈ S}. Then the sequence space Ω 2 for the MC is the set of all possible one-sided infinite sequences of states, Again the shift map ν is ν(x 0 x 1 x 2 · · · ) = (x 1 x 2 x 3 · · · ) and the probability measure on the cylinder set In the studies of dynamical systems, especially in ergodic theory, a measurepreserving dynamical system is an important concept. For the shift map, it is defined as follows: ν is measure preserving if P 2 (ν −1 (A)) = P 2 (A), for all cylinder sets A. Alternatively we can say that the measure P 2 is invariant under the shift map ν if ν is measure preserving [23]. One can show P 2 is invariant if and only if the initial distribution p z is the invariant density of the MC, π: π y = x π x p xy . In i.i.d RDS, θ is measure-preserving.
The one-sided sequence of transformations α applies on three different initial conditions s 1 , s 2 , s 3 , and induces three onesided sequences of states a, b, c. In the mean time, a different sequence β could induces the exactly same sequences as well. The shift map θ in the product space Ω 1 maps from α to γ. It induces the shift map ν in the product space Ω 2 maps from c to d.
Each element ω 1 = α 0 α 1 α 2 · · · α k · · · in the product space Ω 1 is a sequence of deterministic transformations. If this sequence applies on n different initial conditions, i.e, s 1 , s 2 , · · · , s n , it produces n different sequences in the product space Ω 2 . In the mean time, multiple elements in Ω 1 apply on the same initial conditions may have exactly the same sequence of states. This implies that knowing one sample trajectory in the MC may not be enough to fully determine which transformation was picked in the i.i.d RDS view. We use "may" because in some special situations, this can actually be uniquely determined. We shall call such RDS's having no common dynamics. It will be explored more in Sec. 3.3.
Question concerning two simultaneous trajectories, e.g., the joint probability of two sequences of states ω ∈ Ω 2 and ω ∈ Ω 2 , can not be asked; it is not defined in this model. Such probability has to be defined on the new product space Ω 2 ⊗ Ω 2 . We will discuss this more in the Sec. 4.2 in the context of synchronization.
3. The metric entropy. The concept of metric entropy was first introduced by Kolmogorov and further improved by Sinai [50,34,10]. It has been very successfully used in solving the isomorphism, or conjugacy, of dynamical systems. The metric entropy measures the maximal rate of information production a system is capable of generating [19]. It is a well-developed subject but can be technical in nature.
In the present work, we only give some very intuitive notions, and then focus on several relevant results for the metric entropy. Some mathematical derivations, in a heuristic fashion, can be found in Appendix B.
3.1. Metric and topological entropies of Markov chain. The notions of metric and topological entropies are very much motivated by the Gibbs' and Shannon's entropy, which is based on the large deviation of probability [48], and Boltzmann's entropy, which is based on simple counting [13].
In this section, we only focus on the irreducible MC such that it will have the unique invariant density π. The irreducibility of MC is equivalent with the dynamic being ergodic. Aperiodicity is not required for the entropy theory of MC.
The metric entropy h of a Markov chain with transition probability p ij , i, j ∈ S, is the asymptotic exponent of the vanishing probability for a single stochastic trajectory i 0 i 1 i 2 · · · i with increasing , e −h l [20]: and Here we stipulate that 0 ln 0 = 0. The last step to replace time average by expectation from invariant distribution, is based on the ergodicity of the MC: As → ∞, the frequency of the state i in the sequence goes to π i and the frequency of the pair ij becomes π i p ij .
Note that for a deterministic transformation with probability 1 for transition i k → i k+1 , the probability in (12) is a constant. Therefore, its metric entropy h = 0.
The topological entropy of a Markov chain η characterizes what is possible and what is not; it is independent of actual values of transition probabilities. Two Markov chains with transition probabilities p ij and q ij , i, j ∈ S, has a same topological entropy when p ij = 0 if and only if q ij = 0. They both induce a same SFT, and the topological entropy η simply counts the number of possible trajectories generated by the Markov process. Consider an n × n irreducible binary matrix A, n = S , satisfying A ij = 1 when p ij > 0, A ij = 0 when p ij = 0. We call it the "topological skeletal matrix" of this MC. The number of possible trajectories increases asymptotically with : in which λ A is the largest positive eigenvalue of the matrix A. The topological entropy η, following Boltzmann's notion of entropy It is easy to show that η ≥ h. For a Markov chain with all p ij = 1 n , the equality is attained η = h = ln n.
The definition of topological entropy according to (14) also clearly indicates that η is the same for a stochastic process with "increasing number of distinct trajectories" and an endomorphism with "increasing number of distinct pre-image" [41].
For an MC, its topological entropy η is intimately related to how many connected neighbours a state has, or "dimensions" in a lattice system [27]. For an MC with exact ν non-zero elements in each and every row of its transition matrix, ν < n, then the largest eigenvalue of its "topological skeletal matrix" A is λ A = ν. The corresponding right eigenvector is clearly (1, 1, · · · , 1). This is the "topology" of the MC, which is also the metric entropy of the Markov chain with p ij = A ij /ν. Its corresponding invariant distribution, e.g. the left-eigenvector is 1 n , 1 n , · · · , 1 n : Then we have the following result which could be very useful for sparse Markov networks: where ν i is cardinality of {j : p ij > 0}, and {π i } is the stationary distribution. In particular, if the MC has the maximal number of neighboring states being ν, then h ≤ ln ν.
Proof. Let us now consider all the possible Markov chains with state state S which have the same "topological skeletal matrix" A: Then for each i, the transition probability p ij has lower entropy than the uniform A ij /ν i whose entropy is simply ln ν i . The metric entropy for the Markov chain then is:  [27]. Moreover, the deterministic transformation with finite state has zero metric entropy once it is chosen. Therefore, the randomness of this system is solely generated by the i.i.d process.
The metric entropy for this RDS is given as follows.
It is worth mentioning that the metric entropy of the RDS in many cases can be infinite. This is mainly because when there are countably infinite transformations in Γ, the infinite sum in the entropy may not converge. A different notion of entropy of i.i.d. RDS that remedies the difficulty is defined as the weighted mean of the metric entropy for all deterministic transformations with probability mass as their weight [21]. We do not need to be concerned with this since our S is always finite.
We are now in the position to address the question that for a given MC, which RDS representation is reasonable according to certain requirement. In fact, the metric entropy of its corresponding RDS becomes a good characterization for different representations. It is natural to ask the lower bound and upper bound of them for a given MC.

3.3.
Lower bound of metric entropy. Kifer notified this question and gave the result that h RDS ≥ h MC [21]. In the finite i.i.d RDS, different sequences of deterministic transformations applies on a same initial condition might induce the same MC trajectory, as we discussed in Sec. 2.4. That means more information is required to determine which deterministic transformation is chosen at each step. In other words, the RDS generates more information than the MC at each step. With this intuition, Kifer's result becomes clear; a general proof can be found in his book [21]. We will provide an elementary proof here for the finite i.i.d RDS and illustrate the condition under which the equality is attained.
Denote a deterministic transition matrix P ∈ Γ as P t1,t2,...,tn , if P i,ti = 1. Denote its corresponding probability as a t1,t2,...,tn . For example, P 1,2,1 represents the matrix  By the representation of deterministic transition matrices P i1,i2,...,in and Q j1,j2,...,jn , the definition is equivalent with P − Q has at least one row being all zeroes. In our notation, that is, ∃k such that i k = j k . If two deterministic transformations have no common dynamics, then P − Q has no rows that being all zeroes. This definition can be extended to multiple deterministic transformations. If no pair of two deterministic transformations has common dynamics, we call all these deterministic transformations have no common dynamics. In the case that the equality is attained, and when simulating a trajectory in the state space S with such RDS, the deterministic transformation is uniquely identified in each step. The following corollaries illustrate more about the RDS and its induced Markov chain as the equality attained. For a given Markov transition probability matrix, it is not necessarily true that there exists an RDS representation whose metric entropy is the same as that of the MC. In fact, for most Markov transition matrices, it is not reachable. If exists, such RDS might not be unique as well. Nevertheless, it provides a possible lower bound of the metric entropy. In order to give a reachable lower bound, it usually need to solve a non-convex problem and no desirable solution is available.

Upper bound of metric entropy.
It is natural to ask what is the upper bound of the metric entropy of all RDS representations given a finite MC. It turns out we can find an analytic expression for the decomposition.  The Gibbs inequality is applicable since a i1,i2,...,in and Π k M ki k are probability mass functions. The equal sign holds if and only if these two functions are identical, i.e, a i1,i2,...,in = Π k M ki k .
The example in (6) is the maximum-metric-entropy representation for the given Markov transition matrix M. It is easy to see if all entries of the transition matrix M are positive, then all n n deterministic matrices will have positive probability. Furthermore, the most probable deterministic transition matrix corresponds to the deterministic transformation that maps to the state with the largest probability given the current state, i.e, its deterministic matrix has entry one in the position that is maximum in each row of the transition matrix M. It is a very insightful result since this is the most reasonable "deterministic counterpart" for a given MC with transition probability matrix M.

Synchronization.
Each realization of an RDS is a non-autonomous dynamical systems [2]. Synchronization is a well-developed concept in the theory of nonautonomous ODEs [26]. Through studying synchronization in an RDS, one appreciates the fact that RDS formulation is a more refined model of stochastic dynamics than an MC. The same phenomenon has also been terms as random sink [4,26] in neuron dynamics applications. It has a deep relation to the concept of coupling in the theory of MC [25,47].

Defining synchronization.
Recall the SFT (Ω 1 , θ) derived from the finite i.i.d. RDS defined in Sec 2.4, for each element ω 1 in the product space, we apply this sequence of deterministic transformations on multiple initial conditions simultaneously, and it induces multiple sequences of states. These sequences are not independent; actually once they collide at some instance, they will be together forever. This phenomenon is called synchronization. In fact, it is easy to see that if any pair of sequences synchronize in the finite step almost surely, any multiple sequences will synchronize to one single sequence almost surely. So we can reduce this to the study of two-point motion [4].
There are always some sequences that they will never synchronize but we claim the probability of these sequences is 0. So they are insignificant.

4.2.
Maximum metric entropy representation and Doeblin's coupling. We note, however, that not every RDS will possess such property. Given a Markov chain with the transition probability matrix M, we would like to have a survey on the synchronization of its RDS representations. We are particular interested in the maximum metric entropy representation, i.e, a t1,t2,...,tn = M 1t1 M 2t2 . . . M ntn .
In studying synchronization, one needs simultaneous construction of two infinite sequences by applying the sequence ω 1 on the two initial states as a pair (x 0 , y 0 ), so the product space will be Ω 2 ⊗ Ω 2 and the shift map can be induced from θ in the RDS. It describes a two-point motion.
In terms of this RDS, it becomes a new Markov chain (X, Y ) whose state space is S × S. Its transition probability matrix W is defined as In probability theory, Eq. 18 is the transition probability of a Markov chain, as the most basic example of coupling of two Markov chains, first used by Wolfgang Doeblin. This Markov chain (X, Y ) behaves as follows: if X n = Y n , then the two components make independent movements according to the transition matrix M at each step; if X n = Y n , they make the same movements. Thus, {(1, 1), (2, 2), . . . , (n, n)} is absorbing.
Actually, in terms of Eq. 19 below, it is very clear why Doeblin's coupling is different from two independent Markov chains: The transition probability matrix W for the two-point motion of two independent Markov chains is W is the same as W except at these n rows of (1, 1), (2, 2), . . . , (n, n).
With the introduction of W, we can find the precise condition under which an RDS synchronizes. We have the following theorem. Remark. from this theorem, it is easy to see that if a Markov chain is irreducible and aperiodic, then the maximum-metric-entropy RDS synchronizes.
Proof. ⇒: Remember that once a trajectory falls into an absorbing communicating class, it will never get out again. If the Markov chain has at least two absorbing communicating classes, then states from these two classes will never synchronize.
If the MC restricted on an absorbing communicating class is k-periodic, then this communicating class can be divided into k subclasses. Each time the MC jumps from one subclass to the next. States from different subclasses will never synchronize.
⇐: Denote the absorbing communicating class by C. Since the absorbing communicating class is unique, there exists a positive integer n 1 such that for any state i ∈ S \ C, there exists a state j ∈ C to make p n1 (i, j) > 0. Since the Markov chain restricted on C is irreducible and aperiodic, there exists a positive integer n 2 such that for any two states i, j ∈ C, p n2 (i, j) > 0. Set n = n 1 + n 2 , then for any two states i ∈ S, j ∈ C, p n (i, j) > 0. Now for any two initial states i 0 , j 0 ∈ S, we can find admissible sequences to reach any state k 0 ∈ C after nth step: If this two sequence collide before the step n, i.e, m = min{k : i k = j k } < n, we truncate the sequence up to m steps. Now there are two admissible sequences which reach the same state at step m in the first time. So in the intermediate states, The probabilities of both sequences in MC are strictly positive.
Now we want to calculate the probability for both sequences with initial states i 0 and j 0 , p(i 0 , j 0 ), in the maximum-metric-entropy representation RDS.
This probability is strictly positive. p(i 0 , j 0 ) gives a lower bound of the probability of sequences starting with i 0 and j 0 synchronize within n steps. For convention, if i 0 = j 0 , p(i 0 , j 0 ) = 1. Among all different pairs of initial states, define p = min i,j p(i, j), which is the lower bound of the probability for any two initial states to synchronize within n steps. The probability that any two sequences don't synchronize within n steps is P non-sync = Pr{ω : f (n) (ω)s 1 = f (n) (ω)s 2 } ≤ (1 − p). So the probability for any two sequences do not synchronize within kn steps is P k non-sync < (1 − p) k . It implies any two sequences do not synchronize in finite steps has probability 0. Thus, the finite i.i.d RDS synchronizes by definition.

4.3.
Examples and numerical results. We shall show several examples of synchronization through numerical simulations. We consider Markov chain with 4 × 4 transition matrix, thus Γ ≤ 256. We will simulate four sample trajectories in the RDS setting and these trajectories start from four different initial conditions and stop once all four collide into one trajectory; count down steps required. We are interested in two questions: first, given a Markov transition matrix, what kind of RDS representation will synchronize and what kind of will not?; second, if it synchronizes, what is the probability distribution of the synchronization steps (or coupling time)? A more rigorous definition for the random variable N s , the synchronization steps is where s 1 , s 2 , s 3 , s 4 are four different initial conditions.
For the first question, we have shown the sufficient and necessary condition for the maximum-metric-entropy representation. It is possible that an irreducible and aperiodic MC in other RDS representations will not synchronize. A trivial example is an irreducible and aperiodic doubly stochastic Markov matrix being decomposed into a convex combination of permutation matrices. However, if Γ contains nonpermutation matrices, it is still possible that RDS will not synchronize. The first four deterministic matrices are permutation matrices and the last one is not. In such RDS representation, it is easy to see that two trajectories starting from state 1 (or state 2) and state 3 (or state 4) will not synchronize into one, even though the Markov transition matrix is irreducible and aperiodic.
For the second question, it is very hard to calculate the distribution of synchronization steps due to deterministic transition matrices being non-commutative. We simulate various examples of Markov transition matrices in maximum metric entropy representation and min-max representation and histogram the frequency of synchronization steps. In Figure.  Both matrices are irreducible and aperiodic so maximum-metric-entropy representation will synchronize. We also verified min-max representations synchronize through finding the absorbing states after constructing the 16 × 16 matrixes W in (17). From the Figure. 3, the maximum-metric-entropy representation synchronizes faster than the min-max representation on average for the first Markov transition matrix; however, the min-max one is faster on average for the second. The tail frequency for both representations shows an exponential decaying, as illustrated by the semi-log linear fits of the tail probability p(x) ∝ a exp(−bx). The slope b is related to the second largest eigenvalue of the transition matrix of the two-point motion, W.

Random Markov systems.
In the present work, RDS is defined as a sequence of identical, independently distributed deterministic transformations on discrete state space S. If we replace the collection of all possible deterministic transformations by a set of stochastic matrices, then we obtain a generalization of RDS, which we shall term random Markov system (RMS). The description of RMS is similar with RDS if one-point motion is considered. There already exists a large body of literature on this subject, under various names, ranging from coupled (or switching) diffusion processes [42,28,52,54] to theories of random evolution and random medium [36,32]. A key motivation of this class of models is to distinguish intrinsic noise that causes stochastic transitions and extrinsic noise, which is the origin of time-dependent parameters of a dynamical system.
In biophysical chemistry, one of the central issues for this class of problems is the nature of detailed balance, or reversibility [9,16]. See [39] for an extensive discussion on how extrinsic noise can be a source of free energy that sustains a biochemical system out of equilibrium. In mathematical terms, a random i.i.d. sequence of reversible stochastic matrices, in general, defines a Markov process that is not necessarily reversible. This can be easily shown from the following example:  The first stochastic matrix has a detail-balanced stationary probability 1 13 (1, 2, 10), and the second stochastic matrix has a detail-balanced stationary probability 1 3 (1, 1, 1). However, the stochastic matrix on the right-hand-side has a Kolmogorov cycle criterion [53,14], also known as entropy production per cycle [40]  ij . The first assertion is immediate from the definition of detailed balance: π i p ji ∀k, which yields Treating P as a matrix-valued random variable with probability measure {a i ; 1 ≤ i ≤ K}, the pair P(t), X(t) is also a Markov chain with transition probability Then its stationary entropy production is [54]: If P (k) are not detailed balance, but they all still have a common stationary distribution {µ i }. Then, Lemma 5.2. Assuming a same invariant distribution, the stationary entropy production rate (epr) of a full RMS is never less than the epr of the composite Markov chain: Proof. We note the left-hand-side and the right-hand-side of (24) are, respectively, and i,j∈S Based on the log-sum inequality, which states that for two sets of non-negative numbers (α 1 , α 2 , · · · α n ) and (β 1 , β 2 , · · · , β n ), Summing over all i, j, the two sides leads to (25) and (26).
The equality in (27)  Therefore, if not all the Markov systems are detail balanced, then the total stationary entropy of the RMS, in terms of the pair P(t), X(t) , is greater than the entropy production of the composite MC, X(t), with transition probability matrix M = E Q P α defined similarly in Sec. 2.3. As discussed in [44], the details in these three systems decrease as a form of coarse graining, thus the entropy production.
Interestingly, the mean entropy production rate of a set of Markov chains in a RMS can actually be greater than the total stationary entropy production rate of the RMS.

6.
Conclusions. In the era of BIG DATA, a quote from [8], which has partially inspired our work, is still very relevant: "[I]n the study of deterministic dynamical systems, environmental noise tends to be suppressed or, at most, plays a secondary role, whereas in the study of statistics the deterministic dynamic kernel of the random generating mechanism tends to give way to the more macroscopic characterization such as the mean functions, the covariance functions, the spectral functions and so on." In the present work, we have seen that environmental noise and the dynamic kernel of random generating mechanism, which we termed extrinsic and intrinsic noises respectively, actually play very different roles in a stochastic dynamics. Random dynamical system (RDS) is a more refined stochastic dynamic description of random processes. For single sample paths generated by a Markov chain, there are many compatible i.i.d. RDS. In the present work, we have termed them RDS representations of the Markov chain. Since the representation is not unique, one can characterize an RDS representation in terms of the metric entropy of the i.i.d. system. It can be shown that the Shannon-Khinchin entropy of the Markov chain is the lower bound of the metric entropies of all the RDS representations. On the other hand, the RDS representation with the maximum metric entropy has a very special feature that if currently synchronization has not yet occurred between two trajectories, then their next transitions are independent.
One of the particularly attractive features of the discrete state formulation of RDS and MC is the possibility of various in-depth investigations of complex biological dynamics using a broad spectum of mathematical tools that are accessible to mathematical biologists. For example, applying the theory of random matrices [30,35]. In essence, by focusing on discrete state space, our approach described here has reverted to the original motivation of Oseledec's multiplicative ergodic theorem [3], which is at the heart of RDS. Other possible directions include the theory of irreversible Markov process and its entropy production [53,14], convex analysis [43], and maybe even group theoretical investigations. Also, understanding chaotic dynamics in a discrete setting [27] could have implications to uncertainty quantification of numerical algorithms [17]. Finally, but not the least, recent development in stochastic thermodynamics has suggested that the fundamental entities in a nonequilibrium stochastic dynamics are cycles rather than states [40,18]. The discrete formulation of RDS will provide ample opportunity to explore this insight.
where P j are deterministic matrices, β j are nonnegative and j β j = 1. Therefore, Where P N +1 = P , a N +1 = a, a i = (1 − a)β i for i = 1, 2, · · · , n. Obviously, a i are nonnegative. Moreover, It is easy to see that a deterministic matrix cannot be expressed as a convex combination of other deterministic matrices. Thus deterministic matrices are the vertices of this polyhedron.
Appendix B. Metric entropy. In this section, we review some important concepts in connection with metric entropy. This material can be found in standard textbooks [21,50,19,10]; it is presented for the convenience of the readers.
Let T be a measure preserving transformation of a probability space (S, F, µ). A good motivation for the notion of metric entropy h T , associated with T , is in terms of measurements. A measurement is a finite partition of the space S, P = {P 1 , · · · , P k }. All these measurable sets P i are disjoint and their union is S. Now consider a finite portion of an orbit, with length n, generated by T , starting from an initial condition s ∈ S, s, T (s), T 2 (s), · · · , T n−1 (s).
Each of the points T i (s) belongs to exactly one of the sets of the partition P, s ∈ P k0 , T (s) ∈ P k1 , · · · , T n−1 (s) ∈ P kn−1 .
We say k = {k 0 , k 1 , · · · , k n−1 } the address of s with respect with the partition P . It is possible that another orbit will have the same address as k with respect with the partition P so we can collect all initial points such their orbits have the same address k. P n (k) = s ∈ S : address of s = k (B1) In fact, P n = {P n (k) : k is any address of length n} is also a partition of S. Moreover, it can be shown that P n is the join of the partitions P, T −1 P, · · · , T −(n−1) P . One likes to quantify the amount of information in an address of an orbit with length n. This is given by the Gibbs-Shannon entropy H(P n ) = − m j=1 µ(P n j ) ln µ(P n j ) where m is the number of partitions of P n . The amount of information per unit is then 1 n H(P n ). Therefore, the information per unit in a measurement P is defined as H(P, T ) = lim n→∞ 1 n H(P n ) and the metric entropy is the supremum of this value over all possible finite measurements, h T = sup P H(P, T ). A discovery due to Sinai helps the computation: The supremum is attained for all partitions are generators. A generator is a partition P such that two different points of S have a different address.
Therefore the metric entropy is h(T ) = − m i=1 p i ln p i . Bernoulli trial is a sequence of i.i.d. random variables; one can similarly derive for the metric entropy of a Markov chain. For Markov chain with transition probability p ij and invariant distribution π i , h = − m i,j=1 π i p ij ln p ij . See also Eq. C9.
Appendix C. Sequence space and thermodynamic formalism. Consider an aperiodic, irreducible, and recurrent Markov chain with transition probability p ij and stationary distribution {π i }, i, j ∈ S with S = n. If one considers the space of all the possible sequences generated by the Markov process, then there is an isomorphism between the problem of stochastic dynamical systems and the technique used in statistical thermodynamics. This is known as the thermodynamic formalism [24]. Both metric entropy and topological entropy emerge in this formalism, as follows.
One calls E ij def.
= − ln p ij "interaction energy" between nearest neighbors in state i and j in a given sequence σ = i 0 i 1 · · · i −1 of length . Then the total energy of the sequence σ and is called a "partition function" and Λ(β) def.
To compute the partition function Ξ in (C2), one can use a technique based on matrix multiplication. In molecular biophysics of proteins and DNA, this has been applied to the theory of helix-coil transition [37]. It can be easily shown that Ξ(β, ) = (0, · · · , 1 i0 , . . . , 0)Q in which matrix Q has elements Q ij = e −βEij = e β ln pij .
Entropy in thermodynamics is defined as and is called "mean internal energy". U − β −1 S = Λ then is free energy. We note that when β = 1, Q ij = p ij and Ξ(1, ) = 1. Therefore when β = 1, Λ = 0 and U = S ∀ . C.1. Thermodynamic limit. Taking → ∞ is called "thermodynamic limit", in which one has the free energy where λ Q (β) is the largest eigenvalue of Q. Furthermore, In fact, S(0) is the topological entropy of the Markov chain, and S(∞) = 0. At β = 1, eigenvalue perturbation calculation yields S(1) = − i,j∈S π i p ij ln p ij , which is the metric entropy.