ON SIR-MODELS WITH MARKOV-MODULATED EVENTS: LENGTH OF AN OUTBREAK, TOTAL SIZE OF THE EPIDEMIC AND NUMBER OF SECONDARY INFECTIONS

. A Markov-modulated framework is presented to incorporate correlated inter-event times into the stochastic susceptible-infectious-recovered (SIR) epidemic model for a closed ﬁnite community. The resulting process allows us to deal with non-exponential distributional assumptions on the con- tact process between the compartment of infectives and the compartment of susceptible individuals, and the recovery process of infected individuals, but keeping the dimensionality of the underlying Markov chain model tractable. The variability between SIR-models with distinct level of correlation is dis- cussed in terms of extinction times, the ﬁnal size of the epidemic, and the basic reproduction number, which is deﬁned here as a random variable rather than an expected value.


1.
Introduction. The SIR-model is a well-studied epidemic model that, together with its generalizations, has been widely applied to infectious diseases such as measles, chickenpox, or mumps, among other situations where infection confers (typically lifelong) immunity; a good discussion about SIR-models analyzed from deterministic and stochastic perspectives can be found in the lucid texts by Allen [1,2], Andersson and Britton [8], Bailey [16] and Keeling and Rohani [36]. The SIR-model is first analyzed in depth by Kermack and McKendrick [37] in 1927 in order to study the evolution of a disease in a closed community of finite size where, at time t, individuals are classified into three categories: S(t) susceptibles, I(t) infectives, and R(t) removed individuals. The general description in [37] is related to an homogeneously mixed population (i.e., any infective can infect any susceptible with equal probability), where the infection and recovery rates of a given infective depend on the total time that this individual has been infected for. The analytical difficulties of addressing this time-dependent assumption lead Kermack and McKendrick [37] to the consideration of a number of special cases. Specifically, Special Case B in [37] amounts to the particular assumption that infection and recovery rates are constant, becoming the origin of the basic SIR-model; see e.g. [37,Eq. (29)]. Therefore, in the basic SIR-model the rate of new infections is proportional to the current numbers of susceptibles and infectives in the population; as time passes, some infectives recover and go into the removed category and, at the same time, infectives contact susceptibles so that S(t) decreases and I(t) increases.
In recent years, there has been a rapidly growing interest in SIR-models and their applications; a comprehensive review of the existing literature and some results can be found in the survey papers by Britton [21], and Isham [34,35]. In the stochastic setting, three important quantities of the SIR-model include the final size distribution, the expected duration of an epidemic, and the basic reproduction number R 0 , which are analyzed by using a variety of tools in applied probability; see, for example, the articles by Allen [3], Allen and Burgin [4], Artalejo et al. [9], and Artalejo and López-Herrero [12], among others. Gani and Purdue [27] outline a matrix-geometric method for determining the total size distribution in a recursive manner, and El Maroufy et al. [25] explore how this method can be used to study SIR-models with a generalized mechanism of infection. A recursive algorithm used to compute the distribution of the epidemic size, under the assumption of a large finite initial population of susceptibles, is described by Gordillo et al. [30], who also derive a diffusion limit in the near-critical case. Unlike Neuts and Li [40] who derive algorithms to compute the time to extinction of the infectious disease, Grasman [31] deals with a diffusion approximation for continuous states. More concretely, Grasman [31] presents a method to approximate the long-term stochastic dynamics of an epidemic modeled by state variables denoting the various classes of individuals within the population such as in the SIR-model and a variant with spatial structure, including epidemics in populations distributed among different locations with migration of individuals between these locations. The concepts of local infectious clump and local susceptibility set are used by Ball and Neal [18] to develop a unified approach to the threshold behavior of SIR epidemics with two levels of mixing. The objective of Artalejo and López-Herrero [12] is to investigate quasi-stationarity and the ratio of expectations as two conceptually different approaches for understanding the dynamics of the SIR-model and its variant with demography before the extinction of an epidemic. New descriptors, including the time to reach specific numbers of infectives and susceptible individuals [15,Section 3.1], and the time to reach a critical number of infections [15,Section 3.2] are also investigated by using efficient numerical tools; for a related work, see the paper [5] where the interest is in the time to reach a critical number of infections in the SIR-model with infective and susceptible immigrants. Results based on the maximum entropy (ME) formalism are first derived analytically in [13] for the SIR-model, and they are then applied to outbreaks of extended spectrum beta lactamase organisms in intensive care units of hospitals. Recently, Clancy [23], and Gómez-Corral and López-García [29] show how SIR epidemics can be naturally modeled as piecewise deterministic Markov processes when individuals' infectious periods are not necessarily exponentially distributed; note that the epidemic model in [23,29] is previously analyzed using special constructions by Ball [17], and Picard and Lefèvre [41]. Non-exponential tolerance to infection in SIR-and SEIR-models is analyzed by Streftaris and Gibson [45], who explore the use of a non-exponential model for individuals' thresholds to infection (see Sellke [44]) as an alternative to the traditionally employed Poisson streams for occurrence of infectious contacts.
With the exception of the ME principle [13], SIR-models with general infectious period distribution [23,29] and non-exponential tolerance to infection [45], the key for methods used in the above papers lies in the fact that the events are generated by Poisson streams and so new infections and recoveries, being exponentially distributed, possess the Markov property but do not incorporate correlations. Our major achievement in this article is the presentation of an SIR-model with correlated inter-event times. Specifically, we define a Markov-modulated process that allows us to reflect non-exponential distributional assumptions on the contact process between the compartment of infectives and the compartment of susceptible individuals, and the recovery process affecting the compartment of infectives, but keeping the dimensionality of the underlying Markov chain tractable. This means that inter-event times in our Markov-modulated SIR-model are scheduled at a compartmental level, rather than according to the individual level, which is the case of the extensions studied by Clancy [23], Gómez-Corral and López-García [29], and Streftaris and Gibson [45]. Thus, instead of an extension of the SIR-model, the resulting Markov-modulated model should be considered as an approximating model.
Up till now, relatively little work has been reported on epidemics exhibiting correlation in the process generating events. The use of a state-dependent (SD) Markov-modulated mechanism -including the Markovian arrival process (MAP, [10, Section 2]), and block-structured state events (BSDE, [10, Section 3.5]) -, as a point process characterized by a small number of matrices, provides us with a highly tractable benchmark for the study of SIR-models with versatile qualities. From a conceptual perspective, by using a Markov-modulated framework one deals with a weakened version of the lack-of-memory property allowing a Markovian dependence on one of a finite number of phases and, consequently, incorporating correlated and non-exponentially generated events. The BSDE approach (Artalejo and Gómez-Corral [10]) and related SD formulations can be used to extend stochastic systems that use Markov chains to model a biological population. For example, a SD-SIS model is first formulated in [11] as an augmented process, and then analyzed according to the time until the extinction, and the coefficient of correlation between successive events; as a related work, the article [28] deals with non-exponential distributional assumptions on a host birth, a parasite death, and parasitism in bivariate host-parasitoid models. Amador and Artalejo [6] use the BSDE formalism to study the number of cases of infection in a computer network, with an application to the propagation of the CodeRed II virus.
In the following sections we consider stochastic SIR-models of increasing complexity. In Section 2, we present the basic SIR-model as a bivariate stochastic model with the susceptible and infective compartments as state variables. In Section 3, we first construct SIR-models with correlated inter-event times by applying a Markovmodulated approach that provides a methodological tool to model SD Markovian transitions operating in the presence of phases. Next, we focus on a concrete specification of the underlying phase process such that relevant properties of the basic SIR-model are appropriately satisfied by the resulting Markov-modulated model. We analyze the ultimate extinction of the epidemic, the length of an outbreak and the final size of the epidemic, as well as two random variants of the basic reproduction number R 0 . Numerical results are presented in Section 4 to illustrate the effects of the correlation structure on the dynamics of Markov-modulated versions.

Transitions
Events Rates Table 1. Stochastic transitions, events and rates in the basic SIR-model Finally, some conclusions are given in Section 5. For ease of reference, Appendix A contains the proof of certain properties of the Markov-modulated SIR-model; and those Markovian arrival streams used in Section 4 are specified in Appendix B.
Glossary of notation Throughout the paper, vectors and matrices are denoted by bold letters; in particular, vectors are assumed to be column vectors and are represented by lowercase letters, like a, and matrices are denoted in uppercase, like A, with the transpose of A denoted by A T . The identity matrix of order a is given by I a , and 0 a×b is the null matrix of dimension a × b. The null and unit vectors of order a are denoted by 0 a and e a , respectively, and e a (b) is a column vector of order a such that all entries equal 0, except for the bth one which is equal to 1.
The symbols ⊗ and ⊕ stand for the Kronecker product and the Kronecker sum of two matrices, respectively, and the exponential of a square matrix W is defined

2.
The basic SIR-model. In the SIR-model, individuals develop an immunity to the disease. At time t, the population consists of I(t) = i infectives, S(t) = s susceptibles, and R(t) = r removed individuals, and the only possible events (Table 1) correspond to contacts between an infective and a susceptible, and the removal of an infective. The rates λ i,s and µ i can be specified in infinitely many ways. For example, the choice λ i,s = βis and µ i = ηi yields the general stochastic epidemic (Bailey [16,Chapter 6], Gani and Purdue [27]), and Models 1 and 2 analyzed by Neuts and Li [40] are related to the respective infection rates λ i,s = βi α s and λ i,s = βi min{s, n}, where the parameter α ∈ (0, 1) quantifies the degree of interaction between infectives and susceptibles, and the value specifies the fraction of susceptible population that is exposed to each infective; the transmission of myxomatosis among rabbits is analyzed by Saunders [43] by selecting the infection rates λ i,s = (i + s) −1/2 is.
In the case of a closed population that is homogeneously mixed, two state variables suffice and, under the assumption of initial numbers of m > 0 infectives and n > 0 susceptibles, the SIR-model is formulated as a time-homogeneous continuoustime Markov chain (CTMC) X = {X(t) = (I(t), S(t)) : t ≥ 0} defined on a finite state space S(m, The process X , which is from now on termed basic, is uniquely specified by the transition rates λ i,s and µ i in such a way that states in C 0 (n) = {(0, 0), (0, 1), ..., (0, n)} are absorbing, and the class C(m, n) = {(i, s) ∈ N 0 × N 0 : 1 ≤ i ≤ m + n, 0 ≤ s ≤ min{n, m + n − i}} consists of transient states. States in C 0 (n) are associated with the ultimate extinction of the epidemic, and the class C(m, n) can be expressed in terms of levels as ∪ n s=0 l(s), where the sth level corresponds to states with s susceptibles present, that is, l(s) = {(i, s) ∈ N 0 × N 0 : 1 ≤ i ≤ m + n − s}, for 0 ≤ s ≤ n.
For later use, we recall the structured form of the infinitesimal generator of X , which is seen to be the keystone to derive efficient algorithms for the distributions of the total size and the maximum size of the epidemic (Neuts and Li [40,), and the joint distribution of the maximum and the time instant of its occurrence (Neuts and Li [40,Section 5]). Specifically, states in S(m, n) are assumed to be labeled so that transient states precede absorbing states, in such a way that where states (i, s) ∈ l(s), with s ∈ {0, 1, ..., n}, are ordered as and absorbing states are ordered as This labeling of states results in the infinitesimal generator Q of X where J(m, n) = 2 −1 (n + 1)(2m + n) is the cardinality of the class C(m, n) of transient states, and In Eqs. (2)-(3), the sub-matrices A(s) and B(s) record transition rates related to jumps of X from states of the sth level to states of l(s) and l(s−1), respectively, and diagonal elements of A(s) are given by −q i,s = −(λ i,s + µ i ), for 1 ≤ i ≤ m + n − s; similarly, the column vector t(s) contains transition rates related to jumps of X from states of the sth level to the absorbing state (0, s). They are given by t(s) = µ 1 e m+n−s (m + n − s), and 3. Markov-modulated events in the SIR epidemic model. The dynamics in the SIR-model (Table 1) can be explained in terms of scheduled events. Let us assume that the process X enters state (i, s) ∈ C(m, n) at time t. The next transition (i, s) → (i , s ) is triggered by two scheduled events E (i,s) (1, −1) and E (i,s) (−1, 0), where the former amounts to the infection of a susceptible and the latter is related to the removal of an infective. The occurrence instants of E (i,s) (1, −1) and E (i,s) (−1, 0) are independent and exponentially distributed with mean values λ −1 i,s and µ −1 i , respectively, and they are also independent of the history of X up to time t. The basic process X spends in (i, s) an exponentially distributed period of time, with expected length (λ i,s +µ i ) −1 , and it then moves to state (i , s , the removal of an infective). Therefore, the basic state (i, s) ∈ C(m, n) is updated in the light of the observed value (z 1 , z 2 ) of a bivariate vector (Z 1 , Z 2 ) that, as a function of (i, s), records scheduled events taking place when the sojourn time that X spends in (i, s) expires. More concretely, we may express the resulting basic state are inherently linked to the superposition of two independent Poisson streams with arrival rates λ i,s and µ i .
In what follows, we replace the basic process X by an augmented version (X , Y) allowing a Markovian dependence on a finite number of phases. To construct the augmented process (X , Y), every basic state (i, s) ∈ C(m, n) is first replaced by a set of augmented states (i, s, y 1 , y 2 ) with 1 ≤ y k ≤ L k for predetermined values L k ∈ N and k ∈ {1, 2}. Poisson assumptions for (Z 1 , Z 2 ) are then generalized to incorporate correlated inter-event times into the SIR-model; specifically, we construct the superposition from two independent MAPs of orders L 1 and L 2 (Artalejo et al. [11]) instead of two Poisson processes, and we introduce a third pair (z 1 , z 2 ) = (0, 0) that reflects a transition between phases, but it does not imply any transition in the basic state (i, s).
We briefly recall that an MAP (He [33, Section 2.2]) is related to a bivariate CTMC (N , K) = {(N (t), K(t)) : t ≥ 0}, which is defined on a denumerable state space S (N ,K) = {(n, k) : n ∈ N 0 , k ∈ {1, ..., L E }} and has infinitesimal generator The matrices E 0 and E 1 are square matrices of order for technical reasons, it is assumed that the resulting infinitesimal generator E 0 + E 1 is irreducible. In the CTMC (N , K), the counting process N = {N (t) : t ≥ 0} describes the number of births (or occurrences of a predetermined event) during the interval (0, t], provided that the CTMC starts from state (0, k) at time 0, and is modulated by the phase process K = {K(t) : t ≥ 0}, whose phase k dictates the instantaneous rates for births 1 . By partitioning the .., L E }}, for integers n ∈ N 0 , the process (N , K) allows for single births so that a transition is possible in one step from each subset l (N ,K) (n) to states of the subset l (N ,K) (n + 1), and it may allow phase changes at birth epochs to occur without any restriction; more particularly, the birth rate may depend on the phases, but not on the current subset. A generalization of the MAP that accommodates processes with different types of births is the marked Markovian arrival process (MMAP, He and Neuts [32], He [33, Section 2.5]). The main idea is to assign different interpretations to the birth rates, which can be done by considering a set Z of indices, a family {E z : z ∈ Z} of non-negative matrices and a matrix E 0 with non-negative offdiagonal elements and negative diagonal elements verifying that E 0 + z∈Z E z is an irreducible infinitesimal generator of order L Z . The MMAP with characteristic is the marginal counting process of type-z births, for z ∈ Z, and K = {K(t) : t ≥ 0} is the underlying phase process. This means that, for z ∈ Z, the matrix E z consists of infinitesimal rates for the increase of one unit in N z (t); needless to say, MAPs are obtained as special cases of MMAPs with a single type of births. The above construction leads to a regular, time- can be thought of as a SD MMAP with two categories of marked events: a type-(z 1 , z 2 ) event with either (z 1 , z 2 ) = (1, −1) if the infection of a susceptible individual is observed, or (z 1 , z 2 ) = (−1, 0) if the removal of an infective is registered. Similarly to X , the state space S * of (X , Y) can be partitioned into the set C 0 (n) of absorbing states, and a class C * (m, n) of transient states, where C * (m, n) = ∪ n s=0 l * (s) and the sth augmented level consists of states (i, s, y 1 , y 2 ) with 1 ≤ i ≤ m+n−s and 1 ≤ y k ≤ L k for k ∈ {1, 2}.
In our approach, we use two scaled MAPs that, for a basic state (i, s) ∈ C(m, n), are specified by characteristic matrices (C * 0,(i,s) , C * 1,(i,s) ) for the occurrence of an infection, and (D * 0,(i,s) , D * 1,(i,s) ) for the removal of an infective, with for k ∈ {0, 1}, where (C 0 , C 1 ) and (D 0 , D 1 ) are the characteristic matrices of two independent MAPs of orders L 1 and L 2 , respectively, and fundamental arrival rateŝ λ(1) andλ (2). This means that the infinitesimal generator Q * of (X , Y) has the structured form of Q in (1) with A(s), B(s), and t(s) in (2)-(3) replaced by suitably defined sub-matrices A * (s) and B * (s), and vectors t * (s). They are specified as follows: (i) The sub-matrix A * (0) has the structured form in (4) with −q m+n−s ,s and µ m+n−s replaced by ( (iii) For 1 ≤ s ≤ n, the sub-matrices B * (s) are obtained from (5) by replacing the rates λ m+n−s−s ,s by the matrices .
It is worth noting that the basic process X and its augmented counterpart (X , Y) possess identical structural properties, but only (X , Y) exhibits correlation between events. In particular, for every basic state (i, s) ∈ C(m, n), the fundamental arrival rate of type-(z 1 , z 2 ) marks is given bȳ and the fundamental arrival rateλ (i,s) of the SD MMAP is given bȳ regardless of the initial MAPs with matrices (C 0 , C 1 ) and (D 0 , D 1 ). Moreover, the correlation structures of the initial MAPs and their scaled variants are identical; that is, if ρ k = ρ(X 1 , X k ) denotes the correlation coefficient between the first and the kth inter-arrival times X 1 and X k in the MAP with characteristic matrices (C 0 , C 1 ), and ρ k,(i,s) is its counterpart for the scaled MAP with matrices (C * 0,(i,s) , C * 1,(i,s) ), then it can be verified that regardless of the basic state (i, s) ∈ C(m, n). A similar result holds for the MAPs with characteristic matrices (D 0 , D 1 ), and (D * 0,(i,s) , D * 1,(i,s) ). We address mathematically inclined readers to Appendix A for a proof of Eqs. (6)-(8).
3.1. Length of an outbreak and total size of the epidemic. An outbreak begins when a single individual chosen appropriately from the population becomes infected. The disease spreads from this infective to susceptible individuals at a certain rate, in such a way that new infectives attempt to infect other susceptibles and then recover in accordance to the model description in Section 2. The outbreak ends when no infectives remain.
The distributional assumptions in the basic SIR-model imply that the process X will reach states (0, s ) with 0 ≤ s ≤ n, starting from any numbers of m > 0 infectives and n > 0 susceptibles, and the epidemic will always die out. This means that the epidemic extinction is always certain and the expected times till extinction are all finite, regardless of the initial basic state. Let T (i,s) be the time until the epidemic dies out, provided that the population currently contains i infectives and s susceptibles, for (i, s) ∈ C(m, n). Then, the cumulative distribution function of the length T (i,s) of the residual outbreak 2 has the form (9) whereē J(m,n) (i, s) denotes the initial distribution on the class C(m, n) of transient states, and the joint distribution of (T (i,s) , S(T (i,s) )) can be expressed as is the currently visited state, we may specifyē J(m,n) (i, s) as a row vector of order J(m, n) such that all entries equal zero, with the exception of the entry for the initial state (i, s) which is equal to one.
The proof of (9)-(10) is based on the structured form of Q given in (1). More concretely, the transition function P(t) of X with elements P ((I(t), S(t)) = (i , s )|(I(0), S(0)) = (i, s)), for states (i, s), (i , s ) ∈ S(m, n), is given by P(t) = exp{Qt}, and it is easy to verify that with sub-matrices P C(m,n),C(m,n) (t) = exp{Q C(m,n),C(m,n) t} and P C(m,n),C0(n) (t) = (I J(m,n) − exp{Q C(m,n),C(m,n) t})(−Q −1 C(m,n),C(m,n) )Q C(m,n),C0(n) . Then, the term e J(m,n) (i, s) exp{Q C(m,n),C(m,n) t}e J(m,n) on the right side of (9) amounts to the conditional probability of being in a non-absorbing state at time t, provided that (i, s) is the current state of X , and P (T (i,s) ≤ t, S(T (i,s) ) = s ) corresponds to the entry of P C(m,n),C0(n) (t) associated with the initial transient state (i, s) and the absorbing state (0, s ) visited by the process X at time t; note that this entry is given byē J(m,n) (i, s)P C(m,n),C0(n) (t)e n+1 (n + 1 − s )), and (10) is then proved from the identity Q C(m,n),C0(n) e n+1 (n + 1 − s ) = µ 1 e J(m,n) (J (m, n; s )). It is readily seen that, in Eq. (10), the matrix Q C(m,n),C(m,n) is non-singular since the absorption into states of C 0 (n) occurs with probability one. Then, by using the jargon of the labeling of states yielding Eqs. (1)-(5), the entry of −Q −1 C(m,n),C(m,n) located at the (i, s)th row and the (i , s )th column is the expected total time spent in state (i , s ) -which is linked to a population consisting of i infectives and s susceptible individuals -during an outbreak, given that the outbreak starts with i infectives and s susceptibles.
In the case of Markov-modulated events, the length T * (i,s,y1,y2) of the resulting residual outbreak, provided that (i, s, y 1 , y 2 ) ∈ C * (m, n) is the current state of (X , Y), follows a continuous PH law of order J * (m, n) = L 1 L 2 J(m, n) and a representation in terms of the vectorē J * (m,n) (i, s, y 1 , y 2 ) of initial probabilities and the sub-matrix Q * C * (m,n),C * (m,n) recording transition rates related to jumps of (X , Y) from states in the class C * (m, n) of transient states to states in C * (m, n). Note that Q * C * (m,n),C * (m,n) has the structured form of Q C(m,n),C(m,n) in (2) with A(s) and B(s) replaced by the sub-matrices A * (s) and B * (s) defined for the augmented process in Section 3. Then, it is a matter of routine to derive expressions for P (T * (i,s,y1,y2) ≤ t) and P (T * (i,s,y1,y2) ≤ t, S(t) = s ) from Eqs. (9)-(10) by replacing the matrices I J(m,n) and Q C(m,n),C(m,n) by I J * (m,n) and Q * C * (m,n),C * (m,n) , respectively, and the vectors e J(m,n) (i, s), e J(m,n) and e J(m,n) (J (m, n; s )) byē J * (m,n) (i, s, y 1 , y 2 ), e J * (m,n) and e J * (m,n) (J (m, n; s )), respectively, with J (m, n; s ) = L 1 L 2 J (m, n; s ).
The final size of the epidemic in the basic SIR-model can be readily determined from the mass function P (S(T (i,s) ) = s ) of the number of survivors, for integers 0 ≤ s ≤ s and states (i, s) ∈ C(m, n). To evaluate the probability P (S(T (i,s) ) = s ), we may first observe that the number of jumps of the basic process X till absorption into C 0 (n) follows a discrete PH law (Latouche and Ramaswami [38, Section 2.5]) of order J(m, n), and representation in terms of the vectorē J(m,n) (i, s) of initial probabilities and the sub-matrix . . . . . .
which contains one-step transition probabilities among transient states in the underlying jump embedded chain; specifically, non-null entries in the kth row of A(s) and B(s) are obtained from the sub-matrices A(s) and B(s) by dividing their infinitesimal counterparts in (4) and (5) by the sojourn rate q m+n−s+1−k , for 1 ≤ k ≤ m + n − s and 0 ≤ s ≤ n, and diagonal elements of A(s) are all null. Then, we may notice that the event {S(T (i,s) ) = s − j} amounts to the fact that the number of jumps till absorption into C 0 (n) equals i + 2j (i.e., j new infections and i + j recoveries), for integers 0 ≤ j ≤ s and states (i, s) ∈ C(m, n). Hence, we may express for integers 0 ≤ j ≤ s, where the column vector p C0(n) consists of one-step transition probabilities from states in C(m, n) to absorbing states, and it takes the form . . .
for states (i, s) ∈ C(m, n). More concretely, the expected number E[S(T (i,s) )] of surviving susceptibles can be evaluated from (11) as where q C0(n) = P i−1 C(m,n),C(m,n) p C0(n) . Eq. (12) is then derived by noting that the first term on the right side equals s + 1 since s j=0ē J(m,n) (i, s)P i+2j−1 C(m,n),C(m,n) p C0(n) = 1 by (11), and the second term Similarly to the above arguments, the mass function of the number S(T * (i,s,y1,y2) ) of survivors in the Markov-modulated SIR-model and its expectation can be expressed from Eqs. (11)-(12) by usingē J * (m,n) (i, s, y 1 , y 2 ) and e J * (m,n) instead of the vectorsē J(m,n) (i, s) and e J(m,n) , respectively, and by replacing the matrices I J(m,n) and P C(m,n),C(m,n) , and the vector p C0(n) by I J * (m,n) , P * C * (m,n),C * (m,n) and p * C0(n) , where the matrix P * C * (m,n),C * (m,n) and the column vector p * C0(n) are suitably defined from the jump embedded chain of the augmented process (X , Y).
Eqs. (11)-(12) are easy to implement and one may expect them to be numerically well behaved in the case of small numbers (m, n) of initial infectives and susceptibles. Large values of (m, n) imply large cardinalities of the class C(m, n) (respectively, C * (m, n)) of transient states in the basic process X (respectively, its augmented counterpart (X , Y)), which often turns the computation of powers of P C(m,n),C(m,n) (respectively, P * C * (m,n),C * (m,n) ) in (11)-(12) into instable. As an alternative to (11)-(12), Black and Ross [20] recently derive, for a variety of epidemic models including the basic SIR-model, an iterative method for computing the final size distribution, which is mainly based on the replacement of the basic process X by a bivariate representation (I (t), R(t)) recording the numbers of infection and recovery events at time t, and a co-lexicographic labeling of states for the resulting process. Similarly, El Maroufy et al. [26] give an explicit formula of the final outcome probabilities in the SIR-model with generalized infection rates, establish two fast and efficient recursive methods to evaluate the final size of the epidemic, and show how those methods are numerically stable for large numbers (m, n) of initial infectives and susceptibles. However, the extension of the approaches in [20,26] to the augmented process (X , Y) for Markov-modulated SIR-models is not immediate.

3.2.
The exact reproduction number R exact,0 . In the SIR-model defined by the rates λ i,s = βis and µ i = ηi with β = (m + n) −1 β , the basic reproductive ratio R 0 = η −1 β separates the growing and shrinking behaviors and, consequently, it marks the epidemic threshold between regimes in which the disease either increases or dies out in the long run. Despite its simplicity and robustness, it is known that, due to the occurrence of repeated contacts taking place between a focal infective and other individuals already infected before, the ratio R 0 = η −1 β overestimates the average number of secondary cases in the basic SIR-model; see Artalejo and López-Herrero [14]. We thus present two alternative measures, namely, the exact reproduction number R exact,0 (Section 3.2) and the population transmission number R p (Section 3.3), that overcome this difficulty and provide valuable insight in the Markov-modulated SIR-model under study.
In this section, we focus on an exact variant of R 0 first introduced by Ross [42] in the setting of SIS and SIR epidemics for a homogeneously mixed population of finite size and then presented by Artalejo and López-Herrero [14] in a more general setting, which is defined as a random variable instead of an expected value. To be concrete, based on [14, Section 3], we define R exact,0 as the exact number of secondary cases generated by a focal infective during its infectious period. An important property of R exact,0 is that, unlike R 0 which is related to the time of invasion (i.e., for a community with (I(0), S(0)) = (1, n), R(0) = 0 and n ≥ 1), the exact version R exact,0 can be appropriately defined at every time instant. Nevertheless, we from now on set the initial state (1, n), with n ≥ 1, for comparison purposes.
For every basic state (i, s) ∈ C(1, n) and values 0 ≤ j ≤ s, we let x (i,s) (j) be the column vector of order L 1 L 2 with entries (x (i,s) (j)) (y1−1)L2+y2 = x (i,s,y1,y2) (j) for 1 ≤ y k ≤ L k and k ∈ {1, 2}. Then, by conditioning on the first transition of the augmented process (X , Y), we notice that entries of x (i,s) (j) satisfy for values 0 ≤ j ≤ s and (i, s) ∈ C(1, n), where the rate λ * i,s gives the individual rate at which a focal infective contacts with susceptibles, andλ i−1,s is the contact rate of the remaining i − 1 non-focal infectives 3 . This means that the factor − λ−1 (1)λ i,s C 0 ⊕ λ−1 (2)µ i D 0 on the left side of (13) gives the total transition rate given the current state, whereas the terms on the right side correspond to the case j = 0 and the focal infective recovers without giving rise to a secondary infection; a non-focal infective recovers; the case j > 0 and the focal infective infects another individual; and the case j < s and a non-focal infective infects another individual.
Starting from x (i,0) (0) = e L1L2 for 1 ≤ i ≤ 1 + n, the triangular system of equations (13) can be efficiently solved in the natural order j ≥ 0, j ≤ s ≤ n and 1 ≤ i ≤ 1+n−s, and moments of R exact,0 are then derived in an straightforward manner. At an invasion time (i.e., when a focal infective is introduced into a completely susceptible population of size n at time t = 0), the role of R 0 may be replaced by the unconditional expected value where η * is the vector of initial probabilities of the phase process Y at the invasion time; see Appendix A for further details about the underlying MMAP governing Y. It should be noted that an important difference between R 0 and R exact,0 is that R 0 can take arbitrarily large values, whereas R exact,0 < n if the random variable R exact,0 is defined at the invasion time; in addition, the values of R 0 overestimate the expected number of secondary cases generated by a focal infective before its infectious period expires.

3.3.
The population transmission number R p . The population transmission number R p , as discussed in [14,Section 4], is an alternative measure of disease spread, and it counts the exact number of secondary cases produced by all currently infective individuals prior to the first removal. In a similar manner to R exact,0 , the probability distribution of R p depends on the augmented state at time t = 0. Therefore, for a fixed augmented state (i, s, y 1 , y 2 ) ∈ C * (1, n), we define the conditional probabilities y (i,s,y1,y2) (j) by from which we introduce the vectors y (i,s) (j) with entries (y (i,s) (j)) (y1−1)L2+y2 = y (i,s,y1,y2) (j) for 1 ≤ y k ≤ L k and k ∈ {1, 2}.
An appeal to a first-step argument yields for values 0 ≤ j ≤ s and basic states (i, s) ∈ C(1, n), which can be routinely solved by starting from y (i,0) (0) = e L1L2 . The unconditional expected value of R p is then evaluated as when the interest is in an invasion time. 4. Numerical experiments and discussion. In our numerical examples, we focus on the general stochastic epidemic (Gani and Purdue [27]) at an invasion time, and we assume rates λ i,s = βis and µ i = ηi with β = (1 + n) −1 β , β > 0, η = 1.0 and initial size 1+n ∈ {10, 50, 100, 150} of infectives and susceptibles. The aim is to study the dynamics of Markov-modulated versions of the basic SIR-model when the exponential regime on the underlying processes -governing a new infection and the removal of an infective -is replaced by other more general distributional assumptions. Nine scenarios ( Table 2) are defined in terms of standard Poisson processes, and positively and negatively correlated Markovian events, which are appropriately specified from two predetermined MAPs (Appendix B) of order L 1 = L 2 = 3 with values ρ 1 = 0.35016 and −0.35016 for their correlation coefficients; in this setting, the basic SIR-model corresponds to scenario I. In Figures 1-7, we illustrate the effects of the correlation structure in the underlying Markov-modulated streams on the mean number E[S(T (1,n) )] of surviving susceptibles, the expected values of the exact reproduction number R exact,0 and the population transmission number R p . For illustrative purposes, we let the vector η * of initial probabilities be the stationary probability vector η(1) ⊗ η(2) of the phase process Y, regardless of the initial basic state (i, s) ∈ C(1, n). It is important to note that, though there seems to be no reason for expecting the phase process to attain stationary before the disease dies out, the selection η * = η(1) ⊗ η(2) has to be seen as a natural choice of initial phases, since it is inherently linked to the underlying characteristic matrices (C 0 , C 1 ) and (D 0 , D 1 ) and does not necessarily yield the uniform law of initial phases.
We display the expected values E[S(T (1,n) )] (Figure 1), R exact,0 (Figures 2-4) and R p (Figures 5-7) as a function of the basic reproduction number R 0 = η −1 β . It should be pointed out that, when a deterministic approach is adopted, the transition

Scenario
Occurrence of an infection Occurrence of a removal (Matrices C * 1,(i,s) ) (Matrices D * 1,(i,s) ) Table 2. Nine scenarios defined in terms of the matrices C *

1,(i,s)
and D * 1,(i,s) for the occurrence of an infection and the removal of an infective, respectively, which are related to MAPs with positive and negative correlation (with characteristic matrices (E + 0 , E + 1 ) and (E − 0 , E − 1 ), respectively), and Poisson streams (with rates λ i,s and µ i ) between the epidemic and non-epidemic regimes happens at the point R 0 = 1 (equivalently β = η); specifically, the size of the epidemic goes continuously to zero as R 0 approaches one from above and for R 0 ≤ 1 there is no epidemic at all. We may explain this result by noting that if β ≤ η, the infectives recover faster than susceptible individuals become infected, so the disease will not get a toehold in the population and the number of infected individuals, which starts being small, decreases and the disease dies out instead of spreading out. Values R 0 > 1 reflect that each individual catching the disease would pass it on to R 0 others on average, each of them would pass it on to R 0 more and so forth, so that the number of new cases of the disease would be multiplied by the factor R 0 at each round, thus growing exponentially. Therefore, increasing values of R 0 lead to decreasing values of the mean number E[S (T (1,n) )] of survivors, which is graphically shown in Figure 1 for the basic SIR-model (scenario I), and two Markov-modulated versions with positively correlated MAPs governing new infections (scenario A) and recoveries (scenario C). However, it is observed that, for every fixed value of R 0 , more susceptible individuals will survive if new infections are positively correlated, whereas positively correlated recoveries will imply smaller mean numbers of survivors. This means that the number of infections taking place during an outbreak is greater when recoveries are positively correlated (scenario C) than when a positively correlated MAP governs new infections (scenario A), being the basic SIR-model the intermediate case; more particularly, the basic SIR-model behaves closer to the dynamics of scenarios A and C when, in the deterministic setting, the epidemic dies out (R 0 < 1) and the disease is more virulent (R 0 > 1) in the long run, respectively.
Figures 2 and 5 are related to a positively correlated process governing new infections, and they allow us to carry out a comparative analysis between R 0 and the resulting values of R exact,0 ( Figure 2) and R p ( Figure 5) under three assumptions for the removal of an infective: the exponential case (scenario A); the negatively correlated case (scenario E); and the positively correlated case (scenario G). We first observe that, as intuition tells us, R exact,0 and R p increase with increasing values of R 0 (i.e., when the contact rate β increases, whereas the removal rate η is constant) since susceptible individuals become infected faster than infectives recover. Magnitudes in Figures 2 and 5 show an important difference between the exact measurement provided by R exact,0 and the population version R p ; more concretely, differences in magnitude become more apparent with increasing values of the initial size n of susceptible individuals. We also notice that, for each fixed value of R 0 , higher and smaller expected values of R exact,0 and R p are always associated with scenarios G and A, respectively, regardless of n. In quantifying the spread of a disease when it is just starting out, this means that, for positively correlated infections, the maximum reproductive potential for an infectious disease is related to a positively correlated removal process, while the exponential assumption yields the minimum reproductive potential.
In Figures 3 and 6, the interest is in new infections generated by a Markovian stream with negative correlation, and we compare the value of R 0 with the expected values     Numerical results (not reported here) with n no greater than 10 3 suggest that increasing sizes n result in higher values of R exact,0 and R p , which can be reasonably explained by noting that removals occur at a constant rate and, on the contrary, infectious contacts take place more frequently when the initial number n of susceptible individuals increases. More particularly, it is numerically seen that lim n→∞ R exact,0 is finite -with lim n→∞ R exact,0 < R 0 -and lim n→∞ R p = R 0 in the case of Markov-modulated infectious contacts and exponential removals (scenarios A and B), which is closely related to the limiting results lim n→∞ R exact,0 = R 0 and lim n→∞ R p = R 0 suggested by Artalejo and López-Herrero [14,Sections 3 and 4] in the basic SIR-model (scenario I). Under the assumption of Markov-modulated removals (scenarios C-H), it appears that lim n→∞ R exact,0 and lim n→∞ R p are always finite, but with lim n→∞ R exact,0 > R 0 and lim n→∞ R p > R 0 . These limiting results for Markov-modulated SIR-models illustrate that, similarly to the SIR-model [14], the existence of repeated contacts between infectives and any other individual becomes negligible when the initial number n of susceptibles increases.

5.
Conclusions. In this paper, we develop a stochastic modeling framework that incorporates correlation between events into the dynamics of the SIR-model (El Maroufy et al. [25], Gani and Purdue [27], Kermack and McKendrick [37], Neuts and Li [40]). We construct Markov-modulated versions of the basic SIR-model X by allowing one to deal with non-exponential distributional assumptions on the contacts between an infective and a susceptible individual, and the random duration of an infectious period. We may view the resulting augmented Markovian process (X , Y) in Section 3 as consisting of basic states (i, s) ∈ C(m, n), describing the numbers of infectives and susceptibles in the basic SIR-model, and an auxiliary variable called phase, which is a useful artifice in creating Markovian structures that enable easy computations from analytical and numerical perspectives. In our approach, we construct a Markov-modulated model by scheduling events at a compartmental level, rather than according to the individual behavior of infectives and susceptibles. Specifically, we use two scaled streams of Markovian arrivals with suitably chosen level of correlation, in such a way that the augmented process (X , Y) is a regular, time-homogeneous CTMC exhibiting correlation between events, and the phase process Y amounts to a state-dependent version of an MMAP where the types of marked events correspond to either a new infection (i.e., (z 1 , z 2 ) = (1, −1)) or the removal of an infective (i.e., (z 1 , z 2 ) = (−1, 0)) in the basic SIR-model. The process X and its augmented version (X , Y) possess identical structural properties regarding the infinitesimal rates λ i,s and µ i , and the fundamental arrival rates λ (i,s) (z 1 , z 2 ) of type-(z 1 , z 2 ) marked events; see Eq. (6). The variability between SIR-models with distinct level of correlation is discussed in Section 3 in terms of extinction times, the final size of the epidemic, and the basic reproduction number R 0 . More particularly, the role of R 0 , which overestimates the reproductive potential in the SIR-model (Artalejo and López-Herrero [14]), is replaced by two alternative random measures -the exact reproduction number R exact,0 and the population transmission number R p -that overcome this difficulty and provide valuable insight. We emphasize that R exact,0 and R p are random variables, rather than an expected value, and they can be defined either at the invasion time or at any time after the invasion. The key feature of R exact,0 (respectively, R p ) is that, during the infection period of a focal infective (respectively, the elapsed time until the first recovery occurs), it eliminates the effect of repeated contacts since R exact,0 (respectively, R p ) does not count the number of contacts taking place between the focal infective (respectively, the population of infectives) and any contacted individual which has previously been infected. Consequently, important differences between R 0 and the exact measurements provided by R exact,0 and R p are observed (Artalejo and López-Herrero [14], Diekman et al. [24]) when the initial size n of susceptibles is small. The expected values R exact,0 and R p may thus contribute to the design of control strategies for preventing major outbreaks, by means of vaccination, representing exact counterparts of the classical vaccination coverage levels; see [14,Section 5].
For illustrative purposes, in Section 4 we compare R 0 with the expected values E[S(T (1,n) )], R exact,0 and R p at an invasion time. It is worth noting that, irrespectively of concrete specifications for the correlation structure of the infectious process, exponential removals (scenarios A, B, I) yield lower bounds for the expectations of the exact number R exact,0 of secondary cases and its population counterpart R p when removals are negatively correlated (scenarios D, E, H), while upper bounds are linked to removal processes governed by Markovian streams with positive correlation (scenarios C, F, G). Therefore, if the basic reproduction number R 0 -in the stochastic setting, its random versions R exact,0 and R p -is commonly the first descriptor to be estimated during the emergence of a new infectious disease, our examples in Section 4 reveal that we must supplement their meaning with the correlation structure of the underlying Markovian streams for a new infection and the removal of an infective.
It is clear that, similarly to the extensions in [23,29,45], Markov-modulated events in the SIR-model can be scheduled at an individual level. For instance, in the general stochastic epidemic (see e.g. Bailey [16,Chapter 6], Gani and Purdue [27]), we may let the Poisson process of rate β > 0 -governing infectious contacts between a concrete infective and each susceptible individual-, and the exponential law with expected value η −1 -governing the infectious period of this infective-be replaced by two independent MAPs with characteristic matrices (C 0 ,C 1 ) of orderL 1 , and (D 0 ,D 1 ) of orderL 2 , respectively. Instead of states (i, s, y 1 , y 2 ) with 1 ≤ y k ≤ L k and k ∈ {1, 2} (Section 3), this results in augmented states (i, s,ỹ 1 ,ỹ 2 ) with vectors y k = (ỹ k,1 , ...,ỹ k,L k ) containing the numbersỹ k,k of contact processes (k = 1) and removal processes (k = 2) at phases 1 ≤ k ≤L k , for k ∈ {1, 2}, at an arbitrary time. Therefore, a set of is−1 L1−1 i−1 L2−1 augmented states is obtained for each basic state (i, s) ∈ C(m, n) -whereas a constant number L 1 L 2 of augmented states is used in Section 3 for the process (X , Y)-, when Markov-modulated events in the SIR-model are scheduled at an individual level. We are not aware of any reported numerical experiments for this extension of the SIR-model, but we can point out that, based on preliminary numerical examples (not reported here) for numbersL 1 =L 2 = 3 of phases and small initial numbers (m, n) of infectives and susceptibles, the storage cost in floating-point numbers becomes so high that numerical computations of E[S(T (i,s) ], R exact,0 and R p are rapidly prone to numerical instability.
Modeling the spread of a disease by using correlated inter-event times is interesting in its own right, but another important problem not discussed in this paper is that of estimation of parameters; in drawing inferences, a few of the main references on statistical methods for epidemics are Anderson and May [7], Andersson and Britton [8], and Becker [19], which should be appropriately supplemented with the recent monograph by Buchholz et al. [22] on parameter fitting of Markovian arrival processes. In using the augmented process (X , Y) in order to approximate the above Markov-modulated SIR-model with events scheduled at an individual level, the two phase approach (Buchholz et al. [22,Section 5.3]) can be generalized to jointly fit parameters of the characteristic matrices (C 0 , C 1 ) of order L 1 , and (D 0 , D 1 ) of order L 2 . In the two phase approach, a PH distribution of order L k with representation (η(1), C 0 ) if k = 1, and (η(2), D 0 ) if k = 2, is first fitted (Buchholz et al. [22,Chapter 3]), and the matrices C 1 and D 1 are then constructed in such a way that the inter-event time distribution of the resulting MAPs remains unchanged. Current research on these statistical aspects will be reported shortly when the data trace of the epidemic process consists of inter-event times or, alternatively, the numbers of events in finite intervals are available and not the detailed inter-event times.
Appendix B. Markovian arrivals with positive and negative correlation. In our numerical experiments in Section 4, eight scenarios (Table 2)  Then, we have that ρ 1 = −0.35016.