On bounding exact models of epidemic spread on networks

In this paper we use comparison theorems from classical ODE theory in order to rigorously show that the N-Intertwined Mean-Field Approximation (NIMFA) model provides an upper estimate on the exact stochastic process. The proof of the results relies on the observation that the epidemic process is negatively correlated (in the sense that the probability of an edge being in the susceptible-infected state is smaller than the product of the probabilities of the nodes being in the susceptible and infected states, respectively), which we also prove rigorously. The results in the paper hold for arbitrary weighted and directed networks. We cast the results in a more general framework where alternative closures, other than that assuming the independence of nodes connected by an edge, are possible and provide a succinct summary of the stability analysis of the resulting more general mean-field models.


Introduction
Modelling transmission processes on networks, such as epidemics and rumours, has led to many mathematical challenges [21,11]. This is mainly due to the high dimensionality of the exact model, which is often a continuous time Markov chain where the size of the state space scales as m N , where m is the number of states a node can be in and N is the number of nodes in the network [26,29]. While theoretically the master equations can be given, their rigorous analysis is out of reach due to the high dimensionality. One approach to deal with this challenge is to consider some 'clever' averaging, at node or at population level, and proceed to derive evolution equations for some newly defined average quantities. These however, more often than not, depend on other new average quantities which are of higher order, e.g. the expected number of nodes in a certain state usually depends on the expected number of links/edges in certain states. As a rule of thumb, the dependency between moments is broken by making some closure assumptions where higher order moments are approximated by lower-order ones. This then leads to a low-dimensional system of ordinary differential equations (integro-differential and delay differential equations are also possible) or mean-field model.
The approach above has led to a myriad of mean-field models for SIS (susceptibleinfected-susceptible) and SIR (susceptible-infected-recovered) epidemics which are able to capture the average behaviour of epidemics on certain network types (e.g. tree networks and networks build using the configuration model). The most well-known such models are: (a) heterogenous degree-based [22,6], (b) pairwise [23,10,4,26,29,13], (c) effectivedegree [16], (d) edge-based compartmental [17,18], (e) pair-based [24,25,12] and (f) N-Intertwined Mean-Field Approximation (NIMFA) [31,30]. When such models perform well, i.e. output from these agrees well with results from the exact or simulation model, one can proceed to analyse them and to derive analytical results concerning the epidemic threshold and final size, or quasi-equilibrium for SIS epidemics. Such explicit relations between network characteristics and spreading dynamics allow us to better understand how these factors interact and will ultimately lead to more efficient prevention and control measures.
In many cases, mean-field models are validated by simply comparing the results of explicit stochastic network simulation (which stands for the exact model) to output from mean-field models. Such tests are usually performed for a limited number of network types and combination of parameter values. While such heuristic arguments are useful, it is desirable that where possible the difference between the exact and mean-field models is rigorously established using sound mathematical arguments. This endeavour has already led to results proving that under some mild conditions on the degree distribution the edge-based compartmental model is exact for SIR epidemics in the limit of large networks built based on the configuration model [19]. In the case of SIS propagation on a complete graph or on a regular random network the model is a density dependent Markov chain and functional analytic tools can be used to prove that the difference between the output of the mean-field and the exact system scales as 1/N for large system size N [5,14]. Besides estimates on the difference, upper and lower bounds have also been derived for the prevalence obtained from the exact model when SIS or SIR propagation is considered on a complete graph, see [1,2,3] These result are valid for graphs with very simple structure, motivating research for finding upper and lower bounds for more complex models.
In this paper, we focus on the NIMFA model for SIS epidemics and we will show that this model provides and upper bound on the exact model on arbitrary weighted and directed networks. In [31] it is claimed that the NIMFA model overestimates the prevalence obtained from the exact system, however, the rigorous proof is not presented there. Here, this is done by using some well-known results from the theory of differential inequalities. The paper is structured as follows. In Section 2 the exact model is formulated and a bottom-up approach is used. This means that the model starts at the level of nodes and focuses on the probability of nodes being either susceptible or infected. Also here, the closure at the level of pairs is generalised beyond simply assuming that the state of nodes at the end of an edge are independent. In Section 3 the main result is presented and it is here where we rigorously prove that NIMFA provides an upper bound on the exact model. In Section 4 we provide a proof, based on dynamical systems arguments, of the fact that epidemics are negatively correlated and point out that this is crucial for the proof of the main result. This is followed by the analysis on the closed model in Section 5. The paper concludes with a short discussion of the main findings and possible extensions.

Model formulation
Consider a network with N nodes and assume that no node has an edge to itself, but we allow for node j to have an edge to node i having some weight g ij . Typically g ij = 1 if there is an edge from j to i and 0 otherwise, but the model formulation works for any directed and weighted network and thus we can consider g ij ∈ [0, ∞) for i, j = 1, 2, . . . , N . We can use the adjacency matrix G = (g ij ) i,j=1,2,...,N to represent the network. We assume that the transmission rate from j to i is τ g ij . The recovery rate at each node is γ. The probability that node i is infected at time t is denoted by I i (t). The aim is to derive exact and approximate differential equations for this function.

Exact model
It can be shown from first principles or from the exact master equations formulated in terms of the probabilities of all 2 N configurations [24] that I i (t) satisfies the differential equation˙ where S i I j (t) is the probability that the pair consisting of node i and node j is of type S − I at time t. This system is exact but not closed hence further differential equations or a closure is needed to determine the probability I i (t). The differential equations for the pairs take the following form. where A i B j C k is the probability that the triple consisting of the nodes i, j and k is in the state A − B − C, and (i, j) runs over all pairs satisfying 1 ≤ i < j ≤ N and in all summations k is different from i and from j. Note that for any pair (i, j) we have that the right hand sides of the four differential equations sums to 0, that is the sum S i I j + I i S j + I i I j + S i S j remains constant in time. If this constant is 1 at the initial time instant, then it remains 1 for all time.

Closure at the level of pairs
Closure at the level of pairs means that the joint probabilities S i I j , I i S j , I i I j and S i S j are expressed in terms of the marginal probabilities I i , I j , I i and S j . This is always an approximation, as we will show below. However, we wish to rigorously quantify the accuracy of the approximation, or to be able to state whether a model based on closures gives upper or lower bounds for the exact values of the joint probabilities, or indeed if the prevalence from such a closed model is below or above the exact prevalence. First we rigorously define what a closure relation means. The relation of the joint and marginal probabilities are shown in Table 1, where a = I i I j , b = S i I j , c = I i S j and d = S i S j . All letters denote probabilities hence a, b, c, d, p, q ∈ [0, 1]. The marginals can be expressed in terms of the joint probabilities as It can be immediately seen that a + b + c + d = 1, hence the four equations are not independent, therefore they cannot be solved for the unknowns a, b, c and d. This shows that the marginals do not determine the joint probabilities. However, once one of them is given then the remaining three are determined by system (6). A closure relation means that one of the joint probabilities is given by a certain algebraic relation involving the marginals, and the others are determined by system (6). Here we define the closure for the II pairs, i.e. a is specified in terms of p and q. (One can equivalently express the probabilities of SI, IS or SS pairs.) So a closure will be a function W : In order to have a solution satisfying a, b, c, d ∈ [0, 1], the function W must satisfy some conditions. On one hand, the first two equations of (6) show that a ≤ p and a ≤ q must hold, that is a ≤ min(p, q). On the other hand, the third and fourth equations should give a nonnegative value for d, hence 1 − q − p + a ≥ 0 and 1 − p − q + a ≥ 0 must hold, leading to max(p + q − 1, 0) ≤ a. (The max operation is needed because p + q − 1 may be negative, when the lower bound for a is zero.) Finally, it is natural to assume that W is a symmetric function, i.e. W (x, y) = W (y, x), leading to the following definition of the closure.
We note that the most important closure relation is W (x, y) = xy leading to the solution of system (6) in the form This corresponds to the case when the states of node i and j are independent, i.e. the joint probability is the product of the marginals. To check that W (x, y) = xy satisfies the inequalities in the definition is an easy exercise and is left to the Reader. One can also immediately see that W (x, y) = min(x, y) and W (x, y) = max(x + y − 1, 0) are proper closure relations. Once a closure relation is chosen, the closed form of (1) can be given as follows. Since W ( I i , I j ) gives an approximation of I i I j and S i I j + I i I j = I j , one can approximate S i I j as I j − W ( I i , I j ). Hence the closed system iṡ Solving this system for X i yields an approximation for I i .
The 'art' of closing (1) lies in choosing W in a way in which X i is as close as possible to I i . In fact, the only function used in the literature is W (x, y) = xy and the accuracy of the closure has been investigated only numerically. Here our goal is slightly different. We want to find upper and lower bounds to I i , i.e. to introduce closures w and W in such a way that for the corresponding solutions x i and X i of (7) the inequalities We will show that W (x, y) = xy gives an upper bound and W (x, y) = min(x, y) gives a lower bound. However, improving these bounds by more sophisticated closures remains an open question. First, in the next section, we derive conditions on the closure W ensuring that (7) gives an upper or lower bound.

Bounds for the exact system
The derivation of the upper bound is based on the fact that S − I pairs are non-negatively, and I − I and S − S pairs are non-positively correlated [7], that is we have for any i and j that We will prove these inequalities in Section 4 and use them to give an upper bound for the solution of system (1). The latter is done below. The first relation leads to the differential inequality˙ Based on this inequality let us introduce following system of differential equations, called individual-based model or N-intertwined mean-field approximation (NIMFA) [31,30].
Since we have the same right hand side in (9) and (10) with inequality in the first, we expect that the NIMFA approximation yields an upper bound on the exact solution. We will prove this by using the comparison theory of ODEs based on the Kamka-Müller condition which is detailed below. Consider the ODEẋ(t) = f (x(t)) and the differential inequalityẏ(t) ≤ f (y(t)) with a given differentiable function f : R n → R n subject to initial conditions satisfying y(0) ≤ x(0). The aim is to find conditions on f ensuring that y(t) ≤ x(t) for t ≥ 0. In what follows, the ordering relation for vectors is used in the following sense: A sufficient condition on f for the desired inequality to hold is the Kamke-Müller condition [9,20], which is equivalent to requiring that This condition essentially means that the function in the i-th coordinate, f i is increasing in all coordinates x j for j = i. However, this leads to an alternative sufficient condition, which if satisfied, allows us to call the f function cooperative. More precisely, f is called It can be shown that if f is cooperative in a convex domain, then it satisfies the Kamke-Müller condition. A detailed and comprehensive study of differential inequalities and comparison theory can be found in the book by Szarski [28]. Cooperative systems generate monotone dynamical systems that are dealt with in the book chapter by Hirsch and Smith [8] and the book by Smith [27]. The main comparison result used here is the following.
Using this comparison result we can derive the following theorems about the upper and lower bounds.
Theorem 1 Consider a weighted and directed network G, the exact individual-based SIS model given by system (1) and the closed system (7) with a closure W satisfying (besides the conditions in Definition 1) Assuming that both models start with identical initial conditions, Proof. We start from the exact system, where we have used (ii). The right-hand side of the closed system (7) can be given by the function f : It can immediately be seen that the solution of the closed system satisfies the differential equationẋ(t) = f (x(t)), and the solution of the exact system satisfies the differential inequalityẏ(t) ≤ f (y(t)), with both systems starting from the same initial condition. According to (i) the coordinate function f i is increasing in the variable x j , hence f satisfies the Kamke-Müller condition. Therefore, the general comparison Lemma 1 implies that . . , N and ∀ t ≥ 0. ✷ Now, using the closure W (x, y) = xy we can prove that (10) gives an upper bound. Namely, we have seen that W (x, y) = xy satisfies the conditions in Definition 1. Moreover, y − xy = y(1 − x) is increasing in y when x ∈ [0, 1], that is (i) holds. In Section 4 we will prove that (ii) also holds, hence W (x, y) = xy satisfies the conditions of Theorem 1 leading to the following corollary.
Similarly to Theorem 1, the following result can be proved about the lower bound.
Theorem 2 Consider a weighted and directed network G, the exact individual-based SIS model given by system (1) and the closed system (7) with a closure W satisfying (besides the conditions in Definition 1) Assuming that both models start with identical initial conditions, I i (0) = X i (0) (i = 1, 2, . . . , N ), it follows that I i (t) ≥ X i (t) ∀ i = 1, 2, . . . , N and ∀ t ≥ 0.
It is easy to see that W (x, y) = min(x, y) satisfies the assumptions of the theorem. Namely, we have seen that W (x, y) = min(x, y) satisfies the conditions in Definition 1. Moreover, y − min(x, y) = max(y − x, 0) = 1 2 (|y − x| + y − x) is increasing in y, that is (i) holds. In the previous section we saw that I i I j ≤ I i and I i I j ≤ I j , hence (ii) also holds. Thus W (x, y) = min(x, y) satisfies the assumptions of Theorem 2 leading to the following corollary.
Corollary 2 Consider a weighted and directed network G, the exact individual-based SIS model given by system (1) and the individual-based closed systeṁ Assuming that both models start with identical initial conditions, I i (0) = X i (0) (i = 1, 2, . . . , N ), it follows that I i (t) ≥ X i (t) ∀ i = 1, 2, . . . , N and ∀ t ≥ 0.

S − I pairs are non-negatively correlated
In this section we prove that S − I pairs remain non-negatively correlated in the exact system (1)-(5) if they they are non-negatively correlated initially.
Theorem 3 Let S i , I j and S i I j solve system (1)- (5).
The proof of the theorem will be divided into several propositions.
Proof. We will use the identities Using these we obtain Taking the difference of the last two equations yields the desired relation. ✷ Theorem 3 will be proved by induction according to the the number of nodes in the network. First, we prove the theorem for a single edge, i.e. for N = 2, for which system (1)-(5) takes the form˙ where we wrote XY instead of X 1 Y 2 for ease of notation.

Proposition 2 The statement of Theorem 3 holds for a graph with two connected nodes.
Proof. We prove that is nonnegative if A(0) ≥ 0, which proves the statement according to Proposition 1. Using the differential equations (1)-(5) we can derive a differential equation for the function A as follows.
Thus A satisfies an inhomogeneous linear differential equation. Multiplying this differential equation with exp(−2(τ + γ)t) and integrating from 0 to t yields The non-negativity of b(s) and A(0) yields that A(t) ≥ 0 for all nonnegative t. ✷ Before proving the theorem in the general case, let us introduce the following notations, where we have used Proposition 1, and which is the conditional probability that node i is susceptible given that node k is susceptible. The correlation of the conditional probabilities are given as where the last term is the conditional probability that node i is susceptible and node j is infected given that node k is susceptible. It is important to note that if A k ij (t) and A ij (t) identical as long as node k is susceptible.
The proof of the theorem is based on the differential equation of A ij that is derived in the proposition below.

Proposition 3
The functions A ij for 1 ≤ i < j ≤ N satisfy the following system of inhomogeneous linear differential equations. and Proof. For the ease of notation, we will write XY instead of X i Y j and I k XY instead of I k X i Y j and similarly for X i Y j I k , where the indices i and j are fixed throughout the proof. Differentiating (19) and using the differential equations (1)-(5) yielḋ where Q 1 = γ(−2 II SS + SI II + IS II + SI IS − II IS + IS SI − II SI ), Each term will be simplified separately. Simple algebra leads to The expression for Q 2 can be reduced as follows where in the last step the identities II + SI = I j and II + IS = I i were used. Based on the identities the term Q 3k can be rewritten as The transformations of the term Q 3k can be completed by using (20) and (21) as follows.
where we used Proposition 1 for A ki . Similar transformations lead to Substituting the expressions obtained for Q 1 , Q 2 , Q 3k and Q 4k into (23) we get the desired differential equation for A ij . ✷ Now we are ready to prove Theorem 3. Proof of Theorem 3. We prove that A ij (t) is nonnegative if A ij (0) ≥ 0, which proves the statement according to Proposition 1. This will be proved by induction according to N . The statement for N = 2 is proved in Proposition 2. Assuming that the statement is true for N − 1 means that A k ij ≥ 0, where node k can be regarded as a newly added susceptible node that does not invalidate what we already know. This effectively allows us to extend the model from N − 1 to N nodes, and thus completing the induction step. Let us now consider system (22) for all i and j. This is an inhomogeneous system of linear differential equations in the forṁ where x(t) represents the vector containing all A ij , M is a cooperative matrix, i.e. its offdiagonal entries are nonnegative, and g(t) ≥ 0. We prove that x(0) ≥ 0 (coordinate-wise) implies x(t) ≥ 0 for all t ≥ 0. Namely, we haveẋ(t) ≥ M x(t), since g(t) is non-negative. Consider the homogeneous systemẏ(t) = M y(t) with initial condition y(0) = 0. The solution is obviously y(t) = 0 for all t. On the other hand, this system satisfies the Kamke-Müller condition, hence Lemma 1 yields that x(t) ≥ y(t) for all t ≥ 0, leading to the desired statement.
✷ We note that the statement of the Theorem can be formulated in terms of the I − I pairs. Using that S i = 1 − I i and S i I j = I j − I i I j , leads to the following.

Corollary 3 Under the assumptions of Theorem 3 the inequality I
holds for all t > 0.

Analysis of the closed model
The goal in this section is to analyse system (7) from the dynamical system point of view. The closure W satisfies the conditions in Definition 1, moreover, according to Corollary 3 it satisfies W (x, y) ≥ xy as well. Thus our aim here is to understand the dynamical behaviour of system (7) when W satisfies xy ≤ W (x, y) ≤ min(x, y) for all x, y ∈ [0, 1]. (24) The lower bound W (x, y) = xy is covered in [15], and this result is presented first.
Theorem 4 Given a directed, weighted, and strongly-connected network, G, let Λ max (G) be the largest eigenvalue of the adjacency matrix of G. Let the closure in (7) be given as W (x, y) = xy. If γ < τ Λ max (G), then a unique endemic (nonzero) steady state exists, and it is stable. Moreover, all of its coordinates are positive. If γ > τ Λ max (G), then there is no endemic steady state and the disease-free steady state is stable.
Concerning the upper bound W (x, y) = min(x, y), the following result can be easily proved.
Proposition 4 Let G be a directed and weighted network and let the closure in (7) be given as W (x, y) = min(x, y). Then the only steady state is the disease-free and it is stable.
Thus a transcritical bifurcation occurs when the closure is W (x, y) = xy and there is no bifurcation when W (x, y) = min(x, y); that is the threshold behaviour disappears when such a crude closure is used. This however raises the question of studying the intermediate regime when W is between the two extremes. Below we give a sufficient condition on closures to ensure that the threshold behaviour is maintained.
We will consider closures where W satisfies where V : [0, 1] 2 → [0, r], with some r ∈ (0, 1), is a continuous function satisfying V (0, 0) = 0 and xy + V (x, y) min(x, y) ≤ min(x, y). We note that the inequalities in (25) yield a sufficient condition for the existence of the transcritical bifurcation. This means that it may be possible to identify closures that lead to transcritical bifurcation but do not satisfy condition (25). We note that W (x, y) = xy obviously satisfies this condition with V (x, y) = 0, and a non-trivial example is W (x, y) = √ xy min( √ x, √ y). For the latter, simple calculation shows that there exists a V (x, y) such that this is positive and bounded by a constant r < 1. Below we will prove that for any choice of W that satisfies condition (25) the same threshold as in Theorem 4 is obtained.
The non-trivial steady state x ∈ (0, 1] N of system (7) is given by that will be rearranged using where α = γ/τ , (Gx) i is the i-th coordinate of the vector Gx and Expressing x i from (26) we get the fixed point equation x = T (x) for the non-trivial steady state with We can immediately see that T maps the unit cube [0, 1] N into itself and the origin is its fixed point, representing the disease-free steady state. We will show that in the case γ < τ Λ, that is α < Λ, T has a nontrivial fixed point in the interior of the cube, representing an endemic steady state. (Here Λ = Λ max (G) is the largest eigenvalue of the adjacency matrix of G.) The existence of this fixed point will be verified by using Brouwer's fixed point theorem on a suitably chosen convex subdomain of the cube not containing the origin. In order to achieve this goal we will need a few auxiliary results.
Proposition 5 For a directed, weighted, and strongly-connected network, given by its adjacency matrix G, there exists a positive number µ, for which the following holds. If Proof. According to (25) ✷ Proposition 8 Let α < Λ. Then there is a ρ > 0, such that T i (x) < ρ for all i implies (Gx) i < ΛT i (x), when (Gx) i = 0.

Therefore (27) leads to
According to Proposition 6, we get that h((Gx) i ) > β(Gx) i , hence and this completes the proof. ✷ Now we are ready to prove the existence of the endemic steady state and the presence of a transcritical bifurcation.
Theorem 5 Given a directed, weighted, and strongly-connected network, G, let Λ be the largest eigenvalue of the adjacency matrix of G. Let the closure W in (7) satisfy (25). If γ > τ Λ then the origin is the only steady state of the system. In the case γ < τ Λ, an endemic (nonzero) steady state also exists.
Proof. We first consider the γ > τ Λ case and take a steady state x ∈ [0, 1] N . According to (26) and using that F i is nonnegative we get It is easy to see that for two vectors, u and v with nonnegative coordinates, the inequality 0 ≤ u i ≤ v i for all i implies |u| ≤ |v|. Hence for any nonzero steady state x ∈ [0, 1] N we have where we used that Λ is the largest eigenvalue of G. Hence there is no endemic steady state. We note that this part of the proof only used the fact that W (x, y) ≥ xy, condition (25) has not been used. Let us turn to the case γ < τ Λ. As we mentioned above, we will prove the existence of the endemic steady state by applying Brouwer's fixed point theorem to the mapping T given in (27) in a suitably chosen domain. The goal is to exclude the origin from this domain, hence we introduce a half-space S = {x ∈ R N : x − a, u ≥ 0} with some appropriately chosen vectors a, u ∈ R N . Then our domain will be Ω = [0, 1] N ∩ S. In order to have a nonempty intersection we will choose a from the cube [0, 1] N and u will be the unique positive eigenvector of G corresponding to the maximal eigenvalue Λ, that is Gu = λu.
In order to prove the invariance of the domain Ω it is useful to determine the intersection points of the hyperplane given by x − a, u = 0 and the coordinate axes. The intersection point on the i-th axis is at c i = a, u /u i . It is easy to see that a point y ∈ [0, 1] N is in Ω if there is a coordinate i, for which y i ≥ c i . Namely, if y ∈ [0, 1] N is not in Ω, then y, u < a, u = c i u i for all i, hence y i u i < c i u i leading to y i < c i for all i. Now, choose a ∈ [0, 1] N in such a way that for all i we have c i < ρ given in Proposition 8, that is a, u /u i < ρ for all i.
We will prove that T maps Ω into itself. We know that T maps to [0, 1] N , hence we only need to prove that T (x) − a, u ≥ 0 holds for any x ∈ Ω. We have previously shown that, if there is an index i for which T i (x) ≥ ρ, then T (x) ∈ Ω. Hence we only need to consider the case when T i (x) < ρ for all i. In this case we can apply Proposition 8 yielding x j Λu j = x, u ≥ a, u .
Thus we proved T (x) − a, u ≥ 0, which proves that T maps the convex, compact domain Ω into itself, hence by Brouwer's fixed point theorem it has a fixed point in Ω, which is a nontrivial steady state completing the proof of the theorem. ✷ We note that the stability of the steady states was also determined in the special case W (x, y) = xy. Further conditions on the smoothness of W would make it possible to generalise the stability result of Theorem 3.8 in [11] for different choices of W .

Discussion
In Section 4 we proved that the closure W (x, y) = xy satisfies the second assumption of Theorem 1, that is this closure leads to an upper bound of the exact model. Similarly, in Theorem 2 we have shown that W (x, y) = min(x, y) yields a lower bound of the exact model. However, further work could focus on finding more accurate upper and lower bounds with the possibility of constructing a sequence of closures whose limit is closer in some sense to the exact model.
The analysis of the closed model was presented in Section 5. However, the investigation of the local and global stability for a general closure relation is still an open question. Moreover, we have shown that the qualitative behaviour of the closed system depends on the closure and can be significantly different: it may or may not lead to a transcritical bifurcation. The question of whether a closed system shares the same qualitative features as the exact model is an important one, and ideally the behaviour of the two systems should remain the same. Thus identifying the closure or closures which separate different regimes, those that conserve the qualitative behaviour of the exact system versus those that do not, remains an important direction for further research. One possible step towards achieving this may be to find closures that delimit closed models that have different qualitative behaviour when compared to each other, without considering their relation to the exact model.