Impacts of cluster on network topology structure and epidemic spreading

Considering the infection heterogeneity of different types of edges (lines and edges in the triangle in a network), we formulate and analyze an novel SIS model with cluster based mean-field approach for a network. We mainly focus on how network clustering influences network structure and the disease spreading over the network. In networks with double poisson distributions, power law-poisson distribution, poisson-power law distributions and double power law distributions, we find that cluster is positive(the clustering coefficient is increasing on the expected number of triangles) when the average degree of lines is fixed and the moment of triangles is less than some threshold. Once the moment of triangles exceeds that threshold, cluster will become negative(the clustering coefficient is decreasing on the expected number of triangles). For the disease, clustering always increases the basic reproduction number of the disease in networks with whether positive cluster or negative cluster. It is different from existing results that cluster always promotes the disease spread in the homogeneous or heterogeneous network.


1.
Introduction. Recently, more and more studies reveal that the infectious disease dynamics can be profoundly influenced by two main network properties, one is the degree distribution(the number distribution of contact neighbors of per individual) [18,12,13] and, the other is contact clusters(such as within households or school) [14,22,24]. The impact of the degree distribution on epidemics in networks without cluster has recently been broadly discussed and well understood [12,19,8]. All results clearly show, in scale free networks, that the basic reproduction number diverges to infinity when the variance of the degree distribution diverges to infinity. Namely, though the transmission incidence is sufficient small, the disease can still become epidemic [19].
Clustering (or transitivity) in a complex network refers to the propensity of two neighbors of a given node to also be neighbors of each other, thus forming a triangle of edges within the graph [5]. In a social network, there is a high probability that two friends of a given individual will also be friends of one another. The average probability for the whole network is called the clustering coefficient of the network [15]. The clustering coefficient C measures of the level of clustering in a network. In general, in order to increase the clustering coefficient for a network while keeping degree distribution constant, it is obvious that network clustering is increasing. Network clustering not only affects network structures, but also influences the dynamics of epidemics spreading on that network.
Using different methods, many researchers investigate the impact of clustering on SIR epidemics models [24,5,15,7,4,9,10,2,11,16,20,21]. By mean-field models based the pair-wise approximation, Keeling [7] and Eames [4] find in a regular network, when the contacts number is constant, that the basic reproduction number and the final epidemic size will be decreasing with the clustering coefficient. Because the spread process described by the SIR epidemic model on a network can be considered as a percolation problem, many researchers study the impact of the clustering on the disease spread by percolation theory [5,15,9,10,2,16]. When the degree distribution is fixed, Gleeson [5] and Miller [9,10] obtain that the cluster decreases the basic reproduction number in a random regular graph. Coupechoux et al [2] also find the same result in a random regular graph and the network with a power law degree distribution with exponential cutoff. By the definition of the basic reproduction number, Molina et al [11] consider the heterogeneous infection probability depend on degrees in regular graph, ER random graph and BA network, the result is also the reproduction number is become small in clustered network. In 2011, Volz et al [24] take advantage of the method of dynamical probability generating function based nodes and find clusters decrease the final size of infections in the random graph. In summary, when the degree distribution of a network p k is fixed, the available studies show that increased clustering decreases the basic reproduction number. It is contrary to what Newman [15,16] discovered that clustering strengthings the basic reproduction number and clustering significantly reduces the size of epidemics once the average degree of network is fixed by the theory of percolation.
Considering the edge multiplicity, Serrano et al [20,21] regard that clustered network can be classified into two classes: the weak cluster(edge multiplicity is small and triangles disjoint)and the strong cluster(triangles share edges). Results show that, in heterogeneous networks, the basic reproduction number of the SIR model in weakly clustered network is smaller than the unclustered network with the same distribution, and, the basic reproduction number in unclustered network is smaller than that of strongly clustered network. Moreover, for networks with an exponential degree distribution, if the degree distribution is fixed and the clustering coefficient increases, the basic reproduction number is decreasing in weakly clustered networks and it is increasing in strongly clustered networks. To update, as far as we know, there are no specific dynamic equations modeling the transmission process nor mathematical analysis in literatures about the basic reproduction number and the clustering coefficient of a network. The impact of clustering on the transmission of infection diseases is unclear. What are relationships among degree distribution of networks, clustering, clustering coefficient and the basic reproduction number of diseases? How do degree distribution and clustering of networks influence the disease spread? So, the focus of our present paper is to establish an SIS dynamical model on weakly clustered networks, to study how degree distribution and clustering influence the network structure and the spread of disease over the network. It is noted that mean-field models consider a large number of small interacting individual components which interact with each other by a simple model, and the effect of all the other on any given individual can be approximated by a single averaged effect. In this context, we will consider the infectivity heterogeneity of infective nodes located different sites(at the end of single edge or the edge in the triangle), we will derive an SIS dynamical model based on the mean-field method describing epidemics in networks with arbitrary degree distributions and clustering coefficients. Our model extends the SIS model proposed by Pastor on networks without cluster [19]. By the next generation matrix method, the epidemic threshold of the disease is obtained. When the clustering coefficient is 0, the epidemic threshold exactly coincides with the form in ref. [19]. We also analyze the influence of degree distributions and cluster on network structure and the disease. The results show that, in heterogeneous networks, the cluster always promotes the epidemic spread.
2. Construction of clustered networks. We consider a class of weakly clustered network where there are no common edges between any two triangles. We assume that each node has some lines(or single edges) and triangles in the network, and triangles are non-overlapping. N represents the total number of nodes in the network. The number of nodes with l lines and r triangles is denoted N l,r (l = 0, 1, 2, . . . , n, r = 0, 1, 2, . . . , m), and the joint degree distribution of the network p l,r = N l,r N is the proportion of nodes with l lines and r triangles in the total number of nodes. For convenience, we assume the numbers of lines and triangles of every node is independent. It is obvious that the total distribution of degree p k is obtain by p k = n l=0 m r=0 δ k,l+2r p l,r . If the joint distribution p l,r is given, then the average degree k is given by As we all know that there are two extreme cases for heterogeneous networks: network with Poisson distribution and scale-free network with power law distribution. A general network may be between these two extreme cases. So, we assume the joint degree distributions of networks p l,r are respectively (1) double poisson distributions, e − l l l l! e − r r r r! , (2) power law-poisson distributions, (γ − 1)2 γ−1 l −γ e − r r r r! , l min = 2, (3) poisson-power law distributions, e − l l l l! (γ − 1)2 γ−1 r −γ , r min = 2, and (4) double power law distributions, (γ 1 −1)2 γ1−1 l −γ1 (γ 2 − 1)2 γ2−1 r −γ2 , l min = 2, r min = 2. When we take n = 50, m = 100, from (a) of Fig.  1, we can see the total degree distribution p k is not again poisson distribution although p l,r is double poisson distributions, and (d) of Fig. 1 shows the total degree distribution p k is not again Power law distribution when p l,r is double power law distributions. However, networks in (a), (b) and (c) of Fig. 1 are approximately homogeneous networks, and the network in (d) of Fig. 1 is heterogeneous network. In this context, we mainly focus the above four types of clustered networks.     Figure 2. Illustration of clustered network. It is a regular clustered network where there is no common edges for any two triangles. Each node in this network is connected to two lines and one triangle, and the total number of nodes is a constant.
3. Dynamical mean-field(MF) reaction rate equations of disease transmission. the effect of clusters. Namely, we consider that each individual exists only in two discrete states: S-susceptible and I-infected. At each time step, each susceptible (healthy) node is infected if it is contacted by one infected individual, and an infected node is cured and becomes susceptible again. It is worth noting that each susceptible node is infected through two different contacts pathways, one is that infected neighbors are connected by some lines, and the other is some infected neighbors in triangles. We denote that the density of susceptible nodes with l lines and r triangles at time t as s l,r (t), and the density of infected nodes with l lines and r triangles is ρ l,r (t). Let θ 1 (t) be the probability that a susceptible individual is connected to an infected individual by a random line and θ 2 (t) is the probability that a susceptible individual is connected to an infected individual by a random edge in triangles, for a individual of degree (l, r), the probability that it is randomly connected with a individual of degree (l , r ) is p (l ,r )|(l,r) (Appendix A. Two Conditional Probabilities) by a line, and, the probability that it is randomly connected with a individual of degree (l , r ) is q (l ,r )|(l,r) (Appendix A. Two Conditional Probabilities) by a edge in a triangle, When the conditional probability p (l ,r )|(l,r) or q (l ,r )|(l,r) is independent of the degree (l, r) (Appendix A. Two conditional probabilities), obviously, θ 1 (t) = l ,r l p l ,r l ρ l ,r (t), θ 2 (t) = l ,r r p l ,r r ρ l ,r (t).
In the current clustered network, if a susceptible node is not infected, then it must be not infected by any of all infected neighbors to which it is connected, either by lines or by triangles. Let β 1 be the infect probability that a susceptible node is infected by a random infected neighbor connected by a line, and the infected node connected by the random edge in triangles can infect the susceptible node with the infection probability β 2 . In fact, β 2 is involved with β 1 . Assuming that there are susceptible nodes denoted respectively as v 1 and v 2 in a triangle, v 3 is infected located in a triangle. Here, we choose v 1 as the target node, then, or not infected by neighbors in the triangle, it is shown as in Fig. 3. From Fig. 3 (c) and (d), the probability that v 1 is infected is Figure 3. Transmission in the triangle. v 1 is not infected by the infected neighbor in the triangle, v 1 and v 2 are not infected by v 3 , or v 1 is not infected by v 3 , and, v 3 transmits the disease to v 2 , while, v 2 doesn't transmit the disease to v 1 . v 1 is directly infected by v 3 , or, v 1 is not infected by v 3 , and, after v 3 transmits the disease to v 2 , v 2 transmits the disease to v 1 .
Next, we will deduce the dynamical SIS model with heterogeneous infection rates in the clustered network.
For a given node with degrees l and r, the probability that there arel +r infected neighbors(l infected neighbors arrived byl lines among l lines andr infected neighbors arrived byr edges in r triangles) is given by the product of two binomial distributions: naturally, we can obtain the probability that a susceptible node with l lines and r triangles is not infected by it's infected neighbors in unit time: obviously, the probability that it is infected in per unit time should be After h time steps, the infected probability becomes and infected nodes can recover and become susceptible with the probability µh. Then, a dynamical MF rate equation for the density of infected nodes ρ l,r (t) can be written as Without losing generality, let µ = 1. Further, dividing by h in (8) and letting h → 0, one can obtain the derivative of ρ l,r (t) In particularly, if the network is a regular graph with cluster ( Fig. 2) that each node in the network has same lines and triangles, then the joint distribution p l,r is p l,r = 1, l = l 0 and r = r 0 , 0, or else, then, 4. Results.

4.1.
Generalization of model and the basic reproduction number. If p l,r is the joint distribution of nodes with l lines and r triangles, then, when r has a unique value 0, namely, there are only lines but without triangles cluster in the network. It is interesting to note that our dynamical model (11) is reduced to SIS model proposed by Pastor [19]. It is worth noting that the form of model (11) is generalization of the standard SIS model in homogeneous network where cluster is not considered. Obviously, model (9) has a unique disease-free equilibrium E 0 = ( nm+n+m 0, 0, . . . , 0). According to van den Driessche and Watmough [23], the spectral radius of the matrix F V −1 , where F is the rate of appearance of new infections and V is the rate of transfer of individuals out of compartments. where Through some calculations, we can obtain the invertible matrixes P and Q(Appendix The characteristic equation of F * is It is obvious that F * has two real inequality eigenvalues so, the basic reproduction number R 0 is given by By Theorem 2 in van den Driessche and Watmough [23], the disease-free equilibrium E 0 is locally asymptotically stable if R 0 < 1 but unstable if R 0 > 1. If there are no triangles cluster in the network, that is r = 0, then we obtain the basic reproduction number for this special case is l . This is consistent with the results derived in the networks without cluster [19]. According to the dynamical model (11) on the regular network with clusters, we obtain that the basic reproduction number is R 0 = β 1 l 0 + 2β 2 r 0 . Moreover, when r 0 = 0, R 0 = β 1 l 0 is coincident with the result of uniform network model proposed by Pastor [18].

4.2.
Impacts of clustering and degree distribution on network structure. Clustering is often characterized using the clustering coefficient C, which is the ratio of 3× the number of triangles(denoted by N ) to the number of 2-paths in the network(denoted by N 3 ) [15,9]. The clustering coefficient C can be calculated by the probability generation function. We define the probability generating function of the joint degree distribution of the network as g(x, y) = l,r p l,r x l y r . Once the joint distribution p l,r is given, the conventional degree distribution of the network, the probability p k that a node with k(= l +2r) edges in total can be easily obtained, and, we can also write down a generating function for the total degree distribution p k , thus: Then, by the definition of the clustering coefficient, it is obtain that It is obvious that ∂g ∂y | (1,1) is the average number of triangles in the network, so ∂g ∂y | (1,1) = r , then we have combining the formula (20), we can rewrite the clustering coefficient as For the convenience of calculation, we need to know the relationship between the mean value and the second moment of some random variable. If a random quantity x obeys the distribution p(x), the mean value and the second moment of x are respectively denoted as x and x 2 , we have and where a = 2 γ−2 γ−3 > 0 [17]. Next, we will discuss network clustering in four types of networks when the distribution of the total degree is variational.
Case I. Double poisson distributions When the joint degree distribution of networks is a double poisson distributions, by formula (23) and (24), the clustering coefficient C can be simplified to: One can verify that and When l is fixed, formula (27) shows ∂C ∂ r > 0 when r < l /2, the clustering coefficient is increasing on the expected number of triangles, we define that clustering is positive, however, ∂C ∂ r < 0 when r > l /2, the clustering coefficient is decreasing on the expected number of triangles, we define that clustering is negative. So, the impact of triangle clusters on the network structure is complicated when l is fixed, when the size of triangles in the network is smaller, r < l /2, the larger the number of triangles is, the more dense network contacts are, network cluster is positive. On the contrary, the size of triangles in the network reaches a critical value when r > l /2, the larger the number of triangles is, the more sparse network contacts are, network cluster is negative. From the sociology, for a given individual in the network, at first, there are few friend connections among its' network neighbors, when more and more neighbors become friends, the remaining other neighbors are also easy to become friends, but, when there are numerous triangles without common edges, these triangles form mutually independent small groups, so, the remaining other neighbors will be impossible to become friends. From formula (28), we can see the clustering coefficient C is monotonously decreasing on l when r is fixed.
Further, we can obtain and, let ∂a because a − 1 > 0, so ∂C ∂ l < 0, and ∂C ∂ r > 0 when r < , hence, the clustering is negative.

SHUPING LI AND ZHEN JIN
It is noted that ∂a ∂ r = ∂a ∂γ ∂γ r = (γ−2) 2 (γ−3) 2 = a 1 , so, we can obtain and ∂C ∂ r > 0 when r 2 < 2 l , the clustering is positive, ∂C ∂ r < 0 when r 2 > 2 l , so the clustering is negative. (23) and (25), the clustering coefficient C is rewritten as and and Because b − 1 > 0, ∂C ∂ r 2 > 0 when r 2 < (b − 1) l , hence the clustering is positive, ∂C ∂ r < 0 when r 2 > (b − 1) l , therefore the clustering is negative. In summary, in networks with double poisson distributions and power law-poisson distribution where numbers of triangles obey poisson distribution, network clusters vary from positive clustering to negative clustering with the increment of r . While, in networks with poisson-power law distribution and double power law distributions where numbers of triangles obey power law distribution, network clustering from positive clustering to negative clustering with the increment of r 2 . Numerical simulations in Fig. 4 are in agreement with the above theoretical results in the four types networks.

4.3.
Impacts of clustering and degree distribution on the disease. When the distribution of the total degree is variational, how network cluster and degree distribution influence the disease spread.
Case I. Double poisson distributions. When the degree distribution of the network is double poisson distributions, it follows from (19), (24) and (25) that R 0 can be simplified as: where 1 = (β 1 ( l + 1) − 2β 2 ( r + 1)) 2 + 8β 1 β 2 l r . For a fixed l with C < 1 1+4 l , we obtain from (26) that  Formula (38) and (39) show that when C < 1 1+4 l , for fixed l and fixed C, the reproduction number R 0 has two different values. From formula (38), we have It is obvious that 2β 2 > β 1 , so ∂R0 ∂ r > 0, namely, R 0 is monotonously increasing with the expected number of triangles, and the disease will become epidemic with the increment of triangle clusters. The relation of R 0 and C is delicate. Combining formula (27) with (40), one can see that thus, if r > l /2, then R 0 is monotonously decreasing with respect to C, while R 0 is monotonously increasing about C when r < l /2. Note ∂ l > 0. Therefore, ∂R0 ∂ l > 0 regardless of the cases, namely, R 0 is monotonously increasing about l with a fixed r . Further, combining (28) with (42), it is obtained that Thus, R 0 is monotonously decreasing about C when r is fixed.
From formula (33), (34), (48) and (49), we obtain that, for a fixed l , ∂R0 ∂C = ∂R0 ∂ r ∂ r ∂C > 0 when r 2 < l , while, ∂R0 ∂C < 0 when r 2 > l . For a fixed r , ∂R0 ∂C = ∂R0 ∂ l ∂ l ∂C < 0. Case IV. Double power law distributions. When networks have the joint distribution p l,r = (γ 1 − 1)2 γ1−1 l −γ1 (γ 2 − 1)2 γ2−1 r −γ2 , l min = 2, r min = 2, by some easy calculations, we obtain that l 2 l = 2 γ1−2 and here, we can also prove that ∂R0 ∂ r > 0 and ∂R0 ∂ l > 0(Appendix C. Determination of symbols of ∂R0 ∂ r and ∂R0 ∂ l ). From formula (36), (37), (51) and (52), we obtain that, for a given l , ∂R0 In  Fig. 5 (a) and (c), the basic reproduction number R 0 is not simply determined only by the clustering coefficient C, but also related with the degree distribution and C simultaneously. With the increasing of r or r 2 , the clustering coefficient C first increases and then decreases gradually, and network clusters change from positive clusters to negative clusters gradually. However, the basic reproduction R 0 has been increased. With the increasing of l , the clustering coefficient C decreases gradually, and network clusters have been negative clusters. However, the basic reproduction R 0 has been increased.

Remark 2.
Comparing of the four cases as shown in Fig. 5 (a), (b), (c)and (d), we find that the disease is more easy to become epidemic on the network where the number of triangles obeys power-law distribution.
5. Discussion. When k 2 − k = d(a positive constant), namely, the average number of second neighbors remains a constant. From formula (22), we can obtain r = dC 2 , and ∂ r ∂C = d 2 . When the joint degree distribution is double poisson distribution, ( l + 2 r ) 2 + 2 r = d. So, when C < −1+ √ 1+4d 2d , it is derived that From formula (53), it is obtained that Furthermore, according to formula (38), we have When β 1 = 0.005, d = 2, 4, 6, from Fig. 6 we can see ∂R0 ∂C > 0, so, R 0 is increasing with the increment of the clustering coefficient C when the total degree is invariant.
It means that, when the average number of second neighbors is fixed and the cluster coefficient C is increasing, the average degree k = l + 2 r = d(1 − C) is decreasing and the second moment k 2 = d + k is also decreasing. The disease is more easy to be spread although the heterogeneous extent of degree is reduced. In fact, when the average number of second neighbors is fixed and the cluster coefficient C is increasing, it is obvious that the remaining average degree k 2 − k k is increasing. From the perspective of bond percolation, it is obvious beneficial to the spread of diseases.
To summarize, in this paper, we present a new SIS model including network cluster to generalize the usual SIS model without clusters. We conclude that triangle clusters are not always benefitial to make friends, while it always promotes the disease spread in the network because of the infection heterogeneity. Our results also have important implication for prevention and cure of disease by blocking triangles. For instance, in a contact network, once some persons who have infected can be effectively separated from those who are healthy and their movement are restricted, which suggests that some triangles where infective individuals exist will be broken. Thus, R 0 become decreasing. It is effective for controlling the spread of disease and the results support the strategy of the government compel isolation of sick people to protect the public in most cases such as tuberculosis (TB) and SARS, etc.
It is noted that our network has only non-overlapping triangle besides lines. In fact, real networks are much more complex and usually contain higher order motifs(such as a square with a diagonal or a fully connected square) [1]. Second, to simplify the problem, network cluster is described by the global cluster coefficient rather than the local cluster coefficient [25]. Third, the degree-degree correlation [6] and the degree-cluster correlation [3] are ignored in this paper. Thus, the studying of the epidemic spreading on the network with different motifs, the local cluster, the degree-degree correlation and the degree-cluster correlation will be a huge challenge for future researches. Appendices.
Appendix A. Two conditional probabilities. We denote N l and N r as the numbers of nodes with l lines and r triangles respectively. The probability that a node has l lines and r triangles is described by the probability mass function p l,r = N l,r /N , and, p l and p r are marginal probabilities that a node with l and r triangles respectively. l and r respectively represent the average number of lines and triangles in the network. So, we can deduce the following formulas Let N l (l,r),(l ,r ) represents the total number of lines between nodes with degree (l, r) and nodes with degree (l , r ). The probability that there are lines between nodes with degree (l, r) and nodes with degree (l , r ) is denoted as p((l, r), (l , r )). The conditional probability p((l , r )|(l, r)) represents that a line emanating from a node with degree (l, r) points to a node with degree (l , r ). So, we have and, the following balance condition is established l p((l, r), (l , r )) = l p((l, r)|(l , r ))p l ,r = lp((l , r )|(l, r))p l,r . (A. 16) We define the correlation coefficient C l (l,r),(l ,r ) for the line by comparing the true number of lines between node of degree (l, r) and node of degree (l , r ) with the expected number if half-lines connected at random [6], so Next, we can find the probability that a random edge in the triangle is connected to the node. For this purpose, we give some notations. Let N r (l,r),(l ,r ) be a symmetric matrix whose elements represent the total number of edges in the triangle between node of degree (l, r) and node of degree (l , r ), because any two triangles have not common edges, N r (l,r),(l ,r ) is also the total numbers of triangles that node with degree (l, r) and node with degree (l , r ) exist in the same triangle. The probability that there are a common triangle between node of degree (l, r) and node of degree (l , r ) is denoted as q((l, r), (l , r )), and the conditional probability q((l , r )|(l, r)) represents that a triangle emanating from a node with degree (l, r) points to a node with degree (l , r ) by one of edges in the triangle. So, we have The correlation number C r (l,r),(l ,r ) for the edge in triangles is defined by the proportion of the true number of edges in triangles between node with degree (l, r) and node with degree (l , r ) with the expected number if half-edges connected at random, so , , i = 2, 3, . . . , n + 1, i = 3, 4, . . . , n + 1, and .