Heterogeneous Population Dynamics and Scaling Laws near Epidemic Outbreaks

In this paper, we focus on the influence of heterogeneity and stochasticity of the population on the dynamical structure of a basic susceptible-infected-susceptible (SIS) model. First we prove that, upon a suitable mathematical reformulation of the basic reproduction number, the homogeneous system and the heterogeneous system exhibit a completely analogous global behaviour. Then we consider noise terms to incorporate the fluctuation effects and the random import of the disease into the population and analyse the influence of heterogeneity on warning signs for critical transitions (or tipping points). This theory shows that one may be able to anticipate whether a bifurcation point is close before it happens. We use numerical simulations of a stochastic fast-slow heterogeneous population SIS model and show various aspects of heterogeneity have crucial influences on the scaling laws that are used as early-warning signs for the homogeneous system. Thus, although the basic structural qualitative dynamical properties are the same for both systems, the quantitative features for epidemic prediction are expected to change and care has to be taken to interpret potential warning signs for disease outbreaks correctly.


Introduction
Infectious diseases have a big influence on the livelihood (and indeed lives) of individual people as well as the performance of whole economies [54]. The development and understanding of mathematical models that can explain and especially predict the spreading of such diseases is therefore of enormous importance. A seminal work in this area was provided by Kermack and McKendrick in 1927 [42]. Up to this day their model is used as basis for analysis, although it has of course been extended in numerous ways. One such way is to consider heterogeneous populations. This is due to the realisation that individual people differ in their genetics, biology and social behaviour in ways that influence the spreading of infectious diseases. One type of model treats these individual traits as a static parameter [17,28,51]. Since these parameters have a certain distribution amongst the population some information may be gained by studying the moments of this distribution [21,60]. Other models deal with time varying heterogeneities like age or duration of the infection [4,22,35]. For a more complete overview of different ways to model heterogeneity in this context we refer to textbooks on mathematical epidemiology such as [11,19,40].
In this paper we will exclusively deal with susceptible-infected-susceptible, in short SIS, models. These models assume that an individual is either infected or susceptible, and furthermore that an infected individual recovers from the infection with no lasting immunity and immediately becomes susceptible again. One of the main applications of SIS models are sexually transmitted diseases [13,24,34,65], but other bacterial infections can also be modelled this way [27]. There are also applications of this model outside of biology, for example in the study of spreading of computer viruses [41,63] or social contagions [31]. Heterogeneous versions of this model also have a long history, see for example [48].
One feature that is present in most of these models is the existence of a threshold that fundamentally influences the behaviour of the system. This threshold is usually given in terms of the basic reproduction number R 0 . If this number is smaller than one, then a disease can not lead to an outbreak and usually a disease free population rests, mathematically speaking, in a stable steady state. If R 0 is however bigger than one, then a disease can become endemic in a population. In many diseases this number is not constant but is susceptible to seasonal or other environmental changes [3,40]. Hence, it is important to provide rigorous mathematical analysis, how R 0 has to be viewed for heterogeneous populations [15].
Once we understand this influence of heterogeneity, then it is of great interest to analyse possible warning sings that indicate the approach of R 0 to the critical value, when R 0 depends upon parameters. One approach to model this setup is to consider epidemic dynamics as a multiple time scale system where the population dynamics, including infection and recovery, are fast while parameters influencing R 0 drift slowly, so that R 0 increases from the sub-threshold regime R 0 < 1 to the critical value R 0 = 1. Considering also stochastic perturbations, it has been shown in various epidemic models [44,53] that there exist warning signs for the upcoming critical value when recoding time series from the sub-threshold regime R 0 < 1. We follow in this vein and study how incorporating heterogeneity into a stochastic SIS model influences the warning sings of an impending critical transition.
Our two main results for dynamics and warning signs for heterogeneous SIS models can be summarized on a non-technical level as follows: (R1) We prove a theorem, how the global dynamical structure of the deterministic (i.e. no noise) homogeneous population model is preserved when the homogeneous population is replaced by a heterogeneous one. In particular, the result shows that upon very reasonable modelling assumptions on the heterogeneity, the homogeneous and heterogeneous models have the same basic bifurcation structure with an epidemic threshold at R 0 = 1.
(R2) We extend the heterogeneous SIS model by stochastic perturbations as well as by slow parameter dynamics. We use numerical simulations to investigate warning signs for epidemic outbreaks based upon scaling laws of the variance in the sub-threshold regime. We show that the rate of variance can change below the epidemic threshold when (a) a cut-off for the heterogeneities is considered, (b) a discretise distribution of heterogeneities is considered, (c) if the system interacts with the upper and lower level population boundaries, (d) if the transmission rate cannot be separated into a product of parametric drift and contribution from heterogeneity.
We also provide first steps to explain (a)-(d) on a non-rigorous level via formal calculations and considering the influences of various terms in the model.
The main implications for prediction and management of epidemic outbreaks are twofold. First, an epidemic threshold still exists for heterogeneous populations. It may shift due to the distribution of types in the heterogeneous population considered but we still have a tipping point or critical transitions towards an endemic state. This shows that there is a need to develop warning signs that can be applied before the outbreak. However, warning signs from homogeneous population models do not directly generalize to the heterogeneous situation. In particular, the functional form at which the warning sign of variance rises, does depend crucially on many additional factors, which are not predicted by simple homogeneous SIS-models.
The paper is structured as follows. In Section 2 we briefly review a homogeneous SISmodel as a baseline for our considerations. We state some known results about this model that are relevant to our analysis. In Section 3 we introduce the heterogeneous model we wish to study. In Section 4 we state, prove, and interpret the first main result (R1). In Section 5 we explain, why we extend the model by a slow parameter drift and by a noise term. Furthermore, we explain some background from the theory of warning signs for stochastic multiscale SIS models with homogeneous populations. In Section 6, we numerically analyse the influence that heterogeneity has on the behaviour of the system near bifurcation point by looking at the variance as warning sign. In Section 7, we provide a few first steps to explain the numerical observations. In particular, Sections 6-7 provide the details for our second main result (R2). We conclude in Section 8 with an outlook of future problems for epidemic models with heterogeneous populations which arose during our analysis.

The homogeneous model
Basic SIS-models are well understood and an in-depth discussion of them can be found in introductory books about mathematical epidemiology (e.g. [10,11,40]). As a baseline homogeneous model we useṠ where d dt =˙denotes the time derivative, β > 0 is the transmission rate and γ > 0 the recovery rate. The parameter η models the propagation of the disease due to imported cases of the infection. Such import can, for example, be explained by brief contacts with individuals outside of the population (see [40,53]). The parameter η has also been used with different interpretations. In [31] and [32] η denotes spontaneous self-infection in the transmission of social contagions. In [58] and [64] η is a time dependent function modelling an infective medium. Furthermore, in [2] the mean field approximation of the ǫ−SIS model introduced in [59] is presented. It too is an (heterogeneous) SIS-model with positive η. The general model with η > 0 will be used in the analysis of the steady states and bifurcation of the deterministic heterogeneous model. For the analysis of the stochastic model it will be included in the noise term.
Note that the positive quadrant is invariant for (1) so our choice of initial conditions ensures that the population sizes of infected and susceptibles remain non-negative. We shall only consider (I(t), S(t)) ∈ [0, +∞) × [0, +∞) for t ≥ 0 from now on.
By adding the two equations in (1), it is easy to see that S(t) + I(t) is constant in this model. Because of the structure of (1) we can assume without loss of generality that S(t) + I(t) = 1, since re-scaling both variables S(t) and I(t) by the inverse of the population size yields a total population of size 1. By substituting S(t) = 1 − I(t) into the equation for I(t) we can describe the system by the single equatioṅ If η = 0 then we define R 0 = β γ , known as the basic reproduction number. If R 0 ≤ 1 then (2) has single steady state, I * ≡ 0, that is globally asymptotically stable. If R 0 > 1 then (2) has two steady states. The steady state I * ≡ 0 remains but is now unstable. The second steady state is I * * ≡ β−γ β and is globally asymptotically stable with the exception of I 0 = 0. From a mathematical perspective, this exchange-of-stability happens at a transcritical bifurcation when (I, R 0 ) = (0, 1). If η > 0 then (2) always has one steady state. It is globally asymptotically stable, i.e., all non-negative initial conditions yield trajectories that are attracted in forward time to the steady state.

The heterogeneous model
We now modify the baseline model (1) by dividing the population according to some trait that is relevant to the spreading of the disease. This can indicate social behaviour like contact rates or biological traits like natural resistance towards the disease (for a detailed interpretation of heterogeneity we refer to introductory works in epidemiology, e.g. [19]). Each individual is assigned a heterogeneity state (h-state) ω which lies in some set Ω. This ω can of course also be a vector carrying information about more than one trait.

Het. Pop. Dyn. & Scaling Laws
• q(ω) is the intensity of participation in risky interactions of an individual with h-state ω, • β(ω) = ρ(ω)q(ω) where ρ(ω) is the force of infection of the disease towards an individual with h-state ω, • γ(ω) is the recovery rate for an individual with h-state ω, • η(ω) is the h-state dependent fraction of individuals that become infected through the import of the infection from outside the population.
Of course, η(ω) can also take any of the different interpretations mentioned in section 2. Note that if all these variables are constant then the heterogeneous system (3) is equivalent to the homogeneous system (1).
We want to make a short note about two aspects of this model. One is that for each ω the population S(t, ω) + I(t, ω) is obviously constant. This implies that ω itself is not influenced by the disease and an individual that has h-state ω at the beginning remains in that h-state for the duration of our consideration. The second aspect is the transmission function J(t) T (t)+J(t) . Transmission functions of this type have been used before [21,22,60]. Such a transmission function can for example be derived by assuming a population with a heterogeneous social contact network [51]. Models with such populations are at the centre of intensive current research (see e.g. [6,7,18,39]).
We now formulate the mathematical assumptions for the heterogeneous population epidemic model used in our subsequent analysis. The set Ω is a complete Borel measurable space with a nonnegative measure µ and Ω dµ(ω) = 1. All integration with respect to ω is taken to be with respect to that measure. All functions and parameters are assumed to be nonnegative and measurable with respect to µ. Since S(t, ω) + I(t, ω) is constant we can introduce a density function f (ω) := S(t, ω)+I(t, ω). We can assume without loss of generality that Ω f (ω) dω = 1. The function q(ω) is taken to be positive almost everywhere on Ω, i.e. there is always some probability for risky interaction for each h-state. We have for some constant C > 0. By using q(ω) C instead of q(ω), we can assume without loss of generality that T (t) + J(t) = 1. We also assume that the three functions β(ω), γ(ω) and η(ω) are bounded, which makes sense from a modelling viewpoint. Furthermore, we assume there exists an ε > 0 such that inf ω∈Ω β(ω) ≥ ε and inf ω∈Ω γ(ω) ≥ ε.
These assumptions just mean that the transmission probability is never equal to zero when infected and susceptible individuals meet and that there is always at least some positive, albeit potentially very long, time after which any infected individual recovers from the disease. An important consequence of these assumptions is that the functions S(t, ·), I(t, ·) are measurable for every t ≥ 0 (see Theorem 1 in [61]). For η(ω) we consider two cases. First the case that there exist a set A ⊆ Ω with positive measure such that η(ω)f (ω) > 0 for ω ∈ A. We denote this case by η > 0. The second case where such a set does not exist will be denoted by η = 0.
Using S(t, ω) = f (ω) − I(t, ω) and T (t) + J(t) = 1 we can describe the system (3) bẏ It is now a natural question to ask which dynamical features are shared by the homogeneous population ordinary differential equation (ODE) given by (2) and the heterogeneous population differential-integral equation (4).

Persistence of Dynamical Structure
In this section we show that in terms of steady state solutions and their stability properties the system (4) exhibits the same behaviour as the system (2).
Theorem 1. If η > 0 then the system (4) has a unique steady state solution. This solution is globally asymptotically stable. If η = 0 we define the basic reproduction number If R 0 ≤ 1 then (4) has the unique steady state solution I(t, ω) = 0. This solution is globally asymptotically stable. If R 0 > 1 then (4) has exactly two steady state solutions, one of which is I(t, ω) = 0. In this case, the solution I(t, ω) = 0 is an unstable steady state solution while the second steady state solution is globally asymptotically stable with the exception of I 0 (ω) = 0 a.e. on Ω.
Proof. We first show that the system does indeed have the number of steady states we claim it has. LetÎ(ω) be a steady state of (4) andĴ = Ω q(ω)Î(ω) dω. As a steady state of (4)Î(ω) is characterised by the equation

Het. Pop. Dyn. & Scaling Laws
Plugging this into the equation forĴ yieldŝ Every solutionĴ to (7) yields a steady state of (4) by putting it into equation (6). Thus, we are searching for the roots of the function Note that the second derivative is always negative. We consider the case η > 0 first. We know that g(x) = 0 has a solution since g(0) > 0 and g(1) < 0. Since g(x) is concave this solution is unique. Consider now the case η = 0. In this case g(0) = 0, so 0 is a solution. If g ′ (0) ≤ 0 then g(x) negative on the whole interval [0, 1] due to the concavity of g(x). If however g ′ (0) > 0 then g(x) is positive for small enough x. Using same reasoning as in the case η > 0 we see that there exists a unique positive solution to g(x) = 0. We therefore need to determine whether g ′ (0) > 0. Since η = 0 this is given by We see that if R 0 ≤ 1 then g ′ (0) ≤ 0 and 0 is the only solution to g(x) = 0, if R 0 > 1 then g ′ (0) > 0 and there exists a unique solution in of g(x) = 0 in (0, 1) alongside the solution 0.
Now we want to show that the system converges to a steady state. In order to do this, we first need to show that J(t) converges. In particular, we want to prove: The proof of Lemma 1 is one major difficulty in this proof. However, the argument is quite lengthy and technical; hence we include it in Appendix A.
Now that we know that J(t) converges it remains to show that if η = 0 and R 0 > 1 then J(t) converges a positive value and not to 0 unless I 0 (ω) = 0 a.e. on Ω. Consider the inequality Thus, if J(t) is positive and sufficiently small we have This shows that the term Ω we see that J(t) is bounded below by a positive, monotonically increasing function. Therefore it can not converge to 0. Since I 0 (ω) > 0 on a set of positive measure we have that J(0) > 0. Thus, J(t) converges to a positive value. Conversely, if I 0 (ω) = 0 a.e. on Ω then J(0) = 0. Directly from (4) we see that in this casė I(t, ω) = 0 for a.e. ω ∈ Ω and thus J(t) = 0 for all t ≥ 0.
Since J(t) converges, the convergence of I(t, ω) follows immediately from (8). Obviously the limit of I(t, ω) is one of the steady states we identified above. We have also shown that if there are two steady states then I(t, ω) converges to the positive one, unless I 0 (ω) = 0 a.e. on Ω. Since in all other cases the convergence of I(t, ω) is independent of the initial data, the claim about asymptotic stability is proven. ✷ In the case η = 0 the value R 0 acts as a threshold value that determines whether there exists an endemic steady state or not. So far R 0 has only this mathematical meaning. The basic reproduction number is however a biological concept. Using the definition given in [20], the basic reproduction number is defined as the expected number of secondary cases produced, in a completely susceptible population, by a typical infected individual during its entire period of infectiousness. We now want to show that the value R 0 as we defined it coincides with this definition. Also in [20] the following result was obtained.
Proposition 1. Let S(ω) denote the density function of susceptibles describing the steady demographic state in the absence of the disease. Let A(τ, ζ, ω) be the expected infectivity of an individual which was infected τ units of time ago, while having h-state ω towards a susceptible which has h-state ζ. Assume that Then the basic reproduction number R 0 for the system is given by In our case the function f (ω) describes a steady state, provided that there are no infected individuals. The value β(ω) denotes the strength of infection for an individual with h-state ω. The value q(ω) indicates the number of infectious contacts an infected individual with hstate ω has. On the other hand β(ζ) = ρ(ζ)q(ζ) is the average amount of risky contacts that would lead to an infection that an individual with h-state ζ has. The chance of an infectious contact between the infective ω individual and a specific ζ individual is therefore given by q(ω) In the absence of susceptible individuals the equation for the infected is given byİ(t) = −γ(ω)I(t), which suggests that the probability that an infected individual is still infected at time t is given by e −γ(ω)t . Since the infectivity of an individual is in our case independent of how long ago the individual was infected, we can conclude that the expected infectivity A(τ, ζ, ω) is given by q we can use Proposition 1 and get This is exactly the basic reproduction number as defined in Theorem 1.

Extending the Model
Although the heterogeneous population SIS model (4) does capture additional realistic features of populations, there are several effects, which it cannot account for at all, or does not account for very well. In particular, finite-size effects and small fluctuations are not included. Furthermore, most realistic heterogeneous parameter distributions, e.g. the transmission rate, are not fixed in time but could be considered as additional dynamical variables. In this section, we extend the model (4) to include these effects.

Noise
In Section 2 we introduced several interpretations of the parameter η. Although it is modeled as a deterministic influence on the disease, the effect η is supposed to describe on the other hand is seemingly of a random nature. Furthermore, even in a situation where we don't want to model any of these effects (i.e we set η = 0) we can still expect there to be some random deviations from the transmission of the disease as predicted by the deterministic model. In fact, the validity of deterministic epidemiological models is usually argued by viewing them as the average transmission and recovery rate of individual random contacts in a sufficiently large population. It is therefore justified to expect to see some remaining randomness in the actual progression of the disease [12,33,47,50].
We therefore want to model these random effects by exchanging the term containing η with a term containing a stochastic process. A natural starting point for the case when the functional form and properties of the stochastic process are not known is to consider white noise ξ = ξ(t) ξ is a generalized stochastic process, so-called white noise, as discussed in [5]. We also want to consider the case when the noise depends upon the heterogenity and write ξ = ξ(t, ω) with the caveat that ω ∈ Ω still denotes the variable measuring the heterogeneity distribution, while we suppress the underlying probability space for the stochastic process ξ in the notation.
Putting ξ(t, ω) into equation (3) yieldṡ The function σ(ω) : Ω → [0, +∞) is assumed to be bounded and basically provides the noise level for a specific ω. Note that for every ω, the sum S(t, ω) + I(t, ω) is still constant. We can therefore again describe the system (9) by the smaller systeṁ One problem with using an additive noise term is that I(t, ω) always has to be positive. But in this model it would be possible for I(t, ω) to become negative. To disallow this we will usė Similarly, since I(t, ω) has to be smaller than f (ω), we usė In the following considerations we restrict ourselves to models using additive noise. However, we want to indicate another commonly encountered modelling possibility, which is using a multiplicative noise term instead of an additive one. That is, to usė where g : R × Ω → [0, +∞) is bounded. Imposing the conditions g(0, ω) = 0 and g(1, ω) = 0 can now ensure that it is never possible for I(t, ω) to become negative or larger than f (ω). Usually, one also assumes that g(·, ω) does not vanish between zero and one.
Which of these two options is chosen will depend on what kind of random influences are to be considered. If the random fluctuations are meant to offset fluctuations in the transmission and recovery of the infection, then the multiplicative noise term might be more appropriate. First of all, if no infected individuals are present then the disease does not spread at all, which is captured by this model. Also, if nearly no one (or nearly everyone) is infected then the inaccuracies of the deterministic model should be small, so the noise term should also be small. Again, the multiplicative noise exhibits this behaviour.
The model with additive noise, which we will use in the following, is however not without merit. It allows us to model a population that has contact with an outside source that can import the disease into the population. This source can be, as mentioned above, another population which imports the disease. Alternatively, there might be factors in the environment that import the infection. For a population of animals it could for example model the possibility to become infected through one of its food sources. Also for human populations this allows us to assume that there are vermin or insects in their environment, which are carriers of the disease and are able to transmit it to humans. In these situations there is a chance to become infected even in a population that consists entirely of susceptible individuals, which is not captured in models using multiplicative noise.
Also, it should be noted that if both effects are present, i.e. internal fluctuations as well as external fluctuations, and we assume that both noise terms act as summands in the model, then the noise term is [σ(ω) + g(I(t, ω), ω)]ξ(t, ω).
Near the two states I(t, ω) ≡ 0 and I(t, ω) ≡ f (ω), we have that g (I(t, ω), ω) is a higher-order term in comparison to the constant term as long as the constant term does not vanish and we are mainly interested in the regimes near the the two states I(t, ω) ≡ 0 and I(t, ω) ≡ f (ω) in the remaining part of this work. Also note that a multiplicative noise term g(I(t, ω), ω)ξ(t, ω) with g(0, ω) > 0 can always be written as which is near I(t, ω) = 0 again the sum of an additive noise term and a term of higher order. Based on these arguments, we proceed with additive noise but it could definitely be interesting to investigate the purely multiplicative noise in future work.

Multiple time scales
As a final extension of our model we now introduce a slow variable into the system. Making certain model parameters slow dynamic variables is a very natural extension used in virtually all areas of research in mathematical biology [26,45]. The main reason is that it is usually not correct to assume that all system parameters are fixed but most system parameters are going to change slowly over time, so a parametric model should rather be viewed as a partially frozen state for a model with multiple time scales.
In the context of epidemiology, many diseases have seasonal cycles or are latent for a longer period before it comes to an outbreak. In both cases we assume that the basic reproduction number R 0 was smaller than 1 until some time, which means the stable steady state of the deterministic system is 0, and bigger than 1 afterwards, which means that a stable endemic steady state exists. In order to capture this in our model we assume that β(ω) slowly changes over time. In fact, there are many different possibilities that may lead to a slowly changing transmission rate, including seasonal changes, evolutionary processes, socio-economic influences, and so on. Furthermore, if we would keep the transmission rate fixed as a parameter, then we would either observe a disease-free state or an endemic state in the SIS model but not the transition between the two cases. It is precisely the dynamic transition regime which we are interested in.
We assume that the time dependence of the function β(t, ω) is such that R 0 is increasing in t. For example, assume that β(t, ω) is separable, i.e. there exists a function β 0 (t) such that β(t, ω) = β 0 (t)β(ω), and that this function β 0 (t) evolves according to the equationβ 0 (t) = ε for 0 < ε ≪ 1. In this case we would have Thus, R 0 (t) is strictly increasing and, if β 0 (0) is small enough, R 0 (0) < 1. This is exactly the situation we want to capture. This effect can of course also be achieved with a β(t, ω) which is not factorisable. We are therefore looking at the systeṁ with an appropriate function h(t, ω).

Warning-Signs for the Homogeneous Fast-Slow Stochastic Model
In this section, we briefly recall some techniques for fast-slow systems and warning signs for stochastic fast-slow systems. For reviewing this material, we consider a simple homogeneous version of (13) to simplify the expositioṅ where I = I(t) is the fast variable and β = β(t) the slow variable. For σ = 0, ε = 0, the set is called the critical manifold [37] and consists of steady states for the fast subsystem, which is obtained by setting ε = 0 in (14). The transcritical bifurcation discussed in Section 2 separates C 0 into three parts in the positive quadrant Then C a 0 and C e 0 consist of attracting steady states for the fast subsystem, while C r 0 is repelling. The stability is exchanged at the transcritical bifurcation point with β = γ, i.e. at R 0 = 1. The deterministic fast-slow systems analysis of the dynamic transcritical bifurcation with 0 < ε ≪ 1 can be found in [43,55], where one key point is that one can extend a perturbation C a ε , a socalled attracting slow manifold, of C a 0 up to a region of size I ∼ O(ε 1/2 ) and β − γ ∼ O(ε 1/2 ) as ε → 0 near the transcritical bifurcation point. The relevant conclusion for us here is that a linearisation analysis is expected to be valid up to this region, excluding a small ball of size O(ε 1/2 ).
It can be shown that sample paths of the stochastic system with 0 < ε ≪ 1, 0 < σ ≪ 1 also track with high-probability the attracting manifold inside a neighbourhood of order O(ε) plus a probabilistic correction term [8]. However, as the transcritical bifurcation point (I, β) = (0, γ) is slowly approached from below β ր γ, the probabilistic correction term starts to grow. Indeed, there is a simple intuitive explanation for this behaviour due to an effect also called "critical slowing down". To understand this effect, Taylor expand the drift and diffusion terms of the I-component of the stochastic differential equation (14) around C a 0 and keep the linear terms, which yieldsİ where we view β as a parameter for now and use I to emphasize that we work on the level of the linearization. Then (15) is just an Ornstein-Uhlenbeck (OU) process [23]. Consider the regime β ≤ γ, then the variance of the OU process increases if β increases and it is an explicit calculation [44] to see that so the variance increases rapidly as we start to approach the bifurcation point by changing the parameter β more towards γ. This makes sense intuitively as the deterministic stabilizing effect from the drift term [β − γ]I(t) pushing towards a region near C a 0 is diminished ("critical slowing down") and hence the noisy fluctuations increase. It is known that the effect of critical slowing-down in combination with noise can be exploited to predict bifurcation points in certain situations (see e.g. the ground-breaking work [62]). The idea has been also suggested in the context of ecology [14] and then applied in many other circumstances [56]. In fact, one may prove that we indeed have for the full nonlinear stochastic fast-slow system (14), under suitable smallness assumptions on a fixed noise level and staying O(ε 1/2 ) away from the region of the deterministic bifurcation point, that where α = 1, A = σ 2 , β(t crit ) = γ with β(0) < t crit ; the details can be found in [44] using moment expansion methods, and in [9] using martingale methods and/or explicit OU-process results. The main practical conclusion is that there is a leading-order scaling law of the variance as the value of R 0 is approached by letting the transmission rate slowly drift in time. This scaling law can be used for prediction as the scaling exponent α = 1 is universal for a non-degenerate transcritical bifurcation. In fact, a calculation of the leading-order covariance scaling laws for all bifurcations up to codimension-two in stochastic fast-slow systems has been carried out [44], which builds a mathematical framework for generic systems. However, this theory simply does not apply to the heterogeneous population model (13) we consider here. In particular, the influence of heterogeneity on early-warning signs has not been investigated much to the best of our knowledge (for examples see [38,56]). Since it is a key effect in realistic models of disease spreading, it is natural to ask, how it influences the scaling law (17).

Numerical results
Here we present numerical simulations of the homogeneous and heterogeneous system to see the influence of the heterogeneity. We have chosen to consider the variance as the early warning sign. We assume that the variance behaves like A (t crit −t) α for appropriate A and α, where t crit is the time at which R 0 (t) = 1. The main difficulty lies in correctly determining α. We will calculate it by fitting the reference curve A (t crit −t) α to the time series of our simulation using the least squares method. To smoothen the time series we will average the variance over 100 simulations. However, since the reference curve goes to infinity at t crit we do not fit the curve over the whole interval, which also takes into account the theory, which excludes a small ε-dependent ball near R 0 (t) = 1 as discussed in Section 5.3. We therefore calculate the best fit over 80% or 90% of the considered time interval. Generally, fitting over 90% gives better results. In the cases we consider only 80% of the interval, fitting over a larger part would not yield reasonable results as the solution goes to −∞. These are cases in which the sample path drops below the negative unstable branch of the transcritical bifurcation (see Figure 9). Since different choices in the size of the considered interval lead to slightly different values for α we cannot claim to calculate the exact α that the variance of I(t) follows. We will however be able to detect changes in the level of α that are due to influences of the heterogeneity.
One further aspect we fix for all our considerations is the order in which we aggregate the system and calculate the variance. We could calculate the variance of I(t, ω) and then aggregate these variances, or first calculate I(t) = Ω I(t, ω) dω and calculate the variance of I(t). We choose the latter option since in applications it is more feasible to be able to track the changes of the prevalence of the disease in the whole population rather than being able to track it for each h-state, as would be required by the first method.
In this section, we shall only consider the numerical simulations make observations about the results. A more detailed discussion why certain effects may occur is then given in Section 7.
First we consider the homogeneous system with additive noise and a very simple multiplicative As initial conditions we choose I(0) = 0 and β 0 (0) = 0. The parameters are chosen as β = 0.3, γ = 0.4, and σ = 0.01. The time scale separation parameter ε for the slow variable drift is set to ε = 0.0001. If we allow I(t) to become negative then we know that the variance of I(t) should behave as A t crit −t . In Figure 1 we show the variance of I(t), averaged over 100 simulations, and the reference curve with both the theoretical exponent α = 1 and with the exponent provided by the best fit over 80% of the time interval. The measured exponent is reasonably close to the theoretically predicted value α = 1; this slight underestimate is expected as a transcritical bifurcation splits into two saddle bifurcations upon generic perturbations and for saddle-nodes the exponent is α = 1 2 ; see also [44] for more details, which exponents may occur in the generic cases. Here we see the variance of the aggregated variable I(t). As a reference we show the curve A/(t crit − t) α with both the expected theoretical exponent α = 1 and with the exponent α = 0.8414 provided by the best fit over 90% of the considered time interval. Note that as we approach the critical moment the curve for the variance is noticeable below the curve with α = 1.
for the discrete-time numerical scheme; for an introduction to numerical schemes for stochastic ordinary differential equations see [29]. The results show that the key exponent α decreases in comparison to the system without cut-off.
Next, we consider the heterogeneous system. We are going to consider situations in which the white noises ξ(t, ω) are dependent on each other for different ω ∈ Ω, or where the space of h-states is discrete to understand, which implications these assumptions have on the model. Note that both assumptions have a direct modelling motivation. Usually, we may group or cluster different parts of a heterogeneous population into different classes, e.g. all parts with a different trait. Secondly, ξ(t, ω) models all stochastic internal and external effects and one natural assumption would be that all classes of the heterogeneous population are subject to the same external fluctuations, which would lead to the case ξ(t, ω) = ξ(t), i.e. the same white noise acts on all h-states. Note that for the aggregated variable I(t) we havė If Ω is continuous and the ξ(t, ω) are independent of each other, then Ω σ(ω)ξ(t, ω) dω = 0 and the influence of the noise is reduced to indirect effects. We therefore consider either continuous Ω with dependent ξ(t, ω) or a discrete Ω with independent ξ(t, ω).
As measure µ we choose the counting measure normed to 1 over Ω We assume that β(t, ω) = β 0 (t)β(ω) andβ 0 (t) = ε. As in the homogeneous case we choose I(0) = 0 and β 0 (0) = 0 as initial conditions and β = 0.3, γ = 0.4, and σ = 0.01 for the parameters. The time scale separation parameter ε for the slow variable is again set at ε = 0.0001. Here the heterogeneity influences the number of elements in Ω and the distribution f (ω). Furthermore, ξ(t, ω) are chosen as n independent identically distributed random variables. We will both now This is simply a normal distribution with mean 0.5 truncated to Ω. Figure 5 shows f (ω) for different values of p. The parameter θ is the standard deviation of this distribution. Note that as θ goes towards 0, the function f (ω) converges to the delta-distribution δ(ω − 0.5). Hence, the the heterogeneous system starts to approximate the homogeneous one as θ → 0. On the other hand, if θ → +∞ then f (ω) converges to the constant function f (ω) = 1. We will therefore parametrise f (ω) with θ = 1 (2p−2) 2 − 1 4 for p ∈ (0, 1). In the discrete case which we consider first, this yields approximately a binomial-type distribution. In Figure 3 we show the result for p = 0.5 and two different choices of n. Figure 4 shows how both α and A in the best fit curve change with increasing n. A clear trend is observed showing that α (and A) decrease as n is increased.
For the heterogeneous system with continuous Ω we choose Ω = [0, 1] with µ as the Lebesgue measure. At first we again restrict the influence of the heterogeneity to the function f (ω). The choice of the other parameters in unchanged from the discrete system. What has to be changed however, is the noise term in the equation. As mentioned above we want the noise for different h-states to be dependent on each other. We do this by using the first natural approximation of using the same white noise for all h-states, i.e. ξ(t, ω) = ξ(t) independent of ω. In Figure 6 we show the variance of I(t) against the reference curves for two different values of the parameter p. In Figure 7 we show, how p influences both A and α. The results show that upon increasing p, we first see α increase and A decrease until they stabilize for larger p. We observe that the stabilization approximately happens when the distribution f (ω) starts to have full support on [0, 1].
The last case we are interested in here is to consider a system where β(t, ω) is not separable in the sense that it cannot be factored into a product of functions depending only on t and ω. From the modelling standpoint, this means that the evolution of the transmission rate and the heterogeneity in the population interact in a non-trivial way, for example, one may consider the situation when a certain population trait amplifies the change in the transmission rate, while another trait decreases it. As a first benchmark mathematical example, we simply seṫ β(t, ω) = ε(ω + 0.5)t ω−0.5 , which is solved by β(t, ω) = εt ω+0.5 . We restrict any further influence of ω to f (ω). However, we choose f (ω) slightly differently than before. We set  Figure 7: This shows the influence of the parameter p on the values α and A of the best fit, calculated over 90% of the time interval. In α we see initially a steady decrease until it reaches a constant level. In A we see an initial increase until the values reach a fixed level. Note that the leveling out both α and A occur for the same values of p. Furthermore, by comparing with Figure 5 we see that this coincides with those values of p for which the support of f (ω) becomes the whole of Ω. i.e. a normal distribution with mean µ and a standard deviation of 0.1. We let µ vary in [0, 1]. All other parameters are the same as before. Figure 8 shows, how µ influences A and α as calculated from an aggregation of 100 simulations and fitted over 90% of the time interval. We observe a very strong trend in the crucial exponent α, which decreases as the mean µ of f (ω) is increased.

Explanations
In this section we give some explanations, formal or heuristic, for the effect that are observable in our simulations

Homogeneous system
The first effect we want to explain is the influence of the cut off on the homogeneous system. Since the steady state solution I(t) = 0 is asymptotically stable and the added white noise always has an expected value of 0, in the system without cut off I(t) fluctuates around 0. Once we introduce the cut off I(t) can no longer fluctuate freely. This introduces a bias in the positive direction. That is, a sample path I(t) is free to change upwards but we stop it when it changes too far downwards. This results in the averaged path being strictly positive (see Figure 9). Another effect is that because we restrict the fluctuations of the white noise we decrease the variance of the resulting stochastic process I(t). This can be seen by comparing Figures 1 and  2. Finally, in the system without cut off the variance increases at a certain rate. In the system with cut off this increase is still present, but we also have a second effect at work. Due to the fact that the averaged path also increases, each individual sample path has, as it were, more With cut off Without cut off Figure 9: Averaged path of the homogeneous system, averaged over 100 simulation, both with and without cut off. The path without cut off eventually tends towards −∞ as it drops below the unstable branch of the transcritical bifurcation.
space to fluctuate in, as a downwards deviation from the average path can now be bigger than before without hitting 0. Thus in addition to the usual increase in the variance there is also a decrease of the restriction we place on the variance. Therefore, the increase of the variance is steeper in the system with cut off. This steeper increase is translated into a decrease of α.

Discrete heterogeneous system
We now want to analyse the observed changes in the heterogeneous system. We first look at the case where Ω is discrete. Recall that we used In Figure 10 we show, how this normalisation constant C changes with n. Since C is increasing in n we have that, heuristically, for a fixed ω ∈ Ω the value f (ω) decreases. A more rigorous way to state this is to say that if ω is in Ω for a discretisation level n 1 and for a level n 2 with n 1 < n 2 , then f (ω) is smaller for n 2 . Now note that due to the fact that we have chosen most of our parameters independent of ω, the linearisation ofİ(t, ω) is given bẏ Thus, if f (ω) becomes smaller then I(t, ω) becomes more "rigid", i.e. it fluctuates less, which results in smaller value of A. But this in turn also means that as R 0 approaches 1 the additional freedom to fluctuate increases. This results in a bigger increase in the variance of I(t) and thus a smaller value of α. Both of these effects are visible in Figure 4. Furthermore, by comparing Figures 4 and 10 we see that the levelling out of α and A coincides with the levelling out of C.

Continuous heterogeneous system
For the heterogeneous system with continuous Ω we note that by definition we always have I(t, ω) ∈ [0, f (ω)]. If the parameter p is big enough then f (ω) is large enough for all ω so that the upper bound is not important due to the fact that it is never reached. If p is small however, then f (ω) also becomes small for some ω. Thus we not only have a cut off at 0 but also at f (ω). Thus, for small p the variance is even more restricted. Also for these ω a rise of the average path will not result in more freedom in its fluctuation due to the restriction above by f (ω). Only when p increases and the upper bound f (ω) becomes less and less important, then the increase of the variation is aided by a increased freedom to fluctuate, which leads to lower values of α. In Figure 7 we see exactly this behaviour. Since these changes in A and α depend solely on these cut off effects we expect that they vanish if we make the same simulations for the system without cut off. The results of such a simulation can be seen in Figure 11, where indeed p has no discernible influence on A or α.  Figure 11: We show, dependent on p, the change in the values α and A of the best fit, calculated over 80% of the time interval, for the heterogeneous system without cut off. There is no discernible influence of p present.

Non-separable β(t, ω)
In order to explain our observations of the system where β(t, ω) is not separable we first look at the linearisation of the equations. We assume that all functions in our equations are in L 2 (Ω). We can write for the deterministic system (4) The Fréchet-derivative of F , evaluated at I * and applied to ζ(ω), is given by We define a linear operator T by T I(t, ω) = dF dI (0) I(t, ω). In particular, the equation linearised at 0 reads aṡ I(t, ω) = T I(t, ω) = f (ω)β(ω) Ω q(ω)I(t, ω) dω − γ(ω)I(t, ω).
We are interested in the spectrum of the operator T . We consider this operator on the space X = {ζ ∈ L 2 (Ω) : ζ(ω) ∈ [0, f (ω)]}, i.e. the subset of L 2 (Ω) that consists of the points which are possible states of our system. A point λ ∈ C is an eigenvalue of T if and only if there exists an eigenvector ζ ∈ X such that T ζ − λζ = 0. This equation in its longer form is We can rearrange this to get Plugging this into the above equation yields An eigenvalue λ of T must therefore satisfy or (γ(ω) + λ)ζ(ω) = 0 and Ω q(ω)ζ(ω) dω = 0.
Note that any λ that satisfies the first equation in (19) is negative as γ(ω) is strictly positive. Furthermore, due to q(ω) being a positive function, any eigenvector to fulfil the second equation in (19) has to be negative somewhere. The domain X which we consider for T does therefore not contain any eigenvectors satisfying this equation. For these reasons we consider only equation (18) to be relevant for our considerations. This equation has a unique solution. Note that for λ = 0 the left hand side is exactly R 0 . In particular, λ is positive if R 0 > 1 and negative if R 0 < 1. Note that if we assume in our calculations that γ(ω) is independent of ω then we can rearrange the equation (18) to identify λ as 1 If we now assume that β(t, ω) is a time dependent slow variable, then we can expect that this equation approximately describes the evolution of λ. In particular, if β(t, ω) is separable, β(t, ω) = β 0 (t)β(ω), then Ω q(ω)f (ω)β(ω) dω is a constant κ and we get λ(t) = β 0 (t)κ − γ. Var(I) Figure 12: The homogeneous system for different β 0 (t), with best fit over 90% of the time interval. While for β 0 (t) = εt 0.8 the variance is still in the vicinity of the reference curve with the theoretical exponent α = 1 (although visibly below it), for β 0 (t) = εt 1.5 these curves are markedly different. This can also be seen in the value α of the best fit. In the former case it is α = 0.8435 while for the latter we get α = 0.3795.
We know that λ(t) is the exponential rate with, which the quasi-stationary system (ε = 0) would go to 0. Hence, for negative λ(t), the smaller it is the more "rigid" the system is. If β 0 (t), and thus λ(t), is increasing fast near the critical point then it is tightly locked to 0 until shortly before t crit . Therefore, we expect a sharp increase in the variation close to t crit and thus a low α. We show this effect for the homogeneous system in Figure 12. In our simulation for the heterogeneous system we achieve the same effect by changing the distribution f (ω). Recall that we used β(t, ω) = εt ω+0.5 . Thus, for ω = 0 the increase is as the square root of t while for ω = 1 it is polynomial. With the parameter µ we can control, which increase is dominant. If µ is small then f (ω) is concentrated on those ω for which β(t, ω) ≈ εt 0.5 . Thus it grows slowly and we expect a higher α. Also the system is less "rigid" and allows for a higher overall variance in I(t) and thus larger A. As µ increases, so does the derivative of λ(t) and we expect a more rigid system (hence smaller A) and a faster increase of the variance near the critical point (smaller α). Both of these behaviours can be seen in Figure 8. In Figure 13 we show how λ(t) behaves for different choices of µ.

Outlook
In this paper we have provided new insights on qualitative persistence and quantitative nonpersistence of various dynamical phenomena in an SIS-model with heterogeneous populations. The main conclusions are that one can expect a generic dynamical structure of a disease-free and endemic state, separated by a transition at R 0 , to persist. However, the classical warning signs for tipping points have to be re-considered carefully in heterogeneous epidemic models. In particular, we observed that the scaling law exponent for the inverse power-law increase of the variance decreases and in many cases lies below the theoretically predicted values of the homogeneous population system. This means that using an extrapolation procedure with fixed exponent to predict the region, where the practical R 0 -value lies, may not give the correct epidemic threshold.
Since, this work is one of the first investigations of warning signs in heterogeneous population models, it is clear that many open questions remain. Here we shall just mention a few of these. From a mathematical perspective, it would be natural to ask for a full analytical description of phenomena arising near bifurcation points for heterogeneous stochastic fast-slow systems. There are basically no results in this direction available yet, although recent significant progress in mathematical multiscale dynamics may suggest that a (partial) analysis should be possible [44]. From a biological and epidemic-modelling perspective, it would be interesting to compare different classes of models to the fast-slow heterogeneous stochastic SIS model we considered with a view towards heterogeneity, epidemic thresholds and warning signs for critical transitions. For example, this could include SIR models [30,52], adaptive network dynamics [25,49,57], and stochastic partial differential equations [1,46].
Of course, many other extensions of the model, for example demographic changes, could also influence the behaviour. A focus on quantitative scaling laws could shed new light on which models are most appropriate for certain disease outbreaks, when results are compared with data.
Of course, our study here only carries out a few important baseline steps to achieve these future goals. Nevertheless, it provides clear evidence for the need to further investigate the interplay between various effects such as parameter drift, noise, and heterogeneity in the context of biological models, which exhibit bifurcation phenomena of high practical and social relevance.

A The convergence in mean
Here we prove the auxillary result Lemma 1, which shows that for the deterministic heterogenous SIS model we study, a suitable weighted mean of the infected population J(t) := Ω q(ω)I(t, ω) dω has a well-defined limit.
Proof of Lemma 1. We employ the same notation as in the proof of Theorem 1. In addition, define J * = lim sup J(t) and J * = lim inf J(t). Assume that J(t) does not converge, then J * − J * > 0. In the following five steps we lead this assumption to a contradiction.