AN EXACT APPROACH TO CALIBRATING INFECTIOUS DISEASE MODELS TO SURVEILLANCE DATA : THE CASE OF HIV AND HSV-2

When mathematical models of infectious diseases are used to inform health policy, an important first step is often to calibrate a model to disease surveillance data for a specific setting (or multiple settings). It is increasingly common to also perform sensitivity analyses to demonstrate the robustness, or lack thereof, of the modeling results. Doing so requires the modeler to find multiple parameter sets for which the model produces behavior that is consistent with the surveillance data. While frequently overlooked, the calibration process is nontrivial at best and can be inefficient, poorly communicated and a major hurdle to the overall reproducibility of modeling results. In this work, we describe a general approach to calibrating infectious disease models to surveillance data. The technique is able to match surveillance data to high accuracy in a very efficient manner as it is based on the Newton-Raphson method for solving nonlinear systems. To demonstrate its robustness, we use the calibration technique on multiple models for the interacting dynamics of HIV and HSV-2.

1. Introduction.In order to take a mathematical model of infectious disease from a theoretical construct to a practical tool for informing health policy in a specific setting, an important initial step is model calibration.The terms "model calibration", "model fitting" and "model parameterization" all describe the process of estimating values for the parameters used within a mathematical model so that its output is relevant to the situation of interest.Despite its importance, calibration is commonly overlooked as a part of the modeling process with its details often relegated to appendices or supporting materials.Insufficient communication of the calibration process presents a significant obstacle to the overall reproducibility of modeling work.
In this work we present an exact approach to calibrating infectious disease models to surveillance data that aims to streamline the process while increasing efficiency and reducing reliance on computational methods.At its core, the approach reserves specified model parameters to be used as fitting variables where the number of fitting variables is determined by the number of calibrating conditions to be satisfied.Estimates for the remaining model parameters are established from empirical studies and/or previous modeling studies.Once a set of values has been specified for nonfitting parameters, values of the fitting variables are found so that the mathematical model exactly matches the given calibration conditions.
The structure of the paper is as follows: in the next section we describe three general approaches to model calibration.In Section 3, we outline our general approach to exact model calibration and illustrate it with a simple toy model.Focusing our attention to the calibration of models for the interaction of HIV and HSV-2 (genital herpes) for illustration, we then demonstrate our calibration approach for multiple mathematical models of varying complexity: a 4-compartment SI-type model for the interaction of HIV and HSV-2 in Section 4, the Granich et al. model [11] for HIV in Section 5, an HIV/HSV-2 coinfection model with behavioral response in Section 6 and some extensions to the calibration approach in Section 7. Finally, in Section 8 we provide a summary of the approach and discuss generalizations, strengths and weaknesses.

2.
Background.When parameterizing a model, the subset of parameters that will be allowed to vary in order to achieve a desirable fit to data is identified.We refer to these as calibration or fitting parameters.In an ideal situation, reliable estimates and/or ranges of values exist for the remaining model parameters.Model calibration or fitting is then the process a specifying values for the calibration parameters.While many techniques for doing so exist, we emphasize three general approaches.
While easy to overlook as a method, manual calibration is quite possibly the most ubiquitous approach to model fitting.We use the term "manual calibration" to loosely refer to scenarios in which a modeler simply plays with parameter values until a satisfactory fit is achieved.Despite its lack of rigor, the hands-on approach of adjusting parameter values and observing the effects offers the researcher an opportunity to gain valuable insights and familiarity with a model.Of course, obvious downsides for the manual approach include the facts that the process can be inefficient, produce less than desirable calibration results and can be difficult to reproduce.
A more formal, yet still intuitive, approach involves obtaining random samples of parameter sets and finding those that adequately fit the data.In this approach, one begins by specifying probability distributions for the parameters of interest.Multiple random parameter sets are then sampled from these distributions (often using a Latin Hypercube Sampling technique) with the model simulated for each parameter set.Parameter sets that do not produce model behavior that adequately fits the given data are then filtered out leaving only those that are relevant.The remaining parameter sets then represent the result of the calibration process.Such a calibration is conducive to uncertainty and sensitivity analysis as variation in multiple model parameters is included in the parameter sets.However, a drawback of the approach is its ultimate reliance on sophisticated statistical techniques (e.g.partial rank correlation coefficients, logistic regression, etc.) to demonstrate relationships between model parameters and behaviors of interest.The required familiarity with these techniques (including their appropriateness in the given situation and their metrics) can impede clear understanding of the modeling results for some readers.Lastly, the fact that parameters are randomly sampled means that, although the qualitative behavior should be the same, the modeling results are not completely reproducible.
The last and most formal approach to calibration is to use a numerical fitting algorithm.In this approach, optimal parameter values are found that minimize a specified metric of error (e.g.absolute or squared error between model simulation and data).Numerical fitting algorithms can be deterministic (e.g.Nelder-Meade simplex method and Levenburg-Marquardt method) or involve a stochastic search component (e.g.Markov chain Monte Carlo search and genetic algorithms) and are often quite computationally demanding.The benefit of such rigorous methods is that they reproducibly produce the very best fit possible for a given model.The price for their performance is that the details are complicated and hidden within subroutines of many proprietary software packages.Consequently, the process of model calibration is hidden within a "black box" for many practitioners.
3. Illustration of the exact approach to model calibration.The goal of this work is to introduce an exact approach to model calibration.Our approach is not intended to replace those described above but rather as an alternative that may be preferable in certain situations or could be used in conjunction with other methods in others.
For example, it is often the case that one wishes to calibrate an epidemiological model to disease surveillance data (i.e.standardized estimates of disease prevalence, incidence and other important indicators) such as those contained in WHO Global TB database and the UNAIDS/WHO Global HIV/AIDS Online Database.More specifically, assume that we are examining an infectious disease in a particular setting for which disease prevalence is at or near endemic equilibrium.As a simple illustrative example of the overall approach, consider calibrating a basic SI infectious disease model, given by S = Λ − βSI − µS, I = βSI − µI, to disease prevalence at endemic equilibrium.We must first identify the calibration parameter.In general, we note that disease transmission coefficients are particularly well-suited for this role as reliable estimates for transmission coefficients rarely exist and because transmission coefficients incorporate a multitude of factors that can vary significantly among different settings and populations (e.g.biological and environment factors, human behavior, effect of interventions, etc.).Using β as our calibration parameter, we note that the model yields a disease prevalence of P * = 1 − µ/β at endemic equilibrium.Hence, when values for the other model parameters are established, the transmission coefficient β can be used to exactly calibrate to a specified disease prevalence, P * , using the fact that β = µ/(1 − P * ).
Of course, calibration in this simple example is greatly facilitated by an analytic expression for disease prevalence at equilibrium that will not be available for more detailed models.In following sections, we show how the construction of a calibration system of equations that is then solved by Newton's Method can overcome this hurdle for models of varying complexity.
Many epidemiological studies [9,21,14,5,15] have shown that infection with HSV-2 increases an individual's susceptibility to infection with HIV.The mechanisms behind this increased susceptibility are that genital ulcers resulting from HSV-2 infection provide passage to HIV viruses through epithelial cell layers to the more vulnerable cell layers beneath and that HSV-2 infection promotes the recruitment of HIV target cells to the genital region.In our model, HIV-negative individuals that are infected with HSV-2 are more likely to become infected with HIV than those not infected with HSV-2.Higher HIV infectivity has also been observed for individuals that are coinfected with HIV and HSV-2 compared to those infected with only HIV [12,4] due to increased HIV viral load and/or increased access through mucosal membranes via herpetic lesions.Consequently, we model HIV-positive individuals that are also HSV-2 infected as more infectious than those that are only infected with HIV.Lastly, it has also been shown that infection with HIV increases HSV-2 infectivity of coinfected individuals by increasing the duration of HSV-2 shedding episodes and increases the frequency of such episodes [1].
4.1.The model.The interacting dynamics of HIV and HSV-2 infection are modeled using the following system: where S is the number of fully susceptible individuals (i.e.infected with neither HIV nor HSV-2); H is the number of individuals infected with HSV-2 only; A is the number of individuals infected with HIV only; C is the number of individuals infected with both HIV and HSV-2; and N = S + H + A + C is the total size of the population.We assume a background mortality rate of µ and that disease mortality is only induced by HIV infection and is modeled at a rate of µ A .The model incorporates the most important aspects of the interaction between HIV and HSV-2 infection.Namely, being infected with HSV-2 renders an individual more susceptible to HIV infection (by a factor of r 1 ); co-infected individuals are more infectious and likely to spread both HIV and HSV-2 (r 2 and r 3 , respectively).Consequently, we have that r i ≥ 1 for i = 1, 2, 3.For detailed descriptions of how the empirical findings on the interaction of HIV and HSV-2 infection from [9,21,14,5,15,9,21,14,5,15,12,4] are transformed into transmission cofactors for mathematical modeling, see [10,1].The flow diagram for the model is depicted in Figure 4.

4.2.
Calibration to HIV and HSV-2 prevalence at equilibrium.As in Section 3, we will consider the situation where we wish to parameterize the current model in (1) for a setting with specified HIV and HSV-2 prevalence levels (which we denote as A and H, respectively) assuming both diseases are at equilibrium.To achieve the two prevalence conditions, we use the transmission coefficient for HIV, β, and that for HSV-2, σ, as fitting parameters.We assume that values for the remaining model parameters have been established (e.g.fixed from the literature).It is worth noting that despite the relative simplicity and small dimension of the model in (1), finding an explicit expression for the its endemic equilibrium is intractable via direct methods due to the highly nonlinear nature of the system; even when using modern computer algebra systems (CAS).
Our calibration approach is facilitated by rescaling the state variables of the model so that s = S N , h = H N , a = A N and c = C N .Hence, state variables of the rescaled model represent proportions of the total population rather than numbers of individuals.Noting that s = 1 − h − a − c, the full model (1) reduces to If the endemic equilibrium is denoted by (h * , a * , c * ) and we calibrate to the prevalence data A : overall HIV prevalence at endemic equilibrium, H : overall HSV-2 prevalence at endemic equilibrium, it follows that A = a * + c * and H = h * + c * .At endemic equilibrium, it must be the case that h = a = c = 0. Including our calibration conditions, we arrive at our calibration system of equations where the first three equations impose the equilibrium conditions and the last two impose the calibration conditions on disease prevalence.
4.2.1.Newton's Method.Introduced in many first semester Calculus courses, Newton's Method is perhaps the most famous numerical method because of its efficiency and elegance.In the one-dimensional case, the method finds successively better approximations to the solution of f (x) = 0 using the iteration Visual illustration of Newton's Method in 1-dimension for finding the solution of f (x) = 0.As illustrated, the iteration results from finding the root of the tangent line at the current iteration (x n , f (x n )).
As seen in Figure 2, Newton's Method essentially approximates the root of a function using the root of its first order approximation (i.e. its tangent line in the 1-dimensional case).In n−dimensions where we wish to solve f ( x) = 0, Newton's Method can be written as where Df ( x) is the Jacobian matrix of f evaluated at x.As matrix inverses are computationally unstable, the typical iteration is x n+1 = x n + ∆ x where ∆x is the solution to Df ( x n )∆ x = −f ( x n ).As long as the initial approximation x 0 is close enough to the actual root and the Jacobian is nonsingular, Newton's Method converges quickly to the desired root.
Unfortunately, there is no theoretical result dictating how close is "close enough" to guarantee convergence and one must often experiment with initial approximations until a suitable one is found [6].To use Newton's Method to solve of our calibration system (8), it turns out that finding an initial approximation close enough to the actual solution is not terribly demanding.In fact, using the simplifying assumption that HIV and HSV-2 infection are independent produces an adequate initial approximation in every scenario tested.Specifically, we let and solve (4) and ( 6) for σ and β to obtain initial estimates Given its prominence among numerical methods, a form of Newton's Method should be available in all computing platforms.In most software, Newton's Method is at the core of the default root-finding function.Among options for the required precision of the solution, maximum iterations, etc., most implementations can compute an automatic numerical approximation to Jacobian matrix.Alternatively, the user can choose to supply the exact Jacobian function to increase efficiency and accuracy.
In every scenario tested, automatic approximations to the Jacobian have performed admirably.To demonstrate the performance of our calibration approach for a high HIV prevalence setting, we consider the case of South Africa.Data from the World Bank [18] shows that, like in many sub-Saharan African countries, HIV prevalence has stabilized over last several years.was prevalent in human populations long before HIV, it is reasonable to assume that HSV-2 prevalence would have also be at or near endemic equilibrium.While more uncertainty exists in estimates of HSV-2 prevalence than HIV prevalence, data suggests an HSV-2 prevalence of roughly 50% [2,16,22].For the South Africa example, we use the baseline parameter values from Table 1 which includes three values specific to South Africa (i.e.µ A = 1 20 to incorporate the impact of HIV on life expectancy in South Africa, A = 0.17 and H = 0.50.).Our initial approximation to the solution of the calibration system of equations ( 4)-( 8) is found using ( 9)-( 11) to get Table 2 shows the iterations of Newton's Method to complete the calibration.We note that the method converges after only 3 iterations.We use data from the United Kingdom as an example of low HIV prevalence setting.Clearly, calibrating the unstructured model from (1) to HIV prevalence in the general population is not appropriate for making policy decisions regarding HIV in the UK given the concentrated nature of infection among particular risk groups.Given such low HIV burden, it is also difficult to argue that prevalence is at or near equilibrium.Here, we use this extreme example simply to illustrate the robust nature of our calibration approach.For this example, we assume an HIV prevalence of 0.3% [18] and an HSV-2 prevalence of 4.0% [20].
For the United Kingdom example, we use the baseline parameter values from Table 1 which includes three parameters values specific to the United Kingdom (i.e.µ A = 1 100 to reflect that life expectancy of HIV-infected individuals is near that of uninfected individuals in the United Kingdom, A = 0.003 and H = 0.04.).Our initial approximation to the solution of the calibration system of equations ( 4)-( 8) is found using ( 9)- (11) to get c 0 = 0.00012, a 0 = 0.00288, h 0 = 0.03988, σ 0 = 0.029766, β 0 = 0.035352.
As shown in Table 2, convergence to the solution of the calibration system is achieved in only 2 iterations of Newton's Method.4.3.2.Robustness of calibration approach.In this section, we calibrate the fourcompartment SI-type HIV/HSV-2 coinfection model for a range of randomly sampled model parameters and calibration conditions.Specifically, we obtain 50 independent samples of model parameters and calibration conditions from uniform distributions over the ranges given in Table 1.The calibration system (4)-( 8) was then solved using Newton's Method with initial approximation given by ( 9)- (11).For each sample, the calibrated value of β and σ was noted along with the resulting HIV and HSV-2 prevalence at endemic equilibrium and the necessary number of iterations of Newton's Method.The results are shown in Table 3.For all 50 simulations, the calibration approach successfully matched the calibration conditions (i.e.HIV and HSV-2 prevalence) while requiring no more than 4 iterations of Newton's Method.

Granich et al. HIV model.
In this section, we consider an HIV-only model based on that of Granich et al. [11].This highly influential modeling work, which appeared in The Lancet, sparked considerable interest among the HIV community in the strategy of "Treatment as Prevention" which is essentially the idea that providing early antiretroviral treatment (ART) to HIV-infected individuals reduces their HIV viral load (and consequently their infectiousness) thus preventing future Table 2. Calibration of the four-compartment SI-type HIV/HSV-2 coinfection model for a high prevalence and low prevalence setting; South Africa and the United Kingdom, respectively.Note that i = 0 the "0 th iteration" refers to our initial approximation to the solution of the calibration system (4)- (8).HIV and HSV-2 prevalence at the endemic equilibrium that results from using the ith approximations for β and σ are denoted A i and H i , respectively.Precision is measured by average absolute value of the calibration system ( 4)-( 8) evaluated at the current approximation (i.e.||f (hi,ai,ci,σi,βi)||1

A stopping criteria of max{|β
South Africa (HIV prevalence 15%, HSV-2 prevalence 50%) 0 0.0367143341 0.0705450591 0.0000000447 0.5949903301 3.10   as follows: where Notably, infectivity is the same in each of the four infectious classes.Hence, the infectious classes do not represent the stages of HIV infection (e.g.primary, chronic and AIDS) but rather are used to produce a gamma distribution for the survival of HIV-infected individuals that approximates the Weibull survival distribution that has been observed in data [11].
Another defining characteristic of the Granich et al. model is the transmission term given by λe −αP n where P is the prevalence of HIV infection at a given time.As Granich et al. describe, this allows for the population's risk behavior to decrease as disease prevalence increases.Figure 5 illustrates the form of the population's response to disease prevalence.In [11], λ, α and n are referred to as the "initial value", "location" and "shape" of the transmission term, respectively.From a practical modeling perspective, this formulation allows the model to exhibit the rapid initial takeoff of the epidemic observed in several sub-Saharan African settings (e.g.R 0 ≈ 8), followed by a relatively modest disease prevalence at endemic equilibrium (e.g.< 20%).In a standard formulation with fixed transmission coefficient, prevalence at endemic equilibrium is typically 1 − 1/R 0 .5.2.Calibration to HIV prevalence at equilibrium.We begin by calibrating the model in (12) to a given HIV prevalence at endemic equilibrium which we denote by A. To satisfy this single calibration condition, we use the initial value of the transmission term, λ, as a calibration parameter.We assume that values for the remaining model parameters have been established. where Combing the four conditions that ensure the disease is at equilibrium ( 14)-( 17) with a calibration condition (18), we have the following calibration system of equations To complete our calibration, we use Newton's Method to find the solution (i * 1 , i * 2 , i * 3 , i * 4 , λ * ) of the calibration system ( 14)-( 18) using the following initial approximation: Here, our initial estimate λ 0 results from solving (14) with the initial approximations i = A and of i 1 = A/4.

5.2.1.
Demonstration of calibration to HIV prevalence.In this section, we use our calibration technique to parameterize the Granich et al.HIV model to HIV prevalence at equilibrium.In Table 4(a), we summarize model parameters and establish baseline values and ranges.Baseline values are those from [11] that resulted from fitting the full version of the model to South Africa's HIV epidemic.To demonstrate the robustness of our approach, we calibrate the model for random sets of model parameters and calibration conditions.Specifically, we obtain 50 independent samples of parameters and calibration conditions from uniform distributions over the ranges given in Table 4(a).The calibration system given by ( 14)-( 18) was then solved using Newton's Method using the initial conditions in (19).
For each sample, the calibrated value of λ was noted along with the resulting HIV prevalence and the necessary number of iterations of Newton's Method.The results are shown in Table 4(b).For all 50 samples, the calibration approach successfully matched the calibration condition (i.e.HIV prevalence) while requiring no more than 2 iterations of Newton's Method.where is the number of people infected with HIV only, C = C 1 + C 2 + C 3 + C 4 is the number of coinfected individuals, N = S + H + A + C is the size of the total population and P = A+C N is the prevalence of HIV.A flow diagram of the model is given in Figure 6.
An important implicit assumption is made by using the same transmission formulation for HIV and HSV-2 (i.e.Ω = λe −αP n A+r2C N and Ψ = κe −αP n H+r3C N ).Specifically, we assume that the behavioral response only occurs in response to the HIV epidemic (i.e. e −αP n term depends only on HIV prevalence) and that the behavioral response to the HIV epidemic (condom use, partner reduction and other safe-sex practices) would have a corresponding effect on HSV-2 transmission (i.e. e −αP n is included in the HSV-2 transmission terms as well).
6.2.Calibration to HIV and HSV-2 prevalence at equilibrium.In this section, we calibrate the expanded HIV/HSV-2 coinfection model with behavioral response to match both HIV and HSV-2 prevalence at the endemic equilibrium (as was done for the four-compartment confection model in Section 4.2) using the initial values of the HIV and HSV-2 transmission terms (λ and κ, respectively) as calibration parameters.The calibration conditions are given by A = HIV prevalence at endemic equilibrium, H = HSV-2 prevalence at endemic equilibrium.

As in our previous calibration efforts, we begin by rescaling the state variables to represent proportions of the total population by letting
Correspondingly, we will let a = a 1 + a 2 + a 3 + a 4 and c = c 1 + c 2 + c 3 + c 4 .As s = 1 − h − a − c, we can reduce the dimension of the in model (20) to get where the transmission terms have become Ω = λe −αP n (a+r 2 c) and Ψ = κe −αP n (h+ r 3 c).
Combing the nine conditions that ensure the disease is at equilibrium in ( 22)-(30) with the HIV and HSV-2 prevalence conditions in (31) and (32), respectively, we have the following calibration system In order to complete the calibration, we use Newton's Method to find the solution ) of the calibration system ( 22)-(32).For the initial approximation to the solution of the calibration system required by Newton's Method, we again make the simplifying assumption that HIV and HSV-2 infection are independent (i.e. that the total number of coinfected people is c 0 = A H, total number infected with HIV alone is a 0 = A − A H and the total number infected with HSV-2 alone is h 0 = H − A H) and distribute HIV-infected individuals evenly among the four infected stages.To obtain initial approximations for λ and κ, we solve the 2-dimensional system given by ( 22) and (23) for λ and κ.The resulting initial approximation is given by: and .
(35) 6.3.Demonstration of calibration to HIV and HSV-2 prevalence.In this section, we show the robustness of our calibration technique to parameterize the HIV/HSV-2 coinfection model with behavioral response to HIV and HSV-2 prevalence at equilibrium.In Table 5(a), we summarize model parameter values and ranges.Demonstrating the robustness of our approach, we again calibrate the model for random sets of model parameters and calibration conditions.Specifically, we obtain 50 independent samples of parameters and calibration conditions from uniform distributions over the ranges given in Table 5(a).The calibration system given by ( 22)-( 32) is then solved using Newton's Method with initial conditions determined by ( 33)-( 35).
For each sample, the calibrated value of λ and κ is noted along with the resulting HIV prevalence and the necessary number of iterations of Newton's Method.The results are shown in Table 7(b).For all 50 samples, the calibration approach successfully matched the calibration conditions (i.e.HIV and HSV-2 prevalence) while requiring no more than 4 iterations of Newton's Method.
7. Extensions of the calibration approach.In the previous demonstrations of our calibration approach, the mathematical model at hand was always calibrated to disease prevalence at endemic equilibrium.Fortunately, the approach can be generalized to achieve additional calibration conditions.In this section, we show how conditions that relate to the transient phase of an epidemic (the magnitude and timing of peak disease incidence, specifically) can be incorporated into our calibration.
7.1.Exact calibration of Granich et al. model to HIV prevalence at equilibrium, peak HIV incidence and timing of peak HIV incidence.In Section 5.2, we calibrated the Granich et al. model as stated in (12) to HIV prevalence at equilibrium using λ as a calibration parameter and assuming that values for α and n had been established a priori (recall the transmission term of λe −αP n ).As α and n determine the shape of the population's behavioral response to HIV prevalence (see Figure 5), they can be used to incorporate additional detail into our calibration.In this section, we will describe the calibration of the model in (12) to the following three conditions: A = HIV prevalence at endemic equilibrium, I = peak HIV incidence, P = HIV prevalence at time of peak HIV incidence.
To do so, we use the initial value, location and shape of the HIV transmission term (i.e.λ, α and n) as calibration parameters.If the Granich et al. model explicitly included time (i.e. was nonautonomous), we could calibrate the system so that peak HIV incidence occurs at a specific time t * .Since the Granich et al. model is autonomous, we use HIV prevalence at the time of peak HIV incidence, P , as a proxy for the time of peak incidence to calibrate to different types of epidemic curves (e.g.early, mid or late peaking epidemics).
Table 5. Calibration of HIV/HSV-2 coinfection model with behavioral response to HIV and HSV-2 prevalence.
(a) Model parameters and calibration conditions for parameterizing the HIV/HSV-2 coinfection model with behavioral response to HIV and HSV-2 prevalence at endemic equilibrium.‡ Recruitment rate range is same as background mortality to give a total population of 1 in the absence of HIV, without loss of generality.

Symbol
Range References

Model Parameters
Background mortality rate (/yr) µ To create our calibration system, we note that HIV incidence (i.e. the rate of new HIV infections) is given by λe −αi n (1−i)i.To guarantee that HIV incidence is I at the time when HIV prevalence is P , we require that λe −α P n (1− P ) P = I.To ensure that HIV incidence is maximized at this time, we require that d dP λe −αP n (1 − P )P = λe −α P n −α P n n 1 − P + 1 − 2 P = 0. Adding these two additional conditions (noting that λe −α P n = 0), we arrive at the following calibration system of equations: To reiterate, the calibration system imposes that the disease is at equilibrium in (36)-( 39), that the HIV prevalence at endemic equilibrium is A in (40), that HIV incidence is I at the time when HIV prevalence is P in (41) and that HIV incidence is maximized at the time when HIV prevalence is P in (42).
In order to complete the calibration, we use Newton's Method to find the solution ) of the calibration system (36)-( 42).Unlike the previous calibration examples, establishing a suitably robust initial approximation to the root is nontrivial.To do so, we note that at equilibrium we must have that i = A, so from (36) and (41), we have that which implies that Solving (42) for α, we get that Substituting (45) into (44), we get e (46) which can be solved numerically to obtain an initial approximation for n.Once an initial approximation n 0 is established, (45) and (43) are used to obtain an initial approximation for α and λ, respectively.Initial approximations for i 1 , i 2 , i 3 and i 4 can be found by noting that Equations (37)-(40) consist of 4 linear conditions on i 1 , i 2 , i 3 , i 4 .
To summarize, we calibrate our model to HIV prevalence and peak incidence by using Newton's Method to find the solution (i * 1 , i * 2 , i * 3 , i * 4 , λ * , α * , n * ) of the calibration system (36)-(42) using the following initial approximation: 7.1.1.Demonstration of calibration to HIV prevalence, peak HIV incidence and timing of peak HIV incidence.In this section, we use our calibration technique to parameterize the Granich et al.HIV model to HIV prevalence at equilibrium, peak HIV incidence and timing of peak incidence.To demonstrate the robustness of our approach, we calibrate the model for random sets of model parameters and calibration conditions.Specifically, we obtain 50 independent samples of parameters and calibration conditions from uniform distributions over the ranges specified in Table 7(a).Notably, restrictions were placed on the peak HIV incidence, I, and the HIV prevalence at time of peak HIV incidence, P , for epidemiological plausibility.Specifically, we require peak HIV incidence and the HIV prevalence at the time of peak incidence to be from 10-50% and 5-95% of the HIV prevalence at equilibrium, respectively (see Table 7(a)).This is done to avoid identifiability issues that could arise from having epidemiologically implausible calibration conditions such as a very high peak HIV incidence followed by very low prevalence at equilibrium.The calibration system given by ( 36)-( 42) was then solved using Newton's Method with the initial conditions given in (47)-(49).For each sample, the calibrated values of λ, α and n were noted along with the resulting HIV prevalence ( A * ), peak HIV incidence ( I * ), HIV prevalence at the time of peak incidence ( P * ) and the necessary number of iterations of Newton's Method (N ).The results are shown in Table 6(b).
For all 50 samples, the calibration approach successfully matched the three calibration conditions while requiring no more than 3 iterations of Newton's Method (see Table 6(b)).To illustrate the variation in the nature of the epidemic, Figure 7 shows the dynamics of HIV prevalence and HIV incidence of the first 12 calibrations.In each calibration plot, the horizontal dashed blue line indicates the sampled HIV prevalence at equilibrium ( A), the horizontal dashed red line the sampled peak HIV incidence ( I) and the horizontal dashed gray line the sampled HIV prevalence at the time of peak incidence ( P ).Thus, the plot indicates a successful calibration if HIV prevalence (solid blue curve) approaches A (dashed blue line) as t → ∞; HIV incidence (solid red curve) peaks at I (dashed red line); and HIV prevalence at time of peak incidence is P (solid blue curve, horizontal dashed gray line and the dashed vertical gray line indicating time of peak incidence all intersect at the same point).
The minor discrepancies between the sampled and calibrated values of HIV prevalence at peak incidence ( P ) observed in Table 6(b) are due to the transient nature of the system at the time of peak incidence.Consequently, the HIV prevalence at peak incidence is influenced by the imprecision of choosing the time of peak incidence from the discrete set of time points t 0 , t 1 , ..., t K that results from the numerical solution to the dynamical systems model.Table 6(b) also shows a troubling level of variability in the magnitude of calibrated values that determine the population's response to disease prevalence, α * and n * .Most notably, we see in Table 6(b) that α * ranges from values of order 10 1 to 10 149 .In Figure 8, we plot the behavioral response to HIV prevalence via the transmission term e −αP n for the calibrations with largest α values.We note that the shape of the behavioral responses remain biologically feasible despite the magnitude of the α terms.Also in Figure 8, we plot the prevalence and incidence curves for Calibrations #20, 30, 33, 40 and 46 (noting that those of #4 and #10 are in Figure 7).In each calibration, we observe that the scenario of a late-peaking epidemic (i.e. one where the HIV prevalence at the time of peak incidence is very near the HIV prevalence at equilibrium) seems to be producing the large α * values. .Dynamics of HIV prevalence and HIV incidence of the first 12 calibrations: horizontal dashed blue line is sampled HIV prevalence at equilibrium ( A), horizontal dashed red line is sampled peak HIV incidence ( I) and horizontal dashed gray line is sampled HIV prevalence at the time of peak incidence ( P ).Calibration is successful if HIV prevalence (solid blue curve) approaches A (dashed blue line) as t → ∞; HIV incidence (solid red curve) peaks at I (dashed red line); and HIV prevalence at time of peak incidence is P (solid blue curve, horizontal dashed gray line and the dashed vertical gray line indicating time of peak incidence all intersect at the same point).
7.2.Approximate calibration of HIV/HSV-2 coinfection model to HIV prevalence at equilibrium, peak HIV incidence and timing of peak incidence.In Section 6.2, the HIV/HSV-2 model with behavioral response was calibrated to HIV and HSV-2 prevalence at the endemic equilibrium.In the previous section, the Granich et al.HIV model was calibrated to different types of epidemic curves (e.g.early, mid and late peaking epidemics) using peak HIV incidence and the timing of peak HIV incidence.In this section, we attempt to calibrate the expanded HIV/HSV-2 coinfection model with behavioral response (i.e.System (20) in Section 6.1) to HIV and HSV-2 prevalence conditions as well as the the shape of the HIV epidemic (i.e.peak HIV incidence and its timing).The conditions are given by A = HIV prevalence at endemic equilibrium, H = HSV-2 prevalence at endemic equilibrium, I = peak HIV incidence, P = HIV prevalence at time of peak HIV incidence, and the calibration is achieved using the initial values of the HIV and HSV-2 transmission terms (i.e.λ and κ) and the location and shape of the transmission terms (i.e.α and n) as calibration parameters.
To impose the peak incidence condition in the same manner as the previous section, we note that the rate of new HIV infections in the HIV/HSV-2 model with behavioral response is given by λe −αP n (a+r 2 c)(s+r 1 h).Unfortunately, we immediately see that this rate is no longer a function of only P as was the case in (40) in our previous calibration.This is a major issue in setting up a calibration system because peak HIV incidence occurs during the transient phase of the epidemic.Consequently the state variables a, c, s and h at this time are not the endemic equilibrium values that would be enforced in the rest of the calibration system.Overcoming this would require an incidence formulation such as λe −αP n P (1+r 2 ω 1 )(1− P )(1+r 1 ω 2 ), where ω 1 (ω 2 ) are the proportion of HIV positive (negative) individuals that are infected with HSV-2 at the time of peak HIV incidence.Unfortunately, such ω values would depend on the particular initial conditions assumed and would not be available a priori.
As a simpler alternative, we present the following approximate calibration of the model to the four conditions given in (50) using λ, κ, n and α as calibration variables.This two step approach relies on previous calibration systems in the following way: Step 1. Calibrate the Granich et al. model to A, I and P using calibration parameters λ, n and α by solving system (36)-(42) in the previous section.
Step 2. Using the resulting values for HIV transmission terms n and α from Step 1, calibrate the HIV/HSV-2 model with behavioral response to A and H using λ and κ as calibration variables by solving system ( 22)-(32) from Section 6.2.The calibrated parameter values are then n * , α * (outputs from Step 1 which determine the shape of the behavioral response) and λ * , κ * (outputs from Step 2 which impose disease prevalence at endemic equilibrium).7.2.1.Demonstration of calibration to HIV and HSV-2 prevalence at equilibrium, peak HIV incidence and timing of peak HIV incidence.To demonstrate the performance of this approach, we calibrate the model to 50 independent samples of parameters and calibration conditions from uniform distributions over the ranges specified in Table 7(a).The results are shown in Table 7(b).As might be expected given Step 2 of this approximate approach, we see that the model is exactly calibrated to both HIV and HSV-2 prevalence at equilibrium (see A vs A * and H vs H * in Table 7(b)).On the other hand, the calibration is not able to exactly match the conditions related to the transient phase of the epidemic (i.e.peak HIV incidence and timing of peak HIV incidence).Specifically, we see in Table 7(b) that the peak HIV incidence realized via calibration, I * , is higher than the desired value, I, and quite often by a substantial amount.This makes sense because the terms that control the behavioral response to HIV prevalence, namely n and α, are determined in Step 1 before HIV's interaction with HSV-2 infection is included.Including the interaction produces a higher peak incidence level due to the fact that HSV-2 infection increases the rate of new HIV infections from both sides of transmission (i.e.r 1 , r 2 ≥ 1).While the approximate calibration considerably overshoots peak HIV incidence, it is worth noting that the approach does much better in terms of the timing of peak HIV incidence (i.e.P * vs P in Table 7(b)).
8. Discussion.The model calibrations conducted in this work have demonstrated that an exact approach to calibration can be a straightforward and efficient means to parameterize an epidemiological model to surveillance data.On the other hand, the similarities of the scenarios examined herein (e.g.disease prevalence at endemic equilibrium) may cast doubt on the general applicability of this approach in other situations.Hence, it is important to identify limitations and generalizations of the approach that could influence its applicability.
From the examples presented, the exact calibration approach may appear to be limited to diseases at endemic equilibria.However, it is worth noting that there is no such limitation from a technical perspective.As an example, recall our first calibration system (4)-( 8) from Section 4.2.If reliable data were available that suggested a given HIV and HSV-2 prevalence of A and H, respectively, and that the proportions of HIV only, HSV-2 only and coinfected individuals where changing at rates da dt , dh dt and dc dt , respectively, one could simply insert these nonzero rates as the left-hand sides of ( 4)-( 6).The calibration procedure would work exactly the same.However, a practical limitation is evident if we attempt to consider an analogous situation for the HIV/HSV-2 coinfection model with behavior change from Section 6.While it is reasonable that estimates could exist for the rates of change for the aggregate HIV-infected, HSV-2 infected and coinfected proportions of the population, it is unlikely that specific estimates exist for each subdivision (i.e.da1 dt , ..., da4 dt , dc1 dt , ..., dc4 dt ).An exact calibration to estimates of disease prevalence, incidence, etc. may leave the impression that such indicators are known data rather than the imprecise estimates that they are in reality.However, this approach should not be viewed as an alternative to incorporating uncertainty into modeling work.In fact, the ideas presented here could compliment the standard approach of sampling parameter values and filtering described in Section 2. In the standard approach, all model parameters would be independently sampled before running simulations and filtering out the parameter sets which do meet calibration conditions.In a combined approach, model parameters for which strong empirical estimates exist could be randomly sampled and values of calibration parameters (for which reliable empirical estimates do not exist) would be obtained via the exact procedure to match calibration conditions.Parameter sets with infeasible values for the calibration parameters could then be filtered out.With such an approach, the ultimate number of parameters sets that are filtered out (i.e.computational time wasted) would be significantly reduced.Moreover, uncertainty in the calibration conditions (e.g.disease prevalence, incidence) could also be incorporated by sampling values from assumed distributions as was done in each robustness demonstration in this work.
Lastly, it is clear that the exact calibration approach can not be used to fit a model to time series data.While approaches were shown to calibrate to different types of epidemics in Section 7, the approach remains limited to calibration conditions at a few time steps, at most.Hence, numerical fitting algorithms will be required whenever a model must be fit to times series data.In such situations, our approach could improve the performance of numerical fitting algorithms whose efficiency and precision are largely determined by the quality of the initial approximation to the solution.To do this, an exact fitting (analogous to that of Section 7.1) would be used to obtain parameter values for which the model behavior matches the general shape of the data.Those values would then be used as an initial approximation for a numerical fitting algorithm.
While not universally applicable or intended to replace other calibration procedures, the exact calibration process presented in this work streamlines the process of calibrating mathematical models of infectious disease to surveillance data while increasing efficiency and reducing reliance on computational methods.7. Calibration of HIV/HSV-2 confection model with behavioral response to HIV prevalence, HSV-2 prevalence, peak HIV incidence and timing of peak HIV incidence.
(a) Model parameters and calibration conditions for parameterizing the HIV/HSV-2 confection model with behavioral response to HIV and HSV-2 prevalence, peak HIV incidence and timing of peak HIV incidence.‡ Recruitment rate range is same as background mortality to give a total population of 1 in the absence of HIV.

Symbol
Range References

Figure 3 .
Figure 3. HIV prevalence in South Africa (% of population ages 15-49 infected with HIV) from 1990-2013.Data from the World Bank, World Development Indicators [18].
total number of HIV-infected individuals, N = S + I 1 + I 2 + I 3 + I 4 is the total population size and P = I/N is the prevalence of HIV.We have modified the original model of Granich et al. by replacing density depending recruitment with constant recruitment, Λ, and by neglecting classes of individuals on ART.

1 (
a) e −αP n with varying values of n and fixing α = 3. e −αP n with varying values of α and fixing n = 3.

Figure 5 .
Figure 5. Illustration of population's behavioral response to HIV prevalence in the Granich et al. model via the transmission term e −αP n .Notably, if n = 1 sexual risk behavior decreases exponentially as a function of HIV prevalence.As n → ∞, the decline in sexual risk behavior approaches a step function.

6 . 6 . 1 .
HIV/HSV-2 coinfection model with behavioral response.In this section, we consider an HIV/HSV-2 coinfection model that combines the interactions of HIV and HSV-2 infection from the model in Section 4.1 and the behavioral response to the HIV epidemic of the Granich et al. model described in Section 5.1.The model.The model includes ten classes of individuals: S, susceptible to infection by both HIV and HSV-2; H, infected with HSV-2 only; A 1 , A 2 , A 3 , A 4 , infected with HIV only; C 1 , C 2 , C 3 , C 4 , co-infected with both HIV and HSV-2.The model equations are given by

Figure 6 .
Figure 6.Flow diagram of the HIV/HSV-2 coinfection model with behavioral response.The rate of transmission for HIV is given by Ω = λe −αP n A+r2C N where A = A 1 + A 2 + A 3 + A 4 is the number of people infected with HIV only, C = C 1 + C 2 + C 3 + C 4 is the number of coinfected individuals, N = S + H + A + C is the size of the total population and P = A+C N is the prevalence of HIV.The rate of transmission for HSV-2 is given by Ψ = κe −αP n H+r3C N .

Figure 7
Figure 7. Dynamics of HIV prevalence and HIV incidence of the first 12 calibrations: horizontal dashed blue line is sampled HIV prevalence at equilibrium ( A), horizontal dashed red line is sampled peak HIV incidence ( I) and horizontal dashed gray line is sampled HIV prevalence at the time of peak incidence ( P ).Calibration is successful if HIV prevalence (solid blue curve) approaches A (dashed blue line) as t → ∞; HIV incidence (solid red curve) peaks at I (dashed red line); and HIV prevalence at time of peak incidence is P (solid blue curve, horizontal dashed gray line and the dashed vertical gray line indicating time of peak incidence all intersect at the same point).

Figure 8 .
Figure 8. Behavioral response to HIV prevalence via the transmission term e −αP n for the calibrations with largest α values and the prevalence/incidence curves for those calibrations.Note that the prevalence/incidence curves of Calibrations #4 and #10 are shown in Figure 7.
(a) Model parameters and calibration conditions for parameterizing the Granich et al. model to HIV prevalence, peak HIV incidence and timing of peak HIV incidence.‡ Recruitment rate range is same as background mortality to give a total population of 1 in the absence of HIV, without loss of generality. of A HIV prevalence at time of peak HIV incidence P 5% -95% of A Calibration Parameters Initial value of transmission term (/yr) λ determined by calibration procedure Location of transmission term α determined by calibration procedure Shape of transmission term n determined by calibration procedure (b) Results of calibrating the Granich et al. model to 50 independent sets of parameters and calibration conditions (randomly sampled from ranges above).For each sample, the resulting values of the calibration parameters (λ * , α * , n * ) and the resulting calibration values ( A * , I * , P * ) are given.The number of iterations of Newton's Method required to achieve a stopping criterion of max{|λ i − λ i−1 |, |α i − α i−1 |, |n i − n i−1 |} < 10 −9 is denoted by m. 031 0.028 0.306 0.095 0.016 0.011 224.91 1.22212E+01 0.2 0.095 0.016 0.011 2 0.017 0.027 0.115 0.054 0.017 0.036 Table determined by calibration procedure (b) Calibration of HIV/HSV-2 confection model with behavioral response to 50 independent sets of parameters and calibration conditions.For each sample j = 1, 2, ..., 50, the resulting values of the calibration parameters (λ * , κ * , α * , n * ) and the resulting calibration values ( A * , H * , I * , P * ) are given.Number of iterations required to satisfy the stopping criteria of Steps 1 and 2 (i.e.max{|λ i −λ i−1 |, |α i − α i−1 |, |n i − n i−1 |} < −9 and max{|λ i − λ i−1 |, |κ i − κ i−1 |} < 10 −9) denoted m 1 and m 2 , respectively.

Table 1 .
Model parameters and calibration conditions for the four-compartment SI-type HIV/HSV-2 coinfection model.† Values used in the high HIV burden example; South Africa.‡ Values used in the low HIV burden example; United Kingdom.

Table 3 .
Robustness of calibrating the four-compartment SI-type HIV/HSV-2 coinfection model to HIV and HSV-2 prevalence at the endemic equilibrium.For each sample j = 1, 2, ..., 50, the resulting values of the calibration parameters (β * , σ * ) and the resulting disease prevalences at equilibrium ( A * , H * ) are given.The number of iterations of Newton's Method required to achieve a stopping criterion of max{|β i − β i−1 |, |σ i − σ i−1 |} < 10 −9 is denoted by m.

Table 4 .
[11]bration of Granich et al. model to HIV prevalence.(a)Modelparameters and calibration conditions for parameterizing the Granich et al. model with behavioral response to HIV prevalence at endemic equilibrium.†Baselinevalue taken from[11]; range is modeling assumption of this work.‡ Recruitment rate range is same as background mortality to give a total population of 1 in the absence of HIV, without loss of generality.
determined by calibration procedure (b) Results of calibrating the Granich et al. model to 50 independent sets of parameters and calibration conditions (randomly sampled from ranges above).For each sample j = 1, 2, ..., 50, the resulting values of the calibration parameter (λ * ) and the resulting HIV prevalence at equilibrium ( A * ) are given.The number of iterations of Newton's Method required to achieve a stopping criterion of |λ i − λ i−1 | < 10 −9 is denoted by m.

Table 6 .
Calibration of Granich et al. model to HIV prevalence, peak HIV incidence and timing of peak HIV incidence.