Mathematical Analysis and Dynamic Active Subspaces for a Long term model of HIV

Recently, a long-term model of HIV infection dynamics was developed to describe the entire time course of the disease. It consists of a large system of ODEs with many parameters, and is expensive to simulate. In the current paper, this model is analyzed by determining all infection-free steady states and studying the local stability properties of the unique biologically-relevant equilibrium. Active subspace methods are then used to perform a global sensitivity analysis and study the dependence of an infected individual's T-cell count on the parameter space. Building on these results, a global-in-time approximation of the T-cell count is created by constructing dynamic active subspaces and reduced order models are generated, thereby allowing for inexpensive computation.


1.
Introduction. The Human Immunodeficiency Virus (HIV) disables many components of the body's immune system and, without antiretroviral treatment, leads to the onset of Acquired Immune Deficiency Syndrome (AIDS). Despite the vast amount of study devoted to understanding viral pathogenesis and developing new therapeutics, no procedure or medication currently exists to reliably eliminate the virus from a host. However, new advances in long-term treatment strategies and insight into disease dynamics have stemmed from mathematical and computational modeling approaches, in addition to clinical experimentation.
A variety of mathematical models have been proposed to describe HIV infection and disease dynamics [1,11,12,13,14,15,16,18,22,23,26]. Unfortunately, using a model to capture the entire time course of infection within the body can be troublesome as many oversimplify the biological dynamics of the disease in an effort to gain mathematical and rudimentary biological insight, and fail to capture all stages of infection. The majority of models accurately capture only the first stage(s) of infection [1,11,12,15,16,17,20] with the T-cell count and viral load asymptotically approaching a nonzero limit -the latter referred to as the viral set point. One recent description has been able to provide a holistic understanding of disease dynamics by accurately capturing all three stages of infection. This model, proposed in [8], is comprised of a system of seven nonlinear autonomous differential equations that are fully coupled and augmented by twenty-seven distinct parameters. In this paper we investigate the dynamical properties of the model and establish a result concerning its large-time asymptotic behavior. Further, we analyze this model, utilizing mathematical and statistical methods to elucidate the contribution of the parameter space on an infected individual's T-cell count, and approximate solutions as a function of time.
The paper proceeds as follows. Within the next section, all infection-free steady states of the model are determined and the local asymptotic stability properties of the biologically-relevant equilibrium are studied. The associated theorems have interesting implications for the model's predictive nature, especially upon the introduction of antiretroviral therapy. In Section 3, we utilize active subspace methods to perform a global sensitivity analysis of the model with respect to its parameter space and further investigate the dependence of an infected individual's T-cell count on system parameters. With this information, a global-in-time approximation of the T-cell count is constructed using dynamic active subspaces. Additionally, various reduced models are constructed to represent different stages of the disease, and these are discussed in detail. In general, active subspace methods are a useful tool to perform global sensitivity analysis of a given parameter space, provide a clear picture of the most important activity in a model arising from parameter variation, construct dimensionally-reduced approximations to complex, dynamical models, and execute inexpensive numerical approximations from models that require computationallyintensive simulation. This is of particular importance in the current context as parameters for the original long-term HIV model are individual-dependent, and therefore must be determined for each new patient. Hence, the construction of a significantly less expensive computational approximation will allow one to utilize the model for large sets of patient data. Finally, we note that all of the MATLAB scripts and functions used to generate our results are provided free, open source, and available to the public at http://inside.mines.edu/~pankavic/activeHIV.

2.
Model Description, Parameters, and Analysis. To begin, we consider the following long term model of HIV disease dynamics within a host, as recently formulated in [8]: dCT L dt = s 3 + (K 7 T I + K 8 M I )CT L − δ 6 CT L dV dt = K 9 T I + K 10 M I − K 11 T V − ( The population of CD4 + T-cells, denoted here by T (t), is one of the most critical components in determining the body's response to HIV infection, and the first equation represents its time evolution. In (1) the T-cell population is increased by a standard source, s 1 , which represents the constant supply rate of immunocompetent T-cells from the thymus, and a nonlinear generation term p1 C1+V T V , which accounts for the appearance of new T-cells due to the immune system's response to the infection. In contrast, because T -cells have a finite lifespan, a natural death term δ 1 T is also included. The last two terms in the first equation model the infection of T-cells by either virions, the population of which is denoted by V (t), or infected macrophages, M I (t), at rates K 1 and K 2 , respectively. The latter term is introduced in (1) as studies have shown that infected macrophages likely play a vital role in the progression of infection [7,10] by producing large amounts of virus even after the T-cell population has been depleted.
The second equation describes changes within the actively-infected T-cell population (T I ), the first term of which represents the addition to this compartment due to new infections. However, not all interactions between virions and healthy T-cells produce actively-infected T-cells. Some proportion ψ of these new infections contribute to the T I population, while the remainder (1−ψ) lack necessary host factors, resulting in the absence of viral protein expression, and become latently-infected T-cells (T L ), contributing to the third equation. The infected T-cell population adds, in addition to the number created by the virions and macrophages, a supply of newly-activated latent T-cells, at a per-capita rate α 1 . In turn, this portion of the T L population is lost within the third equation. Infected T-cells are lost due to natural death, which is represented by δ 2 T I , while a more interesting term, K 3 T I CT L, represents the loss of infected T-cells due to cytotoxic lymphocytes (CT L), one of the many attacker cells the immune system employs. Latently-infected T-cells also possess a natural death rate denoted by δ 3 . Similar to T-cells, the macrophage population (M ) possess a natural birth rate, s 2 , as well as a natural death rate, δ 4 . As discussed in [16], macrophages divide and become more aggressive in seeking out pathogens once they are alerted of a viral presence by CD4+ T-cells. Hence, the macrophage population is increased in response to HIV infection at a rate of K 4 . Macrophages also attempt to eliminate virions, but may also become infected, adding to the infected macrophage population (M I ). Infected macrophages naturally expire at a certain rate δ 5 , and are also destroyed by cytotoxic lymphocytes at a rate K 6 . Once infected, macrophages produce virions at a rate given by K 10 . Infected macrophages may also infect healthy T-cells, and the rate at which this occurs is denoted by K 2 .
The main defender of the body against infected cells is the cytotoxic lymphocyte, the population of which is denoted by CT L. These cells seek to destroy renegade T-cells and macrophages that have been infected and altered by HIV. As for the other components of the immune system, we assume that cytotoxic lymphocytes are produced at a constant rate s 3 by the bone marrow. Additionally, new lymphocytes are produced proportionate to the other aspects of the body's immune response. Hence, the term (K 7 T I + K 8 M I )CT L, occurs within the sixth equation to represent the recruitment of new lymphocytes.
Lastly, the growth of the virion population V depends on a variety of parameters. New virions are continually produced by infected T-cells and infected macrophages at rates K 9 and K 10 , respectively. In addition, the virion population is decreased through a variety of means. Due to infection, virions are lost at the rates K 11 and K 12 , proportional to the interaction of T-cells and macrophages with virions, respectively. Also, macrophages, responding to antibodies produced by a host, ingest and destroy virions at a rate K 13 . Finally, virus particles are cleared from the host by other factors, including the innate immune response, at a rate δ 7 .
Note that all parameter values in (1) are positive. Typical values and ranges for the parameters taken from [8] can be found within Table 1. The time course of infection predicted by this model with the established parameter values was shown to agree well with clinical data from [5,6,21], and representative simulations of (1) with realistic parameter values are displayed in Figure 1. 2.1. Infection-free steady states. Though the system (1) possesses a large number of steady states -the authors have discovered at least ten using standard parameter values and a computational root finder -one is often most interested in understanding the dynamical properties of the disease-free equilibrium. In this section, we identify such equilibria and investigate the stability of the biologicallyrelevant state. Our first result demonstrates that only one such equilibrium state exists when all populations of (1) are positive.
Theorem 2.1. The model (1) possesses exactly two virus-free (i.e. V ≡ 0) steady states. One of these states, namely , ωK 10 , ωK 10 ξ K 6 (α 1 + δ 3 ψ) , achieves negative values, where ω = s 3 K 6 + δ 5 δ 6 δ 5 (K 7 K 10 − K 8 K 9 ) and ξ = (1 − ψ)(δ 2 K 6 − δ 5 K 3 ).  The only nonnegative (i.e. biologically relevant) steady state of (1) satisfying V ≡ 0 is Hence, the only guarantee of viral clearance as t → ∞ occurs when actively and latently infected populations are also eradicated, resulting in healthy T-cell and macrophage populations tending asymptotically to background values. The proof of Theorem 2.1 is contained in Appendix A. Utilizing standard parameter values for this model from Table 1, we find the following equilibrium populations for E: Since the parameter values in (1) are positive, the steady state E given in Theorem 2.1 must have a negative cytotoxic T-lymphocyte population, namely − δ5 K6 . So, under no parameter regime will E be biologically relevant. With the unique infection-free steady state identified, we turn to its stability properties.
The proof of Theorem 2.2 is also contained within Appendix A. Computing the basic reproduction number of Theorem 2.2 by using the standard parameter values given in Table 1, we find that R 0 = R 1 = 953 >> 1. Hence, as expected, the non-infective steady state E N I is not locally asymptotically stable.
Introducing antiretroviral therapy, or ART, into the system provides additional insight into this result. Two specific classes of ART drugs, namely Reverse Transcriptase Inhibitors (RTIs) and Protease Inhibitors (PIs), serve to reduce the amount of new virus produced by either reducing the ability of virions to replicate through reverse transcription or disabling the capability of newly-produced virions to mature, thereby rendering them uninfective. The efficacies of these classes of drug, denoted RT I , P I ∈ [0, 1], can be incorporated using the transformations K 1 → K 1 (1− RT I ), K 5 → K 5 (1− RT I ), K 9 → K 9 (1− P I ), K 10 → K 10 (1− P I ).
As these constants appear within the stability result and decrease each of the ratios R 1 , R 2 , and R 3 , it follows that a large enough efficacy will force the system to tend towards E N I as t → ∞. Therefore, one merely needs to perform this transformation on the result of Theorem 2.2 in order to determine what efficacy is needed to guarantee viral clearance. Using the previously-determined parameter values and defining ∈ [0, 1] by we find > 0.998 in order to force R 0 < 1. Thus, a combined drug efficacy greater than 99.8% would be needed to asymptotically drive the system to clearance. Of course, lesser drug efficacies could still give rise to viral loads that are effectively negligible rather than tending to zero, and therefore correspond to viral clearance. Notice that the asymptotic stability result in Theorem 2.2 depends only upon a relatively few number (15 of 27) of the parameters. Therefore, the majority of the pertinent dynamics takes place on a lower-dimensional subspace of the entire parameter space. Hence, in the next section we explore a dynamic tool to better understand the contribution of the parameter space to populations within the model. In particular, this will lead to reducing the dimension of the parameter space with minimal loss of information using an active subspace decomposition.
3. Dynamic Active Subspaces, Sensitivity, and Reduced Models. In this section we will use active subspace methods to approximate the T-cell count at a specific time given the parameter values in (1). We closely follow the material as developed by Constantine and co-authors [2,3]. An active subspace is a lowdimensional linear subspace of the set of parameters, in which input perturbations along these directions alter the model's predictions more, on average, than perturbations which are orthogonal to the subspace. These subspaces allow for a global measurement of sensitivity of output variables with respect to parameters, and often the construction of reduced-order models that greatly decrease the dimension of the parameter space.
3.1. Active Subspace Methods. The general structure of an active subspace decomposition begins by letting m ∈ N be given and defining the space X = [−1, 1] m . Also given is a differentiable function f : X → R and an associated probability density ρ : Here, the space X represents a normalized set of parameter values. With these quantities in place, consider the matrix C defined by For any smooth f , the matrix C represents an average derivative functional which weights input values according to the density ρ. In general terms, f (x) represents the quantity of interest in a given model, while gradients of f are taken with respect to normalized model parameters x ∈ X, and ρ(x) is the probability density associated to the values of these parameters. We note here that a single normalized parameter is a random variable taking values in [−1, 1], which when appropriately scaled represents a parameter in the original model (1). Since the dimension of the parameter space in this model is 27, we take m = 27 throughout. The matrix C is the average of the outer product of the gradient of f with itself and has some useful properties that will allow us to deduce information about how f is altered by perturbations in its arguments.
Considering each entry of the matrix we note that C is symmetric, and thus permits the spectral eigendecompostion Here, W is an orthogonal matrix whose columns w i , (i = 1, . . . , m) are the orthonormal eigenvectors of C. From (3) we can further solve for the eigenvalues of C, which are given by From (4) we see that the eigenvalues of the C matrix are the mean squared directional derivatives of f , in the direction of the corresponding eigenvector. Thus, the eigenvalues of C provide useful information about the quantity of interest. For instance, if a particular eigenvalue is small then (4) tells us that, on average, f does not change significantly in the direction of the corresponding eigenvector. Conversely, if the eigenvalue under consideration is large, then we may deduce that f changes considerably in the direction of the corresponding eigenvector. Therefore, it will be of interest to further investigate the behavior of the function in this direction. Once the eigendecomposition (3) has been determined, the eigenvalues and eigenvectors can be separated in the following way: where Λ 1 contains the "large" eigenvalues of C, Λ 2 contains the "small" eigenvalues, and W k contains the eigenvectors associated with each Λ k , for k = 1, 2. An easy way to differentiate between the "large" and "small" eigenvalues is to list them on a log plot from greatest to least and determine a spectral gap. This gap will correspond to differences of at least an order of magnitude, and thus allow one to compartmentalize large eigenvalues within Λ 1 and the remaining smaller eigenvalues in Λ 2 . A more systematic method of choosing how many eigenvalues to store within Λ 1 will be presented in Section 3.2.
With the decomposition (5), we can represent any element x of the parameter space by Thus, evaluating the quantity of interest at x is equivalent to doing so at the point By the definition of W 1 and W 2 it's clear that small perturbations in z will not, on average, alter the values of f . However, small perturbations in y will, on average, change f significantly. For this reason we define the range of W 1 to be the active subspace of the model and the range of W 2 to be the corresponding inactive subspace. The linear combinations that generate these subspaces will then represent the contributions of differing parameters in the model and describe the sensitivity of the quantity of interest with respect to parameter variations. In general, the eigenvalues and eigenvectors of C defined by (2) can be wellapproximated, using a random sampling algorithm. We will briefly outline the method, but full details can be found in [2] (Algorithm 3.1) and [3]. The algorithm can be described concisely as follows: 1. Draw N samples {x j } N j=1 independently according to the density ρ.

For each parameter sample
represents a vector perturbation from the sampled parameter values and δ > 0 can be taken arbitrarily small. 3. Approximate the matrix C by We note that the last step is equivalent to computing the singular value decomposition of the matrix where it can be shown that the singular values are the square roots of the eigenvalues ofĈ and the left singular vectors are the eigenvectors ofĈ. The singular value decomposition method of approximatingĈ was developed first in [25].

3.2.
An Illustrative Example -Approximating T(1700). Next, we will demonstrate the active subspace method at one specific point in time by applying the aforementioned algorithm to the HIV model (1). A suitable quantity of interest would likely involve the T-cell or virus population evaluated at a fixed time. For this example we select T (1700), the T-cell count 1700 days after initial infection. This quantity was chosen because the T-cell count is a strong indicator of a patient's overall health and the most important factor in the progression of HIV infection. The time of 1700 days after initial infection was chosen because regardless of parameter values, preliminary simulations have shown that the patient's T-cell count within (1) will not have decreased to zero by this time. However, if a later time is chosen, the patient's T-cell count could vanish before the final time is reached. In order to compute C, one must be able to construct the gradient of the quantity of interest with respect to the normalized parameter space as required in Step 3 of the algorithm. Since we do not have an explicit representation for T (1700) as a function of system parameters, we cannot explicitly compute gradients. Instead we approximate them using the aforementioned finite difference scheme with a step size of δ = 10 −6 . Additionally, each sample is chosen so that every normalized parameter is uniformly distributed between −1 and 1, i.e. x j ∼ (U [−1, 1]) 27 . In order to map this normalized parameter space onto the biologically-relevant range of parameter values, we use the linear mapping for each of the random samples x j ∼ (U [−1, 1]) 27 , where x u and x l are vectors containing the upper and lower bounds on the parameters, respectively. Thus, the resulting vector p represents the actual parameter values input within the model. Next, a stiff differential equations solver (MATLAB's ode23s function) is used to  compute the T-cell count after 1700 days. Then, each of the 27 parameters is perturbed by δ = 10 −6 , and again the corresponding T-cell count after 1700 days is computed. With these two values, the aforementioned finite difference approximation is used to calculate the gradient of f (x) = T (1700; x) with respect to the normalized model parameters. Regarding the upper and lower limits, x u and x l are taken to be 2.5% above and below the standard values given in Table 1. Figure  2 displays the approximate eigenvalues of the corresponding C matrix. Clearly a spectral gap exists between the first and second eigenvalues. In order to automatically determine the optimal decomposition of Λ we can use the relative measure of separation given byλ Then, the dimension of the active subspace, i.e. the number of eigenvalues stored within Λ 1 , will be given by dim = argmax k=1,...,26λ k .
While the index of the largest value ofλ k describes the location of the largest spectral gap, it is often convenient to consider only the first two valuesλ 1 andλ 2 . Doing so limits the dimension of the active subspace to one and two respectively, which allows for easy visualization of the quantity of interest as a function of the active subspace and allows one to fit a curve or surface to the data. Plotting the values ofλ k results in Figure 4. Clearly, with this measure of separation, the optimal choice for the dimension of the active subspace is merely one. Consequently, we store λ 1 in the matrix Λ 1 and the remaining eigenvalues λ i , i = 2, . . . , 27, along the diagonal of Λ 2 . The active subspace is then generated by linear combinations of the entries of w, the first eigenvector. Figure 3 displays the eigenvector corresponding to the maximal eigenvalue shown in Figure 2, and it can be seen that three parameters possess associated weights greater than 0.3. These are the 9th, 22nd, and 25th parameters as ordered in Table 1. From the table these parameters can be identified as K 4 , δ 4 , and δ 7 , which represent the increase in macrophage population due to the immune system, the death rate of the macrophage population, and the death rate of the virus population, respectively. Hence, small perturbations in these parameters will significantly alter the value of T (1700), as they are the most heavily weighted. Contrastingly, changes within the remaining parameters, whose weights are near zero, will not have an appreciable affect on T (1700).
Plotting T (1700) along the active subspace results in Figure 5. Here, the horizontal axis is represented by values of the first active variable y = x · w, which represents a linear combination of the normalized parameters x with weights given by entries of the first active variable vector w. Adopting the terminology of [2], we will refer to plots of the quantity of interest along the active subspace as sufficient summary plots. Figure 5 (left) shows a clear trend, namely that T (1700; y) is a decreasing function of the active variable. Additionally, the data points within the sufficient summary plot can be fit to a particular function. In this case, a four parameter arctangent function is fit to the data to approximate the quantity of interest. This was performed using the MATLAB function lsqcurvefit which minimizes the residual (in the least squares sense) of the difference in the data and the approximation. Hence, we have determined that the data is best fit by T (1700; y) = −79.2532 − 492.5680 tan −1 (0.8933y − 1.9069), (11) and the resulting curve can be found in the plot on the right side of Figure 5. In order to test the accuracy of the nonlinear approximation, 100 simulations were run and the relative error computed. The results are displayed in Figure 6 and indicate that the approximation is typically within 5% from the value computed by computationally solving (1) to determine T (1700). Stated another way, over 90% of the test simulations returned approximations whose error was 5% or less. Of course, not every eigenvalue decomposition will necessarily result in a onedimensional active subspace of the model parameters. For instance, should one wish to approximate the T-cell count 2000 days after initial infection, the same order reduction method can be employed and the eigenvalues of the C matrix can be computed. The plot of the eigenvalues of C can be seen in Figure 7 (left). In this case the largest spectral gap exists between the second and third eigenvalues rather than the first and second, and therefore a more descriptive choice for the dimension of the active subspace is two instead of one.
The two weight vectors corresponding to these dominant eigenvalues are then computed and sufficient summary plots are obtained. Figure 8 shows the one and two dimensional sufficient summary plots, respectively, for the approximation of T (2000). Considering the two dimensional sufficient summary plot in Figure 8 (right), it can be seen that very little variation occurs within the T-cell count in the vertical direction, which represents the second active variable. All of the variation in the T-cell count appears to transpire mostly in the horizontal direction. Also, in Figure 8 (left) we can see that the one dimensional sufficient summary plot clearly shows a distinct trend. For these reasons, the one-dimensional active subspace representation provides a sufficient description of the dynamics in the parameter space, while the second active variable doesn't appear to contribute as greatly. Hence, even though the largest spectral gap appears between the second and third eigenvalues, a single linear combination of the parameters captures the overwhelming majority of the dynamics, and only the one-dimensional representation is utilized. Figure 7   Since this algorithm can be used to approximate the T-cell count at any fixed time, we can further compute active subspaces to reduce complexity in the parameter space and obtain a simple time course for T (t). We refer to such a model as a dynamic active subspace approximation. Within the next section we will demonstrate precisely how this reduced description can be constructed.
3.3. Dynamic Active Subspaces. Now that the approximation for T (1700) has been computed, this method can be repeated for a predetermined discretization of the time domain in order to create a global-in-time approximation for the T-cell count, say T (t; y). First, it is necessary to choose the mesh on which T (t) will be approximated. In this case the time interval [0, 3400] is divided into 86 non-uniform subintervals and the active subspace decomposition, along with the associated eigenvalues and eigenvectors, is used to determine a functional approximation at each time step. Next, one must orient the eigenvectors to point in approximately the same direction so that they transition smoothly from one time step to the next. By this we mean that the magnitude of the components of the consecutive weight vectors differ only slightly, but because the normalized eigenvector decomposition is only unique up to a sign, weight vectors at different time steps must be oriented so that they do not point in opposing directions. Upon correctly orienting the weight vectors, nonlinear curve-fitting is employed to compute the functions that best fit the associated sufficient summary plots.
Once the approximations have been computed at each time step, the final task is to dynamically assemble them using linear basis functions, namely where w(t) = 86 i=1 w i φ i (t) for all t ∈ [0, 3400] is a linear interpolation of the first active variable vector, w i , within the i th time interval, y(t) = x · w(t) is the corresponding active variable, T i (y) is the approximation to the T-cell count within the i th time interval, and φ i (t) is the hat function on the interval [t i−1 , t i+1 ] given by Recall that the original parameter values p and the normalized parameters x can be interchanged using (8). Here we use the notation T (t; x) instead of T (t; y) because within the global-in-time approximation, the active variable vector, w and hence the active variable y, changes within differing time intervals. Of course, a higherorder approximation can be obtained by expanding φ i (t) in a polynomial basis of greater degree. The results of computing the active subspace and sufficient summary plot for 18 of the 86 time steps can be found in Appendix B. With this we see that trends in the sufficient summary plot transition smoothly from one time step to the next and at each fixed time t the T-cell count can be represented by one of three distinct functional forms. In particular, curves that are fit to the data generated within sufficient summary plots transition from (1) linear to (2) arctangent trends at or around 55−65 days after initial transmission, while for times greater than or equal to 1800 days, the trend resembles (3) an arctangent function multiplied by a heaviside step function. The last of these aproximations occurs during later periods of the time course when the T-cell count may vanish for certain values of the active variable. For example, Figure 9 displays a sufficient summary plot of the T-cell count at 2600 days after infection occurs. For large values of the first active variable, say y ≥ 0.5, we see that T (2600; y) = 0 since a proportion of sample runs result in a patient's T-cell count tending to zero prior to 2600 days. Contrastingly, for y < 0.5, the Tcell count trend resembles an arctangent function. These three specific functional approximations and their transitions comprehensibly display the biological aspects of all stages of infection dynamics. Before discussing this further, a minor discussion of HIV disease pathogenesis is needed.
The time-course of HIV infection is characterized by three distinct stages: acute (or primary) infection, chronic infection (also referred to as the clinical latency stage), and the transition to AIDS. The first of these phases takes place within 10 − 20 weeks of initial introduction of the virus within a host and is characterized by a rapid fluctuation in the T-cell and virion population. With respect to the Tcell population there is initially a rapid decrease from the introduction of the virus, and then a rapid rebound arising from the body's immune response. Symptoms during this phase of the infection include fever, swollen glands, fatigue, rash, and sore throat. The next stage, chronic infection, ranges from a number of years to over a decade without treatment. During the chronic period the T-cell and virion populations remain at relatively constant levels, with the T-cell population decreasing at a particularly slow rate and the number of virions increasing steadily. During the last stage of infection, the transition to AIDS occurs as the T-cell population reaches a density lower than 200 cells per mm 3 . The progression to AIDS is typically associated with a sharp decrease in the T-cell population within a year or so. From the sufficient summary plots in Figure 10 and Appendix B, the three distinct stages of the disease become clear. The three functional forms arising from sufficient summary plots precisely separate the three distinct stages of HIV disease progression within an infected individual -the initial arctangent function representing the acute phase, followed by a slow linear decline that denotes the asymptomatic or chronic phase, and finally the heaviside arctangent function detailing the decline of the T-cell count as a patient develops AIDS. This description of the T-cell population provides a detailed visual account of the three stages and the transitions between them. For clarity, a representative graph of each trend in the data is provided in Figure 10. Therefore, the abrupt changes in the T-cell count arising within the one-dimensional active subspace of parameters completely categorize the stages of disease pathogenesis.
In addition to separating these stages of the disease, the active subspace method allows for the creation of three different types of approximate models. First, a visual representation of the quantity of interest -in this case, the T-cell countas a function of the most pertinent linear combination of parameters within the original model is provided. Additionally, the method allows for the construction of an explicit, analytic model by combining nonlinear function approximations such as (11) over an interval of time. In this direction a second approximation method is available instead of utilizing basis functions as in (12). Namely, one may prescribe the functional form during a particular stage of the disease and fit time-dependent coefficients to the transitions within sufficient summary plots. For instance, one may express the arctangent approximation over the timespan t ∈ [0, 55] by for well-behaved functions a, b, c, and d that are fit to the changes in the data. This type of explicit approximation would be easiest to implement during the asymptomatic phase for which T (t; y) = m(t)y + b(t) and the slope and T -intercept functions, m(t) and b(t) respectively, are given by the piecewise-defined or smoothed functions approximated in Figure 11. Lastly, the nonlinear fits arising from the sufficient summary data can be easily stored and supply the basis for a low-cost computational approximation without the need to simulate the original model over long time periods. In contrast to simulating the full system of ODEs for each new set of parameter values, this computational model need only be precomputed once and can easily describe which of the parameters are most important and during which stages they significantly alter the biological quantities of interest. As parameters within the original HIV model (1) vary amongst patients, they must be recomputed for each individual, and the computational savings provided by the reduced model is vital.
Utilizing the dynamical algorithm represented by (12), this computational model can be constructed and compared with a representative simulation of the full dynamical system. Figure 12 displays both the simulated T-cell count and its globalin-time approximation using dynamic active subspaces. We note that because the active subspace method is global with respect to the parameter space, the precise values of parameters in Table 1 do not necessarily influence the structure of the active subspace model, as long as a feasible range of parameter values is available.
In order to test the accuracy of the active subspace approximation to solutions of (1), 100 independent simulations were performed and the relative error was computed. Within these error calculations a uniformly-distributed random time was selected along with a uniformly-drawn selection of the parameter values within their respective ranges. The result is shown in Figure 13, and displays that for 96% of simulations the analytic approximation varied less than 5% from the value given by the stiff differential equation solver. Hence, one may conclude that the globalin-time active subspace model well-approximates solutions to the original system of ODEs given in (1).

3.4.
Dimension Reduction in the Parameter Space. As determined in the previous section, not all parameters are required to describe the behavior of solutions within the model and those that are needed may not be important during each stage. Therefore, it makes sense to investigate the construction of reduced models, namely those which eliminate the contributions of certain parameters. Though the active subspace method has reduced the dimension of the parameter space upon which the T-cell count depends, all of the parameter values are still needed in order to compute the approximate solution, as the reduced parameter space has been expressed merely as a linear combination of the original parameter values, i.e. y = x·w. However, the weights (or coefficients) within this linear combination, given by w, should provide a clear method to reduce the original parameter space as well. For instance, assume that a parameter, say x 3 , possesses a corresponding weight entry w 3 that is very small in comparison to the other weights. Then, variations in x 3 will have little to no effect on the reduced parameter space y, and hence will not appreciably influence the output variable T (t; y). Thus, one can merely eliminate x 3 within the model or set x 3 ≡ 0 rather than considering it as a parameter whose value could possibly vary depending upon the patient.
While it would be most beneficial to determine those weight vectors that remain small throughout the entire course of infection and remove the corresponding parameters, this is an unlikely scenario as different parameters will typically influence different stages in some substantial manner. For this reason, a true global-in-time parameter reduction is highly unlikely. Indeed, the computed weight vectors for this study show that at most 3 of the 27 parameters -namely, K 11 , K 12 , and K 13could be eliminated without introducing enormous variations in the behavior of the model during some stage of infection. Such a limited reduction in complexity would be marginally beneficial to the expense of computations. Instead, we may separate the dynamics into the three distinct stages of the disease and use parameter reduction to create new models for each stage separately. By eliminating the associated interactions from (1) we may derive a simpler system that still accurately predicts each of the three phases of infection.
As an illustrative example, we consider the weight vectors arising within the first 40 days of initial infection and remove parameters whose weights remain below a fixed threshold throughout this interval of time. Using a relatively strict threshold, in this case 0.032, a number of parameters are removed from (1), namely s 2 , s 3 , K 2 , K 4 , K 5 , K 6 , K 7 , K 8 , K 10 , K 11 , K 12 , K 13 , δ 3 , δ 4 , δ 5 , δ 6 , and α 1 . With these parameters eliminated, the T L , M , M I , and CT L populations decouple from the remaining equations. This implies that latently-infected T-cells, macrophages (both healthy and infected), and cytotoxic lymphocytes do not play an important role in the early behavior of the disease, i.e. within the first five to six weeks of introduction of the virus within the body. Because these populations decouple, it's necessary to remove K 3 (whose maximum weight during the first 40 days is 0.1256) and set ψ = 1, thereby yielding the reduced system for the acute stage, namely This model is an augmented form of the well-known three-component model (see [24]) with an additional Michaelis-Menten term within the T-cell population, which accounts for the homeostatic proliferation of such cells upon depletion of this compartment due to interactions with virions. This model was recently analyzed in detail in [19] and found to possess many desirable properties, including bistable equilibria which explain the dependence of infection dynamics on the initial T-cell count and viral load, as well as the existence of a Hopf bifurcation which describe oscillations within the system. Using the fitted parameter values given in [19] for the model (13) and plotting against the full model (1) with the standard parameter values given in Table 1 results in Figure 14.
Hence, for the acute stage we have reduced (1) with 27 independent parameters, to (13), which features only 8 parameters. Following the same procedure for the other two stages would further reduce the dimension of the parameter space in the model (1) at later stages of the infection. These reduced models could then be used to analyze the disease separately within each distinct stage of infection and with much less computational cost. If a global model is still preferred in comparison to separating models by stage, one can perform this parameter reduction for all three stages subject to the constraint that values of T (t) within time-adjacent models be equal and a new global-in-time dynamical model arises by construction. Such a reduced, long-term model would be comparable to that of Figure 12. 4. Conclusion. The current study concerns an analysis of the system (1), which is one of the only mathematical models to accurately represent all three stages of HIV infection within a host. A unique, biologically-relevant virus-free equilibrium was shown to exist, and conditions were determined that guarantee the local asymptotic stability of this state. Then, using dynamic active subspaces, the system of seven ODEs with 27 parameters was approximated by algebraic and computational models while retaining the majority of pertinent information and system behavior. This method enabled the discovery of a dominant subspace of parameters, within which the T-cell count is most sensitive to perturbations, and allowed us to perform thorough parameter studies in this direction, rather than the complete 27-dimensional parameter space. In general, active subspace methods also provide for a deeper visualization of the dependence of solutions on the parameter space, as well as, an analytic model (12) to describe the quantity of interest and a computational model for which solutions are less expensive to construct. The efficacy of the model (12) was investigated by calculating the relative error compared to solutions of (1) solved with a stiff ODE solver.
As with any study, a number of future questions arise from our investigation. First, as many parameter values within (1) are not specifically known and vary greatly in the literature, one would like to quantify the uncertainty inherent in choosing a particular value for each. While this can be investigated using the approximations established herein, it was not the focus of the study and more complex mathematical tools are needed to do so. Another direction to consider is the explicit quantification of error within the lower-dimensional approximations provided by active subspaces. Some preliminary results appear in [2], but exact bounds and convergence theorems for dynamic, rather than time-independent, active subspaces are currently unavailable within the literature. Finally, we note that these methods can be used for other in-host models of HIV (or other physical and biological models [4]) that possess high-dimensional parameter spaces. For instance, a recent refinement of (1) was proposed in [9], and appears to display less sensitivity to variations in parameter values. Hence, a similar technique would likely be useful to conduct a parameter study or construct a reduced-order model for this system.
Creating a linear system with (14), the CT L equation, and the V equation then solving for T I , T L , and M I gives Lastly, inserting the value of M I into the T equation and solving for T yields Finally, consider the former case. Then, it follows from the equation for V that T I = 0 as well. Collecting these terms in the CT L differential equation implies that CT L = s3 δ6 . The equations for T I and T L together imply T L = 0, and finally, with the remaining populations determined, the first equation implies T = s1 δ1 . Hence, we find the steady state We note that assuming all parameter values are strictly positive implies that the only non-infective steady state of biological significance is E N I . Finally, we sketch the proof of the asymptotic stability result, which utilizes standard methods from the theory of dynamical systems (i.e. the Hartman-Grobman and Routh-Hurtwitz theorems) to determine the qualitative behavior of the E N I steady state from (1).
Proof of Theorem 2.2. As the model is not positivity-preserving a simple technique like the next-generation method does not apply. Hence, we will utilize the Hartman-Grobman Theorem to arrive at the stated result and omit some of the more technical details. We begin by computing the Jacobian of (1) evaluated at the steady states From this, we can see that three eigenvalues are certainly real and negative
The same conditions imply the positivity of a 1 . For a 0 , the negative terms are dominated by positive terms if and only if the two previous conditions hold and K 2 K 5 K 9 s 1 s 2 ≤ δ 1 δ 2 δ 4 δ 5 δ 7 + δ 4 δ 5 K 1 K 4 s 1 + δ 1 δ 2 K 5 K 10 s 2 .
Appendix B. Visual representation of three stages of infection. In Figures  15, 16, and 17, we display a more temporally refined representation of the three stages of infection displayed by the model.