Optimal control of a mathematical model for cancer chemotherapy under tumor heterogeneity.

We consider cancer chemotherapy as an optimal control problem with the aim to minimize a combination of the tumor volume and side effects over an a priori specified therapy horizon when the tumor consists of a heterogeneous agglomeration of many subpopulations. The mathematical model, which accounts for different growth and apoptosis rates in the presence of cell densities, is a finite-dimensional approximation of a model originally formulated by Lorz et al. [18,19] and Greene et al. [10,11] with a continuum of possible traits. In spite of an arbitrarily high dimension, for this problem singular controls (which correspond to time-varying administration schedules at less than maximum doses) can be computed explicitly in feedback form. Interestingly, these controls have the property to keep the entire tumor population constant. Numerical computations and simulations that explore the optimality of bang-bang and singular controls are given. These point to the optimality of protocols that combine a full dose therapy segment with a period of lower dose drug administration.


1.
Introduction. The main obstacle to a successful chemotherapy treatment for cancer has been and still is tumor heterogeneity. Cancer cells typically are genetically unstable and, coupled with high proliferation rates, this leads to significantly higher mutation rates than in healthy cells [7,8]. As a result, a tumor often consists of an agglomeration of diverse subpopulations of cells with widely varying phenotypes and chemotherapeutic sensitivities. The Norton-Simon hypothesis [20,21] postulates that tumors typically consist of faster growing cells that are sensitive to chemotherapy and slower growing populations of cells that have lower sensitivities or are resistant to the chemotherapeutic agent. Possible explanations for this feature lie in the fact that resistance is achieved through pathways that use up more energy which thus cannot be used for proliferation [6] and evolutionary mechanisms then lead to a dominance of sensitive cells, simply since cells that duplicate fast will outperform those that replicate at slower rates. At the same time, cells that proliferate fast are also more vulnerable to a cytotoxic attack since they have higher growth fractions in synthesis and mitosis. If there exist subpopulations for which the activation mechanisms of certain drugs (targeted or not) do not work, this eventually will lead to a failure of therapy. Intuitively, in the presence of drug resistant strains, as the cytotoxic agent kills off the sensitive cells, the resistant cell population becomes increasingly more dominant. In fact, it is even conceivable that the killing of all the faster growing sensitive cells enables the resistant population to thrive (which may take years to materialize) while this subpopulation was in some sense controlled previously through evolutionary mechanisms by the faster growing sensitive populations. Similar ideas are also behind efforts to come up with adaptive therapies [5]. It is important to note that such behavior is a basic systems-type mechanism that is equally present for traditional cancer drugs that widely attack all strongly proliferating cells and for targeted therapies. The latter clearly have the advantage of much lower side effects, but they do not offer a means to deal with this fundamental issue of having possibly a tiny fraction of intrinsically resistant cells that in time then becomes dominant. It is an intriguing idea to maintain some level of the evolutionary fitter sensitive population and try to control in this way the more dangerous resistant population. Obviously, both from an ethical and practical point of view, there are severe obstacles to such an approach in praxis, but it seems worthwhile to analyze what mathematical models would conclude about such strategies.
Lorz et al. [18,19] have formulated a mathematical model for phenotypic heterogeneity and drug resistance in solid tumors that allows for a continuum of possible traits. This model then was expanded upon by Greene, Lavi, Gottesman and Levy [10,11] as a means to explain the roles played by increasing cell densities and mutations in the emergence of specific traits that become dominant. We briefly describe this model in Section 2 and then in Section 3 formulate therapy as an optimal control problem. In the modeling we include multiple chemotherapeutic agents in order to allow for combinations of drugs that act on different pathways and thus may be effective for separate tumor subpopulations. Mathematically, as an optimal control problem, the dependence of the model on a continuum of traits x gives the problem a distributed aspect and the resulting optimization problem becomes non-standard and of great complexity. For this reason it makes sense to start its analysis by considering finite-dimensional approximations in a simpler framework. In Section 4 we formulate and analyze such a high-dimensional approximations for the case of a single chemotherapeutic agent. The approximating model is a nonlinear system with weak interactions only linked through the total size of the tumor and thus it becomes more amenable to analysis. Indeed, in spite of the high dimension of the system, singular controls are given by feedback functions and an explicit formula will be derived. We then conclude in Section 5 with some numerical computations that illustrate the overall system behavior for some locally optimal and singular solutions.
2. An infinite-dimensional model for tumor heterogeneity. As a starting point, we briefly describe the mathematical model for tumor heterogeneity from [10]. In this model, a continuum of possible traits (phenotypes) x, x ∈ [0, 1], is considered. The underlying philosophy is that both the replication rates r and natural death rates µ of cells may depend on the trait x. More importantly, chemotherapeutic sensitivities with respect to a particular agent will vary with some cell-types being more resistant than others. Thus these terms become functions of x, r = r(x) and µ = µ(x). If one makes the linear log-kill hypothesis on the action of a drug concentration c = c(t) with ϕ = ϕ(x) the cytotoxic killing parameter under treatment, then a basic trait dependent model of exponential growth takes the form where n(t, x) denotes the population density of cells with trait x at time t. More realistically, the rates for cell division and apoptosis not only depend on the trait x, but also on the cell density [9] which is closely related to the total tumor mass N = N (t) of cancer cells at time t, Thus, in [10] the following equation is considered with f = f (N ) and g = g(N ) positive functions that depend on the total tumor size N . Based on biological considerations it is assumed that the function f is strictly decreasing with total tumor size N and that the function g is strictly increasing. Such conditions simply reflect the facts that with increasing tumor volumes replication rates decrease and death rates increase. Also, most cytotoxic agents merely prevent further cell divisions and for this reason the drug-related cytotoxic term here is interpreted as lowering the replication rate r. Equation wherec(τ ) = c(t(τ )). This modeling change introduces a logistic structure similar to the form (a − bN )N . Under the above assumptions on f and g, the scaling factor G is strictly increasing and once this term offsets the balance between growth and apoptosis, the population stabilizes. It is not difficult to introduce mutations into the model as well (see [10,25]), but here we only consider this simpler densitydependent equation.

3.
Tumor heterogeneity as an optimal control problem. In this section, we formulate an optimal control problem for the dynamics (4) that encompasses combinations of various drugs and their pharmacometrics. It is common practice in chemotherapy to give not just one drug, but a cocktail of therapeutic agents that have different activation mechanisms, i.e., act on different pathways. Naturally, resistance of cells is not absolute, but drug specific and while certain sub-populations of cells may not react to a particular drug, they may be sensitive to another. We therefore consider a multi-input formulation with m controls u i which we write as a column vector u = (u 1 , . . . , u m ) T . These controls represent the dose rates of the particular agents and are connected with the respective concentrations through pharmacokinetic models. These are generally described by low-dimensional linear differential equations with the dimension corresponding to the number of compartments used in the modeling. Here, for simplicity, we just incorporate a 1-compartment model of exponential growth and decay, but this form could easily be made more general. Pharmacodynamics, i.e., the effectiveness of the drug, is modelled through the expressions for the killing terms in the dynamics. In the formulation in Section 2 we made the linear log-kill hypothesis on the action of a drug concentration c i with ϕ = ϕ(x) a trait dependent cytotoxic killing parameter under treatment. Alternatively, Michaelis-Menten terms of the form could be used. This equation describes saturation effects with C i,50 denoting the concentrations for which half the maximum possible effect ϕ i (x) is realized. The linear model (5) generally is adequate if the concentrations are kept well below saturation levels and here we retain this simpler structure. The important aspect is that the sensitivities of the various sub-populations are modelled through a coefficient that depends on the respective trait x. Denoting the time scale with the more common notation t, the dynamics thus takes the form In an optimal control problem, the aim then becomes to choose the controls u in a way to minimize some performance criterion imposed on the dynamics. Here we use a functional of the form where α, β are positive constants and γ is a row vector of positive weights. The objective consists of a weighted average of the total tumor population αN (T ) at the end of a fixed therapy interval [0, T ], an integral term β T 0 N (t) that measures the tumor size during the therapy interval, and a penalty term T 0 γu(t) that limits the toxicity of treatment. The latter serves as an indirect measure of side effects.
As it is typical in optimal control formulations, the coefficients α, β and γ represent variables of choice and generally need to be fine tuned to achieve a system performance that would be considered adequate with respect to other aspects possibly not included in the modeling. For example, the integral term T 0 N (t)dt is included to prevent that all the efforts are put on minimizing the tumor volume at the end of treatment which may lead to unacceptably high intermediate values.
Similarly, including the total drug dosages T 0 u i (t)dt as penalty terms in the objective is just one way of limiting side effects of treatment. Alternatively, one could assume that the total amount of drugs to be given is specified a priori based on a medical assessment of their toxicity and then the question becomes how to best administer the specified amounts of drugs. For example, this is the point of view taken in the papers [4,14] where anti-angiogenic therapies were considered. In such an approach these limits are added to the model as isoperimetric constraints of the form Mathematically this increases the dimension of the state space as m equations need to be added to account for the use of the drugs over time. From a practical perspective, even if the amount of drugs is specified a priori, it is of importance to see how the solutions change if the amounts A i are varied and thus, if possible, these problems are solved depending on these values. The penalty approach we pursue here is in some sense complementary, but fully compatible with the hard limit approach. By including the total drug dosages as penalty terms, the results for different values of the weights γ will also tell us what tumor volumes are realizable with what amounts of drugs. Indeed, mathematically the two problem formulations lead to almost identical systems describing the optimality conditions and similar posteriori considerations should take place in interpreting the results. From the point of view of solving the underlying problem, these are simply two different ways of approaching it. We only note that there would be significant differences if the penalty term were not taken linearly as L 1 -term. In our view, formulations that use, for example, quadratic objective functionals in the controls distort the underlying biology since they do not represent the total drug dosages.
We thus arrive at the following optimal control problem: subject to the dynamics (6)- (7). As optimization problem, the dependence of the model on the traits x brings in a distributed aspect that makes this a highly non-standard problem. The natural choice in line with the mathematical structure would be to consider x-dependent controls, but these do not represent the medical problem. It is simply not possible to administer an agent and tell it to act on x ∈ [0, 1 2 ], but not on x ∈ ( 1 2 , 1]. The different effects that drugs have on the various sub-populations are modeled as pharmacodynamics, but drugs are administered as one unit in time and cannot be split over the sub-populations. Indeed, the problem very much resembles the recently introduced concept of ensemble control [16,17] where the aim is to control an ensemble of systems (here described by the various traits x) by means of one control.

4.
Finite-dimensional approximations. The complexity of the optimal control problem (Th) is such that it is not even clear which form necessary conditions for optimality take. It is therefore prudent to start with simpler problem formulations that approximate this model. In this paper, we only consider a single-input formulation. We also drop the pharmacokinetic model, i.e., we are identifying the dose rates of the agents with their concentrations. This might appear drastic as obviously concentrations vary continuously while dose rates can be abruptly changed and, as we shall see below, discontinuities do arise in this formulation that are not physically real for the concentrations. However, once we know the solution to this simplified model, then the incorporation of a linear pharmacokinetic model (5) into the system is rather standard and we shall comment more on this below.
We consider finite-dimensional approximations to the problem with n distinct traits and a single control u = u(t). We denote the state by N = (N 1 , . . . , N n ) T and writeN for the average. We then express G as a function ofN . The optimal control problem (Th) is then approximated by the following problem: (TA): For α and β row vectors of positive weights and γ > 0, minimize the objective We are interested in the case when n is large. Generally, high-dimensional optimal control problems are difficult to analyze, even numerically, but the problem considered here has simplifying features that enable analytic approaches. If we write the dynamics (11) in matrix form, we geṫ where R = diag(r 1 , . . . , r n ), C = diag(ϕ 1 , . . . , ϕ n ), M = diag(µ 1 , . . . , µ n ), all are diagonal matrices. This has several simplifying features (for example, the matrices commute) that allow us to make analytical computations.

4.1.
Necessary conditions for optimality. First-order necessary conditions for optimality are given by the Pontryagin maximum principle [23] (for some more recent references on optimal control, see [2,3,24]). Since there are no terminal constraints in our problem formulation, abnormal extremals do not arise and these conditions take the following form: if (N * , u * ) is an optimal controlled trajectory for the problem (TA) defined over the interval [0, T ], then there exists a multiplier λ = (λ 1 , . . . , λ n ) : [0, T ] → (R n ) * (which we write as a row vector) that satisfies the adjoint equatioṅ with terminal condition λ(T ) = α (14) such that the optimal control u * minimizes the Hamiltonian function H defined as in u over the control set [0, u max ]. (We use primes to denote the derivatives of the function G.) Any controlled trajectory (N, u) for which there exists a multiplier λ such that these conditions are satisfied is called an extremal and the triple (N, u, λ) is called an an extremal lift.
Since the Hamiltonian H is linear in the control u, the minimization property implies that where Φ is the switching function for the problem defined as Whenever the switching function does not vanish, optimal controls are given by either full dose treatment or no treatment. However, optimal controls may take values in the interior of the control set if Φ vanishes identically over some open time interval. Such controls are called singular while controls that only take values in the boundary points of the control interval are called bang-bang controls. Whenever Φ(τ ) = 0 andΦ(τ ) = 0, the optimal control switches between u max and 0 depending on the sign ofΦ(τ ) which gives the switching function its name. If the switching function vanishes over an open interval I, then also all of its derivatives vanish on I and typically, (i.e., with the exception of highly degenerate cases), singular controls can be determined from these formulas. Overall, optimal controls then need to be synthesized from these candidates and often are given by combinations of bang and singular controls with the precise concatenation structure to be determined through an analysis of the properties of the switching functions. Singular controls generally correspond to smooth time-varying administration schedules at less than maximum allowed values. As such, they are physically real and have clear interpretations both as dose rates and concentrations. Bang-bang controls, on the other hands, and also concatenations between bang and singular controls, are generally discontinuous and as such are only realistic on the level of dose rates. The effects that the addition of a linear pharmacokinetic model has on the structure of optimal controls has been investigated in several papers (e.g., see [13]). If optimal controls are bang-bang, this property is preserved on the level of the dose rates and if the objective functionals are properly calibrated, the switching times are close to each other. For example, this is illustrated for a cell-cycle specific model for cancer chemotherapy in section 2.3 in [25], pgs. 113f. For concatenations between bang and singular controls the situation is more involved mathematically since these connections are now realized optimally through chattering controls [15]. But there always exist excellent simple sub-optimal approximations for the model with PK that are obtained from the model without PK [12]. For this reason, the analysis of the model without a pharmacokinetic model as we are pursuing it here, generally with some standard modifications gives the optimal solutions for the full model as well.

Singular controls.
We compute the singular control. In differentiating the switching function, it is well-known that the control can appear for the first time only in an even numbered derivative and thus we need to compute the first two derivatives. Since R, C and M are diagonal matrices, all of these matrices commute with each other and this simplifies the formulas. Using the adjoint equation we get thatΦ Differentiating one more time yields This expression is linear in u * (t) and can be written in the form with the functions θ and χ given by and These formulas simplify significantly if we assume that the weights β at N in the Lagrangian are all equal, i.e., that the aim is to minimize the total tumor mass. We henceforth assume that β =βe. (22) With this specification, the first derivative of the switching function reduces tȯ In particular, if the control u * is singular over an open interval I, then this expression must vanish and, since eCN (t) is positive, we have that Using this relation and (22), it then follows that (G (N (t))) 2 (λ(t)M N )(eM N (t))(eCN (t)) = G (N (t))(βM N (t))(eCN (t)) and Substituting these relations into equation (20) gives us that The coefficient χ at u * in equation (19) is simplified in the same manner and we obtain that It is a necessary condition for minimality of a singular control, the Legendre-Clebsch condition (e.g., see [2,24]), that ∂ ∂u If this expression is non-zero, we say the singular control is of order 1 and if it is negative, we say the strengthened Legendre-Clebsch condition for minimality is satisfied. Note that this expression is the function χ(t). Using (24) we therefore have that ∂ ∂u By our assumptions G is strictly increasing and it therefore follows that the strengthened Legendre-Clebsch condition is satisfied (i.e., this quantity is negative) where the function G is strictly concave and it is violated where G is strictly convex. Summarizing, we have the following result: If a control u is singular on an open interval I, then the control is of order 1 and the strengthened Legendre-Clebsch condition for minimality is satisfied where G is strictly concave and it is violated where G is strictly convex. The singular control is given as a feedback function by the formula It only remains to verify the formula for the singular control. It follows from Φ(t) ≡ 0 on I that the singular control is given by verifying (29). Proof. Let T = n i=1 N i (t) denote the total tumor size. Then, along a singular control, we have thaṫ 5. Numerical computations and illustrations of the system behavior. In this section, we give some numerical results and simulations. The parameter values and functions that we used in our computations are not based on medical data, but merely reflect some qualitative biological properties. The functional forms for r and ϕ are taken from [10]. We mostly keep the dynamics fixed across the runs and only vary the weights in the objective to illustrate their effect on the solutions and the system's response.
In the notation of the problem from Section 2, we use decreasing replication rates r [10], but keep the death rate constant, µ ≡ 0.50 . This is merely done to better focus on the effects that different replication rates have. For the cytotoxic killing parameter we use the two functions that only differ in the constant and follow the same functional expression as used in [10]. Thus the chemotherapeutic sensitivities decay with increasing value x, all subpopulations are sensitive to the drug, but with differing rates. As reparameterization function we have chosen the strictly increasing and concave function Hence singular controls satisfy the strengthened Legendre-Clebsch condition everywhere.
The weights in the objective are chosen according to (22) with n = 21, but in our simulations we varyᾱ andβ to see the effects they have on the controls. The states N i , i = 1, . . . , 21 are evenly spaced over the interval and correspond to the values x i = 0.05i. The upper limit on the control is set to u max = 1 and the final time T is fixed and chosen as T = 10. In all the calculations we choose the initial distribution of cells over the traits uniform with value n(0, x) ≡ 200 21 . For the numerical computations we adopt the standard shooting method to compute extremals: starting with an initial guess for λ(0), the dynamics and adjoint equation are integrated forward in time while the control is chosen in accordance with the sign of the corresponding switching function Φ(t) = β − λCN (t) using equation (16). Using small perturbations for each element of λ(0), numerically the matrix of sensitivities of the endpoint of the multiplier, λ(T ), with respect to its initial point, λ(0), is computed and then λ(0) − 0.01 implemented as the new guess for the initial condition of λ. This procedure is iterated until a desired error in the transversality condition on the multiplier at the end-point is achieved, λ(T ) − α < ε. These computations are able to generate extremal controlled trajectories that correspond to bang-bang controls, but generally (like any other direct numerical procedures) do not locate possible singular controls. Figures 1-3 below show three examples of locally optimal bang-bang controls with one switching that have been computed in this way with the cytotoxic killing parameter chosen as the function ϕ 1.5 (x). (We only remark that for an optimal control problem over a fixed horizon without terminal constraints like (TA), there exist second-order tests to verify whether an extremal controlled trajectory corresponding to a bang-bang control provides a strong local minimum (e.g., see the theory developed in Chapters 5 and 6 in [24]). This is the case if all bang-bang switchings are transversal folds and algorithmic procedures to verify these conditions are developed in [24,Sect. 6.2], especially Theorem 6.1.3. These conditions are satisfied for all the numerical examples given in this paper.) The weights used in Fig. 1 (especially,β = 80) put a high penalty on the total tumor size throughout the treatment interval and this causes the control to be at full dose almost over the entire period. But even with full dose control, as shown in the middle segment, some of the traits barely decrease and once the control is turned off, these populations strongly increase. Since the control is turned off towards the end of the therapy interval, the total volume at the end has increased and the distribution has shifted towards a more centric one. Note that, although   the subpopulations with highest x values have the lowest drug sensitivities, here these are still the ones to be depleted the most. The reason lies in the fact that these are also the slowest growing populations and their sensitivity to the drug is still high enough to eradicate these subpopulations. The net balance is worst for x near 0.4 and these populations are the ones that actually grow. For a constant control u ≡ 1 and a prolonged time period, this would become the dominant (and in this case) growing 'resistant' trait. Similar features are also seen in Figs. 2 and 3. The difference is that now the weightβ for the tumor during the therapy horizon has been made small compared to the weightᾱ at the end-point and so minimization of the objective shifts drug administration to the end of the therapy interval. Comparing the cell densities at the final time, it also is evident that for such data the tumor grows vigorously in the absence of treatment.
Although the largest total amount of drugs is given in the example in Fig. 1, the density at the end-point shows the worst behavior with the total tumor volume higher than at the initial point. Such a feature points to singular controls, which automatically limit the total tumor volume, as the possibly better alternative. Figures 4 and 5 compare a numerically computed bang-bang extremal solution with one switching with a control that has a singular segment. For these runs the cytotoxic killing parameter was chosen as the function ϕ 1 (x) and the weights are equal,ᾱ = 1, β = 1 and γ = 100. In Figure 5 the control is shown that starts with a singular segment for time σ = 5 and then switches to a full dose segment for the rest of the interval. This trajectory is not an extremal (we did not enforce the transversality condition at the end point), but it has a significantly better value for the objective, J(u) = 92, than the numerically computed extremal controlled trajectory shown in Fig. 4 which only gives J(u) = 678. The reason is that, by not giving drugs for more than half of the therapy horizon, the tumor volume is allowed to increase too high. The singular control, although the control values are only around 0.3 to 0.4, manage to limit the growth of the fastest growing tumor subpopulation to about 15, while this value exceeds 50 for the no drug segment of the control in Fig. 4. If we optimize over the switching time σ, we obtain a switching time of σ = 9.96, i.e., the control is singular almost over the full period, but for this case the terminal density looks inferior. The calibration of the weights in the objective functional (8) so that optimal solutions will conform to the manifold real aims of therapy can be a tedious matter. 6. Conclusion. In this paper, we formulated an optimal control problem for cancer chemotherapy that aims to minimize a combination of the tumor volume and side effects over an a priori specified therapy horizon when the tumor consists of a large number of subpopulations with possibly different chemotherapeutic sensitivities. If we use a continuum of traits, the complexity of the optimization problem is high and for this reason in this paper we considered a finite-dimensional approximation with a large dimension. From a mathematical side, in spite of an arbitrary dimension of the state space, singular controls, which correspond to administration of the agents at generally lower and time varying dose rates, can be computed explicitly as feedback controls. Our simulations have shown that these strategies offer a decisive advantage in controlling the full tumor population if the values for the singular control are small. Our numerical results strongly point to the optimality of singular controls for the problem as the values for the objective tend to be smaller than for the computed bang-bang extremals, in some cases significantly. However,   a complete solution of even the approximating problem still requires to analyze concatenations between bang and singular controls and this still needs to be carried out. The optimality of singular controls, and the generally lower time-varying doses they represent, would offer some insights into how to approach tumor heterogeneity and have some qualitative implications on the administration of chemotherapy in practice as it would suggest that, possibly after some initial strong induction therapy that reduces the tumor volume, it might then be a better strategy to treat the tumor using lower doses, possibly a.k.a. metronomic chemotherapy [1,22]. This might control the total tumor population more evenly and prevent the dominance of possibly resistant strains.