SINGULARITY OF CONTROLS IN A SIMPLE MODEL OF ACQUIRED CHEMOTHERAPY RESISTANCE

. This study investigates how optimal control theory may be used to delay the onset of chemotherapy resistance in tumours. An optimal control problem with simple tumour dynamics and an objective functional explicitly penalising drug resistant tumour phenotype is formulated. It is shown that for biologically relevant parameters the system has a single globally attracting positive steady state. The existence of singular arc is then investigated ana- lytically under a very general form of the resistance penalty in the objective functional. It is shown that the singular controls are of order one and that they satisfy Legendre-Clebsch condition in a subset of the domain. A gradient method for solving the proposed optimal control problem is then used to ﬁnd the control minimising the objective. The optimal control is found to consist of three intervals: full dose, singular and full dose. The singular part of the control is essential in delaying the onset of drug resistance.

1. Introduction. A common feature of many types of cancers is their ability to develop resistance to chemotherapy over the course of treatment [2,15]. This process, referred to as Acquired Drug Resistance (ADR), poses a major obstacle in effective chemotherapy scheduling. The ADR effect is usually attributed to high mutation and proliferation rates of cancers, coupled with a strong selective force imposed by the cytotoxic agent [5,6]. Metronomic therapy is an alternative to the commonly employed Maximum-Tolerated Dose (MTD) regimes [4]. While MTD scheduling involves administration of high drug doses with long drug-free periods in between, metronomic therapy suggests administration of low doses with short drug-free periods (or continuously). Among other benefits reported in the literature (reduction of toxicity, inhibition of tumour vasculature, immune system activation), metronomic therapy is supposed to minimise drug resistance [9].
The rationale behind metronomic therapy may be summarised as follows: cancer is to be treated as a chronic disease, which should be managed, rather than cured, as the latter may be impossible [4]. From the point of view of treatment scheduling, the two paradigms behind MTD and metronomic therapies illustrate a conflict between two therapeutic goals -maximising cell kill and preventing drug resistance. This problem, due to its quantitative nature, has been studied not only by medics and experimental oncologists, but also by mathematicians. There exists an amount of mathematical models available in the literature which aim to simulate cancer cell proliferation and development of drug resistance. These models range from compartmental models of two [7,11,13] or more [18,21], to models with a continuum of resistance classes [3,10]. Such models not only allow for in silico assessment of therapeutic schedules, but may also be used to find theoretically optimal protocols [11,16,20]. These optimal protocols, although usually not directly realisable under clinical conditions, may aid in hypothesis testing and offer guidelines for a design of more effective chemotherapy schedules.
Under the optimal control theory framework, one is given a model of tumour growth under therapeutic action and aims to find a drug scheduling (control) which minimises a given objective functional. Such an objective functional is usually composed of three parts: (i) a penalty for tumour size (typically a weighted average over all types of tumour cells) at the end of the treatment, (ii) a penalty for overall tumour burden, and (iii) a penalty for drug resistance. While defining the objective this way the modeller is capable of gaining valuable insights into the disease dynamics, it does not offer means to penalise drug resistance beside adjusting the weights assigned to different cell classes. We believe, that in the metronomic spirit of treating cancer as a chronic disease, a modification of the objective functional is in place to include an explicit penalty for drug resistance.
In this study, a simple compartmental mathematical model of tumour growth under chemotherapy is considered under the optimal control theory framework with a non-standard objective functional. The tumour cells are subdivided into two compartments (sensitive and resistant). The objective functional is defined so that, aside from the tumour volume, drug resistant phenotype is penalised. This approach offers a novel opportunity to investigate how a resistance-minimisation paradigm will affect the structure of theoretically optimal controls. 2. Mathematical model. As the aim of this study is to focus on the objective functional, the underlying mathematical model is intentionally chosen to be simple. The tumour population is subdivided into two compartments n 1 , n 2 , which respectively denote the sensitive and resistant subpopulations. It is assumed that both populations follow a logistic-type growth with a common carrying capacity (as they compete for the same resources), but different growth rates. It is furthermore assumed that there is a mutation-induced flow between the compartments and that the action of the drug results in cell death according to the log-kill model [17].
The (non-dimensionalised) model takes the following form: where n 1 , n 2 are the non-dimensional volumes of cells respectively sensitive and resistant to chemotherapy, and u : [0, T ] → [0, 1] is the non-dimensional chemotherapy dose (or control), γ 1 , γ 2 are the non-dimensional proliferation rates and τ 1 , τ 2 are the non-dimensional mutation rates. The timescale was chosen so that the coefficient in front of the chemotherapy-induced cell death is 1 and the cell populations were rescaled by the carrying capacity.
Notice that the form of the RHS of System (1) yields the existence, uniqueness and non-negativity of solutions for non-negative initial data. Optimal protocols are of interest only for realistic tumour size. We restrict our domain of interest to the region Ω defined as Ω = {(n 1 , n 2 ) ⊂ R 2 : n 1 , n 2 ≥ 0 and n 1 + n 2 ≤ 1}, which is invariant under System (1). To see that the triangle Ω is indeed invariant under System (1) it is enough to note that for n i = 0 we haveṅ i = 0 for i = 1, 2 and that for n 1 + n 2 = 1 we haveṅ 1 +ṅ 2 = −un 1 < 0. This also implies that solutions exist for all t ≥ 0.
2.1. Stationary states. In this section we examine the existence and local stability of the steady states of System (1). As a first and simplest approximation of the treatment we consider a continuous, indefinite chemotherapy, i.e. we set u(t) = u ∈ (0, 1]. System (1) has at most two steady states: S 1 * 1 = (0, 0) and S 1 * 2 = (n * 1 , σn * 1 ), where: For the model to be biologically realistic, assume γ 1 , γ 2 > τ 1 , τ 2 . Under these conditions the positive stationary state always exists. Furthermore the total tumour volume at the positive steady state is: so that it lies within Ω. The last inequality may be shown by substituting z = τ 1 /τ 2 in the quadratic equation for σ above and noting that the value of RHS is then negative, so that σ > τ 1 /τ 2 . The Jacobian matrix of System (1) is: Recall that the conditions for local asymptotic stability are tr J < 0 and det J > 0. At S 1 * 1 these are equivalent to:

PIOTR BAJGER, MARIUSZ BODZIOCH AND URSZULA FORYŚ
For this to ever be true, it would have to be that: This, however, is equivalent to: which is a contradiction. Hence the steady state S 1 * 1 is always unstable. At S 1 * 2 , the Jacobian matrix is: so that clearly tr J < 0 and, furthermore, Therefore the steady state S 1 * 2 is always locally asymptotically stable. What is more, simple application of Bendixson-Dulac theorem: shows that no periodic solutions are possible. As due to the limited carrying capacity the solutions may not escape to infinity, we may conclude that the steady state S 1 * 2 is in fact globally asymptotically stable. Phase portrait for System (1) in the phase plane (n 1 , n 2 ) is illustrated in Fig. 1a.
Although not biologically realistic, let us now consider the case when γ 2 < τ 2 for the purpose of mathematical completeness. The positive steady state exists (and is globally asymptotically stable) provided (γ 2 − τ 2 )σ + τ 1 > 0, whereas the zero steady state is stable if and only if It can be shown that conditions (γ 2 − τ 2 )σ + τ 1 < 0 and u > γ 1 − γ2τ1 γ2−τ2 are in fact equivalent. This means that: γ2−τ2 the positive stationary state exists (and is globally stable) and the zero stationary state is unstable, and γ2−τ2 the positive stationary state does not exist and the zero stationary state is stable.
A phase plane with sample trajectories is shown in Fig. 1b.

2.2.
Parameters. For the model to be biologically realistic, we require γ 1 , γ 2 > τ 1 , τ 2 . We follow the usual assumptions that the mutation to resistance is faster than a mutation to sensitivity, i.e. τ 1 > τ 2 . The nominal parameter values used in plots and simulations are listed in Table 1. The values of the γ 1 , γ 2 , τ 1 , τ 2 parameters were chosen based on the typical orders of magnitude of proliferation and mutation rates used in mathematical models. Most importantly γ 1 > γ 2 [12] and γ 1 , γ 2 τ 1 , τ 2 [13], i.e. resistant cells proliferate more slowly than the sensitive ones and proliferation rates greatly exceed mutation rates. Non-dimensional therapy duration of 13.5 corresponds to 30 days.  Step used in finite differences gradient calculations.
3. Optimal control problem. In this section the optimal control problem of drug scheduling is defined. As noted in Section 1, we aim, aside from penalising tumour volume, to penalise drug-resistant phenotype. The non-standard objective functional J takes the following form: (2) and the problem becomes to minimise J over all measurable functions u : Here ω 1 , ω 2 , η 1 , η 2 , ξ and θ are non-negative weights. In order to penalise the resistant population even further, we chose the weights so that ω 2 > ω 1 and η 2 > η 1 -see Table 1. The role of (0 < 1) and the function G : R → (0, 1) is to penalise the drug-resistant phenotype. The function G is chosen to be a smooth function which is close to 1 whenever there are more resistant cells than sensitive cells and close to 0 if vice-versa. This way the penalty for resistance is applied whenever the tumour starts exhibiting a resistant phenotype. The parameter controls how close the sensitive population needs to become to the resistant one for the penalty to be applied. Formally, we require G to have the following properties: • G(0) = 0.5 and G (0) = 0.5.
Defining G on the whole of real line allowed to use standard Heaviside smoothings such as 1 2 (1 + tanh(z)), as shown in Figure 2. Such a choice of G is in this case particularly appealing as it allows for simplifications in numerical computation of quantities G(z)/G (z) and G (z)/G (z) featured in the equations to follow.
For convenience we introduce the following concise notation: We furthermore define the Hamiltonian of System (1) to be: In order to solve the optimal control problem, we will utilise the Pontryagin Minimum Principle [14]: i.e.
with terminal condition p(T ) = (ω 1 , ω 2 ), such that the Hamiltonian H is minimised a.e. on [0, T ] by u * along the optimal trajectory with a constant minimum value c, The Hamiltonian-minimising property of the optimal control and trajectory motivates the definition of the switching function φ: As the Hamiltonian is linear in u we obtain the following result: if u * (t) is an optimal control, then If φ vanishes identically over an interval, the optimal control may take intermediate values between 0 and 1. We will show in the subsequent section that there exist order-one singular controls u sing which are locally optimal and will find an explicit formula for u sing in terms of the state variables.

Results.
4.1. Singular control. We will investigate the existence of singular controls by employing a standard procedure for control-affine systems, i.e. we compute the derivatives of the switching function with respect to the control u. These are given in general by [16]: The following relationships may be easily established and will be useful in the subsequent analysis: In order to obtain analytically traceable results regarding the optimal control, we will assume for the purpose of this study that the side effects of the drug are not penalised in the objective functional, i.e. that θ = 0. We have found that the resistance penalty alone limits the amount of drug administrated in a very natural way, hence, given the benefits of easier mathematical traceability, we feel that this simplification is justified.
In the Proposition below we show that there exist locally optimal singular controls.
Proposition 1. Singular controls u sing are of order 1 and, as functions of the state variables, are given by u sing Legendre-Clebsch optimality condition is satisfied provided that n 1 < n 2 , while the singular arc satisfies the equation for some constant c ∈ R.
Proof. If a control is singular on an interval I ⊂ [0, T ], then φ ≡φ ≡φ ≡ ... ≡ 0 identically on I. Furthermore, by the Pontryagin Minimum Principle, it must be that H ≡ c = const. In particular, the following three relations must hold for almost all t in I: c = H, 0 = φ =φ =φ (10) for some constant c ∈ R. Equations (8) and (9) may be obtained by directly solving the above system for p 1 , p 2 and u in terms of n 1 and n 2 .
If the control is singular over an interval I ⊂ [0, T ] we immediately get, using Equation (5), that Using this and (7) we get fromφ = 0 that that is: at least away from n 2 = τ 1 /γ 2 . Substituting the above into Equation (3) for H and equating the resulting expression to 0 yields (9), the required equation for the singular arc.
Similarly a substitution of p 1 , p 2 intoφ = 0 results in (8), the equation for the singular control as a feedback function of the state variables. The Legendre-Clebsch condition may be obtained quite elegantly: from Equations (6) and (7) it follows that ∂ ∂u Using the properties of G, we see that the Legendre-Clebsch condition (−1) 1 ∂ ∂u ∂ 2 ∂t 2 φ > 0 is satisfied if and only if n 1 > n 2 .
If the ratio τ2 γ2 is small, then the sets Ω c 3,4 are very close to the n 1 + n 2 = 1 line. In our case this ratio is approximately 0.1, which means that the sets Ω c 3,4 are such that the tumour is at over 90% of its carrying capacity. As such a situation would correspond to patient's death, we aim to design the objective functional so that the optimal trajectory never enter these regions. Therefore for any practical parameters in the objective functional we may restrict our attention to sets Ω c 1,2 . Proposition 3. Any extremal control u * ends with a full-dose interval. Proof.

Numerical solution.
For the purpose of the problem, we extended a method formulated byŚmieja et al. [19]. In their paper,Śmieja et al. introduced an iterative method for computing optimal switching times for bang-bang control problems. In each iteration, a gradient of the objective functional with respect to the switching times was computed.
This method is not directly applicable to our problem, as there exist locally optimal singular controls, as shown in Proposition 1. Unlike in a bang-bang case, a perturbation of one of the switching times will have a secondary non-local impact on any subsequent singular intervals in the control. Alteration of any switching time prior to a singular interval will affect the values of the state variables at the time of switch to the singular control. Therefore the trajectory will travel along a different singular arc with different values of the singular control.
The method used to compute the optimal control is therefore as follows. Suppose that the structure of a control is fixed. By "structure" of the optimal control we understand a finite sequence with elements from 1, 0, s corresponding respectively to an application of full dose, no dose, and singular dose. For example by 01s1 we mean a control which starts with no chemotherapy, then applies full dose, a singular dose and full dose again. Note that the structure of the optimal control does not bear any information about the switching times.
Once the structure of the optimal control is fixed, the objective functional, with a slight abuse of notation, may be thought of as a function of the switching times, i.e.: J = J(t 1 , t 2 , ..., t n−1 ), with 0 = t 0 < t 1 < t 2 < ... < t n = T.
This allows us to compute an approximate gradient of the objective functional with respect to the switching times using finite differences: for some positive constant ∆ 1. Note that for this gradient to be well-defined, in practice we require that t i − t i+1 < ∆ for all i. We may then employ a gradient method when, starting with switching times t (0) = (t (0) 1 , ..., t (0) n−1 ) and k = 0 and a constant learning rate α, we update the switching times iteratively as follows: Note that the update in step 2. above may result in the switching times being invalid (i.e. t i+1 − t i becomes smaller than ∆ for some i). If this happens, the corresponding interval of length less than ∆ is removed from the control structure and the process is continued with one switching time less.
The method described above is suitable for finding local, rather than global minima. In order to ensure optimality, the above gradient method was run with different starting control structures (every allowed combination up to 4 switches) and several random initial switching times for each control structure. The optimal trajectory was found to be 1s1 with switching times Optimal control and the corresponding controlled trajectory found using the above method is shown in Figures 4a and 4b.
To verify that the control found using the gradient method satisfies the necessary conditions given by the Pontryagin Minimum Principle, the Hamiltonian H was evaluated along the controlled trajectory and Equations (4) were solved backwards in time using the terminal condition. The Hamiltonian H and the switching function φ were then computed numerically. It was verified that the Hamiltonian is indeed constant (up to approximately 10 −6 = O(∆), i.e. the numerical method order) and that the switching function φ (seen in Figure 4d) has appropriate sign over the three intervals.

6.
Discussion. An optimal control problem of chemotherapy scheduling is proposed in this study. The underlying mathematical model subdivides the tumour population into two compartments (sensitive and resistant) and includes a flow between compartments due to genetic mutations. A non-standard objective functional including a penalty for resistant phenotype is introduced. It is shown that such penalty naturally gives rise to singular controls which maintain the sensitive phenotype via small-dose chemotherapy administration. Section 2.1 provides a full classification of the steady states. We have shown that for constant drug doses there exists a globally attracting positive steady state. The model, by construction, does not allow for total tumour eradication due to a totally resistant subpopulation. The model therefore reproduces a situation when a cancer is incurable using chemotherapy, as it is often the case in practice [15]. This motivates the definition of an objective functional which, instead of simply penalising tumour volume, includes an explicit penalty for drug resistance. The focus of therapy is therefore shifted towards maintenance rather than total eradication of the tumour.
Despite a very general resistance penalty function G, abut which only minimal assumptions were made, some analytical results about the structure of optimal control are available. In particular we showed that there exist singular controls which satisfy Legendre-Clebsch condition and thus locally optimal. The singular arc and the singular control may also be calculated explicitly in terms of the state variables (see Equations (9) and (8)). We identified four regions Ω c 1−4 ⊂ Ω and classified them according to a possibility of locally optimal 01 and 10 switches occurring in each of them. This result is summarised in Proposition 2. Finally, we showed that the optimal control has to end with a full dose interval.
Due to the equations featured in the above calculations being increasingly complicated and not easily interpreted, we referred to a numerical method to find the optimal control. As shown in Figure 4b, the optimal control is of the form 1s1. The singular interval in the middle -during which the control is applied at about 10% of the MTD dose -is crucial in preserving the sensitive phenotype. As seen in Figures 4a and 4c, the singular control maintains the number of sensitive cells just above the number of resistant cells. The singular interval stems directly from the resistance penalty function G being present in the objective functional. These results support our hypothesis that inclusion of explicit resistance penalty in the objective functional leads to the low-dose metronomic-type protocols being optimal.
The problem presented in this study is preliminary in the sense that the modified objective functional is coupled with a relatively simple tumour dynamics. This has a benefit of analytical traceability which is often absent in more complicated models, thus forcing the modellers to rely solely on numerical results. Our next aim is to investigate the same objective functional, but with more complex underlying dynamics. Our ultimate goal is to provide further insights into the mechanisms of action of metronomic therapy -e.g. inhibition of blood vessels and immune system boost. In order to achieve that, more complicated mathematical models will be necessary. Such models could include angiogenic signalling (as done originally by Hahnfeldt et al. [8] and investigated by us in [1] in context of chemoresistance) or include the response of the immune system and will be a subject of our further investigation.