OPTIMAL CONTROL APPLIED TO A GENERALIZED MICHAELIS-MENTEN MODEL OF CML THERAPY

. We generalize a previously-studied model for chronic myeloid leukemia (CML) [13, 10] by incorporating a diﬀerential equation which has a Michaelis-Menten model as the steady-state solution to the dynamics. We use this more general non-steady-state formulation to represent the eﬀects of var- ious therapies on patients with CML and apply optimal control to compute regimens with the best outcomes. The advantage of using this more general diﬀerential equation formulation is to reduce nonlinearities in the model, which enables an analysis of the optimal control problem using Lie-algebraic compu- tations. We show both the theoretical analysis for the problem and give graphs that represent numerically-computed optimal combination regimens for treat- ing the disease.


1.
Introduction. Mathematical modeling is widely used to characterize the relationships between a drug's dose, concentration, and effect. The relationship between dose and concentration is called pharmacokinetics (PK) while the relationship between concentration and effect is called pharmacodynamics (PD). Together they are known in the biopharmaceutical literature as PKPD, which is a major component of the quantitative work in drug development known as pharmacometrics (Pmx) [2].
There are many models for PK and PD in the literature that illustrate what happens after the drug enters the body of a patient. In mathematical modeling of a disease, models are selected depending on the level and time scale desired. In this paper, we take a closer look at a traditional approach to PD, through a Michaelis-Menten expression, named after the two authors who formulated it [12]. The expression involves two main parameters which are the maximum effect of the drug denoted by E max and the concentration at which 50% of the effect is achieved which is denoted by EC 50 . In this model, effects can build up quickly as concentration rises, but then saturate with a maximum effect limit of E max . The formula, although fairly simple, introduces nonlinearities when incorporated in the general model for the treatment of disease. In this paper, we propose an alternative approach in which the Michaelis-Menten formula in the dynamics of the disease is replaced with a bilinear differential equation that has the Michaelis-Menten expression as its steady-state solution [5]. In this sense, the approach is more general. At the expense of enlarging the system we eliminate certain nonlinearities. The general extended model is formulated as an optimal control problem with a carefully selected objective and the problem is analyzed with the tools of optimal control. Besides the application of the Maximum Principle, Lie bracket calculations are presented, indicating optimality of singular controls.
In the second part of this paper the results are applied to a mathematical model we studied previously [13,10] that describes the treatment of a hypothetical patient who has chronic myeloid leukemia (CML). CML is a hematologic cancer in which myeloid cells undergo uncontrolled expansion [8]. A chromosomal translocation leads to continuous BCR-ABL1 kinase activation which causes the excessive proliferation of the cells [6]. The therapies in our model include a BCR-ABL1 inhibitor, an immunomodulatory therapy, and a third agent with effects similar to those of the other two combined. Our model includes quiescent and proliferating leukemic cells, as well as an immune response component and represents a minimal framework needed to describe all of the therapeutic effects considered. The model is formulated as an optimal control problem with the goal of optimizing the combined therapies, taking into account both the disease burden and exposure to the administered therapies. The optimal control problem is fit into the framework discussed earlier and two approaches to this problem are discussed theoretically and compared numerically. We show that if the dynamics of the PD model are fast enough, then the simulations lead to very similar results, both graphically and quantitatively. However, since the extended model benefits from a richer theoretical analysis, the proposed novel approach to the dynamics has clear advantages for the overall understanding of these models and for the ultimate goal of obtaining reliable solutions.

2.
A generalization of Michaelis-Menten-based models for pharmacodynamics. A commonly-used model for PD is based on Michaelis-Menten kinetics and referred to as the E max model in pharmacology. It describes the effect a drug has at concentration C in the form with the parameter E max denoting the maximum possible effect at saturation and EC 50 the drug concentration at which half of this effect is realized. If instead of the concentration C a term C γ with γ > 0 is used, the model is referred to as a Hill function with exponent γ. For γ > 1 this model has sigmoidal behavior (minimal or no effect at low concentrations and saturation at high concentrations) while the Michaelis-Menten form (γ = 1) adequately describes saturation aspects in the upper part of the effect-concentration relation which is the range at which drugs typically are administered. The Michaelis-Menten model was originally used to describe enzyme kinetics and it represents the steady-state (dynamic equilibrium) of a reversible chemical reaction between an enzyme E and a substrate S, E + S ES, where E and S combine to form the complex ES at rate k f and the complex ES disintegrates at rate k r [12]. Mathematically, the concentrations [E] and [ES] of such a reaction are modeled by the linear differential equations The steady-state solution to this equation, defines the standard Michaelis-Menten form.
On the other hand, we can use the differential equation (3) to model the transient behavior as well [5]. From the optimal control point of view such a formulation offers distinct advantages in that the control [S] appears linearly. If [S] t represents the concentration of a drug at time t and [ES] t the effect the drug has (pharmacodynamic model), then it is more accurate to replace the steady-state terms in the above equations by differential equations which also include the transient behavior, for example with the coefficients a, b and c related to the pharmacometric data, e.g., a = k r , b = k f and c = k f [E] 0 . The steady state of the solution y of equation (5) is given by y * = cu a+bu so that E max = c b and EC 50 = a b . Since a drug may act on various targets with different effectiveness, it is advantageous to separate out the multiplicative coefficient E max in this model and use the equatioṅ which gives the steady-state solution of u EC50+u . In this case, the extra dynamics only depend on the EC 50 parameter, and the additional coefficient ξ determines how fast the steady state is reached. If one wants to approximate the standard E max model, then one can simply choose the parameter ξ value high so that the steady state is reached very quickly. On the other hand, if information is available about the detailed processes, then the differential equation model is more informative.
This model extension retains all of the essential features of the pharmacological E max model. From an optimal control point of view, it has the advantage of control-affine dynamics, which allow us to bring in the full framework of geometric methods into the analysis of these problems. This is done at the expense of adding an additional state variable for each of the drugs, but does not change the underlying modeling assumptions. It merely disentangles the nonlinearities in the control structure that would be brought in by using Michaelis-Menten terms.
3. Formulation of optimal control problems. In this section, we consider a general optimal control problem with the above extended formulation for a Michaelis-Menten model incorporated. We take the objective functional in the form where x = (x 1 , . . . , x n ) ∈ R n is a column vector representing the states, u = (u 1 , . . . , u m ) ∈ R m is a column vector representing the controls (assumed to be drug amounts or concentrations) and α, β, and γ are row vectors of the appropriate dimensions with entries that represent weights. Here the entries of γ are all positive, as we assume that there should always be a penalty for using some amount of drug, due to the risk of adverse events. The entries of α and β may be positive, negative, or zero, depending on whether they are weights for "good guys" or "bad guys" in the model. The terminal time τ is fixed. The objective represents a weighted average of the effects of treatment, driven by the values of the states at the endpoint and during therapy, and effects that are assumed to correlate with the total exposure of drugs given. The optimal control problem then is to minimize J over all Lebesguemeasurable (respectively, piecewise-constant) functions u i , i = 1, 2, . . . , m, that take values in compact intervals [0, u max i ] subject to general dynamics describing the time evolution of the system. It is assumed that the dynamics do not explicitly depend on the controls u i , but that these only enter through variables in the form of where a i is a positive constant that represents the concentration for which the i th drug realizes half of its maximum effect. Let y = (y 1 , . . . , y m ) ∈ R m be a column vector, and let z = (x , y ) be the row vector (x 1 , . . . , x n , y 1 , . . . , y m ). We can represent the dynamics in the general forṁ This is a rather general mathematical structure that can be specified further by providing specific forms for the function f (z). As was discussed in the previous section, the Michaelis-Menten type ratios y i represent the steady-state solutions to differential equations. Here we consider two parallel approaches to the optimal control problem which we will label [OC1] and [OC2], respectively. In problem [OC1], we minimize J(u) over all controls u subject to the dynamics (9) (which can be thought of as describing the effect of m controls on the state x using the classical Michaelis-Menten type model). In model [OC2], we replace the functional forms y i (u i ) by the solutions of the following differential equationsẏ with b i a free positive parameter that determines the speed of the reaction. Since the Michaelis-Menten ratios in the dynamics (9) are the steady-state solutions of equations (10), problem [OC2] can be viewed as a natural generalization or extension of problem [OC1].
For the record, we state the elementary observation below.
Lemma 3.1. For each i, as long as the control u i is zero over an interval [0, τ ], y i ≡ 0 on [0, τ ] in equation (8). Also, y i is a monotonically increasing function of u i , and y i is bounded above by 1.
The dynamics for problem [OC2] take the forṁ with drift vector field F and control vector fields G i , i = 1, . . . , m. If we denote the standard unit vectors in R m by e i , we can represent the structure of these vector fields in the following block format with the first group representingẋ and the second oneẏ: An important observation which significantly simplifies theoretical computations is that the control vector fields commute, i.e., their Lie brackets vanish identically: We recall that the Lie bracket of two differentiable vector fields X and Y is given where DY and DX are the Jacobian matrices of Y and X, respectively.
4. Structure of optimal controls for problem [OC2]. We henceforth make the following assumption about the vector field f : i.e., the dynamics is linear in each y i . We are interested in models that represent drug actions under combination treatment. If it is assumed that these drugs act independently, then in a natural way expressions of the form ui ai+ui · uj aj +uj for i = j arise. However, quadratic expressions between equal indices do not come up since the full pharmacodynamic model is subsumed in the Michaelis-Menten expression. Thus this holds for the models we are interested in and it simplifies the formulas derived below.
The first-order necessary conditions for optimality are given by the Pontraygin maximum principle [15]. (For some more recent references on optimal control, see [3,4,18]). Since the problem is over a fixed-therapy horizon and there are no terminal constraints, extremals are normal and we therefore have already normalized the multiplier in the objective to 1. Furthermore, according to the separation of the state z into x and y, we can also write the multiplier in the form Λ = (λ, µ) with λ an n-dimensional and µ and m-dimensional row vector. Thus the Hamiltonian function H for problem [OC2] takes the form where λ, f represents the inner product of the vectors λ and f . If u * = (u * 1 , . . . , u * m ) is an optimal control with corresponding trajectory z * = (x * , y * ), then there exist solutions λ : [0, T ] → (R n ) * and µ : [0, T ] → (R m ) * to the adjoint equationṡ 336 URSZULA LEDZEWICZ AND HELEN MOORĖ for i = 1, . . . , m, such that the controls u * i (t) minimize the Hamiltonian over the intervals [0, u max i ] along (x * (t), y * (t)) and (λ(t), µ(t)). Furthermore, the value of the Hamiltonian H along the optimal solution is constant.
4.1. The switching functions and their derivatives. Since the Hamiltonian H is affine in the controls and the control set is a product of intervals, the minimization of H over the control set splits into m separate one-dimensional problems that are easily solved. Defining the switching functions as the coefficients of the controls in H, we have and it follows that while any control u * i satisfies the minimization condition if Φ i (t) = 0. In particular, since µ i (τ ) = 0 for all i, it follows that controls end with an interval where u * i (t) ≡ 0. If Φ i becomes zero at a timet, but its time derivativeΦ i (t) does not equal zero, then the control switches between the values 0 and u max i according to the sign ofΦ: is positive. This gives the switching function its name. In the other extreme, if the switching function Φ i is identically zero over a nonempty open interval I, then we say that the control u i is singular over I. In such a case, all derivatives of the switching function will vanish over the interval I, and differentiating the switching function until the control explicitly appears for the first time in a derivative, generally allows us to compute such controls. The following result, which follows from a direct calculation, efficiently summarizes the computations of these derivatives: Proposition 1. Let V be a smooth vector field in the variable z and define the function Ψ as the inner product of the multiplier Λ with the vector field V , where z(t) is a solution of the differential equations (11) corresponding to the control u and Λ is a solution of the corresponding adjoint equation, Then the derivativeΨ(t) is given bẏ where [X, Y ] denotes the Lie bracket of the vector fields X and Y .
Continuing to write Λ = (λ, µ) for the multiplier, we thus have the following formulas for the first and second derivatives of the switching functions: Proposition 2. The first derivative of the switching function Φ i is given bẏ Proof. In this case, Φ i = γ i +Ψ where Ψ is of the form in Proposition 1 with V = G i . The row-vector (β, 0) is orthogonal to G i and thus (β, 0), V (z) = 0. Furthermore, all control vector fields commute and thus [G j , G i ] vanishes for all indices i and j.
Proposition 3. The second derivative of the switching function Φ i is given bÿ Proof Here f yi = ∂f ∂yi is the n-dimensional column vector consisting of the partial derivatives of the components of the vector field f with respect to y i . Finally, f yiyj stands for the n-dimensional column vector whose entries are the second partial derivatives and, for i = j, We remark that it is equation (21) that allows us to make the relevant computations without having to use the special form that the vector field f has. Also note that, since the control vector fields G i and G j commute, it follows from the Jacobi identity that [G j , [F, . Such a relation does not hold in general. Proof. Denote by E ij the m × m matrix that has a single entry 1 in the (i, j)-th position and all other entries equal to 0. Then, splitting up the state z into its components x and y, we have that Similarly, Recall that by our assumption none of the equations contains quadratic terms in the y i , i.e., that ∂ 2 f ∂y 2 i ≡ 0. Thus it follows that Comparing this relation with equation (20), it follows that which verifies equation (21). Analogously, for i = j, we get that ∂yi∂yj (x, y) 0 verifying the formulas. If an optimal control u * i is singular over an open interval I, then it satisfies a second-order necessary condition for optimality, the so-called Legendre-Clebsch condition, namely If this expression is negative, we say the strengthened Legendre-Clebsch condition is satisfied. This result makes singular controls prime candidates for optimality and is a strong indication that singular controls will be part of optimal concatenation sequences (together with the constant controls 0 and u max ).
Proof. Denote the combined multiplier (λ, µ) by Λ and suppose u * i is singular on I. Then the switching function and its first two derivatives vanish identically on I. Since γ i and b i are positive, it follows from (21) gives us that This verifies the result.
Another direct, but more cumbersome calculation verifies that where This formula is messy, but once combined with the other brackets it allows us to compute the singular controls fromΦ i ≡ 0. As function of states and multipliers we have that It is possible that more than one component of the controls is singular over an interval I. In this case, if the controls u i and u j are singular on I, it is an additional necessary condition for optimality, the so-called Goh condition, that also In our case, since the control vector fields commute, this is automatically satisfied. Thus, in principle, any combination of the controls could be given by singular controls. If all the controls are singular, we say the controls are totally singular. In this case, depending on m, all the multipliers may be determined by the fact that the switching function and their derivatives vanish and the equationsΦ i (t) = 0 allow us to compute these controls explicitly. For example, if m = 3 and all three controls are singular, then the equations Φ i (t) ≡ 0 determine the multipliers µ i completely and the equations for the derivatives of the switching function (namely, the equationsΦ i (t) ≡ 0) determine the remaining multipliers λ 1 , λ 2 and λ 3 . Thus, a totally singular flow is uniquely determined. Naturally, such computations do not guarantee that the control bounds will be satisfied and this always needs to be verified a posteriori.

5.
Application to a mathematical model for combination therapy of CML.
In this section we illustrate the differences in the solutions for problems [OC1] and [OC2] with some numerically computed locally optimal controls for a model for combination treatment of CML (chronic myeloid leukemia).

5.1.
Formulation of the original model. We briefly recall a mathematical model for combination treatment of CML from [10,13]. The states of the system consist of concentrations of quiescent leukemic cells Q, proliferating leukemic cells P and effector cells E (representing immune effect). Three different types of therapeutic agents are considered and their concentrations are denoted by u 1 , u 2 , and u 3 . The control u 1 represents a targeted BCR-ABL1 inhibitor (such as imatinib) that mainly affects the proliferating leukemic cells; u 2 is an agent (such as dasatinib) that inhibits BCR-ABL1 and also has immune effects; u 3 represents an immunomodulatory agent (such as the anti-PD-1 therapy nivolumab). The effects of the specific drugs are summarized in Figure 1 (taken from [13]), with arrows indicating amplification of effects and vertical bars indicating inhibition. Representing the effects of the drugs using Michaelis-Menten terms results in the following equations for both the disease-immune dynamics and the treatment effects: In these equations, all parameters are positive. The cell concentrations for Q are relatively small and are therefore modeled by an exponential function with growth coefficient r Q . For the proliferating cells P , we model growth with a Gompertz model, as Afenya and Calderón found this to be best for describing CML growth [1]. The effector T cells E also have a rate of increase modeled by a Gompertz function, so as to have approximately exponential growth when numbers are very small, but still be bounded by a maximum threshold. In the populations P and E, replication rate constants are represented by r P and s E ; P ss and E ss represent the respective steady states or carrying capacities. The natural death rate constants of the respective populations are denoted by δ Q , δ P , and δ E . The population Q consists of quiescent leukemic cells, some or all of which may be stem cells [14]. When quiescent cells divide, one copy is assumed to be the same type as the original cell while the second copy may differentiate further into a faster-cycling proliferating type. For this reason, the transition term k P Q is not subtracted from the quiescent cell population in (29). This term represents the rate at which quiescent cells, Q, produce differentiated proliferating cancer cells, P . The controls u i of the mathematical model which might typically represent concentrations of the respective drugs are identified with drug dose levels in this work. The "C 50 " parameters U 1C 50 , U 2C 50 , and U 3C 50 represent the dose levels required to achieve half of the maximal effects of u 1 , u 2 , and u 3 , respectively. These values, along with EC 50 and P C 50 , are assumed to be fixed across the effects being modeled. The maximum possible effect size is allowed to depend on the setting. Q = quiescent leukemic cells P = proliferating leukemic cells E = immune effect P E Q u 1 = general BCR-ABL1 inhibitor u 2 = BCR-ABL1 inhib.+immuno-onc. u 3 = immuno-oncology therapy Figure 1. Diagram of the dynamical system. The shaded circular regions represent the "populations" or "states" included in the model. Solid arrows extending from or to the populations represent changes in numbers, with inward-pointing arrows signifying increases in numbers and outward-pointing arrows decreases in numbers. Dashed arrows indicate indirect effects leading to increases or decreases. Colored bars represent inhibition of a production or of an indirect effect, due to the represented treatment. Colored arrows represent amplification of a rate or of an indirect effect.
The cell populations (or states), dose levels (or controls), and system parameters are described in Table 1. The values listed are used in the computations for this paper, but do not reflect specific model fits or therapies.

5.2.
Extended dynamics of the model. The dynamics (29)-(31) fall into the framework of dynamics (9) with x = (Q, P, E) and u = (u 1 , u 2 , u 3 ) . Introducing the additional state vector y = (y 1 , y 2 , y 3 ) as a variable, the extended dynamics take the following form: − δ E 1 + (1 − U 2 max,5 y 2 ) (1 − U 3 max,3 · y 3 ) P max,2 P P C 50 + P E, level of u1 that gives half the maximum effect mg 300 U 1max,1 maximum possible effect of u1 0.8 on new P from Q and growth of P U 1max,2 maximum effect of u1 on death of P 10 u2 dose level of a BCR-ABL1 inhibitor that also has mg immunomodulatory effects (e.g., dasatinib) u max 2 maximum dose level of u2 mg 140 U 2C50 dose level of u2 that gives half the maximum effect mg 40 U 2max,1 maximum effect of u2 on death of Q or P 2 U 2max,2 maximum effect of u2 on new P from Q and growth of P 0.6 U 2max,3 maximum effect of u2 on death of P 10 U 2max,4 maximum effect of u2 on stimulating proliferation of E 10 U 2max,5 maximum effect of u2 on prevention of the death of E 0.4 u3 dose level of an immunomodulatory agent mg (e.g., nivolumab) u max 3 maximum dose level of u3 mg 240 U 3C50 dose level of u3 that gives half the maximum effect mg 80 U 3max,1 maximum effect of u3 on death of Q or P 1 U 3max,2 maximum effect of u3 on stimulating proliferation of E 1 U 3max,3 maximum effect of u3 on prevention of the death of E 0.7 Table 1. Cell populations, dose levels, and parameters used in calculations.
The initial conditions for the x-variables are denoted by Q 0 , P 0 and E 0 , respectively, and the initial conditions for the effects all are zero, y i (0) = 0 for i = 1, 2, 3. The controls u i are concentrations and thus are nonnegative. The one degree of freedom which controls the speed of the reactions is the b i , all other parameters are given by the general framework introduced in previous sections.

Comparison of Solutions for the optimal control problems [OC1] and
[OC2]. We consider the problem of drug administration from an optimization point of view ( [19]). The main goal of the treatment is to reduce or eliminate the disease burden, in this case represented by BCR-ABL1 transcript levels. Since the therapies used to achieve this reduction cause side effects, we also want to keep side effects for each drug low enough to be tolerable. We assume that exposure to the drugs, represented by area under the dose level curve (AUC), correlates with side effects. So to minimize side effects, we should minimize AUC for each drug with appropriate relative weights according to the severity of the side effects associated with each one. In order to reduce disease burden while simultaneously keeping side effects tolerable, we therefore minimize a weighted sum of these terms as it was formulated in Section 3, i.e., with α 1 , α 2 , β 1 , β 2 , γ 1 , γ 2 , γ 3 constant positive weights. The treatment period [0, τ ] is fixed here. The first two terms represent the disease burden at the end of the treatment period. Generally we choose the weights so that α 1 Q is higher than α 2 P , since the Q population can repopulate the P population, and so is more important to minimize. The first two terms inside the integral ensure that the leukemic cell populations don't get too large during the treatment period. The last three terms inside the integral take into account toxicity from the various treatments, with the coefficients γ i ("toxicity penalty weights") determining the relative importance of minimizing exposure for each drug.
To obtain meaningful results, the weights need to be calibrated with respect to the initial conditions and treatment period. If they are made too large, then the optimal solution (given by minimizing J) would be to give no drugs. If they are made too small, then the optimal solution would be to use full doses throughout the treatment interval. A good choice of weights is especially important over extended therapy intervals. 5.4. Numerical results. Figure 2 gives a comparison of numerically-computed approximations to locally optimal controls for a one-year treatment period with the following weights in the objectives: These solutions were computed numerically using the software PROPT in which the dynamics are collocated at intermediate points and the optimal control problem is transformed into a finite-dimensional nonlinear program which then is solved using SNOPT. For each of the runs shown, the optimality conditions for this nonlinear optimization problem are satisfied, but the computed solutions are only approximations to the optimal solutions. The top row in Fig. 2 shows the solution to problem formulation [OC1] and the middle and bottom rows show solutions for the formulation [OC2] with fast (middle) and slow (bottom) y-dynamics. Visually the first two solutions are virtually identical, but in the solution for [OC2] there is a small discontinuity when the control u 2 changes from maximum dose to the singular control. The same is true for the third run: here also the control switches from the singular control to no dose at the end, although the graphs appear to be continuous. The software simply graphs the controls continuously. Similarly, the numerical software does not recognize the jump to zero in the controls at the end of the therapy if the y-dynamics is fast since it is happening only for a very brief instant in time. It is clearly visible in the controls u 2 and u 3 in the bottom row when the dynamics are slowed down. Intuitively, because the dynamics account for the transient behavior as well, optimal controls are stopped before time τ once their main effectiveness would only lie after the end of the optimization period. But clearly the numerical software returns adequate approximations to what are the optimal structures and it is clear, for example, that the optimal concatenation structure for control u 2 is of the form maximum dose followed by a singular control and ending with a rest period. Based on the theoretical computations of singular controls, numerical computations based on arc parameterizations could be implemented that then compute the optimal control more precisely in a 3-dimensional optimization procedure.
6. Conclusion. In this paper, we introduced a novel approach to modeling pharmacodynamics of drugs with a generalized Michaelis-Menten model, which also takes into account the transient phase. Although the new formulation has an enlarged state space, it also has the advantage of being affine in the controls which enables the use of Lie theoretic computations. We then confirmed the theoretical analysis indicating the optimality of singular controls using numerical simulations that compare solutions for both the original and extended models. If the dynamics of the extended model are relatively fast, these solutions are both qualitatively and quantitatively very similar. The extended model approach proposed here allowed us to use a theoretical basis to support the numerical computations. This is particularly important for complex optimal control problems that pose computational challenges.