A new computational strategy for optimal control problem with a cost on changing control

In this paper, we consider a class of optimal control problems where the cost function is the 
sum of the terminal cost, the integral cost and the full variation of control. Here, the full variation 
of a control is defined as the sum of the total variations of its components. By using the control parameterization 
technique in conjunction with the time scaling transformation, we develop a new computational algorithm for 
solving this type of optimal control problem. Rigorous convergence analysis is provided for the new method. 
For illustration, we solve two numerical examples to show the effectiveness of the proposed method.


1.
Introduction. In recent years, several algorithms for solving optimal control problems with state and/or control constraints have been proposed. These include the nonsmooth Newton's method [9,10], the constraint transcription method [22,35], the exact penalty function method [14,38,39,41], the lossless convexification method [1,2], and a new global optimization approach based on interval analysis [43]. For these optimal control problems, the cost function is expressed in terms of a terminal cost and an integral cost. There is no cost associated with changing control. However, there is often some cost associated with a change of the control action. Consider a situation, where there are two different control laws that achieve the same cost value. However, one control takes a constant value, while the other fluctuates wildly. For the standard optimal control theory, it could not distinguish the differences between these two control laws, as no cost is associated with the changing of the control signal. In real world applications, it is obvious that the constant control law is much preferred. The cost on the changing of the control signal must be taken into consideration.
In [5], necessary optimality conditions are derived for an optimal control problem in which the control signal can assume two possible values and there is a cost associated with changing from one value to another. Subsequently, a computational algorithm is developed in [29] for solving the problem considered in [5] numerically. Further optimality conditions are derived in [24] for solving similar (but more general) problems in which the full variation of the control signal is incorporated as a penalty in the cost function. State constraints are not considered in these papers. This is obviously a serious limitation, as most optimal control problems arising in real-world applications involve state constraints (see, for example, the spacecraft control problems in [2]; the underwater vehicle problems in [7]; and the zinc sulphate purification problems in [35]. In [32], a computational method based on the control parameterization technique is developed for solving a general class of optimal control problems in which the cost function includes the full variation of the control signal and the state variables are subject to inequality path constraints. However, one disadvantage of this method is that it approximates the control by a piecewise-constant function with fixed switching times. Due to the development of time scaling transformation in [6,16,18,19], the control parametrization technique can be used in conjunction with the time scaling transformation to produce much improved computational methods for solving optimal control problems in which both the control values and the control switching times are regarded as decision variables to be chosen optimally-see, for example, [17,21,22,38]. These new computational methods are known to produce results of much higher accuracy when compared with methods based purely on the control parameterization technique.
The full variation of a control is defined as the sum of total variations of its components. It gives a measure for the cost of changing control action. The proposed computational method involves two stages of approximation. At the first stage, the control parametrization technique together with a time scaling transform is used to approximate the problem into a sequence of approximate problems. It is interesting to note that each of these approximate problems is an optimal parameter selection problem in which the term involving the full variation of the control becomes a simple non-smooth function in the form of L 1 -norm. The second stage of the approximation is to use a smoothing technique to smooth the non-smooth function. The smoothed problem is a standard optimal parameter selection problem. Rigorous convergence analysis is given wherever necessary to support the method. The efficiency and applicability of the proposed method is demonstrated by solving two numerical examples. Here, we wish to note that the exact penalty function approach [39,41], instead of constraint transcription method [31], is used in [20] to develop a computational method for solving the class of optimal control problems considered in [32].
In [40], the control parameterization technique is used to develop a computational method for solving the class of optimal control problems, where the cost function is the sum of the terminal cost, the integral cost and the full variation of the control. As it is a conference paper with page limitation, the proofs for all theorems are omitted and no numerical example is included for the purpose of illustration. In this paper, the time scaling transform will be used to supplement the control parameterization technique such that a much improved computational method is developed for solving the same class of optimal control problems considered in [32,40]. Furthermore, the proofs are given for all the theorems and a numerical example is included so as to illustrate the effectiveness of the method proposed. In some sense, this paper can be considered as a complete and much improved version of the one considered in [40].
2. Problem formulation. Consider a process described by the following system of differential equations defined on the fixed time interval (0, T ]: where x = [x 1 , ..., x n ] ∈ R n , u = [u 1 , ..., u r ] ∈ R r , are, respectively, state and control vectors; The initial condition for the differential equation (1) is: where a i , i = 1, ..., r, and b i , i = 1, ..., r, are given real numbers. Note that U is a compact and convex subset of R r . The full variation of a function u = [u 1 , ..., u r ] : [0, T ] → R r is defined as: For each u ∈ V, let x(· | u) be an absolutely continuous function defined on [0, T ] which satisfies the system of differential equations (1) almost everywhere in (0, T ] and the initial condition (2). This function is called the solution of the system (1) -(2) corresponding to the control u ∈ V.
The inequality terminal state constraints and inequality continuous state constraints are specified as follows: where Φ i , i = 1, ..., N T , are given real-valued functions defined on R n , and where h i , i = 1, ..., N S , are given real-valued functions defined on [0, T ] × R n . Let G be the class of all those control functions in V such that the constraints (5) and (6) are satisfied.
We may now state our optimal control problem as follows: Problem (Q). Given the system (1) -(2), find a control u ∈ G such that the cost function is minimized over G, where c is a positive weighting factor, and Φ 0 , L 0 are given real-valued functions.

YUJING WANG, CHANGJUN YU AND KOK LAY TEO
The following conditions are assumed throughout: Define From (7), we have If u, v ∈ V are such that it is clear that q 0 (u) = q 0 (v). (11) However, it is not necessarily true that be the discontinuity points of the i−th component v i of the control v in (0, T ). Let From Theorem 2.9.3 of [31], we note that A i contains a countable number of points, and so is We now construct a function u = [u 1 , ..., u r ] from v as follows: Clearly, u ∈ V and is such that (ii).
Let D be the set which contains all those functions v ∈ V such that v(t) = u(t) a.e. in [0, T ]. We call u a minimal bounded variation function of D. Clearly, Let U be the set which consists of all such minimal bounded variation functions of V. Elements of U are called admissible controls and U is called the class of admissible controls.
Remark 1. Note that our aim in Problem (Q) is to minimize the cost function (7). However, by (14)- (17), it suffices to confine our attention to the set U rather than the set V in the minimization process.
Let F be a set which consists of all those elements in U such that (5) and (6) are satisfied.
Remark 2. Problem (Q) with G replaced by F will be referred to as Problem (P ). Clearly, an optimal control to Problem (P ) is also an optimal control to Problem (Q).
We need the following lemma, which is Lemma 10.2.1 of [31]. then 3. Control Parametrization and Time Scaling Transform. We shall apply the control parameterization time scaling transform to Problem (P ). For each p ≥ 1, let the planning horizon [0, T ] be partitioned into n p subintervals with n p + 1 partition points denoted by where τ p 0 = 0 and τ p np = T , while τ p i , i = 1, ..., n p − 1, are decision variables subject to the following conditions: The number n p of the partition points is chosen such that n p+1 > n p . Let U p consist of all those elements from U which are piecewise constant functions in the form given by where σ p,k ∈ U and χ I denotes the indicator function of I defined by Let σ p = [(σ p,1 ) , ..., (σ p,np ) ] , where (σ p,k ) = [σ p,k 1 , ..., σ p,k r ], k = 1, ..., n p . Restricting to U p , the control constraints (3) become: We construct a transformation from t ∈ [0, T ] to a new time scale s ∈ [0, 1] mapping the knots: τ p 0 , τ p 1 , τ p 2 , ..., τ p np−1 , τ p np , into the pre-fixed knots:

YUJING WANG, CHANGJUN YU AND KOK LAY TEO
Clearly, these pre-fixed knots are such that The required transformation from t ∈ [0, T ] to s ∈ [0, 1] is defined by the following differential equation: with the initial condition t(0) = 0 (28) and the terminal condition where θ p k ≥ 0, k = 1, ..., n p , are decision variables, and (23). Let Θ p be the set consisting of all those θ p = [θ p 1 , ..., θ p np ] such that θ p k ≥ 0, k = 1, ..., n p . Furthermore, let V p be the set consisting of all those non-negative piecewise constant functions v p defined by (30) with θ p ∈ Θ p . Clearly, each θ p ∈ Θ p defines uniquely a v p ∈ V p , and vice versa.
From (27)-(28), we have Thus, from (29), it follows that Define Clearly, the piecewise constant control ω p can be written as: where (σ p,k ) = [σ p,k 1 , ..., σ p,k r ] , σ p,k ∈ U , k = 1, .., n p , and ξ p i , i = 0, 1, ..., n p , are fixed knots chosen as specified by (25). Here, σ p is the same as that defined for u p given in (22). Let Ξ p be the set consisting of all such σ p . Furthermore, let U p be the set containing all those piecewise constant controls given by (34) with σ p ∈ Ξ p . Clearly, each (σ p , θ p ) ∈ Ξ p × Θ p defines uniquely a (ω p , v p ) ∈ U p × V p , and vice versa.
Let B p be the corresponding subset of U p × V p uniquely defined by elements from B p . Clearly, for each (ω p , ν p ) ∈ B p , there exists a unique (σ p , θ p ) ∈ B p such that The full variation term appearing in the cost function (7) takes the following simple form: The following is the approximating optimal parameter selection problem to the optimal control problem (P ). Problem (P (p)). Subject to system (35)-(36), find a combined vector (σ p , θ p ) ∈ B p such that the cost function where c > 0 is a weighting factor, is minimized over B p , where L 0 is obtained from L 0 in the same way as f is obtained from f according to (39) and (40).
where |·| denotes the usual Euclidean norm of R rnp × R np . Then, Proof. (i) follows from the fact that the set Ξ p × D p is compact. The proofs of (ii)-(iv) follow from arguments similar to those given in the relevant parts of the proofs of Lemmas 6.4.3 and 6.4.4 of [31].

4.
Smoothing the Cost Function. To solve Problem (P ) via the control parametrization technique used in conjunction with the time scaling transform, we need only to solve a sequence of finite dimensional optimization problems (P (p)). However, the last term in the cost function of Problem (P (p)) is non-differentiable. The aim of this section is to use the idea of [30] to "smooth" the last term of the cost function. More precisely, we define Clearly, Furthermore, We now consider the following approximate problem. Problem (P µ (p)). Problem (P (p)) with the cost function (46) replaced by Then, following similar arguments as given in the relevant parts of the proof of Lemma 6.4.4 of [31], it is easy to show that Theorem 4.1. Let (σ p, * , θ p, * ) and (σ p,µ, * , θ p,µ, * ) be, respectively, optimal solutions to Problem (P (p)) and Problem (P µ (p)). Then, Proof. See the proof of Theorem 10.4.1 of [31].

Constraints Approximation.
For each i = 1, ..., N S , the corresponding inequality continuous state constraint in (42) is equivalent to However, constructed from h i , as given below: and ε > 0 is an adjustable constant with small value. We now define a related approximate problem in the following: Problem (P µ,ε,γ (p)) : Problem (P µ (p)) with (42) replaced by where To continue, we assume that the following condition is satisfied.

A Computational Algorithm.
To solve the constrained optimal control problem (P ), we construct a sequence of approximate problems (P µ,ε,γ (p)) in µ, ε and γ. For a given positive integer p > 0 and a real positive number µ > 0, and for each ε > 0 and γ > 0, let (σ p,µ, * ε,γ , θ p,µ, * ε,γ ) be the solution of Problem (P µ,ε,γ (p)). The following algorithm can be used to generate a sequence of combined vectors in the feasible region of Problem (P (p)). Algorithm 6.1.

Remark 9. In
Step 1 of Algorithm 6.1, we are required to solve the problem (P µ,ε,γ (p)). It can be solved as a nonlinear optimization problem. The details are given in the next section.
7. Solving Problem (P µ,ε,γ (p)). Problem (P µ,ε,γ (p)) can be viewed as the following nonlinear optimization problem, which is again referred to as Problem (P µ,ε,γ (p)). Minimize: where G µ 0 is defined by (12.4.4), subject to where Φ i , i = 1, ..., N T , and G i,ε , i = 1, ..., N S , are, respectively, defined by (41) and (56), Υ is defined by while (62)-(64) are the constraints which specify the sets Ξ p and Θ p given in Section 3. The constraints (63) and (64) are known as the boundedness constraints in nonlinear optimization programming. Let Λ p be the set which consists of all those (σ p , θ p ) ∈ R rnp × R np such that the boundedness constraints (63)-(64) are satisfied. This nonlinear mathematical programming problem in the combined vectors can be solved by using any nonlinear optimization technique, such as the sequential quadratic programming (SQP) approximation routine with the active set strategy.
To use the SQP routine to solve the nonlinear optimization problem (58)-(64), we choose an initial control parameter vector (σ p , θ p ) (0) ∈ Λ p to start up the SQP routine. The SQP will use, for each (σ p , θ p ) (i) ∈ Λ p , the values of the cost function (58) and the constraints (59)-(64) as well as their respective gradients to generate the next iterate (σ p , θ p ) (i+1) . Consequently, it gives rise to a sequence of combined parameter vectors. The optimal combined parameter vector obtained by the SQP routine is regarded as an optimal combined parameter vector of Problem (P (p)).
Step 2. Calculate the values of the cost function (51) and each of the constraint functions (55) as follows: In view of Section 5.2.1 of [31], the gradient formulas of the cost function G µ 0 (σ p , θ p ), the terminal state inequality constraint functions Φ i (x(1 | σ p , θ p )), i = 0, 1, ..., N T , and the state inequality constraint functions G i,ε (σ p , θ p ), i = 1, ...N S , can be computed using the following algorithm: Data. For a given (σ p , θ p ) ∈ B p .
Step 1. Solve the co-state systems corresponding to the cost function (51), the terminal constraints (41), and the inequality state constraints (55), respectively, as follows. (i).: For the cost function (51): Solve the following co-state system of differential equations: with the boundary condition backward in time from s = 1 to s = 0, where x(· | σ p , θ p ) is the solution of the system (35)-(36) corresponding to (σ p , θ p ) ∈ Λ p ; and the Hamiltonian function H 0 is defined by Let λ 0 (· | σ p , θ p ) be the solution of the co-state system (70)-(71).
(ii).: For the i−th terminal constraint given in (41): Solve the following co-state system of differential equations: with the boundary condition backward in time from s = 1 to s = 0, where x(· | σ p , θ p ) is the solution of the system (35)-(36) corresponding to (σ p , θ p ) ∈ Λ p ; and H i , the corresponding Hamiltonian function for the i−th terminal constraint function, is defined by Let λ i (· | σ p , θ p ) be the solution of the co-state system (73)-(74).
(iii).: For the ith inequality state constraint given in (55): Solve the following co-state system of differential equations: with the boundary condition λ i,ε (T ) = 0 (77) backward in time from s = 1 to s = 0, where x(· | σ p , θ p ) is the solution of the system (35)-(36) corresponding to (σ p , θ p ) ∈ B p ; and the Hamiltonian function H i,ε is defined by Let λ i,ε (· | σ p , θ p ) be the solution of the co-state system (76)-(77) .
Proof. See the proof of Theorem 10.5.1 of [31].
This means that by substituting the optimal combined parameter vector (σ p,µ, * , θ p,µ, * ) of Problem (P µ (p)) into the cost function of Problem (P (p)), the corresponding cost value can be made as close as possible to the true optimal cost value of Problem (P (p)) by reducing the value of the parameter µ.
8. Some Convergence Results. In this section, we shall discuss some convergence properties of the sequence of approximate optimal controls. To be more precise, for each p = 1, 2, ..., let (σ p, * , θ p, * ) be an optimal combined vector to the finite dimensional optimization problem P (p); and furthermore, let {u p, * } ∞ p=1 be the corresponding sequence of approximate optimal piecewise constant controls to Problem (P ) obtained as explained in Remark 6. Clearly, g 0 (u p+1, * ) ≤ g 0 (u p, * ), for all p ≥ 1.
We now recall that the class F of admissible controls is defined in Section 2. Let F 0 be the interior of the set F in the sense that the constraints (5) and (6) are satisfied as strict inequalities. Furthermore, we assume that the following condition is satisfied: (A7.) Let u * ∈ F be an optimal control of Problem (P ). Then, for any ε > 0, there exists a control u ∈ F 0 such that Lemma 8.1. Let u∈ U and let u p be constructed from u according to Then, Proof. Since {u p, * } is in U, it is equibounded. Furthermore, by virtue of Lemma 8.3, it is also of equibounded variation. Thus, the conclusion of the lemma follows readily from Theorem 2.9.4 of [31] and Remark 1. Proof. From the second part of Lemma 8.4 and Remark 13, we have Next, we shall show that u is a feasible point of the constraints of Problem (P ). Suppose it is not the case. Then, there exists an i ∈ {1, . . . , N T } such that or there exists a j ∈ {1, . . . , N S } and a non-zero interval I ⊂ [0, T ] such that However, and From Lemma 8.1, we note that for each t ∈ [0, T ]. Thus, by Assumption (A2), we have From (98) and (101), we see that (96) is false. Let us now falsify the validity of (97). For this, it follows from Assumption (A3) and Lemma 8.1 that is uniformly bounded on [0, T ] × X, where X is as defined in Remark 14. Thus, by (100) and Assumption (A3), we have lim ε→0 h j (t, x(t | u p(l), * )) − h j (t, x(t | u)) = 0 for each t ∈ [0, T ]. On this basis, it follows from Lebesgue dominated convergence theorem (see Theorem 2.64 of [31]) that Therefore, if (97) is valid, then Thus, by (99) and (103), This is a contradiction. Thus, u is a feasible solution for the constraints of Problem (P ). Let u * be an optimal control of Problem (P ). Then, by (A7), there exists, for any ε 1 > 0, aû ∈ F 0 such that Sinceû ∈ F 0 , there exists an ε 2 > 0 such that Φ i (x(T |û)) < ε 2 , i = 1, ..., N T , and for all t ∈ [0, T ], h i (t, x(t |û)) < ε 2 , i = 1, ..., N S .
Letû p be the control defined fromû according to (87). Then, by virtue of Lemma 8.1, Remark 14, (A2) and (A3), there exists a p 0 such that u p ∈ F, for all p > p 0 .
Since ε 1 > 0 is arbitrary and u * is an optimal control,ū is also an optimal control of Problem (P ). This completes the proof. 9. Numerical Example. For illustration, we use the proposed method to solve two numerical examples. The first example is similar to the one considered in [24], a simple linear optimal control problem where a cost is introduced for the variation of the control. The second example is taken from [13] , a fishery harvesting model, where a small penalty on the variation is introduced to regularize the computation. 9.1. Example 1: Linear Optimal Control Problem [24]. Minimize: with and where Graphs of the two control functions for 3 sets of values of the penalty parameters c 1 and c 2 are presented in Figures 1 and 2. This linear problem has an interesting property of a sudden change from the bang-bang solution when there is no penalty on control changes to a solution with no variation when the penalty is large. Figures  1 and 2 illustrate the two controls as the penalty parameters move through the transition phase. In Table 9.1.1, we present the value of the integral part of the cost function to show the large variation as the penalty parameters c 1 and c 2 change. Any optimization software is expected to have difficulty with this problem at transition stage, as large changes in control values produce almost no change in the cost value.   [13]. Without going into the detail of the modeling aspect, we shall summarize the problem after appropriate scaling as follows: Minimize: with initial condition: x(0) = 0.45 (116) and constraints: (118) Here, the control function u(t) corresponds to effort expended on catching the fish and x(t) represents the fish population. In [13], this model was computed with c = 0, and we note that the controls do not vary "smoothly". Also there is a small aberration in both state and control at the end of the time horizon. This is expected    as the problem is insensitive to changes in control close to t = 1. We present two cases, one where the control is bounded (by u max = 1.0), and a second where the control (effort) is essentially made unbounded by setting u max = 5. Each of Figures 3 to 10 shows three graphs, one with no variation term (c = 0) and the others with penalty term c = 10 −3 and 10 −2 (i.e. by increasing penalty but still maintaining only a small contribution to the overall cost function. Note how        The graphs of the state variables in Figures 5 and 9 show that the small aberration close to t = 1 can be eliminated by making the penalty parameter large enough. (But not so large as to change the value of integral part of the cost function). Of course, the effect of the control on the state is problem dependent.
Remark 15. In view of the above two numerical examples, we observe that the proposed technique has the following practical application: Consider an optimal control problem in which the cost function does not involve the full variation of control. Then, we may append a term involving the full variation of control to the cost function. By adjusting the penalty constant appropriately, we will obtain a smoother control without changing the optimum value of the original cost function appreciably. This application is similar to the technique of regularization for improperly posed problems. 10. Conclusion. In this paper, we proposed a new computational scheme for a class of optimal control problems where the cost function is expressed in terms of terminal cost, integral cost and the full variation of the control. This algorithm was used to solve two examples. From the solutions obtained, it is observed that the full variation of the control can serve the purpose as a regulation process of the control action.