Minimizing almost smooth control variation in nonlinear optimal control problems

In this paper, we consider an optimal control problem in which the control is almost smooth and the state and control are subject to terminal state constraints and continuous state and control inequality constraints. By introducing an extra set of differential equations for this almost smooth control, we transform this constrained optimal control problem into an equivalent problem involving both control function and system parameter vector as decision variables. Then, by the control parametrization technique and a time scaling transformation, the equivalent problem is approximated by a sequence of constrained optimal parameter selection problems, each of which is a finite dimensional optimization problem. For each of these constrained optimal parameter selection problems, a novel exact penalty function method is constructed by appending penalized constraint violations to the cost function. This gives rise to a sequence of unconstrained optimal parameter selection problems; and each of which can be solved by existing optimization algorithms or software packages. Finally, a practical container crane operation problem is solved, showing the effectiveness and applicability of the proposed approach.

(Communicated by Cedric Yiu) Abstract. In this paper, we consider an optimal control problem in which the control is almost smooth and the state and control are subject to terminal state constraints and continuous state and control inequality constraints. By introducing an extra set of differential equations for this almost smooth control, we transform this constrained optimal control problem into an equivalent problem involving both control function and system parameter vector as decision variables. Then, by the control parametrization technique and a time scaling transformation, the equivalent problem is approximated by a sequence of constrained optimal parameter selection problems, each of which is a finite dimensional optimization problem. For each of these constrained optimal parameter selection problems, a novel exact penalty function method is constructed by appending penalized constraint violations to the cost function. This gives rise to a sequence of unconstrained optimal parameter selection problems; and each of which can be solved by existing optimization algorithms or software packages. Finally, a practical container crane operation problem is solved, showing the effectiveness and applicability of the proposed approach.
1. Introduction. Optimal control theory has many successful applications in engineering, physical and medical sciences, such as soft landing for moon lander [15], robotics [4], zinc sulphate electrolyte purification process [9], hybrid control system [6], switched system [22] and fund management [3]. In practice, the optimal control problems are often subject to various constraints, which include boundedness constraints on the control variables, terminal state constraints, and continuous state and control inequality constraints. There are many computational methods available in the literature for solving various classes of optimal control problems. They include control parametrization method [19], direct transcription method [16], discretization method [5]. For a large number of the optimal control problems, the control functions are assumed to be essentially bounded measurable functions or L-2 functions. There is no smoothness condition imposed. In many practical optimal control problems arising in real-world scenarios, a discontinuous control signal is allowed. However, in some engineering applications, discontinuous control signals should be avoided because abrupt changes in the control signal could entail infinite jerk at the switching points, and excite a large bandwidth of vibration modes causing structural vibration. Thus, for these applications, continuous control is much preferred.
In [20], an optimal control problem with almost smooth controls is considered. This optimal control problem is subject to boundedness constraints on the control variables, terminal state constraints, and continuous state and control inequality constraints. An additional differential equation with a new control appearing in its right hand side is introduced, where the original control becomes a new state and its initial condition becomes a system parameter to be chosen optimally. Thus, an equivalent optimal control problem involving both control function and system parameter, which is referred to as a combined optimal control and optimal parameter selection problem, is obtained, where the (new) control function is not required to be almost smooth. Clearly, the original control function will be continuous and piecewise linear if the new control function is approximated by a piecewise constant function. The control parameterization method [19] is used to solve this equivalent combined optimal control and optimal parameter selection problem, where the control function is approximated by a piecewise constant function and its heights are considered as decision variables. For the continuous state and control inequality constraints, they are handled by the constraint transcription technique.
By applying the control parameterization method to an optimal control problem, the time horizon is partitioned into a number of subintervals, and the control is approximated by a piecewise constant function consistent with this partition. The discontinuities of this piecewise constant function can only occur at the partition points, also called the switching times. The heights of this piecewise constant function are regarded as decision variables. In this way, the optimal control problem is approximated by an optimal parameter selection problem, which is a finite dimensional optimization problem. Clearly, a better solution would be obtained if the switching times of this piecewise constant function are also considered as decision variables. However, some drawbacks will be encountered during the numerical computation (see [19] for details). The time scaling transformation was first introduced for time optimal control problems in [11] and generalized to more general optimal control problems in [2,12,13]. It is introduced to circumvent the difficulties caused by regarding switching times as decision variables. It works by mapping the variable switching times to fixed points in a new time horizon, thus yielding a new optimization problem in which the switching times are fixed. The transformed problem can then be solved more readily by existing gradient-based optimization algorithms.
In this paper, the time scaling transformation will be applied to supplement the control parameterization technique to solve the optimal control problem with almost smooth controls. The exact penalty function method [24,25], rather than the constraint transcription technique, will be used to handle the continuous state and control inequality constraints. A gradient-based computational method will be developed to solve the optimal control problem under consideration.
The remainder of the paper is organized as follows. Following this introduction, we first introduce an extra set of differential equations for the almost smooth control, then the original problem in Section 2 can be written equivalently as a combined optimal control and optimal parameter selection problem in Section 3, where the initial conditions for the extra set of differential equations are regarded as system parameter vector. Then, in Section 4, after the application of the control parameterization technique together with a time scaling transformation, the problem is approximated by a sequence of constrained optimal parameter selection problems, i.e. optimal control problems with piecewise constant controls and fixed switching times subject to terminal state constraints and continuous state and control inequality constraints. To solve this new problem, we propose, in Section 5, an exact penalty function to construct a corresponding unconstrained optimal parameter selection problem, which can be solved by standard gradient-based optimization methods, such as sequential quadratic programming method. The existing optimal control software packages, such as VISUAL MISER [8], can be used. In Section 6, the method proposed is applied to solve a nontrivial optimal control problem. Finally, some conclusions are given in Section 7.
2. Problem formulation. Consider a process described by the following system differential equations:ẋ where x(t) ∈ R n is the state vector at time t, u(t) ∈ R r is the control vector at time t, x 0 ∈ R n is the initial state, T > 0 is a given terminal time, and f : R×R n ×R r → R n is a given continuously differentiable function. Define where α i , i = 1, . . . , r, and β i , i = 1, . . . , r, are given real numbers. Note that U is a compact and convex subset of R r .
where c i , i = 1, . . . , r, and d i , i = 1, . . . , r, are given real numbers, then u is called an admissible control. Let U be the class of all such admissible controls. Furthermore, let U • be a subset of the set U defined by Note that the derivativeu of u is, in fact, only defined almost everywhere in [0, T ]. However, we may assign appropriate values for the functionu at those points in [0, T ] at which the functionu is not defined so that the extended function satisfies the condition (3). Throughout this paper, the functionu is to be understood as its extended counterpart. For each u ∈ U, let x(·|u) be the corresponding solution of system (1).
We assume that f : R n × R r → R n in system (1) satisfies the following linear growth condition: there exists a constant M > 0, such that for all (t, x, u) ∈ [0, T ] × R n × U, where · denotes a norm of R n . This assumption is standard in control theory (see, for example [1]). Condition (2) and (5) ensure that there exists a unique solution of system (1) corresponding to each u ∈ U (see Theorem 3.3.3 of [1]). Now, we formally state our optimal control problem as follows.
Problem (P 1 ). Given the dynamic system (1), choose a feasible control u ∈ U such that the cost function is minimized subject to the following terminal state constraints and the continuous inequality constraints where the functions Φ 0 : . . , N S are assumed to be continuously differentiable with respect to each of their arguments. Note that control bounds can be easily incorporated into (8) and the admissible control is required to be almost smooth. Thus, it is allowed to appear in the continuous inequality constraints (8). This is a slight generalization of that considered in Chapter 8 in [20]. Define Controls in F are called feasible controls, and F is called the class of feasible controls. We assume that F is not empty. For convenience, let Ω • (respectively, F • ) be the interiors of Ω (respectively, F), i.e., Ω • (respectively, F • ) are all those u such that (9) (respectively, (10)) are satisfied as strict inequalities.
To proceed further, we assume that the following condition which was first introduced in [18] is satisfied.
Assumption 2.1. For any u ∈ F, there exists aū ∈ Ω • such that This condition has been widely used in the literature.

Model transformation.
In this section, we introduce an extra set of differential equations for the control u:u with the initial conditions: In view of (11a), we see that u is now a state function rather than a control function, and is determined by the new control function v and the initial vector ξ. For convenience, let ξ be referred to as the system parameter vector. Clearly, for a given system parameter vector ξ, if v is approximated by a piecewise constant function, then u will be a piecewise linear function.
Then, by adding (11) to (1), we have the following state differential equations: Considering the conditions of u in (2)-(3), the following constraints must be satisfied. and For convenience, let Z and V be, respectively, the sets of ξ and v such that constraints (13) and (15) are satisfied. Then, in this section, v is called an admissible control, and V is called the set of admissible controls. Furthermore, ξ is called a system parameter vector, and Z is called the set of system parameter vectors. Let the trajectoryx(·|ξ, v) be the solution of the system (12) corresponding to (ξ, v) ∈ Z × V. We write the constraints (14) as: . . , r. Also, the terminal state constraints (7) and the continuous inequality constraints (8) can be written as: The combined optimal control and optimal parameter selection problem under consideration may now be stated formally as follows.
Problem (P 2 ). Given the dynamic system (12), choose a feasible combined parameter vector and control (ξ, v) ∈F such that the cost functioñ is minimized overF, whereΦ where Φ 0 (x(T |u)) and L 0 (t, x(t|u), u(t)) are given in (6) which is the cost function of Problem (P 1 ).
We note that Problem (P 1 ) is equivalent to Problem (P 2 ). Even so, we present this trivial result in the following as a theorem for future reference.
and vice versa.

Control parametrization and time scaling transformation.
In this section, we will introduce a solution scheme for solving Problem (P 2 ). Our method involves two stages. The first stage is the control parametrization technique introduced in [19]. By partitioning the time horizon into a finite subintervals, and approximating the control function by a piecewise constant function consistent with the partition, Problem (P 2 ) is transformed into an approximate optimization problem (P 3 ), where both the heights and the partition points, known as switching points, are regarded as decision variables. The second stage is the application of the time scaling transformation introduced in [13]. The essential idea is to transform the approximate problem (P 3 ) with variable switching times into a new problem with fixed switching times in a new time horizon.

Control parametrization.
We approximate the control signal v by piecewise constant basis functions as follows: where p ≥ 1 is a given number of the control subintervals, and is the characteristic function defined by τ ∈ T is a switching time vector satisfying and σ ∈ Q is a vector consisting of the heights of the approximate piecewise constant function. It satisfies Note that v p (·|τ, σ) switches value at times t = τ k , k = 1, . . . , p − 1. Furthermore, we note that (19) . Considering the condition on v in (15), we obtain the bound constraints on σ as follows: Let Q p be the set of all those σ ∈ Q which satisfy the constraints (20).
The constraints appearing in (16) and (17) becomẽ Define Then, we can approximate Problem (P 2 ) by the finite-dimensional optimization problem (P 3 ) as follows.
The following convergence properties of this piecewise constant approximation scheme are based on the arguments in [14]. In this paper, we concern more on the practical computation of a control policy, rather than the convergence results.
Theorem 4.1. Let v p, * be an optimal control of Problem (P 3 ), and v * be an optimal control of Problem (P 2 ). Then, Then,v is an optimal control of Problem (P 2 ), and Let (τ * , δ * ) ∈ Ξ be an optimal solution of Problem (P 3 ), then, we can define the following piecewise constant control for Problem (P 2 ): where τ * 0 = 0 and τ * p = T . Based on the convergence properties stated in Theorem 4.1 and Theorem 4.2, we see that v p, * minimizes (18) over the space of feasible piecewise constant controls. That is, a suboptimal control for Problem (P 2 ) can be obtained by solving Problem (P 3 ) with the switching times being considered as fixed. Clearly, using the switching times as decision variables will yield better solutions, as the search space is larger. However, the variable switching times may cause difficulties during numerical computation in the execution of any gradient-based optimization algorithm (see [14] for details). In the next subsection, we will use the novel time scaling transformation technique [13] for converting Problem (P 3 ) into an equivalent optimization problem with fixed switching times in a new time horizon. Thus, existing gradient-based optimization procedures can be applied to solve it.

Time scaling transformation.
To apply the time scaling transformation, we first introduce a new time variable s, then construct a transformation from t ∈ [0, T ] to s ∈ [0, p] which maps the variable switching time points, 0 = τ 0 , τ 1 , τ 2 , . . . , τ p−1 , τ p = T , into the pre-fixed switching time points, s = 0, 1, 2, · · · , p − 1, p, in a new time scale. Define It is easy to see that (27) defines µ(·|θ) as a non-negative, non-decreasing and continuous piecewise linear function of s. For each θ ∈ Θ, definẽ Clearly, we have This means that the time scaling transformation defined by (27) maps s = k to the kth switching time t = τ k . Then,τ (θ) ∈ T is a valid switching time vector for Problem (P 3 ). Using the time substitution t = µ(s|θ), where µ(s|θ) is the time scaling function defined by (27), we obtain the following new state vector: For notational simplicity, let µ(s) = µ(s|θ). Then, applying (27) to the dynamic system (21), we havė x p (0) =x 0 (ξ), (28b) where the overhead dot now represents differentiation with respect to s,x 0 (ξ) and f are defined as in Section 3, and χ [k−1,k) : R → R is the characteristic function defined in Subsection 4.1.
Applying the time scaling transformation (27), the constraints appearing in (23) and (24) and Similarly, the cost function becomeŝ Then, Problem (P 3 ) can be transformed into Problem (P 4 ) as follows.
The proof is completed.

It implies that equation (35) is valid.
Equations (36) and (37) can be proved similarly. This completes the proof.
The next result shows that a solution of Problem (P 4 ) can be used to generate a solution of Problem (P 3 ).
The proof of the reverse implication is similar.
From Theorems 4.4 and Theorem 4.5, the equivalence between Problem (P 3 ) and Problem (P 4 ) is readily seen. Corollary 1. Problem (P 3 ) and Problem (P 4 ) are equivalent. Remark 1. Given an optimal control problem (P 2 ), we first apply the piecewise constant functions to obtain an approximate problem (P 3 ) with variable switching times. The next step is to use the time scaling transformation t = µ(s|θ), where µ(·|θ) is the time scaling function, to transform Problem (P 3 ) into an equivalent problem (P 4 ) with fixed switching times. Then, Problem (P 4 ) is solved by using any existing gradient-based computational method to obtain a solution (θ * , δ * ).
In fact, if (θ * , δ * ) is a solution of Problem (P 4 ), then the optimal switching times for Problem (P 3 ) are: Then, the corresponding suboptimal control for Problem (P 2 ) is: 5. An exact penalty method for Problem (P 4 ). Note that Problem (P 3 ) can be solved indirectly through solving Problem (P 4 ), which is an optimization problem with fixed switching times. The solution of Problem (P 4 ) is used to generate a solution of Problem (P 3 ). It is then used to construct a corresponding suboptimal solution of Problem (P 2 ). However, Problem (P 4 ) involves continuous state inequality constraints, which are required to be satisfied at every point in the time horizon (there are infinite number of time points). Thus, frequently used optimization algorithms are not directly applicable. In this section, we will introduce an exact penalty function method proposed in [24] for solving Problem (P 4 ). The main idea is to approximate Problem (P 4 ) by an unconstrained optimization problem by appending the constraint violation to the cost function to form an augmented cost function. This approximated unconstrained optimization problem is solvable by frequently used optimization algorithms, such as sequential quadratic programming method [10]. Let V be the set of all θ ∈ R p such that 0 ≤ θ k ≤ T, k = 1, . . . , p. Clearly, Θ ⊂ V. By adopting the idea introduced in [24], we define the following exact penalty function: where > 0 is a new decision variable, α > 0 and β > 0 are fixed constants, ρ > 0 is the penalty parameter, and the constraint violation ∆(θ, δ, ) is constructed as follows: The term −α ∆(θ, δ, ) in (38) is constructed to penalize constraint violations, and the term ρ β is to penalize large values of . The smaller the value of yields the larger value of coefficient −α in the term −α ∆(θ, δ, ). This will cause constraint violations to be penalized severely. Therefore, minimizing the penalty function for large values of ρ will lead to feasible points of Problem (P 4 ). Now, we consider the following penalty problem.
From the results above, we can conclude that Problem (P 5 ) is a good approximation of Problem (P 4 ) when ρ is large. That is, for a sufficiently large ρ, a local optimal solution of Problem (P 5 ) is a local optimal solution of Problem (P 4 ). Such a solution can then be used to construct a corresponding local solution of Problem (P 3 ). Since Problem (P 5 ) only involves bound constraints, it is much easier to solve.
Remark 2. Problem (P 5 ) is a standard optimal control problem with fixed switching times and can be readily solved by gradient-based optimization methods such as sequential quadratic programming method or existing optimal control software packages such as the optimal control software VISUAL MISER [8]. However, many of these approaches are designed to find local solutions rather than global solutions. The quality of the local solutions depends on the initial guess used to start the optimization procedure. Thus, to obtain global or near-global solutions, these approaches should be used with several different initial guesses. Hence, further works on solving Problem (P 5 ) may consider to combine our method with a global optimization technique, such as the filled function method [21] or simulated annealing approach [23].
6. Numerical simulations. In this section, we present some experiences with a nontrival test example. The computational approach described in the previous sections have been implemented by the optimal control software VISUAL MISER [8].
the control constraints Note that the control functions u 1 (t) and u 2 (t) are required to be continuous, thus, we introduce the following new states for the original dynamic system: with the initial conditions This problem is solved by using the optimal control software VISUAL MISER [8] with the planning time horizon [0, 1] being divided into p = 10 variable subintervals, and we obtain a cost of 5.47E-03. For comparison, we also divide the planning time horizon into q = 10 equally-spaced subintervals, and the optimal cost obtained is 5.61E-03. This is expected, as the new approach provides an added flexibility to optimize the control switching times. Figures 1-6 show the states and controls of the optimal control problem. It is clear from these figures that all the constraints are satisfied.

7.
Conclusion. In the literature on optimal control, the control is usually assumed to be essentially bounded measurable. However, for many real practical problems, abrupt changes in the control signal are not allowed because they will induce an infinite jerk at the switching points, and hence exciting a large bandwidth of vibration modes causing structural vibration. To overcome this problem, as in [20], a class of optimal control problems subject to continuous state and control inequality constraints is considered in this paper, where the control is required to be almost smooth. This class of optimal control problems is certainly an important addition to the optimal control literature. This paper then goes on to develop a much improved computational method for solving this class of optimal control problems. Detailed contributions can be summarized as follows: 1. A class of optimal control problems subject to continuous state and control inequality constraints is considered in this paper, where the control is required to be almost smooth.
2. A new differential equation is introduced for each almost smooth control variable, where the control variable is treated as a new state with a new control being introduced. Then this new control can be taken as an essentially bounded measurable function, and hence it can be approximated by piecewise constant function.   The initial condition of these new differential equations are also decision variables to be determined optimally.
3. For the continuous state and control inequality constraints, they are dealt with the exact penalty function method. Then, they are appended to the cost function forming a new cost function, and hence a new class of optimal control problems without the continuous state and control inequality constraints is obtained. This approach is better than the constraint transcription method used in [20], as the   local optimal solution obtained by the exact penalty function approach for the transformed problem is a local solution of the original constrained optimal control problem. This property is not shared by the constraint transcription approach used in [20]. 4. For these new optimal control problems, the control parameterization approach is used to approximate the new control functions which is further supplemented by the time scaling transformation. Thus, a sequence of approximate finite   dimensional dynamic optimization problems is obtained. The gradient formulas of the new cost functions are derived and therefore, a gradient based optimization technique can be used to solved each of these finite dimensional approximate dynamic optimization problems. The computational method is supported by convergence analysis. We wish to remark that the time scaling transformation is not used in [8], and consequently the number of the partition points used by the control parametrization method is required to be very large. In this paper, due to the use  of the time scaling transformation, the number of the partition points is much less and yet the accuracy is higher. This can be clearly seen from the numerical results obtained in [8] and in Section 5 of this paper. 5. A nontrivial numerical example involving a container crane is solved using the approach developed in this paper. The results obtained look convincing.