A GRADIENT ALGORITHM FOR OPTIMAL CONTROL PROBLEMS WITH MODEL-REALITY DIFFERENCES

In this paper, we propose a computational approach to solve a model-based optimal control problem. Our aim is to obtain the optimal solution of the nonlinear optimal control problem. Since the structures of both problems are different, only solving the model-based optimal control problem will not give the optimal solution of the nonlinear optimal control problem. In our approach, the adjusted parameters are added into the model used so as the differences between the real plant and the model can be measured. On this basis, an expanded optimal control problem is introduced, where system optimization and parameter estimation are integrated interactively. The Hamiltonian function, which adjoins the cost function, the state equation and the additional constraints, is defined. By applying the calculus of variation, a set of the necessary optimality conditions, which defines modified model-based optimal control problem, parameter estimation problem and computation of modifiers, is then derived. To obtain the optimal solution, the modified modelbased optimal control problem is converted in a nonlinear programming problem through the canonical formulation, where the gradient formulation can be made. During the iterative procedure, the control sequences are generated as the admissible control law of the model used, together with the corresponding state sequences. Consequently, the optimal solution is updated repeatedly by the adjusted parameters. At the end of iteration, the converged solution approaches to the correct optimal solution of the original optimal control problem in spite of model-reality differences. For illustration, two examples are studied and the results show the efficiency of the approach proposed. 2010 Mathematics Subject Classification. Primary: 93C05, 93C10; Secondary: 93B40.

1. Introduction.Linear quadratic regulator (LQR) problem is a standard optimal control problem, where the cost functional is in quadratic criterion and the state dynamics is in a linear form.Solving this optimal control problem is simple and the corresponding optimal solution is guaranteed [2], [5], [15].Further from this, the applications of the LQR problem have been widely explored; see for examples, [13], [14], [10], [18], [9].However, the nonlinear state dynamics is always linearized before a decision control policy is determined to minimize the cost function.In this point of view, the adjustable parameters are introduced in the LQR model such that the differences between the real plant and the model used can be measured repeatedly.At the end of iteration, the iterative solution could converge to the correct optimal solution of the original optimal control problem, in spite of model-reality differences [11], [12], [1].
Usually, the sweep method is applied to construct the feedback control law in solving the LQR model-based optimal control problem [2], [5].It is the same work as those done by [11], [12], [1].In this paper, we propose an efficient computation approach to construct the control sequences for the optimal control problem with model-reality differences.On this basis, the model-based optimal control problem, which is added with the adjusted parameters, is solved iteratively.Our aim is to obtain the true optimal solution of the original optimal control problem via solving the model-based optimal control problem repeatedly.For doing so, the initial control sequences are defined from the LQR optimal control model.Then, the modified model-based optimal control problem is formulated as a nonlinear programming problem [15], [6], [3].During each iteration step, the differences between the real plant and the model used are measured by the adjusted parameters.It follows that the value of the control sequences is updated through the gradient algorithm, where the mathematical optimization technique is applicable.Within a given tolerance, the iterative algorithm gives the correct optimal solution of the original optimal control problem despite model-reality differences.It is highly recommended that the gradient algorithm can make the way of solving optimal control problems with model-reality differences more flexible.
The rest of the paper is organized as follows.In Section 2, a general class of optimal control problem is described.In Section 3, a simplified model-based optimal control problem is discussed, where the adjusted parameters are added into the model used.It points out that the interactive between system optimization and parameter estimation gives a modified optimal control problem, which can be solved by the gradient algorithm.Consequently, an efficient iterative algorithm is resulted.In Section 4, two illustrative examples are demonstrated and the efficiency of the approach proposed is shown.Finally, some concluding remarks are made.
2. Problem Description.Consider a general class of optimal control problem given below: where u(k) ∈ m , k = 0, 1, . . ., N − 1, and x(k) ∈ n , k = 0, 1, . . ., N , are, respectively, control and state sequences, whereas f : n × m × → n represents the real plant, ϕ : n × → is the terminal cost and L : n × m × → is the cost under summation.Here, J 0 is the scalar cost function and the initial state x 0 is a known vector.It is assumed that all functions in (1) are continuously differentiable with respect to their respective arguments.
This problem is regarded as the real optimal control problem, and is referred to as Problem (P).Notice that this problem is a complex problem, where the structure of the problem is in nonlinear manner.Solving this kind of the problem is computationally demanding.In view of this, we propose to solve a simplified model-based optimal control problem iteratively in order to obtain the correct optimal solution of Problem (P).Let this simplified model-based optimal control problem, which is referred to as Problem (M), be given below. x(k where α(k) ∈ n , k = 0, 1, . . ., N − 1, and γ(k) ∈ , k = 0, 1, . . ., N , are the adjustable parameters, while A is an n × n state transition matrix and B is an n × m contol coefficient matrix.J 1 is the model cost function, S(N ) and Q are n × n positive semi-definite matrices, and R is a m × m positive definite matrix.Notice that solving Problem (M) iteratively would give the true optimal solution of Problem (P).This could be done because of the adjustable parameters that introduced into the model are able to measure the differences between the real plant and the model used repeatedly.In such way, we aim at approximating the correct optimal solution of Problem (P) by solving Problem (M), in spite of model-reality differences.
3. Gradient Algorithm with Model-Reality Differences.Now, let us introduce an expanded optimal control problem, which is referred to as Problem (E), given below.
subject to (3) where v(k) ∈ m , k = 0, 1, . . ., N −1, and z(k) ∈ n , k = 0, 1, . . ., N , are introduced to separate the control sequence and the state sequence in the optimization problem from the respective signals in the parameter estimation problem, and • denotes the usual Euclidean norm.The terms and r 2 ∈ are introduced to improve convexity and to facilitate convergence of the resulting iterative algorithm.It is important to note that the algorithm is designed such that the constraints v(k) = u(k) and z(k) = x(k) are satisfied upon termination of the iterations, assuming that convergence is achieved.The state constraint z(k) and the control constraint v(k) are used for the computation of the parameter estimation and matching schemes, while the corresponding state constraint x(k) and control constraint u(k) are reserved for optimizing the model-based optimal control problem.In this way, system optimization and the parameter estimation are mutually interactive.
Applying the calculus of variation [2], [5], the following necessary optimality conditions are obtained.

3.2.
Modified optimal control problem.The modified model-based optimal control problem, which is referred to as Problem (MM), is given below.min subject to (10) x(k with the specified α(k), γ(k), Γ, λ(k), β(k), v(k) and z(k), where the boundary conditions x(0) and p(N ) are given with the specified modifier Γ.
Once the state sequences are determined corresponding to the control sequences which can be defined through the gradient formulation, Problem (MM) could be converted in a nonlinear programming problem as given below [15], [6], [7], [8], [17]: Let this problem be Problem (MM').

Admissible control law. Define
where a i , i = 1, . . ., m, and bi, i = 1, . . ., m, are given real numbers.Notice that V is compact and convex subset of m .Let u denote a control sequence {u(k) : k = 0, 1, . . ., N − 1} in V .Then, u is called an admissible control.Let U be the class of all such admissible controls.
For each u ∈ U , let x(k|u), k = 0, 1, . . ., N , be a sequence in n such that the difference equations with the initial condition as mentioned in Problem (MM) are satisfied.This discrete function is called the solution of the system in Problem (MM) corresponding to u ∈ U .
3.4.Gradient formula.Define the state sequences Then, the system of difference equations in Problem (MM) becomes The variation of the state ( 16) For the cost functional, it is considered that Consider the Hamiltonian function defined by ( 4) and the corresponding necessary conditions (6a) -C (6b), we obtain Then, it follows from (17a) that Hence, from the boundary conditions (6d) and (17b), it yields that Because of û is arbitrary, we obtain the following gradient formula We present this result in the following as a theorem [15], [6], [7], [8], [17]. is given by (20).
3.5.Gradient algorithm.The computation of the gradient of the cost functional J 3 (u) is stated in the following algorithm.
Step 2 Solve the system of the co-state difference equations (6b) backward in time from k = N to k = 1.Let p(k|u) be the solution obtained.
Step 3 Calculate the value of the cost functional J 3 (u) from (10).
Step 4 Compute the gradient of J 3 (u) according to (20).

Remark:
The gradient algorithm is used for updating the control sequence, solving the system of difference equations, to calculate the value of J 3 (u) and the corresponding gradient of J 3 (u) in Problem (MM').
3.6.Iterative algorithm.From the discussion above, we shall summarize the result as an iterative algorithm, and the computation procedure is given below.

The iterative computation procedure
Data A, B, Q, R, S(N ), x 0 , N, r 1 , r 2 , k v , k z , k p , f, L, ϕ.Note that A and B may be chosen based on the linearization of f at x 0 or the linear terms of f .
Note that this step requires taking the derivatives of f and L with respect to v(k) i and z(k ing the result that is presented in Theorem 3.1 and the gradient algorithm.This is called the system optimization step.3.1 Use (20) to obtain the new control u(k) i , k = 0, 1, . . ., N − 1. 3.2 Use (6c) to obtain the new state x(k) i , k = 0, 1, . . ., N .3.3 Use (6b) to obtain the new costate p(k) i , k = 0, 1, . . ., N .
Step 4 Test the convergence and update the optimal solution of Problem (P).In order to provide a mechanism for regulating convergence, a simple relaxation method is employed: where k v , k z , k p ∈ (0, 1] are scalar gains.If v(k) i+1 = v(k) i , k = 0, 1, . . ., N − 1, and z(k) i+1 = z(k) i , k = 0, 1, . . ., N , within a given tolerance, stop; else set i = i + 1, and repeat the procedure starting with Step 1.

Remarks: (a)
A set of control sequences, which is for solving Problem (M) in Step 0, and for solving Problem (MM') in Step 3, respectively, is determined from (20) by using the gradient algorithm.(b) The parameters α(k) i , γ(k) i , Γ i , λ(k) i , and β(k) i are zero in Step 0. Their calculated values, where α(k) i and γ(k) i in Step 1, and Γ i , λ(k) i , β(k) i in Step 2, change from iteration to iteration.(c) Problem (P) is not necessary to be linear or to have a quadratic cost function.(d) The conditions v(k) i+1 = v(k) i and z(k) i+1 = z(k) i are required to be satisfied for the converged optimal control sequence and the converged state sequence, respectively.The following averaged 2-norms are computed, and then they are compared with a given tolerance to verify the convergence of v(k) and z(k): The relaxation scalars (k v , k z , k p ) are step-sizes that regulate the convergence mechanism.They are normally chosen from the interval (0, 1], but this choice may not result in an optimal number of iterations.It is important to note that the optimal choice of k v , k z , k p ∈ (0, 1] is problem dependent, requiring that the proposed algorithm is run several times from Step 1 to Step 4. These values are initially set as k v = k z = k p = 1 for the first run of the algorithm from Step 1 to Step 4, and then the algorithm is run with different values ranging from 0.1 to 0.9.The value that provides the optimal number of iterations can then be determined.The parameters r 1 and r 2 are to enhance convexity, leading to the improvement of the convergence of the algorithm.

Illustrative Examples.
Two illustrative examples are demonstrated here.They are continuous stirred-tank reactor problem [4] and inverted pendulum balancing problem [16].
Example 1: Consider a continuous stirred-tank reactor problem.The real plant is given by for k = 0, 1, . . ., 77, with initial condition Our aim is to determine the control sequences u(k), k = 0, 1, . . ., N − 1, so that the cost function min u(k) J 0 (u) = 0.01 is minimized over the state dynamics.This problem is referred to as Problem (P).The simplified model-based optimal control problem, which is referred to as Problem (M), is given below.min with the initial condition x(0) = [0.050] , and the adjusted parameters γ(k), and The tolerance is set to 10 −5 .The simulation result that is shown in Table 1 presents the efficiency of the algorithm proposed, where 99.3% of the initial cost has been reduced.The trajectories of control and state are, respectively, shown in Figures 1 and 2, while the adjusted parameters α(k) and γ(k) are, respectively, shown in Figures 3 and 4. From the value of the adjusted parameters,which their values are approximated to zero within the given tolerance, it seems that the correct optimal solution of the original optimal control problem is obtained in spite of model-reality differences.
for k = 0, 1, . . ., 29, with initial condition We aim to determine a set of the control sequences u(k), k = 0, 1, . . ., N − 1, such that the cost function min is to be minimized over the state dynamics.
The corresponding Problem (M) is given below.min with the initial condition x(0) = [1.00.5] , and the adjusted parameters γ(k) and α The simulation result is shown in Table 2 with the 79% efficiency of the algorithm proposed.Figures 5 and 6 show the trajectories of control and state, respectively, while Figures 7 and 8 show the adjusted parameters α(k) and γ(k), respectively.Since the values of the adjusted parameters tend to zero at the end of iteration step, it shows that the true optimal solution of the original optimal control problem is obtained despite model-reality differences.A general class of discrete-time optimal control problems, where model-reality differences is taken in account, was discussed in this paper.Because of the complexity of the original optimal control problem, a simplified model-based optimal control problem was proposed to be solved iteratively such that the true optimal solution of the original optimal control problem could be obtained.With introducing an expanded optimal control problem, we integrated system optimization and parameter estimation interactively.In addition to this, a modified optimal control problem was formulated as a nonlinear programming problem and it was solved by using the gradient approach.The resulting iterative algorithm, which integrates the gradient algorithm and the model-reality differences, shows the efficiency through the illustrative examples discussed.On the other hand, the convergence of the adjusted parameters is guaranteed as Lipschitz condition is satisfied.In conclusion, the applicability of the algorithm proposed is highly recommended for solving nonlinear optimal control problems.

Table 1 .
Simulation result, Example 1 Consider an inverted pendulum balancing problem.Problem (P) is described as follows.The state dynamic equations are discretized and given by