A STUDY OF STRUCTURE-EXPLOITING SQP ALGORITHMS FOR AN OPTIMAL CONTROL PROBLEM WITH COUPLED HYPERBOLIC AND ORDINARY DIFFERENTIAL EQUATION CONSTRAINTS

. In this article, structure-exploiting optimisation algorithms of the sequential quadratic programming (SQP) type are considered for optimal con- trol problems with control and state constraints. Our approach is demonstrated for a 1D mathematical model of a vehicle transporting a ﬂuid container. The model involves a fully coupled system of ordinary diﬀerential equations (ODE) and nonlinear hyperbolic ﬁrst-order partial diﬀerential equations (PDE), al- though the ideas for exploiting the particular structure may be applied to more general optimal control problems as well. The time-optimal control problem is solved numerically by a full discretisation approach. The corresponding nonlinear optimisation problem is solved by an SQP method that uses exact ﬁrst and second derivative information. The quadratic subproblems are solved using an active-set strategy. In addition, two approaches are examined that exploit the speciﬁc structure of the problem: (A) a direct method for the KKT system, and (B) an iterative method based on combining the limited-memory BFGS method with the preconditioned conjugate gradient method. Method (A) is faster for our model problem, but can be limited by the problem size. Method (B) opens the door for a potential extension of the truck-container model to three space dimensions.


1.
Introduction. Optimal control problems arise in various contexts, such as economics, biology or engineering. The dynamic behaviour of these systems is often subject to constraints and can be influenced by control variables. In this article, a mechanical system of a truck with a container as a load is considered. By means of optimal control, the control of the mechanical system for the performance of certain predefined manoeuvres is to be determined. The truck-container system is modeled mathematically as a set of fully coupled ODEs and PDEs. The dynamics of the truck are described by the Lagrangian equations, whereas the motion of the fluid is characterised by the shallow water equations.
The mathematical model has been developed in [7]. Herein, the optimal control problems were solved numerically by an adjoint-based direct shooting method. In [10] the same problem is revisited using a first-optimise-then-discretise approach. In this study we follow a first-discretise-then-optimise approach in which we exploit the specific structure of the fully discretised optimisation problem. This article is based on work completed as part of Jan-Hendrik Webert's M.Sc. thesis [15]. Direct discretisation of the optimal control problem yields a nonlinear finite-dimensional optimisation problem which is large-scale and exhibits a sparse structure. Although software (commercial and non-commercial) for the solution of optimisation problems that exhibit this structure, such as SNOPT, IPOPT or WORHP, is available, this article focuses on the implementation and evaluation of alternative algorithms. In a first step, the structure of the discretised optimal control problem of the truck-container system is analysed. Exact first and second derivatives, such as the Jacobian of the constraints and the Hessian of the Lagrangian are derived analytically and exploited in the solution process. In a second step, two solution approaches are implemented, adjusted to the specific characteristics of the problem and tested in numerical experiments. Both approaches employ the sequential quadratic programming (SQP) method, where the resulting quadratic programs (QP) are solved with an active-set strategy. The main effort in the solution process involves the solution of large and sparse systems of linear equations, the so-called KKT systems. The first method relies on the direct solution of the KKT systems with exact second derivatives and MA57, a well-known solver for large and sparse systems. For the second method, a block-elimination approach, the so-called range-space method, is chosen. Exact second derivatives are replaced by a Limited-Memory-BFGS approximation (cf. [11]) and the resulting linear systems are solved with a preconditioned conjugate gradient method. We emphasise that our structure exploitation methods may apply well to other optimal control problems with control and state constraints, though we present our approach in this study on the basis of the aforementioned truckcontainer problem.
This study is structured as follows: after a brief introduction to the mathematical model of the truck-container system, the nonlinear optimisation problem to be solved is formulated. Furthermore, the exact first and second derivatives associated with the optimisation problem are given in this section. Section 3 describes the model verification process with the optimisation software package SNOPT and the analysis of certain influential model parameters. In Section 4, both solution techniques mentioned above are outlined in detail, including implementational details. Computational results are presented in Section 5, before we close with a discussion. 2. Mathematical model with coupled hyperbolic and ordinary differential equations. The technical application that is subject to optimal control in this article is a truck-container system, performing certain driving manoeuvres. Being part of a technical system in motion, the fluid in the container develops its own dynamics, which in turn has an impact on the whole system. In combination with spring-damper elements, unwanted and potentially harmful oscillations can occur. By means of optimal control, the manoeuvering performance of the truck-container system can be improved, for example by reducing braking times or preventing fluid overflow, whilst maintaining the safe operability of the technical system.
In the following two subsections we briefly review the optimal control problem for the truck-container system described in [7].
2.1. Mathematical model. We consider a truck with a container containing a fluid, which is linked to the truck by a spring-damper connection, as depicted in Figure 1. The truck, modeled as a point mass of mass m T , can move in the horizontal direction d and is controlled by a time-dependent force u(t), representing acceleration or deceleration. The distances travelled by the truck and the fluid container in the (X, Z) reference system are denoted by d T and d W and their velocities byḋ T andḋ W , respectively. The function b(x) characterises the bottom surface Figure 1. Model of the truck and fluid tank, cf. [7].
of the fluid container for 0 ≤ x ≤ L, L being the length of the container. Both the height of the fluid column h(t, x) and the horizontal fluid velocity v(t, x) in the moving coordinate system (x, h) obey the Saint-Venant-equations (shallow water equations). The overall mass of the container is given by where h 0 (x) is the initial height of the fluid column andρ W kg/m 2 is the mass of the fluid per area. The grey area in Fig. 1 below b(x) is assumed to have the same density as the fluid. For simplicity, the container itself is assumed to have no mass.
Truck dynamics. The motion of the truck is characterised by a set of ordinary differential equations. Let T denote the final time. Applying the horizontal equilibrium of forces to the masses m T and m W and neglecting any friction terms yields the equations where F is the spring damper force and a W the average acceleration of the fluid in the container. The spring-damper force, which obeys the linear spring and damper force laws, is given by with the constant spring stiffness c N/m and the damper force constant k N s/m . Furthermore, the mean acceleration of the fluid, a W , is given by where v t (t, x) denotes the horizontal acceleration at the point (t, x).
Shallow water equations. Fluid motion is generally described by the Navier-Stokes equations. Assuming that horizontal scales are much larger than vertical scales and neglecting 3D effects in the fluid motion, these equations can be simplified significantly, thereby reducing complexity and computational effort, whilst still conveying the desired information. The fluid is supposed to be incompressible. In this article the fluid is assumed to be water. For h > 0 the resulting shallow water equations (SWE) may be written as where Ω := (0, T ) × (0, L) and g is the gravity acceleration. The right-hand side in (2.4b) models a non-straight bottom geometry b(x) [14, pp. 3-4] and the acceleration of the fluid container due to the coupling force F (t). It must be noted that the fluid-level in the coordinate system (x, h) is b(x) + h(t, x). The initial and boundary conditions for the SWE (2.4) are given by and it is assumed that where h and h are given numbers and H(x) is the wall height of the tank. For compatibility, h 0 (x) must also satisfy (2.5). As the fluid cannot penetrate the tank and it follows that v t ≡ 0 at the left and right boundary of the tank. Exploiting this information in Equation (2.4b), boundary conditions for h can be obtained at the boundary points x e = 0 and x e = L, respectively: Solving (2.4b) for v t and inserting the resulting expression into the integral in (2.3), the mean fluid acceleration simplifies to where (2.6) was exploited. The expression for a W can be inserted in (2.1b) to yield where the spring damper force F cancels out. In summary, the technical system considered in this study is characterised by a set of fully coupled ODEs and PDEs. In the following, the ODEs are transformed to a first-order system by introducing the variables η T :=ḋ T and η W :=ḋ W .
2.2. The optimal control problem. With the set of differential equations describing the truck and fluid motion available, we can pose an optimal control problem for the truck motion from a given initial state to a terminal state. The initial state is characterised by the initial position and velocity of the truck d T (0) = 0 andḋ T (0) =η 0 T and of the fluid container d W (0) =d 0 W andḋ W (0) =η 0 W . For the terminal state, the truck and the container position are given by d T (T ) =d T T and d W (T ) =d T W with the free final time T and the given scalarsη 0 The objective function J to be minimised is composed of a sum of several functions, where each summand is weighted with a non-negative constant α i : Here the summands to be minimised are: the final time T , the deviation from a given (constant) fluid-level h d , the control effort, and the deviation of the terminal truck and container velocities from given terminal velocitiesη T T andη T W . Ifη T T =η T W = 0, a braking manoeuvre is performed, where truck and container have come to a halt after the time T and given distance. To ensure that the acceleration and deceleration of the truck is limited, we impose bounds on the control Furthermore, to prevent fluid overflow the fluid-level must not reach the upper rim of the tank. Fluid overflow would imply a reduction in mass, which is not included in the mathematical model. In addition, the shallow water equation model requires the bottom of the tank to be covered everywhere with fluid. These restrictions are enforced by the constraint (2.5) bounding the fluid column height.
The domain Ω contains the free end time T , which is not known a priori. Furthermore, T appears in the bounds of the objective function integrals. For the numerical solution of the problem, it is convenient to consider a domain with a fixed time horizonΩ := (0, 1) × (0, L). This can be achieved by means of the time transformation t(τ ) = τ T with τ ∈ [0, 1], where T is considered as an additional optimisation variable. All time dependent expressions, including ODEs, PDEs, and the objective function, must to be transformed. For example, in the transformed equations there is an additional multiplicative factor 1/T associated with each of the time derivatives, and the integrations for J are performed from 0 to 1, while dt is scaled by the multiplicative factor T . For improved readability, the normalised time τ is denoted again by t in the following. for all j = 0, . . . , N − 1. Similarly, we introduce d j T , d j W , η j T , η j W , h j i , and v j i for i = 0, . . . , M and j = 0, . . . , N . We define the vector of discretised state variables , and the vector of the discretised control and final time Finally, the vector z is defined by z = [q, w] . In order to rewrite the optimal control problem (OCP) as a nonlinear program (NLP), the dynamic constraints corresponding to the ODE and PDE must be discretised. Various discretisation methods can be applied. For the discretisation of the ODE, the explicit Euler method was chosen. The shallow water equations can be discretised, for example, with the Lax-Friedrichs (LF) method or the Godunov scheme, both discretisation schemes being suitable for the approximation of hyperbolic conservation laws, such as the shallow water equations. For the discretisation of the PDE in this study, the LF method was chosen. The LF scheme exhibits some desirable properties, as it is consistent, conservative, and monotone [2]. Approximating the original conservation law with the LF method is equivalent to solving the original equation with an artificial diffusion term. This property is referred to as numerical dissipation or numerical viscosity, which reduces undesired oscillations of the numerical solution. The artificial diffusion term guarantees a unique solution, whereas hyperbolic conservation laws have, in general, no unique solution without imposing further so-called entropy conditions that follow from physical principles. Furthermore, the solution obtained in the limit of vanishing numerical viscosity also satisfies the entropy condition.
For the LF scheme it is necessary and sufficient for convergence that the Courant-Friedrichs-Lewy (CFL) condition is satisfied. Informally speaking, the space and time grids must be harmonised so that the duration between two neighbouring time gridpoints is less than the time needed by the wave to travel from one time gridpoint to an other. For our problem, it was found experimentally in [7,Ch. 3] that the CFL condition implies 30∆t ≈ ∆x/L and N ≈ 30M for the time and space grids.
The discretised differential equations may be written in the form where we introduce the increment function Φ j by Here we have abbreviated . For the Neumann boundary conditions in h we use the notation where the index e = l stands for the left boundary at x e = 0 and e = r for x e = L, respectively. Approximating the time and space derivatives with the LF-method yields the discretised differential equations i.e., a subset of 4 + 2M constraints with time index j = 0, . . . , N − 1. The term H N r equals the last two constraints in C j but with the time index N . As it was done for the vector of the nonlinear equality constraints, the vector of the linear boundary conditions is sorted by the time index of the variables. We obtain B = B 0 , B 1 , . . . , B N , where The box constraints for control u and the state h, in ascending time order, are Furthermore, the tracking-type term integral is approximated by the trapezoidal rule. Thus, in the following, we abbreviate ω 0 = ω M = 1/2 and ω i = 1, i = 1, . . . , M . In addition, we add a regularization term for the variation of the control with a weight factor α 4 ≥ 0. The reasons for this modification will be explained in Section 3.
Finally, for further details we refer to [15]. Our fully discretised optimal control problem reads: Minimisẽ with respect to z ∈ R (N +1)(7+2M ) , subject to the general constraints (2.10), the initial and boundary conditions (2.11), and the box constraints (2.12).

2.4.
Derivation of the Hessian and the Jacobian. The constraints in the NLP given above can be divided into three classes: The vector C(z) contains the mostly nonlinear equality constraints, whereas the linear boundary conditions and linear box constraints are found in the vectors B(z) and G(z), respectively. We calculate the dimension of the NLP for given discretisations fulfilling the CFL condition. For M = 20, N = 600, we find that the number of optimisation variables is n z = 28247 and the number of equality constraints is n eq = 27650. For M = 50, N = 1500, we have n z = 160607 and n eq = 159110. Thus n z is greater than n eq , which is important for the solvability of the formulated NLP, as the number of degrees of freedom is greater than zero. The algorithms used for nonlinear optimisation problems require first and second derivatives, i.e., the Jacobian of the constraints and the Hessian of the Lagrangian.
Jacobians C (z) and B (z). The derivation of the Jacobian of the nonlinear constraints, C (z), requires a structural approach. We note that the constraints in the vector C(z) are sorted by time index j. The main part of the Jacobian is made up of sparse block matrices on its diagonal. Using the notation for C(z) introduced above, the symbolic Jacobian is obtained. The vertical line indicates the formal distinction between the state vari- Differentiating the sorted vector B(z) with respect to z yields the symbolic Jacobian where each matrix ∂B i /∂q i , i = 0, . . . , N , has unit or zero vectors as columns.
Hessian ∇ zz L. We assemble the Lagrange function (2.14) of the NLP, where the constraints are coupled to the objective function by unknown numbers λ k , ν k , and µ k , the so-called Lagrange multipliers. Differentiating L twice with respect to z yields non-zeros for the objective function and nonlinear constraints only. Thus the linear boundary conditions and box constraints B(z) = 0 and G(z) ≤ 0 will not appear in the Hessian. The Hessian of the Lagrangian with respect to z referred to as H in what follows, is symmetric and exhibits a sparse structure with sparse blocks on its main diagonal and a dense last row and column vector. Similar to η J we consider the quotient η H = number of nonzero entries number of entries = N (8M + 21) (N + 1) 2 (7 + 2M ) 2 (2.16) as a measure of the sparsity of the Hessian. For N = 600, M = 20 we obtain η H ≈ 0.000136 = 0.0136%. The detailed sparsity structure and the values of C , B , and H are provided in [15,Appendix C]. We note that we work with a coordinate storage format for sparse matrices.
3. Model verification. In order to verify the fully discretised model with the exact Jacobian, the formulation has been implemented in SNOPT, a general-purpose software package for large-scale nonlinear programming. It employs an SQP method, solving the QP subproblems with a reduced-Hessian active-set method. Thus, the exact Hessian does not have to be provided by the user. For further documentation, see [9].
Various numerical experiments on the NLP posed in Section 2.3 have been conducted with SNOPT to examine the sensitivity towards certain model parameters and the necessity of a regularisation strategy. Furthermore, an efficient method for the initialisation of calculations based on information from previous solutions is outlined below. A suitable initialisation has the potential to reduce computing times significantly.
3.1. Regularisation strategies. When only the free end-time T is minimised, i.e., the objective function is J = α 0 T , the computed control tends to be non-smooth and a so-called "bang-bang solution" is obtained. For the truck-container system performing a braking manoeuvre in minimum time, it is intuitive that the truck accelerates as long as possible and then brakes with the maximum possible force until it comes to a halt. The optimal solution resembles this intuitive behaviour, yet the optimisation algorithm struggles to find an optimal control without frequent jumps, unless a regularisation term is added to the objective function. Two different regularisation approaches have been implemented and tested.
Let α 0 > 0. Solving the discretised OCP for the modified objective functioñ regularises the solution because the control effort is minimised in addition to the final time. When considering only α 2 = 0, while α 4 = 0, as proposed by Betts [1, Ch. 5.5], the regularisation strategy fails for poor initial guesses of T . It turns out that an alternative regularisation strategy that punishes jumps and oscillations in the control, i.e. with α 4 = 0, is more robust. As we assume that u ∈ L ∞ , the derivative u does not exist in general, thus additional assumptions concerning the control have to be made. Considering the control u as a state variable subject to the ODE-constraint u = ξ, where ξ ∈ L ∞ is the new control, yields that u ∈ W 1,∞ . Physically, the change in the control variable from u to ξ means controlling the jerk instead of the acceleration. It can be observed that this regularisation strategy yields more stable results, but there is still a slight dependency on the initial guess for T , though the oscillations in u are reduced. While neither of the proposed approaches is completely satisfactory, mixing the two strategies increases stability. If the parameters α 2 and α 4 are chosen in a way that the regularisation term is two or three orders of magnitude smaller than the actual objective term in the objective function, the influence of the smoothing terms on the optimal solution is negligible.

3.2.
Initialisation with the collocation method. In order to obtain solutions with increased accuracy, the discretisation of the PDE and ODE must be refined. Both the number of variables and the number of nonzero entries in the Jacobian is O(N M ). To reduce computing times, the collocation method can be employed to initialise a problem with a refined discretisation by interpolating a solution obtained on a coarse grid. Given a discrete solution on a coarse grid, the solution of the ODE and PDE is approximated by interpolating polynomials. Evaluating the resulting spline at the gridpoints of a refined grid yields a discrete starting vector for the refined problem. In general, the initialisation vector does not satisfy the constraints due to the interpolation error. Nevertheless, the optimisation algorithm needs fewer steps to find a feasible point, as the initial point is already close to feasibility. The solution for N = 300 has been utilised to initialise the problems with N = 450 and N = 900. We observe that a refined discretisation leads to more accurate results, which is indicated by decreasing values for T with increasing N . Computing times were reduced significantly by employing the collocation method. It was observed that only about a third of the computing time is required to find the solution when the problem is initialised by collocation. 4. Implementation with sqpfiltertoolbox. The software sqpfiltertoolbox is written in Fortran 90 and implements different sequential quadratic and sequential linear programming methods for general nonlinear optimisation problems. It was originally developed by M. Gerdts for small-to medium-scale problems with dense Jacobian and Hessian matrices [6]. Although there are more sophisticated, yet more complicated algorithms for large scale and sparse optimisation problems available (see e.g. [1, Ch. 2.3]), for the example presented in this paper, the SQP active-set method is employed, as it is implemented in the sqpfiltertoolbox. Nevertheless, some adjustments to the algorithm must be made to exploit the sparsity and to tackle the challenges arising from the large scale of the problem. At the centre of interest is the solution of the KKT system in each QP iteration step. This section focuses on the comparison of different methods for the solution of the KKT systems and the adjustments that have to be made to ensure global convergence of the SQP method.
4.1. Sequential quadratic programming. For convenience, we briefly review the SQP method. To adjust to the common notation in nonlinear programming, in this subsection the vector of optimisation variables is referred to as x ∈ R n and all equality and inequality constraints are comprised in the vector c : R n → R m . The objective function is denoted by f : R n → R. It is assumed that the functions f and c are at least twice continuously differentiable with Lipschitz continuous second derivatives. The general formulation of a nonlinear program is now subject to c i (x) = 0, i ∈ R, (4.1b) where the index set R contains the indices of the equality constraints, while the indices of the inequality constraints are included in the set S. At a given point x, the active set A(x) is the union of the set of equality constraints R with the indices of the active inequality constraints The Lagrangian of the NLP (4.1) is where λ ∈ R m comprises here all Lagrange multipliers. For the first-order optimality conditions to hold, a constraint qualification must be satisfied. For further considerations, the linear independence constraint qualification was assumed [13,Def. 12.1].
In this context, the Hessian H(x, λ) = ∇ 2 xx L(x, λ) is well-defined. The SQP method solves a sequence of quadratic models for the objective with linearised constraints that approximate the NLP locally at (x k , λ k ), where H k = H(x k , λ k ) and ∇f k = ∇f (x k ). The QPs are solved by iterative methods, such as the active-set method. The SQP algorithm starts from an initial point (x 0 , λ 0 ) and iteratively calculates KKT points (d k , λ k+1 ) for (4.3) updating x k+1 = x k + d k , until (x k , λ k ) is a KKT point of the original NLP.
As the complementarity condition must be satisfied at an optimal solution (x * , λ * ), the optimal Lagrange multipliers λ * i associated with the inactive constraints c i (x), i / ∈ A(x * ) are zero. Given a solution x * of (4.1) and assuming that (i) the Jacobian of the active constraints A * at x * has full rank; (ii) H * is positive definite on the nullspace of A * ; and (iii) strict complementarity holds, there is a local solution of the QP (4.3) whose active set A k is the same as the active set A(x * ) of the NLP (4.1) at x * , if (x k , λ k ) is sufficiently close to (x * , λ * ) [13, p. 533]. Thus, locally the QP can be treated as an equality constrained optimisation problem once the active set is known. Given this fact, the local SQP method is equivalent to Newton's method applied to the first-order optimality conditions of the NLP and quadratic convergence of the SQP method towards the solution can be achieved.

4.2.
Direct solution with exact second derivatives. In Section 2.4, the exact first and second derivatives of the truck NLP are provided. Let A k denote the active set at the current iteration step k. The Jacobian of G associated with the active set is referred to as G A k . Having all derivatives available, the KKT system can be assembled as follows  where H ∈ R n×n is the exact Hessian and ∇f ∈ R n denotes the gradient of the Lagrangian of the QP. To improve readability, the index k, indicating the evaluation of functions and derivatives at the point (x k , λ k ), has been omitted.
As we do not have to distinguish between the different Jacobians B , G A k and C for the further examination of the KKT system, we introduce the notation and refer to A ∈ R m×n as the Jacobian of the active constraints. The solution of the system (4.5) poses two major challenges. Firstly, the KKT matrix is large and sparse, which calls for a sophisticated and efficient solver for linear systems of equations (LES), exploiting the sparsity. Secondly, the solvability of the system must be guaranteed, i.e., certain criteria must be met, such that the KKT system has a unique solution.
For the first challenge, open-source software for the solution of large, sparse and symmetric systems of equations is available. The code MA57 [4] uses the so-called multifrontal method, which is based on a sparse variant of Gaussian elimination. Solving the specific KKT system presented above with MA57, revealed two aspects that have a major influence on efficiency and stability of the algorithm: However, this can lead to an unstable factorisation. To resolve the trade-off between computing time and stability, a suitable value is threshold = 10 −2 . The KKT system (4.5) is not always solvable, but a necessary condition for the KKT matrix to be non-singular is that A has full row rank and the reduced Hessian Z HZ is positive definite, where Z denotes the matrix whose columns are a basis for the null space of A. Numerical tests showed that neither of the two requirements are met in general, leading to a breakdown of the QP active-set method. In the following, two modifications of the KKT matrix are proposed that ensure the solvability of the KKT system.
Dual regularisation. In order to determine the row rank of A, being the Jacobian of the active constraints, a singular value decomposition can be performed. Given that A has more columns than rows it has full row rank, if the number of singular values σ i = 0 is equal to the number of rows. An examination of the singular values reveals that A has full row rank at the solution z * . If A has more rows than columns, which can happen, for example, for a bad initial guess with too many active box constraints, full row rank of the Jacobian cannot be achieved and the KKT matrix is singular.
By means of dual regularisation, a non-singular KKT matrix can be obtained even if A is rank deficient. The KKT system is altered to where µ > 0 is a parameter and I the identity. Introducing −µI on the (2,2)-block of the KKT matrix corresponds to solving a QP problem with shifted constraints. As the KKT system is perturbed this way, the obtained solution is only an approximate solution of the original problem. However, if µ is sufficiently small, the deviation of the approximate solution is negligible. For the KKT system of the specific problem under consideration, the invasive influence of the proposed technique could be reduced by introducing dual regularisation on a sub-block only. It has been found that it is sufficient to modify the KKT system as with regularisation in the (4,4)-block only. Numerical tests indicate that choosing µ = 10 −12 yields a stable configuration. Even when in the neighbourhood of the solution, where regularisation would not necessarily be needed for the KKT matrix to be non-singular, regularisation is still useful, as it speeds up the solution process with MA57.
Primal regularisation. When using exact second derivatives instead of a BFGS approximation, neither the Hessian H nor the reduced Hessian Z HZ can be guaranteed to be positive definite. By means of primal regularisation, positive definiteness of the reduced Hessian can be achieved. In the following, a method for primal regularisation is outlined. We refer to the inertia of the symmetric matrix K as the triplet giving the numbers of positive, negative and zero eigenvalues of K: In(K) = (n + , n − , n 0 ). By repeatedly applying Sylvester's law of inertia, we may show that if the reduced Hessian Z HZ is positive definite and A has full row rank m, then In(K) = (n, m, 0). In case of dual regularisation the KKT matrix K changes to the form in (4.6) with a nonzero (2,2)-block µI, for which a similar result is obtained, see [5, Ch. 4, Proposition 2]. If In(K) = (n, m, 0) the local model of the optimisation problem is convex and the KKT system (4.6) has a unique solution. However, if In(K) = (n, m, 0), the local model of the optimisation problem is non-convex and the QP algorithm will fail. In this case, the local model must be convexified, which is referred to as inertia control in the following. In order to determine whether or not inertia control must be performed, the inertia of the KKT system must be computed. Fortunately the desired information is a byproduct of the solution process with MA57. The KKT matrix is factorised by P KP = LDL with the block diagonal matrix D. The inertia of D can easily be determined. As a permutation with the matrix P does not change the eigenvalues of K and L is non-singular, Sylvester's law of inertia can be exploited, leading to the result that In(K) = In(D). Thus, the inertia of the KKT matrix is available in each QP iteration step and it can be checked whether K has the required inertia.
In case the inertia of the KKT matrix is not equal to (n, m, 0) in a QP iteration step, inertia control is invoked. For inertia control the Levenberg method [1, Ch. 2.5] with a modification to fit our specific application is used. Whenever inertia control is invoked, the current QP is discarded and a new QP with the shifted Hessian is set up and solved. The new search direction d k that solves the modified QP is significantly biased toward a gradient direction and convergence is degraded [1, Ch. 2.5].
Numerical tests on the specific truck NLP showed that the inertia of the KKT matrix has the desired properties in the neighbourhood of the solution, which implies that the exact Hessian is accepted in the QP. Inertia control with the shifted Hessian is invoked far from the solution only and local quadratic convergence to the solution is still achieved.

4.3.
Solution with the range-space method. The direct solution with exact second derivatives described above turned out to be reliable and efficient. Nevertheless, there are some difficulties that motivate a different approach to the solution of the KKT system. On the one hand, methods that solve the LES directly, such as the method involving MA57 presented above, might fail for very large problems because of a lack of available storage. On the other hand, deriving the exact Hessian analytically for more complicated problems, e.g., for a 3D PDE, can be very timeconsuming and prone to errors. The method presented in the following circumvents these disadvantages by approximating the Hessian with a Limited-Memory-BFGS (LM-BFGS) matrix and solving the resulting KKT system with the iterative conjugate gradient method, which allows for the solution of larger systems.
In [13,Ch. 16] the range-space method for the solution of the KKT system which is based on block-elimination, is described. Note that the exact Hessian H was replaced by an approximate Hessian W . By solving the first equation for and substituting in the second equation, we obtain a system of equations for the multipliersB λ =b (4.10) with the right-hand sideb = −AW −1 ∇f . Here the matrixB := AW −1 A is symmetric and positive definite if W is symmetric and positive definite, which is the case for an LM-BFGS approximation of the Hessian W . Thus (4.10) can be solved iteratively with the conjugate gradient (CG) method. Once λ is obtained, the search direction d can be computed by inserting λ in equation (4.9). Before the LM-BFGS and CG methods are outlined, it must be mentioned that the rangespace method as presented above is well-defined for non-singular KKT systems only. If the number of constraints is larger than the number of variables, the Jacobian A is rank deficient and the KKT system is not solvable, leading to a breakdown of the proposed method. As mentioned in the previous subsection, non-singularity can be achieved by means of dual regularisation. Nevertheless, performing blockelimination with a regularised KKT matrix leads to an ill-conditioned system of linear equations. As this implies numerical difficulties, dual regularisation should not be the method of choice.

4.3.1.
The Limited-Memory-BFGS method. The exact Hessian can be approximated by a quasi-Newton matrix, which is referred to as a BFGS matrix. By the way these matrices are constructed, they exhibit a dense structure even though the exact Hessian might be sparse. In [3], a limited memory quasi-Newton method is described that uses a less memory-intensive representation of the BFGS matrix. To avoid any confusion concerning nomenclature, it must be stated that the term LM-BFGS method refers to an entire algorithm for the solution of large-scale unconstrained optimisation problems. As the optimisation problem examined in this article involves nonlinear constraints, the LM-BFGS method can not be applied as a whole. Still, the representation of the quasi-Newton approximation for the Hessian, which is a crucial component in the LM-BFGS method, can be integrated in the rangespace method presented above. In particular, matrix-vector multiplications of the form W p and W −1 p, where W denotes the LM-BFGS Hessian and p is an arbitrary vector, can be performed efficiently.
The main difference between the BFGS and the LM-BFGS matrix representation is the storage of the Hessian W k in the k-th SQP iteration. Whereas a dense n × n matrix is stored in the BFGS method, only a set of the l most recent update vector pairs {s i , y i }, i = k − l, . . . , k − 1 is stored, which defines the Hessian implicitly. The update vector pairs are constructed as s i := x i+1 − x i and y i = g i+1 − g i , where g i = ∇f (x i ), in line with the standard notation. After a new iterate has been computed, the new update vector pair is either added to the set if k ≤ l, or the oldest update vector pair is replaced by the newest if k > l. Thus, in the first l steps, the LM-BFGS method resembles the classical BFGS method. For practical purposes it is often sufficient to choose l ∈ [3,7], which implies that only (2l + 1)n instead of n 2 elements must be stored to reconstruct the approximate Hessian.
Auxiliary matrices must be defined and stored for the representation of the quasi-Newton matrix and its inverse. The storage and computational effort required to obtain these matrices is negligible, especially because they have to be computed once for each SQP-iteration only. If the initial matrix is of the form W 0 = σ k I, the approximate Hessian is represented by where σ k is a positive scalar that can either be fixed or updated in the course of the iterations based on curvature information. The choice of σ k will be addressed in the subsection on restarting the BFGS method. In a similar manner, the inverse of the LM-BFGS matrix is given by (4.12) and with S k , Y k and D k as defined above and the scalar γ k . Although there are efficient algorithms for matrix multiplications involving the LM-BFGS Hessian and its inverse, only matrix-vector multiplications are required in the context of the range-space method and the iterative solution of the resulting LES. For an efficient algorithm for the computation of the product W k p see [3,Ch. 3.2]. According to (4.12) the computation of the product W −1 k p involves multiplications with the inverse of the matrix R k . Since l is small and R k is triangular, the inverse R −1 k can be provided with little computational effort. Furthermore, the inverse needs only be computed once per SQP-iteration. In total, (4l + 1)n + (5/2)l 2 multiplications are required to obtain the matrix-vector product.
In the context of the range-space method, a linear system of equations of the form (4.10) must be solved. The matrixB is never computed and stored explicitly. The matrix A is stored in the coordinate storage format, whereas the matrix W −1 is represented as an LM-BFGS matrix. Hence, direct methods, which would require the explicit representation of the matrixB, are not suitable.
The main effort of a single CG iteration lies in the computation of the matrix vector productBp k . In the context of the range-space method this product has the form AW −1 A p k . As mentioned above, the matrixB is not available in its explicit form. Thus, the productBp k involves three consecutive matrix-vector products. For a detailed convergence analysis of the CG method and the essential preconditioning, see [13, pp. 112-119].
Although preconditioning can speed up convergence significantly, the overall computing time is not necessarily reduced, as computational effort is required to obtain a preconditioning matrix and to perform arithmetic operations with its inverse. Therefore, the choice of a suitable preconditioning matrix is problem dependent. In the context of the range-space method with a quasi-Newton approximation of the Hessian, the coefficient matrixB is not explicitly known. In this context, the SSOR and the Cholesky preconditioner are not suitable. Thus, the Jacobi preconditioner has been chosen for further considerations. Detailed information on the efficient computation of the preconditioner is given in the following subsection. 4.3.2. Implementational details. By slightly modifying the presented algorithms above, i.e., the range-space method with an LM-BFGS approximation solved by the CG method, and taking advantage of the characteristic structure of the QP subproblem, the computational effort can be reduced substantially.
Preconditioning the CG method: For computational reasons, the diagonal of the coefficient matrixB = AW −1 A is the preconditioner of choice for further considerations. Nevertheless, the overall computing time increases, if the preconditioner is recalculated in each QP iteration. However, by exploiting the structure of the QP iterations, the diagonal preconditioning matrixM can be obtained efficiently, reducing the overall computing time. In the first QP iteration,M must be computed entirely, but can be updated in the subsequent iterations with very little computational effort. The i-th diagonal element of the preconditioning matrixm ii is obtained by pre-and post-multiplying the coefficient matrix with the i-th unit vector e im ii = e i A j W −1 A j e i , which involves three matrix-vector products. The index j denotes the QP iteration. Depending on the result of the j-th QP iteration, three cases must be distinguished. (i) If the computed stepsize is accepted, the Jacobian of the active constraints does not change in the (j + 1)-th iteration, which implies that A j+1 = A j . Therefore the preconditioner can be reused. (ii) In case the constraint with index i is deactivated, A j+1 is obtained by deleting the i-th row in A j . Instead of recomputing the preconditioner, it can be updated without any computational effort by deleting the i-th row. (iii) This possible case involves the activation of a constraint with index i, which implies that A j+1 results from A j by adding a row at the i-th position.
The preconditionerM is updated by adding a row and column at the respective position, with the diagonal elementm ii = e i A j+1 W −1 A j+1 e i .
If these update formulae are considered, the main effort of computing the preconditioner lies in the first QP iteration, where the preconditioner must be determined from scratch. The effort required to update the preconditioner is negligible. The numerical results show that preconditioning with updates instead of recomputation reduces the number of required CG iterations by about 15%.
Initialising the CG method: If no information on the solution of the LES is available, the CG method is initialised with λ 0 = 0. However, after a deactivation step in the j-th QP iteration, the initial point can be chosen so that only very few CG iterations are necessary to achieve convergence. As the QP stepsize is zero in a deactivation step, the right-hand side of the KKT system (4.8) does not change in the following iteration. If the i-th constraint was deactivated, the Jacobian of the active constraints is updated by deleting the i-th row. Reconsidering the LES that is to be solved by the CG method the following observation can be made: as the Jacobian of the active constraints is sparse and exhibits a block structure, the deletion of a row has little impact on the structure of the system above. The solution of the system changes slightly, as only a few multipliers in λ are impacted by the change. If we initialise λ 0 = λ \ {λ i } in the (j + 1)-th QP step where λ are the multipliers of the j-th QP step in which the constraint with index i was deactivated, the convergence rate of the CG method can be increased significantly. In several calculations it has been observed that only one CG iteration is needed after a deactivation step. Although this result cannot be generalised for other problems, in the context of the NLP considered in this study, which has linear box constraints as inequality constraints only, the careful choice of an initial point λ 0 speeds up the CG method. The numerical results reveal that the average number of CG iterations needed for each QP drops by about 15% when the initial point is chosen as described above. Nevertheless, numerical difficulties were encountered more frequently for initial guesses far from the solution and the overall algorithm turned out to be less robust. This behaviour might be unexpected at first, as a different initial point in the CG method should not change the solution.
Bearing in mind the fact that the CG method is an iterative method that terminates when the residual is smaller than a certain tolerance CGTOL, it is not surprising that different initial guesses lead to slightly different solutions in the iteration process. Even if the residual has machine precision, it cannot be concluded that the solution has the same precision. A slightly different solution in each QP might lead to activation or deactivation of different constraints and thus impact the overall SQP procedure. Therefore, the initialisation with the multipliers of the previous iteration may not have a positive effect. All calculations with the range-space method in the following subsection on numerical results were performed with an uninitialised CG method, i.e., λ 0 = 0. Stable results were obtained with CGTOL = 10 −14 , which was chosen as the standard value for all calculations performed with the range-space method.
Tuning the LM-BFGS method: There are several adjustments to the LM-BFGS method to speed up global convergence of the SQP algorithm. Three modifications in particular are addressed in the following.
(i) When the initial guess is far away from the solution, it is often useful to perform a certain number of steepest-descent steps. In this case the quasi-Newton Hessian is replaced by the identity matrix and arithmetic operations involving the Hessian and its inverse require less computational effort.
(ii) As outlined in Subsection 4.3, a certain number m of vector pairs is stored to reconstruct the LM-BFGS Hessian approximation. If the number of SQP-iterations exceeds m, the newest vector pair replaces the oldest one. Although "old" curvature information is discarded this way, the convergence rate can be improved by deleting all update vectors and restarting the LM-BFGS method with a scaled identity matrix. In [12,Ch. 10] Luenberger and Ye propose to restart after a fixed amount of iteration steps. However, it is also possible to introduce criteria that trigger a restart. For example, the reduction in the constraint violation or in the gradient of the Lagrangian can be measured and compared to the previous iterations. If either of the two criteria with a variable integer r and 0 < α < 1 (e.g. α = 0.99), is violated after the kth iteration, the LM-BFGS method is restarted. Numerical experiments indicate that choosing r as the number of iterations since the last restart yields satisfactory results. On average, the restart is triggered after 3-4 iteration steps.
It turns out that performing steepest-descent steps before starting the LM-BFGS Hessian approximation often has a detrimental effect, as computing times increase in most cases. However, restarting the LM-BFGS method has a beneficial impact. Both a restart after a fixed amount of iteration steps and the automated restart reduce computing times significantly, where the automatic restart is superior to the fixed restart [15, Table B.7].
(iii) As proposed in [12,Ch. 10] and [11,Ch. 4], the performance of the LM-BFGS method can be improved substantially by means of self-scaling. After a restart in SQP-iteration k, the quasi-Newton matrix is initialised with a scaled identity H 0 = σ k I with where [s k , y k ] is the latest update vector pair [3, p. 142]. Luenberger, Ye and Liu propose scaling the inverse Hessian H −1 0 = γ k I with γ k = y k s k y k y k .
Since γ k = 1/σ k , the two methods are not equivalent. Both scaling variants have been implemented and tested, where self-scaling with σ k turned out to be more efficient. In numerical experiments, a decisive beneficial impact of self-scaling was revealed, which has the potential of reducing computing times by about two thirds.

5.
Numerical results and evaluation. All numerical experiments were performed on a machine with 2 Hexa core Intel Xeon CPU X5680s (-HT-MCP-SMP-) with 3.33 GHz each. In Figures 2 and 3 results for different manoeuvres of the truck-container system are shown. In the first case, the truck performs a braking manoeuvre over a distance  (1)), whereas in the second case, the fluid-level deviation is also minimised (Problem (2)). Figure 4 shows the results for Problem (1) with a non-straight fluid tank bottom. In all three cases, the control looks sufficiently smooth, with the undesired oscillations being damped by the smoothing terms. Before discussing the performance of the two different solution techniques examined in this paper, we observe that the solution process for Problem (1) takes significantly more time than for Problem (2), see [15, Tables B.2 (1) Method (B) can cope with a variety of initial guesses, too, yet the algorithm struggles to find feasible points in some cases for Problem (2). For the coarsest discretisation N = 150, M = 5 the algorithm breaks down without finding a feasible point. "Hot starting" (HS) the problem with the current iterate as the next initial guess, fixes the feasibility problems, and a solution is attained at the cost of increased computing times. Furthermore, Method (B) solves Problem (2) for finer discretisations much faster, when initialised by collocation. However, Method (B) is more sensitive to scaling in the objective function, sometimes leading to a breakdown of the method or yielding inaccurate solutions [15, Table B.10]. It can be concluded that Method (A) is more robust than Method (B), for which numerical difficulties concerning scaling and different initial guesses emerged more often. Moreover, due to the use of exact second derivatives, quadratic convergence to the solution can be achieved with Method (A), whereas Method (B) does not exhibit that property. In Table 1 an excerpt of the solution output of Problem (1), exhibiting the quadratic convergence, is depicted.
We conclude that, if exact second derivatives are available, Method (A) is the method of choice for the examined application as it is faster and more robust. Yet, extending the truck-container model to three dimensions would make the derivation of the exact Hessian cumbersome and prone to errors. In this case, Method (B) could be employed, which does not need exact second derivatives, at the cost of significantly higher computing times. The introduction of Method (B) as an alternative to (A) was also motivated by the assumption that the application of a direct method may no longer be viable for very large problems because of memory limitations of the LES solver. It was shown that Method (A) solved problems with up to 104487 variables on a grid with N = 1200 and M = 40 within about 6.2 hours, see Table 2. Based on the observation that Method (B) requires about a fifty times this computing time, computation times of more than 12 days for the solution of   Table 1. Solution output of Problem (1) with M = 5 and N = 150. CV and KKT denote the norm of the constraint violation and the norm of the gradient of the Lagrangian, respectively. QPIT is the number of QP iterations and OBJ the value of the objective function.
the same problem would be required. For even larger problems, the solution with Method (B) is unlikely to yield a result in an acceptable time.
Finally, we compare the performance of Method (A) with SNOPT. For these runs, a sequence of problems of type (1) is set up with gradually refined discretisation. Each problem except the first is initialised using collocation based on the previous solution on a coarser grid. Despite the fact that SNOPT does not rely on exact second derivative information, it requires much less computing time than Method (A) in most cases, as it can be observed in Table 2. However, the problem with N = 1200 and M = 40, which has 104487 variables, is solved much faster by Method (A). This can be ascribed to the superior matrix factoring technique for very large and sparse matrices employed by MA57.
6. Conclusion and future work. In this article, an optimal control problem for an example of a fully coupled mechanical system, i.e., the truck-container system, was solved. The infinite-dimensional optimal control problem was fully discretised, where suitable integration schemes were employed for the discretisation of the coupled system of ordinary and partial differential equations that describe the motion of the truck, container, and fluid.
The finite-dimensional discretised optimal control problem, which is equivalent to an optimisation problem with nonlinear constraints, was solved with an SQP activeset method, as implemented in the software framework sqpfiltertoolbox. The article focused on the solution of the linear systems of equations, the so-called KKT systems, which arise in each QP iteration step. Due to the characteristic structure of a discretised optimal control problem with differential equations as constraints, these LES are large and sparse, which calls for solvers that exploit their structure for efficiency.
Two solution techniques were implemented and examined in the context of this study. Method (A) relies on the direct solution of the KKT systems, where both the Hessian of the Lagrangian and the Jacobian of the constraints were derived analytically. The resulting KKT systems were solved with MA57, a solver for symmetric large and sparse systems. Both primal and dual regularisation techniques problem data: FEASTOL = 1.00E-008 α 0 = 1.00E-001 OPTTOL = 1.00E-008 α 1 = 0 µ = 1.00E-012 α 2 = 1.00E-007 threshold = 1.00E-002 α 3 = 1.00E+000 initialisation by collocation α 4 = 1.00E-007 c 0 = 1.  Table 2. Sequence of problems with gradually refined discretisations solved with sqpfiltertoolbox Method (A) and SNOPT. SQPIT states the number of SQP iterations.
were implemented that ensure the convergence of the method for a variety of initial guesses. In Method (B) the solution of the KKT system was split up in the solution of two smaller dimensional linear systems via block-elimination. The exact Hessian was replaced by a positive-definite Limited-Memory-BFGS approximation and the resulting linear systems were solved with the iterative conjugate gradient method. An efficient preconditioning method that takes advantage of the procedure of the QP active-set strategy was developed. Furthermore, several restarting and scaling techniques for the Limited-Memory-BFGS matrix representation were implemented and tested, which speed up the overall solution process significantly.
Numerical experiments showed the clear superiority of Method (A) in many respects. Not only did Method (A) solve the test problems about a fifty times faster than Method (B), but also found solutions with better accuracy. In some cases, Method (B) struggled to find a feasible point and numerical difficulties were encountered for certain initial guesses and badly scaled objective functions. Nevertheless, Method (B) has the advantage of not being dependent on the availability of exact second derivatives. If the considered model of the mechanical system were extended to three dimensions and the Saint-Venant equations to two dimensions, the analytical derivation of the exact derivatives might not be desirable. Consequently, Method (B) is still relevant for applications where the exact Hessian is not available.
Method (A) outperforms the FDTO approach described in [7], yielding up to several thousand SQP iterations and computing times of about a day for M = 20. While the FOTD approach in [10] exhibits computing times of several hours for the same grid, it is limited to fixed terminal times. Method (A) exhibits computing times of a couple of minutes for the same grid and requires only up to 5 SQP iterations.
For further improvement of the overall solution process with Method (A), the efficient determination of the active set, which consumes a significant share of the computational effort, could serve as an approach. As proposed, e.g. in [1,Ch. 2.3] or [8], a sequence of similar LES, as they arise in the QP iterations, can be solved efficiently by means of the Schur-complement method. As a further possible improvement of our method, we could start with BFGS updates of the identity matrix as approximation to the Hessian, guaranteeing a positive definite matrix, and once the iterate is close to the solution, we switch to the exact Hessian. One way of reducing the number of state variables is to use a non-equidistant coarse grid, with a refinement only at the left and at the right end of the container.