IMPLICIT PARAMETRIZATIONS AND APPLICATIONS IN OPTIMIZATION AND CONTROL

. We discuss necessary conditions (with less Lagrange multipliers), perturbations and general algorithms in non convex optimization problems. Optimal control problems with mixed constraints, governed by ordinary dif-ferential equations, are also studied in this context. Our treatment is based on a recent approach to implicit systems, constructing parametrizations of the corresponding manifold, via iterated Hamiltonian equations.

1. Introduction. The paper discusses in some detail nonlinear programming problems, starting with the classical case of equality constraints, up to general problems involving inequality constraints and abstract constraints as well. We point out optimality conditions with no or less Lagrange multipliers and algorithms of gradient type or of a new type. This last case is developed for general optimization problems, under weak assumptions. An essential ingredient is the possibility to compute efficiently general nonlinear "projection" operators, associated to the equality constraints, and their discretization.
Our approach applies a modification of the standard dimension reduction method, based on the implicit parametrization technique [29] for general nonlinear systems in finite dimensional spaces. An important part is played by iterated systems of ordinary differential equations of Hamiltonian type. Although just continuity is valid in their right-hand side, uniqueness and strong regularity properties of their solution can be also obtained, due to the special Hamiltonian structure.
We also report on applications of similar ideas to optimal control problems governed by ordinary differential systems and involving general mixed constraints. The notion of admissible set of initial points has a key role. The formulation of the problem and certain techniques are inspired by [26], where optimality conditions are investigated. Our analysis is devoted to a new algorithm, in this general setting. We underline that all the transformations and procedures that we use are easy to implement and need just standard routines from MatLab. Implicit parametrizations give a generalization of the implicit functions theorem in arbitrary finite dimensional spaces and have been recently developed in [29], [20], [21], [22]. Obtaining parametrizations is advantageous since they may provide a more complete description of the corresponding manifold and the approach has a useful constructive character. We give a brief review in the next Section 2, devoted to preliminaries. As a first short application, in the final part of Section 2, an example of perturbations is indicated, relevant in shape optimization and in free boundary problems. Section 3 is devoted to the study of general nonlinear programming problems and also provides numerical examples together with comparisons with the relaxation approach [28] or the fmincon routines in MatLab. Finally, the last section is devoted to optimal control problems with mixed constraints, governed by ordinary differential equations and includes the new algorithm and an academic example. The efficiency of the proposed methods is apparent.
We underline that our approach allows the use of maximal solutions in the ordinary differential systems defined in Section 2. In fact, in examples from Section 3 and from [20], or in a recent result from [33], these solutions are even global (periodic) and this ensures a global-like search in Alg. 2 and in Alg. 3. In general, under our approach, it is easy to enlarge the search region, in the absence of critical points, simply by increasing the intervals where the parametrization from Section 2 is defined. For instance, this plays an essential role in the comparison with [28]. See Remark 2 as well.
In this sense, the methods introduced here offer more information than usual local optimization algorithms.
Another important point is that Alg. 2 and Alg. 3 generate the discrete admissible sets in two steps. First the discrete sets corresponding to the equality constraints is constructed via the implicit parametrization method from Section 2. The other constraints are just checked on these discrete sets and the infeasible points are eliminated. This allows very weak assumptions on the inequality and the abstract constraints, on the cost functional. The minimization is also performed directly and several solutions may be obtained simultaneously, in case the minimum point is not unique.
In this section, we discuss the system (1) under the classical independence assumption, following [29]. To fix ideas, we assume The hypothesis (2) can be dropped by using the notion of generalized solution of (1), according to [29]. See as well [20], [30], [21] where several relevant examples are discussed as well.
Clearly, condition (2) remains valid on a neighbourhood V ∈ V(x 0 ), V ⊂ Ω, under the C 1 (Ω) assumption on F j (·), j = 1, l, and we denote by A(x), x ∈ V , the corresponding nonsingular l × l matrix from (2). Without loss of generality, A(x) may be assumed to be in the left upper corner of the Jacobian matrix.
We introduce on V the undetermined linear systems of equations with unknowns (3) We shall use d − l solutions of (3) obtained by fixing successively the last d − l components of the vector v(x) ∈ R d to be the rows of the identity matrix in R d−l multiplied by ∆(x) = detA(x). Then, the first l components are uniquely determined, by inverting A(x), due to (2). In this way, the obtained d − l solutions of (3), denoted by v 1 (x), . . ., v d−l (x) ∈ R d , are linearly independent, for any x ∈ V . They give a basis in the tangent space to the manifold defined by (1). Moreover, these vector fields are continuous in V as ∇F j (·) are continuous in V and the Cramer's rule ensures the continuity of the solution for linear systems with respect to the coefficients. Other choices of solutions for (3), useful in this section, are possible (see Theorem 2.5).
We introduce now d − l nonlinear systems of first order partial differential equations associated to the vector fields (v j (x)) j=1,d−l , x ∈ V ⊂ Ω. Furthermore, we denote the sequence of independent variables by t 1 , t 2 , . . . , t d−l .
These systems have a nonstandard (iterated) character in the sense that the solution of one of them is used as initial condition in the next one. Consequently, the independent variables in the "previous" systems enter just as parameters in the next system, via the initial conditions. Due to their simple structure (one derivative in each equation), we stress that each subsystem (4), (5),..., (6), may be interpreted as an ordinary differential system in V ⊂ R d (with parameters in (5),..., (6)), although partial differential notations are used: . Here, the notations I 1 , I 2 (t 1 ), . . . ., I d−l (t 1 , . . . , t d−l−1 ) are the d − l local existence intervals (for each subsystem), containing 0 in interior and depending, in principle, on the "previous" parameters t 1 , . . . , t d−l−1 . The existence of the solutions y 1 , y 2 , . . . , y d−l follows by the Peano theorem due to the continuity of the vector fields (v j ) j=1,d−l on V . The Peano theorem also gives an evaluation of the intervals I j .
As a simple example for the system (4)-(6), we consider the case d = 3, l = 1 and the equation f (x, y, z) = 0 instead of (1). The condition (2) is assumed in the form f x (x 0 , y 0 , z 0 ) = 0, without loss of generality. Then, the system (4)-(6) has just two subsystems of dimension three (with obvious notations for the derivatives): The right-hand side of these two subsystems satisfy (3) and the Hamiltonian structure is clear.
We recall some basic properties following [29].
has a unique solution.
Remark 1. Note that for systems associated to divergence free fields, the uniqueness results of [9] are valid under certain Sobolev type regularity conditions. However, under our hypotheses, we have just continuity in the right-hand side of the differential system (4)-(6) and [9] cannot be applied. For related results, see [4], [35].
b) The solutions of the systems (4) -(6) are of class C 1 in any existence point and we have: Under the hypothesis (2), the local solution of (1) is a d − l dimensional manifold around x 0 . We expect that Theorem 2.4. If F k ∈ C 1 (Ω), k = 1, l, with the independence property (2), and the I j are sufficiently small, j = 1, d − l, then the mapping is regular and one-to-one on its image.
We consider now another solution choice in (3). We shall use d−l solutions of (3) obtained by fixing the last d − l components of the vector v(x) ∈ R d to be the rows of the identity matrix in R d−l . The next result shows that we construct exactly the solution of the classical implicit functions theorem, which follows as a special case of our approach.
Remark 2. We underline that, although Theorem 2.5 provides the classical solution of the implicit functions theorem, the constructed parametrization may be more advantageous in applications since it offers a more complete description of the corresponding manifold by removing the condition to use functions. For instance, a torus in R 3 cannot be described via one function, but one parametrization may perform this. One can use maximal solutions of (4) - (6) and, in many examples, the (local) maximal solution from Theorem 2.4 may give even a global description of the manifold, [20], [30]. In applications, the choice of other solutions of (3) is also possible and of interest [21], in order to improve the description of the manifold.
Remark 3. Beside the existence statement, Theorem 2.5 gives a construction recipe for the implicit functions solution and an evaluation of its existence neighborhood (via Theorem 2.3), in the system (1), [29]. This may be compared with [5], [24] where other types of arguments are used.
Consider finally, as an example, the special case of perturbations of the form where h j ∈ C 2 (Ω), h j (x 0 ) = 0. If, moreover, l = 1 and the equation F 1 (x 1 , . . . , x d ) = 0, F 1 ∈ C 2 (Ω), together with the associated initial condition, represents the boundary of a subdomain in Ω (where F 1 < 0, for instance) then the geometric perturbation defined by (10) is called functional variation, [19], and may be very complex, including topological and boundary perturbations of the initial domain [18], [12], [27]. Computing the equation in variations as in [30], the perturbations (10) generate a directional derivative in the implicit system (1). Consequently, by the above geometric interpretation, we may define, for l = 1, a new type of geometric directional derivative of domains. This is more general than the speed method or the topological derivatives [18] and has applications in shape optimization, fixed domain methods, see [19], [31], in free boundary problems [13]. In the recent paper [33], this technique is exploited in combination with a penalization method.
3. Reduced gradients in nonlinear programming. In constrained optimization, projected gradient methods are a classical tool, but their application may be hindered by the difficulty to effectively compute projections on the admissible set, Ciarlet [6]. Based on the results from the previous section, we use here the dimension reduction approach to eliminate, totally or partially, the constraints (and the corresponding Lagrange multipliers). The optimality conditions are in a more effective form, due to the decreasing of the dimension. Local and global-like algorithms and numerical examples are also discussed, under weak assumptions. The elimination of certain unknowns has advantages at the computational level.
In the recent papers [28], [17], dimensional reduction is obtained via new relaxation procedures associated to implicit functions. Our approach is certainly different and ensures good numerical results. In the case of polynomial and semialgebraic optimization, [14] Thm.6.5, Thm.7.5, in the setting of global optimization, a stronger constraint qualification is used.
We consider first the classical minimization problem with equality constraints: (1). It is known that by Theorem 2.5 we can replace it (around x 0 ) by the unconstrained problem for (t 1 , t 2 , . . are the components of y d−l , the solution of (4)-(6), corresponding to this case. This methodology will be extended later to the general case of implicit parametrizations from Theorem 2.4.
By Theorem 2.5, Theorem 2.3 and the chain rule, one easily obtains the (known) first order optimality conditions in the Fermat form, involving the tangential gradient of g (due to (3), the vectors v j in (11) give a basis in the tangent space to the equality constraints manifold M ): Proposition 1. Assume that g and F i , i = 1, l, are in C 1 (R d ) and (2) holds. If x 0 is a local solution of (P ), then we have: In fact, this is equivalent to the classical Lagrange multipliers rule, since due to (11), ∇g(x 0 ) is in the normal space to the manifold M , which has the basis given by ∇F i (x 0 ), i = 1, l.
In this non convex setting we assume here that g ∈ C 1 (Ω) is bounded from below and we introduce the following algorithm of projected gradient type, based on the use of the tangential gradient, given in (11): Algorithm 1 1) choose n = 0, δ > 0 (a tolerance parameter) and let t n = (t n 1 , . . . , t n d−l ) be such that y d−l (t n 1 , . . . , t n d−l ) = x n in (4)-(6). 2) compute ρ n+1 ∈ [0, α n ] via the line search: 3) set: The constraints are as in (1) with hypothesis (2) satisfied in all the iterations x n . We denote by G(t) = g(y d−l (t)), defined in a neighborhood of the origin in R d−l and of class C 1 due to (4)-(6) and Theorem 2.3. The sequence {g(x n ) = G(t n )} is non increasing and convergent in this general setting, ensuring the convergence of the algorithm. The sequence {x n } is bounded. Moreover, we have ∇G(t n ) = [∇g(x n ).v j (x n )] j=1,d−l by Theorem 2.3 and the Algorithm 1 is in fact a transcription of the classical gradient method for the unconstrained problem (P 1 ). One can discuss other (very rich) variants of such local algorithms with their convergence (to stationary points, in general), under supplementary hypotheses if necessary, Bertsekas [2], Patriksson [23]. The new point in Algorithm 1 is that one can effectively and efficiently compute the "projection" y d−l via (4)-(6), by using standard routines in MatLab.
Remark 4. The algorithm works practically in V , where the system (4)-(6) is defined and the parameter α n in the line search has to be chosen "small", such that we remain in V and the system (4)-(6) can be solved around t n , in Step 2). In Step 3) we perform the "projection" on the constraints manifold M ⊂ Ω. The points x n generated by this algorithm are always admissible for (P ). No convexity properties are assumed. The definition of (P 1 ) uses the implicit function Theorem 2.5 which is appropriate for optimality conditions, while for the Algorithm 1 the general implicit parametrization method has to be taken into account. The same is valid for the subsequent problem (Q 1 ) and the related results.
We discuss now the general case of both equality and inequality constraints: where g, F i , G j are in C 1 (R d ). The Mangasarian-Fromovitz condition in this case consists of (2) and there is e ∈ R d such that with I(x 0 ) being the set of indices of active inequality constraints in x 0 . See [3], $ 2.3.4 or [7], $ 6 for excellent presentations. The necessary and sufficient metric regularity condition from [34] cannot be used here due to the lack of convexity. The reduced problem is again obtained via Theorem 2.5: (Q 1 ) M in{g(y 1 d−l , y 2 d−l , . . . , y l d−l , t 1 + x 0 l+1 , t 2 + x 0 l+2 , . . . , t d−l + x 0 d )}, subject to the constraints (12), in the "reduced" form: Lemma 3.1. The minimization problem (Q 1 ) satisfies the Mangasarian-Fromovitz condition in the origin of R d−l .
Proof. By the first part in (13), we see that e is orthogonal to ∇F i (x 0 ), i = 1, l and belongs to the tangent space to the manifold (1) since ∇F i (x 0 ), i = 1, l is a basis in the normal space to the manifold given (1) If x 0 is a local solution of (Q), by Lemma 3.1, one can apply the classical KKT theorem, [6], to the problem (Q 1 ) in the origin of R d−l that becomes a local solution for (Q 1 ). Using again the derivation formula, we get: Theorem 3.2. Let x 0 be a local minimum for (Q). Then, there are β j ≥ 0, j = 1, m such that Remark 5. This is a simplified version of the KKT conditions since it eliminates the Lagrange multipliers for the equality constraints.
It is possible, at least in principle, to eliminate completely the Lagrange multipliers: if x 0 is a local solution of problem (Q), then one can remove the inactive inequality constraints at x 0 . This is a consequence of the remark that the inequality constraints that are not active at x 0 define a neighborhood of x 0 . The minimum property of x 0 is preserved in this neighborhood, just under the equality constraints supplemented by the active constraints rewritten as equalities. Under the independence condition for all these constraints, one can write optimality conditions as in the Proposition 1.
We relax now the hypotheses in the problem (Q) and we describe a direct globallike minimization algorithm. It looks for the solution in a maximal neighborhood of x 0 , corresponding to the maximal solutions of the subsystems in (4) -(6) (the maximal existence intervals may depend on the respective initial conditions). See Remark 2 and [20], [30].
We assume in the sequel less regularity: g and G j , j = 1, m, are just in C(R d ) and F i , i = 1, l, are in C 1 (Ω) and satisfy condition (2) in x 0 . This last condition can be removed in fact, working with generalized solutions, according to the subsequent Remark 7. Notice that x 0 is here just an admissible point for (Q) and not a local minimum as in Theorem 3.2. We can also add the abstract constraint x ∈ D, some given closed subset, with nonvoid interior in R d , such that x 0 ∈ D.
The main observation is that in solving numerically (4) -(6), now using the variant corresponding to Theorem 2.4, we obtain automatically a discretization of the manifold M defined by (1), in a maximal neighborhood of x 0 , as explained above. Let us denote by n the discretization parameter. For instance, 1/n can characterize the size of the discretization for the parameters t 1 , . . . , t d−l , or n may be linked to the length of the intervals where the maximal solution is computed, or both, etc. As n → ∞, the union of these discretized points is dense in M .
We denote by C n the set of all these discretized points that, moreover, satisfy all the constraints (the inequality and the other restrictions have to be just checked). They give the approximating admissible set and we formulate the algorithm: Algorithm 2 1) choose n = 1, the discretization step 1/n and the solution intervals I n 1 , . . . , I n d−l , the tolerance parameter δ.
2) compute the discrete set C n of admissible points, starting from x 0 , via (4) -(6) and by testing the validity of (12) and D.
3) find in C n the approximating minimum of (Q), denoted by x n . 4) test if the solution is satisfactory by |g(x n ) − g(x n−1 | ≤ δ. 5) If YES, then STOP. If NO, then n := n + 1 and GO TO step 1).
In step 4) other tests (on the solutions, on the gradients, etc.) may be used. The approximating minimum x n ∈ C n may be not unique and the Algorithm 2 finds all of them. One can adapt the convergence test to such situations. Proof. This is a consequence of the continuity of g and the density of C n in the admissible set, according to Theorem 2.4.

Remark 6.
In general, the set defined by the equality constraints may have several connected components. See Example 2. Starting from x 0 , Algorithm 2 will minimize just on the component that contains x 0 . Initial guesses from all the admissible components are necessary if we want to minimize on all of them. (2) is not fulfilled, then one can use the generalized solution of (1) as mentioned in Section 2, since the Hausdorff-Pompeiu distance (involved in the definition of generalized solutions, [29]) ensures the convergence of these approximating points. This extension to the critical case will be investigated in a subsequent work.

Remark 7. If condition
We indicate now some illustrative numerical examples and compare our results with other approximation methods, from MatLab or [28]. Example 1. We consider first a minimization problem on the torus in R 3 , with radii 2 respectively 1, defined implicitly: min{xyz} F (x, y, z) = (x 2 + y 2 + z 2 + 3) 2 − 16(x 2 + y 2 ) = 0 To generate the discrete admissible set, we use the system (7) Using other starting points like (1, 0, 0) or (3, 0, 0) is not allowed by MatLab that finds no other admissible solutions in these cases, while our approach works.
Example 2. Now, we consider two equality restrictions, that represent a torus intersected with a paraboloid, see Fig.1 and Fig.2.
The numerical results and a comparison with MatLab routine fmincon is indicated below: In the second case fmincon stops after 42 iterations with the message that constraints are not satisfied within the tolerance. In the first case, fmincon finds basically the same values. Consequently, the true solution (obtained in the second case) is not found by MatLab.
Remark 8. We indicate no example involving inequality constraints and/or abstract constraints since they involve no supplementary mathematical developments. By step 2 in Alg. 2 such constraints have just to be checked on the set of discrete admissible points for the equality constraints, obtained via (4)- (6). Notice that in the above examples we get the global solutions.
Remark 9. In [28], an example in R 6 , with three stiff equality constraints, is discussed. Reworking [28] since our algorithm needs no bounds on the independent variables and extends the search domain, which is an advantage from the point of view of global optimization. The necessary working time, on a medium performance laptop, is several minutes. More details on the experiment and some high dimensional numerical examples are indicated in [22]. 4. Applications in control. We investigate first the following optimal control problem with equality mixed constraints: h(x(t), u(t)) = 0, t ∈ [0, 1].
Above, l : are given mappings and x : [0, 1] → R d is the state variable, u : [0, 1] → R m is the control unknown. Initial conditions may be added to (16), but it is not necessary now. For simplicity, we assume here that for any u ∈ C(0, 1; R m )], (16) has global solutions in C 1 (0, 1; R d ), in order that (15) -(17) make sense. Such global existence properties have to be checked in each application. It is possible to work with Carathéodory solutions and weaker regularity assumptions, but this would introduce more technicalities.
Systems of this type have been recently studied in a more general implicit form by Maria do Rosario de Pinho [25], [26], Clarke and de Pinho [8], from the point of view of the maximum principle and under weak differentiability assumptions. We indicate here a new algorithm. The formulation (15) -(17) is of Mayer type and the conditions (16), (17) give a DAE system. A short announcement of some of the ideas discussed in this section can be found in [32].
If inequality constraints are added to (15) - (17), the classical procedure is to introduce supplementary control variables in order to transform them in equality constraints. Our procedure is similar to the previous section, without increasing the dimension of the control space and applies as well to abstract constraints, separated constraints.
As general assumptions, we shall require l continuous, f continuous and locally Lipschitzian in (x, u), the mapping h is of class C 1 , with locally Lipschitzian gradient and there is a point (x 0 , u 0 ) ∈ R d × R m such that h(x 0 , u 0 ) = 0 and ∇h(x 0 , u 0 ) of maximal rank. (18) Notice that, in this setting, some of the constraints (17) may become just state constraints or control constraints and the remaining constraints may be of mixed type. Under hypothesis (18), one can use Thm. 2.4 to obtain a constructive parametric (maximal) description of the admissible manifold for (17), denoted by M ⊂ R d ×R m . As explained in Rem. 4, this is the search region used below. It may be global if the solution of (4) -(6) is so. M is not the admissible set of pairs for the control problem (15) - (17). However, any admissible state-control trajectory (including their initial conditions) should satisfy In the sequel, we shall make the supplementary hypothesis that the admissible controls satisfy u(t) ∈ C 1 (0, 1; R m ). Consequently, (17), (19) make sense. This will be justified in the next proposition and the subsequent remark. In the standard terminology for DAE system, relations (16), (17) are semi-explicit of index one. Taking into account (16), (19) and differentiating, we get: Under the regularity assumption for the admissible pairs, we get that they provide global solutions in [0, 1] for the system (16), (20).
The important observation is that any point in M provides a consistent initial condition for the differential system (16), (20), which solves a key difficulty for the DAE equations.
Remark 10. In fact, we have shown that the differential system (16), (20), under the indicated assumptions, generates admissible pairs in C 1 ((0, 1); R d × R m ), starting from initial conditions in M . And it is meaningful to minimize on this class of admissible state-control pairs and to get at least suboptimal pairs. Moreover, existing regularity results for the optimal pairs, Fleming and Rishel [10], Lee and Markus [15], J.L. Lions [16] allow, in many cases, to restrict the search for admissible controls by such regularity conditions and the problem governed by (16), (20) becomes equivalent with the original minimization problems.
We notice that the set of discretization points in M generated by (4) -(6), applied to (17) and denoted by ∪ n∈N M n , is dense in M when the discretization of I 1 × I 2 × . . . × I d−l is finer and finer.

5.
Conclusion. Based on the implicit parametrization method developed in the last years by the author, we analyze constrained nonlinear programming and optimal control problems. We discuss theoretical questions like optimality conditions involving less or no multipliers, for general optimization problems in finite dimensional spaces. We also study new algorithms in optimization and control, under weak assumptions on the data. Some academic numerical examples are indicated and comparisons with other approaches from MatLab or from the recent literature, are performed.