A superlinearly convergent hybrid algorithm for solving nonlinear programming

In this paper, a superlinearly convergent hybrid algorithm is proposed for solving nonlinear programming. First of all, an improved direction is obtained by a convex combination of the solution of an always feasible quadratic programming (QP) subproblem and a mere feasible direction, which is generated by a reduced system of linear equations (SLE). The coefficient matrix of SLE only consists of the constraints and their gradients corresponding to the working set. Then, the second-order correction direction is obtained by solving another reduced SLE to overcome the Maratos effect and obtain the superlinear convergence. In particular, the convergence rate is proved to be locally one-step superlinear under a condition weaker than the strong second-order sufficient condition and without the strict complementarity. Moreover, the line search in our algorithm can effectively combine the initialization processes with the optimization processes, and the line search conditions are weaker than the previous work. Finally, some numerical results are reported.

1. Introduction. In this paper, we consider the constrained optimization problem to minimize an objective function f 0 (x) with inequality constraints Sequential quadratic programming (SQP) algorithms are widely acknowledged to be among the most successful algorithms available for solving problem (NIO), moreover, a great deal of effort has been devoted to developing efficient SQP algorithms [5,8,17] during the past several decades. Given an estimate x of the solution for problem (NIO) and a symmetric matrix H that approximates the Hessian of the Lagrangian function L(x, λ) := f 0 (x) + i∈I λ i f i (x), where λ is a vector of nonnegative Lagrange multiplier estimates, the standard SQP search direction is obtained by solving QP subproblem as follows Under some suitable conditions, SQP algorithms are known to be globally and locally superlinearly convergent.
Note that a limitation in the application of SQP methods to problem (NIO) is the possible inconsistency of the constraints in subproblem (QP1). To address this difficulty, many techniques have been proposed [1,22]. Jian [12] presented a new regularization technique for dealing with the inconsistent QP subproblem in SQP methods, the modified QP subproblem is as follows  [11].
In [13], Jian, et.al., present a new SQP algorithm for solving problem (NIO) by using of subproblem (QP2). The correctional directions are obtained by explicit formulas in which the generalized projection technique is applied. With the help of curve search technique and some suitable conditions, the algorithm is proven to be global and superlinear convergent. Recently, Jian, et.al., proposed a new algorithm, namely method of quasi-strongly sub-feasible directions [14], for solving problem (NIO). This new algorithm can ensure that the feasibility of constraints is nondecreasing, i.e., I − (x k ) ⊆ I − (x k+1 ), and can also ensure that the iteration points always enter into the feasible set Θ after a finite number of iterations. From the above relationship for I − (x), it is obviously that |I − (x k )| ≤ |I − (x k+1 )|, where |I − (x)| means that the number of elements in set I − (x). Moreover, the numerical results in [14] show that the calculational performance of this algorithm is more promising.
In this paper, motivated by the algorithm ideas in [13] and the method of quasistrongly sub-feasible directions in [14], a new algorithm is proposed for solving problem (NIO). The main search direction is obtained by a convex combination of d 0 and a mere feasible direction d 2 which are generated by solving a QP subproblem and a SLE, respectively. In order to overcome the Maratos effect and obtain the superlinear convergence, the second-order correction direction is obtained by solving another SLE. To reducing the possible computational cost, the working set is introduced. Moreover, the coefficient matrix of SLEs only consists of the constraints and their gradients corresponding to the working set. Finally, the convergence rate is proved to be locally one-step superlinear under a condition weaker than the strong second-order sufficient condition and without the strict complementarity. Some elementary numerical results are reported to show that the proposed algorithm is promising.
The paper is organized into six sections. In Section 2 we present our algorithm and its properties. In Sections 3 and 4 we show that this algorithm possesses global and superlinear convergence, respectively. In Section 5 we present the results of some elementary numerical experiments. Finally, Section 6 contains some conclusions for the proposed algorithm.
For notational simplicity, for a point x ∈ R n and a set J ⊆ I, denotē 2. Description of algorithm. In this section, based on a multiplier function defined below, we first introduce the working set, and then present a new algorithm with arbitrary initial point for solving problem (NIO). First of all, the following basic assumption for problem (NIO) is assumed to be satisfied throughout the paper.
In order to define the working set, we make use of the following function ϕ : which is named as multiplier function when λ is the corresponding multiplier vector, and where the function φ : R n+m → R n+m [4] is defined by It is easy to verify that (x, λ) is the KKT pair of problem (NIO) if and only if ϕ(x, λ) = 0. Moreover, similar to the proof of Theorem 3.7 in [4], we have ϕ(x, λ) is an accurate identification function for I(x), where (x, λ) is the KKT pair of problem (NIO). The corresponding multiplier vector λ k associated with iteration point x k is defined as follows λ 0 := z 0 ; λ k := λ k−1 , k > 0, where z 0 and λ k−1 are quoted in the following Algorithm 1. Then, the working set by use of (1) and (2) is defined as follows Jian [11] showed that I k ≡ I(x * ) when the iteration point (x k , λ k ) is sufficiently close to the KKT pair (x * , λ * ) of problem (NIO) under some suitable conditions and the Mangasarian-Fromovitz constraint qualification holds at (x * , λ * ). In Lemma 4.4(i), we will show that I k yielded by (3) can also exact identification of I(x * ) when the iteration point is sufficiently close to x * . Now, we present our new algorithm for solving problem (NIO) as follows.
Data: x 0 ∈ R n , a symmetric positive definite matrix H 0 ∈ R n×n and z 0 with z 0 i ∈ (z min , z max ], i ∈ I. Set k := 0. Step 1. Compute ϕ(x k , λ k ) from (1) and (2), and then generate the working set I k by (3).
Step 2. Solve the following QP subproblem to get a unique solution d k 0 and the corresponding KKT multiplier vector λ I k , and let λ k = (λ I k , 0 I\I k ), where 0 denotes zero vector with appropriate dimension throughout the following sections. If (d k 0 , Υ(x k )) = (0, 0), then x k is a KKT point of problem (NIO) and stop, otherwise, go to Step 3.
Step 3. Solve the following SLE Then, let d k : are satisfied, where ε k > 0 if Υ(x k ) = 0, and ε k ≥ 0 if Υ(x k ) > 0, then let t k = t, and go to Step 7, otherwise, go to (b).
Step 5. Solve SLE to get a solution (d k 2 , h k 2 ), and compute β k ∈ (0, 1] by the following formula Step 6. Compute the steplength t k which is the first number t in the sequence {1, η, η 2 , . . .} that satisfies the following inequalities and let d k =d k .
Step 7. Compute a new symmetric positive definite matrix H k+1 by some suitable techniques, set x k+1 := x k + t k d k , k := k + 1, and go to Step 1.
be an optimal solution of subproblem (4), combining the fact that subproblem (4) always has a feasible solution d = 0, it holds that under the conditions of d k 0 = 0 and the positive definiteness of H k . Thus, d k 0 is a descent direction for problem (NIO).
(ii) Based on the structure of Algorithm 1, it is obviously that our algorithm is very different from the one of [13]. First of all, we make use of the working set I k to remove those constraints that are locally irrelevant, and thus the scale of subproblem (4) and SLEs (5) as well as (8) are largely decreased, accordingly, the cost of solving subproblem and SLEs could be reduced to some extent per each iteration. Secondly, instead of an explicit formula, a high-order direction is generated by solving a system of linear equations which also only includes the estimates of the active constraints and their gradients, and this is generally more efficient than computing an inverse matrix. Finally, instead of arc search, the line search used in our new algorithm can also well combine the initialization and optimization processes, and meanwhile the line search conditions of Algorithm 1 are weaken than that of conditions in [13].
(iii) In each iteration of Algorithm 1, if the condition (a) in Step 4 is satisfied for a suitable steplength t, then we do not need to perform Step 5 and Step 6, and can go to Step 7 directly. Thus, SLE (8) and β k as well as line search inequalities (11) do not need to be computed, the computational cost of per iteration could be reduced greatly. Moreover, we can also prove that the condition (a) in Step 4 is always hold for t = 1 after a finite number of iterations (see Theorem 3), and the proposed algorithm is to be shown superlinearly convergent with this condition (see Theorem 5).
(iv) The technique of convex combination of the form (10) was first introduced by Mayne and Polak [18] as "bending" technique to ensure the feasibility of the iteration points. Subsequently, Bonnans, et.al., propose a new feasible SQP algorithm, by modifying this convex combination technique, for solving problem (NIO) in [2]. Recently, Jian, et.al., present a strongly sub-feasible SQP algorithm for solving problem (NIO) [10] with the help of this convex combination formula.
(v) In Algorithm 1, we set ε k > 0 if Υ(x k ) = 0, and ε k ≥ 0 if Υ(x k ) > 0. It does not affect any of the analysis if ε k in the first inequality of (7) and (11) is always set to a positive constant. The goal is to allow ε k = 0 in the case Υ(x k ) > 0 to reduce (if possible) the computational cost of the line search and obtain a satisfactory numerical performance, since in this case the first inequality of (7) or (11) does not work.
The following lemmas shows some important properties of the proposed Algorithm 1. (5) is nonsingular, therefore, SLEs (5) and (8) have a unique solution, respectively;

Lemma 2.1. Suppose that Assumption 1 holds and matrix
Proof. (i) In view of the KKT conditions of subproblem (4) and (d k 0 , Υ(x k )) = (0, 0), it follows that this conclusion holds immediately.
(ii) The proof of this part is similar to that of Lemma 2.4 in [5], and is omitted here.
(iii) First, it holds that (9), (10) and (12). Then, combining the constraints of subproblem (5) with (8), it follows that (10) and for all i ∈ I(x k ), we have From Lemma 2.1(iii), we know thatd k is an improved direction in a sense. Therefore, if Algorithm 1 does not stop at Step 2, i.e., (d k 0 , Υ(x k )) = (0, 0), we are always able to obtain the next iteration point x k+1 from x k . According to the structure of Algorithm 1 and Lemma 2.1, the following lemma holds obviously.
Lemma 2.2. Suppose that Assumption 1 holds. Then, 3. Global convergence. In this section, we will establish the global convergence of Algorithm 1. If Algorithm 1 stops at x k , it follows that x k is a KKT point of problem (NIO) from Lemma 2.1(i) and Step 2 of Algorithm 1. Now, we assume that Algorithm 1 generates an infinite iteration sequence {x k }, and prove that any accumulation point x * of {x k } is a KKT point of problem (NIO) under some assumptions. For this purpose, the following assumption is necessary.
Assumption 2. The matrix sequence {H k } is uniformly positive definite, i.e., there exists two positive constants a ≤ b such that Denote the active set of subproblem (4) by Λ k := {i ∈ I k :f i (x k ) + g i (x k ) T d k 0 = 0}, and suppose that x * is a given accumulation point of {x k }. Then, in view of I + (x k ), I − (x k ), Λ k and I k are all subsets of the finite set I , we can assume that there exists an infinite index set K such that holds from Lemma 2.2(ii). The results established in the following lemma are important in the later discussions.
Lemma 3.1. Suppose that Assumptions 1 and 2 hold. Then Proof. (i) First of all, by lim k∈K x k = x * and the continuous of g 0 (x), there exists a constant ζ > 0 such that ||g 0 (x k )|| ≤ ζ for all k ∈ K. Then, from (13) and d = 0 is a feasible solution of subproblem (4), it follows that {d k 0 } K is bounded. (ii) Suppose by contradiction that there exists an infinite index set K ′ ⊆ K such that (14) holds and where From the definition of I(x * ) andĪ, it follows that I(x * ) ⊆Ī. Then, it is easy to get D * i ≥ 0, ∀i ∈Ī and D * i > 0, ∀i ∈Ī\I(x * ). Therefore, in view of Lemma 2.1(ii), we can get that M k is nonsingular, so ||M −1 k || → ||M −1 * ||, k ∈ K ′ , which contradicts with (15), thus the conclusion (ii) holds.
(iv) The proof of part (iv) is similar to Lemma 3.2 in [13] and is omitted here. Based on Lemma 3.1, the global convergence theorem of Algorithm 1 is presented as follows. Proof. By choosing an infinite index set K such that (14) holds, and denoting matrix A k = (g i (x k ), i ∈ Λ). In view of (x k , d k 0 , Υ(x k )) → (x * , 0, 0), k ∈ K, we can conclude that Λ ⊆ I(x * ) = {i ∈ I :f i (x * ) = 0} ⊆Ī, which together with Assumption 1 shows that A T k A k is nonsingular (see Remark 2) for k ∈ K large enough, since A k , from the KKT condition of subproblem (4), we have which further implies that Let the multiplier vector λ * = (λ * Λ , 0 I\Λ) ), then lim k∈K λ k = λ * . Passing to the limit k ∈ K and k → ∞ in (16), it follows that Remark 2. From the definitions of N k and D k in (6), it holds that the matrix (N T k N k + D k ) is nonsingular and positive definite under Assumption 1.

4.
Rate of convergence. In order to show that the entire sequence of {x k } converges to a KKT point x * of problem (NIO), the following additional assumption is necessary.
Remark 3. The definition of quasi-regular point comes from the Definition 4.3 in [23], which includes the detailed analysis about it. In general, the local superlinear convergence of SQP or SSLE (i.e., Sequential system of linear equations) algorithms is established under the assumptions that both the strict complementary and the second-order sufficient condition (shorted by SOSC) hold, in which case, SOSC is equivalent to the stronger second-order sufficient condition (shorted by SSOSC). In this paper, we do not need to assume that the strict complementary condition holds, what's more, we could get the relationship among Assumption 3(ii), SOSC and SSOSC in terms of the definition of quasi-regular point as follows The main purpose of Assumption 3(ii) is to ensure that (x * , λ * ) is an isolated KKT point of problem (NIO). In order to show that the entire sequence {x k } converges to x * , the following important result [19] is cited here. Lemma 4.1. Assume that ω * ∈ R n is an isolated accumulation point of the sequence {ω k } ⊆ R n , and for every subsequence {ω k } K converging to ω * , there exists an infinite subset K ′ ⊆ K such that { ω k+1 −ω k } K ′ → 0. Then the entire sequence {ω k } converges to ω * . Proof. (i) According to Theorem 3.2 and the uniqueness of the KKT multiplier, it is easy to get (x k , λ k ) → (x * , λ * ), k ∈ K. From Assumption 3(ii) and Theorem 4.4 in [23], we have ϕ(x, λ) is the active-set identification function. So, it holds that I k = I(x * ) for all k ∈ K large enough .
(ii) In view of Lemma 3.1 (ii) and (iv), part (ii) holds immediately. Togethering with the lemmas above, the strong convergence of Algorithm 1 is presented as follows.
Theorem 4.3. Suppose that Assumptions 1, 2 and 3 hold. Then the entire sequence {x k } generated by Algorithm 1 converges to x * , i.e, Algorithm 1 is strongly convergent.
Proof. From Assumption 3(ii) and Theorem 4.4 [23], it follows that x * is an isolated accumulation point of {x k }. Then combining with Lemma 4.2(ii) and which further implies that the entire sequence {x k } converges to x * in view of Proof. In view of (5), (6), τ ∈ (2, 3) and Lemma 3.1(ii) as well as Lemma 4.5, we have by Taylor expansion , which implies that Lemma 4.5 holds immediately.
To ensure that the steplength t k ≡ 1 for k large enough without the strict complementary assumption, an additional assumption as follows is necessary.
Assumption 4. Suppose that the KKT pair (x * , λ * ) and the matrix H k satisfy Proof. First of all, we prove that the second and the last inequalities of (7) hold for t = 1 and k large enough.

CHUANHAO GUO, ERFANG SHAN AND WENLI YAN
For i ∈ I(x * ), since lim k→+∞ f i (x k ) = f i (x * ) = 0 and lim k→+∞ Υ(x k ) = 0 it holds that from (10) . On one hand, from (5) and Lemma 4.5, we have On the other hand, it holds from Lemma 4.5 and (17) that and In view of (18) and τ ∈ (2, 3), it follows that f i (x k + d k ) ≤ 0, for i ∈ I(x * ) ∩ I − and k large enough, i.e., the third and the second inequalities of (7) for i ∈ I(x * ) ∩ I − hold.
Summarizing the analysis above, the second and the last inequalities of (7) are satisfied for t = 1 and k large enough.
Then we show that the first inequality of (7) holds for t = 1 and k large enough. By Taylor expansion and Lemma 4.5, we have Again from the KKT conditions of subproblem (4) and Lemma 4.5, it follows that and hold. For i ∈ Λ k ⊆ I(x * ), it holds thatf i (x k ) + ∇f i (x k ) T d k 0 = 0, which together with (18) and (19) as well as Υ(x k ) = o(Υ(x k ) σ ) implies that Again from Lemma 4.5 and Υ(x k ) = o(Υ(x k ) σ ), we have Combining (24) with (25), it follows that Thus, substituting (26) into (23), it holds that Now, substituting (27) and (28) into (21), we have which further show that from Assumptions 2, 4, λ k if i (x k ) ≤ 0 and α ∈ (0, 1 2 ) as well as θ < σ holds for k large enough. Hence, the first inequality of (7) holds for t = 1 and k large enough. The whole proof is completed. At the end of this section, based on Theorem 4.6, we can easily obtain that the steplength t k ≡ 1 and x k+1 ∈ Θ for k large enough, respectively. Then according to Theorem 4.7 and d k 1 = o( d k 0 ), similar to the proof of Theorem 4.5 in [13], we can prove the rate of convergence of the Algorithm 1 as follows.

Numerical experiments.
In this section, some preliminary numerical results are reported, and the computing results show that Algorithm 1 (shorted by ALGO 1) is promising and effective. The algorithm is implemented by using MATLAB R2008a on Windows XP platform, and on a PC with 2.53GHz CPU. For each test problem, we choose H 0 = E n (i.e., the identity matrix) as the initial guess of the Lagrangian Hessian, and the approximation Hessian matrix H k is updated by the BFGS formula described in [20].
During the numerical experiments, the parameters are selected as follows: The algorithm stops if and only if one of the following termination criterion is satisfied: (ii) d k 0 ≤ 1.0e − 6 and Υ(x) = 0, where ǫ is a given small constant. The corresponding results are reported by using the performance profiles as described in Dolan and Moré's paper [3]. Additionally, according ro Remark 1(v), we know that the value of parameter ε k may influence the computational cost of ALGO 1, especially for some problems with complicated objective functions. In order to more intuitive illustrate this conclusion, we report and compare the numerical results of ALGO 1 in two cases First of all, in order to show the calculation results of ALGO 1 under above two cases, some problems [9,21] which are listed in the following Table 1 are tested. The corresponding parameters for the test problems are show in Table 1, where "Prob" denotes the number of the test problem in [9,21], n/m is displayed as the number of variables of the problem and the number of inequality constraints, respectively. According to the value of Υ(x 0 ), we note that the all initial iteration points are infeasible for the corresponding problems. The corresponding numerical results are given in Figure 1. The profile of figure is based on the number of objective function evaluations. The ρ(t) is the (cumulative) distribution function for the performance ration within a factor t ∈ R. The value of ρ(t) is the probability that the solver will over the rest of the solvers. The comparative performance results are shown in Figure 1. In view of the results in Figure 1, it is obviously that the performance of ALGO 1 under Case II is always better than that of ALGO 1 under Case I, i.e., the number of objective function evaluations under Case II is always less than that under Case I. For instance, the number of objective function evaluations of problem 043 under Case I is 108, but under Case II the number of objective function evaluations of problem 043 is only 17, and so on. Moreover, we further note that the optimal solutions of each test problem are the same under two cases during the numerical experiments. The parameter ε k only influence the number of objective function evaluations and further influence the CPU time, which will affect the total computational cost of ALGO 1. Therefore, ALGO 1 has some potential value and advantage for solving some problems with more complicated objective functions.  Table 1 under Case I and Case II, respectively.
Note that the test problems in Table 1 are relatively small. In order to show the more clearly effectiveness of ALGO 1 for solving some large scale problems under Case II, the Svanberg [7] problem (in different dimensions and different infeasible initial points), and the series of problems S394 which have adjustable dimension are modified from the problem S394 in [21], are further tested. The parameters of the test problems are given in Table 2. The profile of figure is also based on the number of objective function evaluations, and the performance results are shown in Figure 2. In view of the results in Figure 2, it is obviously that the number of objective function evaluations under Case II is much less than the number of objective function evaluations under Case I. Moreover, the number of iterations and the optimal solutions for each test problem are also the same under two cases. Summarizing the results in Figure 1 and Figure 2, we can conclude that ALGO 1 with the parameter ε k selected as Case II is encouraging, and also has some great potential value and advantage for solving some practical problems with complicated objective functions.  Table 2 under Case I and Case II, respectively.
Then, for further showing the performance of our algorithm, we compare the number of iterations required by ALGO 1 under Case II to those required by Wang-Chen-He's algorithm [23] (shorted by ALGO-WCH) and algorithm in [10] (denoted by ALGO 2.1), and the corresponding results are shown in the following Figure 3. The optimality tolerance is the same as in [23] and [10], respectively. The left figure of Figure 3 gives the performance of ALGO 1 under Case II and ALGO-WCH, the test problems are the same as in [23]. For some large problems in Table 4 in [10], the corresponding performance of ALGO 1 under Case II compared to ALGO 2.1 are shown in the right figure of Figure 3. From the results in Figure 3, it is quite clear that the performance of ALGO 1-II is better than that of ALGO-WCH, ALGO 2.1, respectively, i.e., ALGO 1-II has the most wins compare with ALGO-WCH and ALGO 2.1. This further implies that our new Algorithm 1 is is much more promising and competitive with ALGO-WCH, ALGO 2.1, respectively.  Summarizing the above analysis of numerical results, we can conclude that our new Algorithm 1 is more promising and competitive with some algorithms. Furthermore, by using the selection way (29) of parameter ε k , we could reduce a certain amount of computation of Algorithm 1, especially for solving some problems which objective functions are complicated and initial iteration points are far from the feasible region of the problem. These also implies that Algorithm 1 has some potential advantage and value. 6. Conclusions. A one-step superlinearly convergent hybrid algorithm is proposed for solving nonlinear programming. First, an improved direction is obtained by a convex combination of the solution of an always feasible QP subproblem and a mere feasible direction. Second, the second-order correction direction is obtained by solving an reduced SLE to overcome the Maratos effect and obtain the superlinear convergence, the coefficient matrix of SLE only consists of the constraints and their gradients corresponding to the new working set. In particular, the convergence rate is proved to be locally one-step superlinear under a condition weaker than the strong second-order sufficient condition and without the strict complementarity. Moreover, the line search in our algorithm can effectively combine the initialization processes with the optimization processes. Finally, some elementary numerical results and comparisons show that the proposed algorithm is promising and competitive.