A WEDGE TRUST REGION METHOD WITH SELF-CORRECTING GEOMETRY FOR DERIVATIVE-FREE OPTIMIZATION

. Recently, some methods for solving optimization problems without derivatives have been proposed. The main part of these methods is to form a suitable model function that can be minimized for obtaining a new iterative point. An important strategy is geometry-improving iteration for a good model, which needs a lot of calculations. Besides, Marazzi and Nocedal (2002) proposed a wedge trust region method for derivative free optimization. In this paper, we propose a new self-correcting geometry procedure with less computational eﬀorts, and combine it with the wedge trust region method. The global convergence of new algorithm is established. The limited numerical experiments show that the new algorithm is eﬃcient and competitive. region method, self-correcting geometry.

1. Introduction. In this paper, we discuss the unconstrained optimization problems min where f is a smooth function of several variables whose derivatives are unavailable or unreliable. This class of problems exists widely in practice and is called as the derivative-free optimization (DFO) problems. The Derivative-Free Optimization (DFO) has a history of more than a half century and is rapidly developed recently due to a growing number of applications that range from science problems [9], medical problems [18], engineering design and facility location problems [7]. The common methods for optimization involve derivatives, for example, the conjugate gradient methods need first-order derivatives; the Newton methods use the information of the second-order derivatives. What's more, the necessary and sufficient optimality conditions also need derivatives. Please refer Nocedal and Wright [17] (1999), Sun and Yuan [33] (2006), Dai etc. [5] (2006), and Powell [27] (2010).
If we can not get the derivative, how to deal with it? The first idea to handle this issue may be the finite-difference approach. But there are many cases where the finite-difference does not work. First, if the value of the objective function is generated by black-box packages, we can only get the objective function values if we have the input data, but we can not get the specific expression and the gradient information. It is the same if the code of functions is written in the way we can not read and translate. Second, when the function evaluation is costly, it needs a prohibitive cost to provide the gradient estimation. Third, if the objective function is noisy, the gradient estimation may be completely useless, and the finite-difference approach is unreliable in the presence of noise. So the derivative-free optimization methods come into being.
In fact, DFO is not a new thing. The earliest ideas came out in the 1960s. The Hooke-Jeeves method [10] in 1961 and the Nelder-Mead method [16] in 1965 are the early literatures of direct search methods. In 1969 and 1973, Winfield [35,36] put forward the first model-based method for derivative-free optimization.
He constructed a quadratic model function to approximate the objective function by interpolation. Powell [19] in 1970 proposed the modern trust region method for unconstrained optimization in which he established the convergence theory and implementable algorithms (see [19,20,28]). Afterwards, several researches indicate that the modern trust region methods are efficient and robust for various optimization problems including the optimization problems without derivatives (see [2,11,15,31,32,33,38,40]). In 1994 and 1998, Powell [21,22,25] proposed the model-based method by interpolation to handle the derivative-free optimization problems. In particular, Powell published some famous softwares, for example, "UOBYQA" [24] and "NEWUOA" [26], for solving DFO problems. In addition, Conn etc. [3] published a derivative-free optimization method by using the Newton interpolation models. Xue and Sun [37] established the convergence analysis of derivative-free trust region algorithm for constrained optimization with separable structure. Zhang etc. [39] developed a class of derivative-free algorithms for least-squares minimization problems by use of polynomial interpolation models. However, since there are several difficult issues and expensive computational efforts, so far, DFO algorithms can only handle small and medium scale optimization problems without derivatives. Now DFO algorithms are faced with big challenge and developing space.
An important issue of model-based methods in DFO is that the set of interpolation points should meet some geometry conditions under which we can get a suitable model function and global convergence. Generally, some geometry-improvement steps to assure of the suitability of the model function are used, see [3,24,26]. Marazzi and Nocedal [12,13] proposed wedge trust region methods which add a wedge constraint in the trust region subproblem such that new generated step avoids approaching to the region in which the set of interpolation points is degenerate. Scheinberg and Toint [29] presented a self-correcting geometry process to achieve the goal of poisedness. Gratton, Toint, Tröltzsch [8] solved the bound-constrained optimization problems by using the technique of self-correcting geometry. However, we find that the self-correcting geometry needs expensive computation efforts, and has not ideal effect when it combines with the basic trust region subproblem. Aiming at the defects, we design a new self-correcting geometry strategy which requires less computational costs. In addition, we put the strategy into the framework of the wedge trust region model, so that the new step generated by the algorithm avoids approaching to the area containing the degenerate set of sample points. Further, we give numerical experiments which show our work is interesting.
The paper is organized as follows. In Section 2 we first introduce some preliminaries about interpolation schemes, and then design the new self-correcting geometry strategy and propose our wedge trust region algorithm with self-correcting geometry. In Section 3, we prove the global convergence of our algorithm. In Section 4, we report the numerical experiments which show that our new algorithm is competitive. Finally, we make some conclusions.

2.
A Wedge Trust Region Method with Self-Correcting Geometry for Optimization.

Interpolation models.
There are some different methods to construct the interpolation model, for example, Lagrange interpolation [1,23], Newton interpolation [3], radial basis function interpolation [18,34] and so on. Here we adopt the Lagrange interpolation.
We consider the general polynomial interpolation model. Let P d n denote the space of polynomials of degree ≤ d in R and let p 1 p + 1 be the dimension of this space. One has p 1 = n + 1 for d = 1 (linear model case) and p 1 = 1 2 (n + 1)(n + 2) for d = 2 (quadratic model case). A basis Φ = {φ 0 (x), φ 1 (x), · · · , φ p (x)} of P d n is a set of p 1 polynomials of degree ≤ d that spans P d n . For such a basis Φ, any polynomial m(x) ∈ P d n can be written as where α 0 , · · · , α p are real coefficients. We say that the polynomial m(x) interpolates the function f (x) at a given point y if m(y) = f (y). Assume that we give a set Y = {y 0 , y 1 , · · · , y p } ⊂ R n of interpolation points and let m(x) denote a polynomial of degree d in R n that interpolates a given function f (x) at the points in Y. The coefficients α 0 , α 1 , · · · , α p can be determined by solving the linear system In order to make the above system have a unique solution, the matrix M (φ, Y ) has to be nonsingular. Definition 2.1. [4] The set Y = {y 0 , y 1 , · · · , y p } ⊂ R n is poised for polynomial interpolation in R n if the corresponding matrix M (Φ, Y) is nonsingular for some basis Φ in P d n . In the following, we give some basic definitions which are appeared in several literatures, for example, see [4,21,29,30].
Definition 2.2. Given a set of interpolation points Y = {y 0 , y 1 , · · · , y p } ⊂ R n , a basis of p 1 = p+1 polynomials l j (x)(j = 0, · · · , p) ∈ P d n is called a basis of Lagrange polynomials if If Y is poised, then the Lagrange polynomials exist, are unique and have a number of useful properties. In particular, if m(x) interpolates f (x) at the points of the poised set Y, then for all x, which is unique. It can also be shown that For more details and other properties of Lagrange polynomials, see, for example, Conn, Scheinberg, and Vicente [4] and Sun etc. [30]. Further, we give the following concept of Λ-poisedness and some related lemmas in [4,29].

Lemma 2.4. [29]
Given a closed bounded domain B, any initial interpolation set Y ⊂ B, and a constant Λ > 1. Consider the following procedure: find j ∈ {0, · · · , p} and a point x ∈ B such that |l(x)| ≥ Λ (if such a point exists), and replace y j by x to obtain a new set Y. Then this procedure terminates after a finite number of iterations with a model which is Λ-poised in B.
, and its associated basis of Lagrange polynomials {l j (y)} p j=0 , there exist constants κ ef > 0 and κ eg > 0 such that, for any interpolating polynomial m(x) of degree one or higher of the form (2) and any given point y ∈ B(x, ∆), we have

2.2.
A new self-correcting geometry process. To guarantee the poisedness of the interpolation set and accelerate the convergence, we make several modifications to the existing self-correcting geometry process. Now we first describe our new selfcorrecting geometry process, and then analyze this algorithm, in particular, justify our modifications.
Algorithm 2.1. A new self-correcting geometry process.
Step 0. Initialization: The current iterate x k , the current interpolation set Y k , the current trust region radius ∆ k > 0, the ratio ρ k , the switch value ∆ c for updating radius and the trial point x + k are given. Constants β > 0, 0 < β 2 < 1 < β 1 and η ∈ [0, 1) are also given.
Step 1. Successful iteration: Step 2. Replace the comparatively bad interpolation points: If ρ k < η, the set Step 3. Replace the distant interpolation points: where r is an index of any point in A k , for instance, such that y r ∈ arg max Step 4. Replace the badly-poised interpolation points: Step 5. Unchange the interpolation set: If ρ k < η and A k ∪ B k = ∅, then set x k+1 = x k and ∆ k+1 = β 2 ∆ k . Now, we give some interpretations of this algorithm.
There are five possible cases in Algorithm 2.1, which is similar to Algorithm 2 in [29] where there are four possible cases. The first case occurs when the trial point x + k makes the objective function value sufficiently decrease. Then we set x k+1 = x + k . As for updating the interpolation set, our measure is that x + k replaces y r which is the farthest point from x + k , instead of the farthest point from x k which is adopted in [13]. Case 1 is described in Step 1.
The second case occurs when the iteration is unsuccessful and the interpolation set contains points that are too far from the current trust region center (A k = ∅) and points whose corresponding Lagrange polynomials have large absolute values at the trial point (B k = ∅). In this case, we can choose any point from A k ∪ B k to be replaced. For practical reason, we replace one point such that we can get the best improvement in the combined measure of both distance and poisedness criteria. Case 2 is described in Step 2.
The third case occurs when A k = ∅ and B k = ∅, which means that although Y k is good in the sense of poisedness criterion, but there are some points that are too far away from the current trust region center. So, we only consider the replacement in the distance sense. For practical reason, we choose the farthest point y r . Case 3 is described in Step 3.
In the fourth case, if A k = ∅ but B k = ∅, i.e., all points are not distant from the current iterate x k , Step 4 is executed. In this case, we only replace the points that belong to B k , ignoring the effect of the distance. For practical reason, we choose the point that violates the poisedness criterion most seriously.
Finally, if the iteration is not successful and the set A k ∪ B k = ∅, then we execute Step 5 and only reduce the trust region radius. Obviously, one of the five cases must be used. We show that Step 3, Step 4 and Step 5 can be used only in a finite number of times until either a successful iteration occurs or the model becomes well poised.
Algorithm 2.1 is a variant and extension of Algorithm 2 in [29] and Algorithm 1 in [13], where A k and B k , respectively, stand for the criteria of distance and the poisedness. According to the empty property of A k and B k , we choose the replaced point depending on the distance criterion, the poisedness criterion or the combination of two criteria. In our algorithm the distance criterion is related to x + k at successful iteration and x k in unsuccessful case, respectively, while the strategy adopted in Algorithm 2 in [29] is only based on x + k . Numerical experiments show that our modification is more efficient.
As for updating the radius, in Algorithm 2 in [29], it is reduced only when the interpolation set Y k is unchanged in the unsuccessful iterations. However, according to the numerical experiments, we find this strategy is not good. To accelerate convergence, we also shrink the radius when Y k is changed in the unsuccessful iterations, if the radius is not too small. If the radius is too small, we reduce the radius only in Step 5 of Algorithm 2.1.
If the interpolation set is changed, we update all coefficients of the Lagrange polynomial. Then a new model can be expressed in (2).

SCGWTR algorithm.
In this subsection we describe our algorithm which is based on the new self-correcting geometry process and the modified wedge trust region method. Our algorithm is called SCGWTR, in brief.
In general, the model is quadratic in trust region framework, written as where ∇m k (x k + s) = G k s + g k , ∇m k (x k ) = g k , ∇ 2 m k (x k ) = G k , and ∆ k is the trust-region radius at the k-th iteration. Note that, G k is a symmetric approximation to ∇ 2 f (x k ). In the derivative-free case, The wedge trust region method was proposed by Marazzi and Nocedal [12,13]. There are two versions: the linear version and the quadratic version, which are classified by the degree of the interpolation models. They follow the idea that adds a wedge constraint into the trust region subproblem.
We define a "taboo region" T C in R n , in which the interpolation points are nonpoised. We also define a "wedge", which is a set W c that contains T C and that is designed to avoid those points too near T C . The description of T C and W C can be referred in Section 3 and 4 in [13]. The wedge constraint is added to the trust region subproblem, and we have min s / ∈ W k .
As for solving the wedge trust region subproblem (9)-(11), we usually first solve the standard trust region subproblem without the wedge constraint and get a solution s e k at the k-th iteration. If s e k satisfies the wedge constraint, we set s k = s e k as the trial step. Otherwise, the wedge constraint is active. By rotating s k , we find a vector satisfying the wedge constraint. Then we set the trial point x + k = x k + s k . Now we state our new algorithm as follows.
Step 1c. : Set m k =m i , ∆ k+1 = θ ∇m k (x k ) , and set v i = x k if a new model has been computed.

Step 2. Compute the replaced point:
Choose the point that is the farthest from the current iterate as the replaced point y r , i.e., Step 3. Solve the wedge trust region subproblem: Solve the wedge trust region subproblem (9)-(11) for getting s k , and set the trial point x + k x k + s k .
Step 4. Update the iteration and interpolation set: Compute Use Algorithm 2.1 to get Y k+1 , x k+1 and ∆ k+1 .
Step 5. Update the model and Lagrange polynomials: If Y k+1 = Y k , recompute the interpolation model m k+1 (x) by using the Lagrange polynomials l j (x)(j = 1, . . . , p) associated with Y k+1 . Set k := k + 1 and go to Step 1.
3. Global Convergence. In this section we prove the convergence results of the algorithm SCGWTR which say that Algorithm 2.2 produces a subsequence which converges to the stationary point. Our convergence analysis is based on the convergence theory of basic trust region methods in [2,33] and is an extension and modification of the convergence analysis of algorithm WEDGE [13] and algorithm SCGDFO [29]. We first give some assumptions.
Assumptions: A1 The objective function f is continuously differentiable in an open set Ω containing all iterates generated by the algorithm and its gradient ∇f is Lipschitz continuous in Ω with constant L.
A2 There exists a constant κ low such that f (x) ≥ κ low for every x ∈ Ω. A3 There exists a constant κ H ≥ L such that G k ≤ κ H for every k ≥ 0.
Note that the above assumptions are standard assumptions for the derivative-free optimization (see [4,13,29]. The assumption A1 means the existence of the first derivatives of objective functions, not the possibility of their calculations. The assumption A3 implies that the Hessian of the model function has to remain bounded in norm. where κ c ∈ (0, 1) is a constant.
Proof. From Lemma 6.1.3 in [33], we know that if s e k is the exact solution of (9)-(10), then Since, in the process of solving the subproblem of the wedge trust region methods, the model function is sufficiently descent, i.e., we can deduce that there exists a constant κ c ∈ (0, 1) such that Now we give the self-correcting property of Algorithm 2.1.
Lemma 3.2. Suppose that assumptions A1 and A3 hold and that m k is a quadratic model. Then, for any constant Λ > 1, if the k-th iteration is unsuccessful, Proof. The proof of this lemma is similar to Lemma 5.2 in [29], and we omit it. This lemma means that the self-correcting property introduced by [29] also holds in our modified algorithm, provided the trust region radius is small enough compared to the model's gradient, and that all interpolation points are contained in the trust region. Then every unsuccessful iteration must result in an improvement of the interpolation set geometry.
Next, we will verify that, in our case, the bound ∆ k can not become arbitrarily small when the iterate is far away from a critical point. This lemma is an extension of Lemma 5.3 in [29]. Lemma 3.3. Suppose that assumptions A1 and A3 hold. Suppose also that, for some k 0 ≥ 0 and all k ≥ k 0 , the model is quadratic and for some κ g > 0. Then there exists a constant κ ∆ > 0 such that, for all k ≥ k 0 , Proof. Assume that, for some k ≥ 0, If the k-th iteration is successful, i.e., ρ k ≥ η, then Otherwise, ρ k < η, and there are three cases that may occur. The first case is when A k = ∅ and B k = ∅, Step 2 of Algorithm 2.1 is executed. Observe that (18) The second case is when A k = ∅ and B k = ∅. If i > 0, then (16) and (18) where k i is the index of the last iteration before k where a new Λ-poised model has been recomputed in the criticality test. Therefore, Step 3 of Algorithm 2.1 is executed. Together with ∆ k < ∆ c , we also have ∆ k+1 = ∆ k .
The third case is conducted when A k = ∅. Under the condition (18), Lemma 3.2 can infer that B k = ∅. Since (18) and (19) hold in this case, Step 4 of Algorithm 2.1 is executed and ∆ k+1 = ∆ k .
As a consequence, the trust region radius can be decreased only if and Algorithm 2.1 implies (17) with In the following, we consider the convergence analysis in the case where the number of successful iterations is finite.
Proof. First, since every iteration is eventually unsuccessful, then for some x * and all k sufficiently large, To deduce a contradiction, we assume that for some κ g > 0 and all k sufficiently large, We can infer from Lemma 3.3 that for all k sufficiently large. Since the number of successful iterations is finite, eventually all iterations go through Step 2, Step 3, Step 4 or Step 5 in Algorithm 2.1. Then, the sequence {∆ k } is nonincreasing and bounded below and therefore convergent. So, let and we have that the unsuccessful iterations of Step 5 in Algorithm 2.1 cannot happen infinitely many times because ∆ k is bounded below by ∆ ∞ and β 2 < 1. While Step 2, Step 3 and Step 4 in Algorithm 2.1 cannot happen infinitely many times too for a certain lower bound ∆ c (see Algorithm 2.1). Thus ∆ k = ∆ ∞ for all k sufficiently large, and all iterations eventually execute Step 5 in Algorithm 2.1.
Note that during the iteration of Step 4 in Algorithm 2.1, the new trial point replaces some interpolation points in the set B k . Then, the trial point x + k replaces a previous interpolation point y j with |l j (x + k )| ≥ Λ. But this is impossible from Lemma 2.4, where the exchange procedure will terminate after a finite number of iterations with a Λ-poised model. So the contradiction is obtained.
Next, we further consider the case of infinitely many successful iterations. Proof. Assume that the lemma is not true, i.e., there exists some κ g > 0 such that (16) holds for all sufficiently large k. Then we have from Lemma 3.3 that (17) holds for all k, including all successful iterations with k large enough. However, from (13), we have that Since, by assumption, there are infinitely many successful iterations, we obtain that which contradicts assumption A2, where S is a set of successful iterations. Therefore the lemma holds.
So, by use of the above lemmas and similar proof in Theorem 5.8 in [29], we can obtain the main theorem.
4. Numerical Experiments. In this section we report the numerical results. We compare our algorithm with the wedge trust region method in [13] and the selfcorrecting geometry method in [29]. All the test problems come from Moré etc. [14]. The names of test problems are listed in Table 1. The details of numerical results for three algorithms are listed in Table 2, where WEDGE, SCGDFO and SCGWTR stand for the wedge trust region method, the self-correcting geometry method and our method, respectively. In Table 2, P, n, Iter, NF, F, and Time, respectively, stand for the problem number, the dimension of the problem, the number of iteration, the number of function calculation, the final function value and the cpu-time.
The selection of the set of initial interpolation points Y 0 = {y 0 (= x 0 ), y 1 , · · · , y p } can be seen in (22), referring to UOBYQA [24], where p = 1 2 (n + 1)(n + 2) − 1, i = 1, 2, · · · , n, where i(u, v) 2n + u + 1 2 (v − 1)(v − 2), and ∆ 0 > 0 is the initial trust region radius. From the numerical results in Table 2, we can find that, for 21 test problems and 29 cases, Algorithm WEDGE is failed for three cases, Algorithm SCGDFO is failed for six cases, but our algorithm SCGWTR is all successful. In addition, in 29 cases, our algorithm is the best for 21 cases. So this table shows that our method is competitive with the other two methods.
According to Dolan and Moré [6], we draw the performance profiles of the three algorithms. From Figure 1, Figure 2 and Figure 3, we can also conclude that our method works more efficiently.

5.
Conclusions. In this paper we propose a new self-correcting geometry process and give a self-correcting geometry wedge trust-region method for derivative-free optimization problems. We describe the motivation and details of our algorithm. The global convergence of our algorithm is established. The limited numerical experiments demonstrate that the new method is effective.