A SCALED CONJUGATE GRADIENT METHOD WITH MOVING ASYMPTOTES FOR UNCONSTRAINED OPTIMIZATION PROBLEMS

. In this paper, a scaled method that combines the conjugate gradient with moving asymptotes is presented for solving the large-scaled nonlinear unconstrained optimization problem. A diagonal matrix is obtained by the moving asymptote technique, and a scaled gradient is determined by multi- plying the gradient with the diagonal matrix. The search direction is either a scaled conjugate gradient direction or a negative scaled gradient direction under diﬀerent conditions. This direction is suﬃcient descent if the step size satisﬁes the strong Wolfe condition. A global convergence analysis of this method is also provided. The numerical results show that the scaled method is eﬃcient for solving some large-scaled nonlinear problems.

1. Introduction. We consider the unconstrained optimization problem where f : R n → R is continuously differentiable. The gradient of f (x) is denoted by g(x) ≡ ∇f (x).

GUANGHUI ZHOU, QIN NI AND MEILAN ZENG
There are many efficient methods for solving problem (1), such as the conjugate gradient(CG) method, the BFGS method, the moving asymptotes(MA) method, etc. In this paper, we demonstrate a new method which combines the CG with moving asymptotes (MA) for solving large-scaled nonlinear unconstrained optimization problems.
As is well known, the CG method is one of the classical methods for unconstrained optimization. It has been proved effective for solving large-scale unconstrained optimization problems because it needs no storage of matrices. In the conjugate gradient method, the iterates {x k } have the following format where the positive scalar α k is called the step size, the vector d k is called the search direction that is defined by with g k ≡ g(x k ) and a scalar parameter β k characterizing the conjugate gradient method. The best-known expressions of β k are Hestenes-Stiefel(HS) [10], Fletcher-Reeves(FR) [6], Polak-Ribière-Polyak(PRP) [15,16] and Dai-Yuan(DY) [3] formulas, which are defined by respectively, where y k = g k+1 − g k and · denotes the Euclidean norm. A popular inexact line search is the Wolfe condition, in which the step size α k satisfies with 0 < δ < σ < 1. Many numerical methods are proved to be convergent under the strong Wolfe condition, that is, the step size α k satisfies (5) and In recent years, the CG method has attracted new attentions. Some scholars have further investigated the method in more regions. For example, Narushima et al. [13] proposed a three-term conjugate gradient method in which the search direction satisfies the sufficient descent condition, Al-Baali et al. [1] extended this method to a two-parameter family of three-term conjugate gradient methods which can be used to control the magnitude of the directional derivative, Dai and Kou [4] sought the conjugate gradient direction closest to the direction of the scaled memoryless BFGS method and proposed a family of conjugate gradient methods for unconstrained optimization, Hager and Zhang [9] developed a limited memory conjugate gradient method to find out the loss of orthogonality that can occur in illconditioned optimization problems and then correct it, Nakamura, Narushima and Yabe [12] considered a unified formula of parameters which establishes the sufficient descent condition, Zhou and Zhou [19] proved the strongly global convergence property of a modified three-term HS conjugate gradient method for nonconvex optimization by the use of the backtracking type line search, and etc.
The MA method was first presented by Svanberg in [17] and later frequently used for structural optimization. Ni and Wang studied the unconstrained optimization problem (1) using the MA with a trust region technique in [14] and [18], respectively. They designed a new subproblem and some new rules to control the parameters of moving asymptotes. The numerical results showed that their method is capable of processing large-scale optimization problems.
In this paper, we propose a new method for solving large-scale nonlinear unconstrained optimization problems. We shall establish a new search direction with the help of scaling thought introduced by Luenberger and Ye in [11]. More precisely, we firstly obtain a diagonal matrix, called a scaled matrix, in the moving asymptotes method. Then we apply this matrix to our new search direction, which is called a scaled conjugate gradient direction. In our method, the step size satisfies the Wolfe condition or the strong Wolfe condition. This method can be viewed as a combination of the CG and the MA. In fact, they are two particular cases of our method. Our purpose is to obtain a new method which may have the advantages of both the CG and the MA.
2. The algorithm. Wang and Ni [18] studied unconstrained optimization by using the MA method. They obtained a descent direction by solving a convex separable subproblem of moving asymptotes in each iteration. The subproblem is whered i is the i−th element of the directiond, ∆ i (i = 1, 2, · · · , n) are trust region radii and whereḡ i is the i−th element of the gradientḡ, Wang and Ni replaced general moving asymptotes with the trust region radius ∆ i and parameter η i , so they showed a reasonable way to control the parameters of moving asymptotes. Wang and Ni [18] pointed out that the function m(x,d) in (8) is a first-order approximation of f (x) and the subproblem (8) is equivalent to n independent onedimensional bound constrained subproblems with positive parameters ε i and η i . The solution of (8) is shown to bē where

GUANGHUI ZHOU, QIN NI AND MEILAN ZENG
The expression (9) implies γ i > 0, i = 1, 2, · · · , n. Furthermore, we define a diagonal matrix Λ by where γ i (i = 1, 2, · · · , n) is determined by (9). Then Λ is positive definite. We can say that the MA method in [18] is essentially a modified steepest descent method since the solution of (8) can be expressed asd = −Λg. We note that this direction can also be obtained by the linear transformation x = Λ 1 2 z and the negative gradient of φ(z) ≡ f (Λ 1 2 z). Now let us expatiate and use an idea, which has been elaborated in [11] by Luenberger and Ye, to derive our new conjugate gradient method. Suppose that x = Az is a linear invertible transformation, where A is an invertible n × n matrix. Then we have To simplify the deduction, we assume α k =α k . Letg be the gradient of the function f (Az) with respect to the variable z, i.e.,g(z) : The search direction of the conjugate gradient with respect to the variable z can be expressed asd k = −g k +β kdk−1 . We choose one formula of (4) as the expression ofβ k , for example, From above three equations, If A is chosen as Λ 1 2 , then Furthermore, if we allow Λ to vary in each iteration, then the search direction can be expressed as where Λ k is defined by (10). In order to control the change of diagonal elements γ i (i = 1, 2, · · · , n) of the matrix Λ k within a particular range, we will choose two constants m and M satisfying 0 < m < 1 < M , such that m ≤ γ i ≤ M for i = 1, 2, · · · , n. This restriction can ensure the descent property of the direction (11) in some sense. In order to guarantee the global convergence, we also need to modify the parameters and define the following notationŝ Hence, our search direction can be expressed as whered k and θ k are defined by (13). Now we can give a scaled method combining the conjugate gradient with moving asymptotes (denoted as the SMCGMA method for convenience).
Step 3 Form the search direction. Compute η k i and ε k i , where and g k i is the i-th component of g k for i = 1, . . . , n. Update the scaled matrix Λ k by (9), (10), and (12). Then compute the search direction d k by (14).
Step 5 Update the trust region radius. Compute where .
Step 6 Compute the new point. Set x k+1 = x k + s k , k = k + 1, and go to Step 2.
3. Convergence analysis. As mentioned above, for unconstrained optimization, Algorithm SMCGMA is a new method which can be considered as a generalization of the Dai-Yuan CG method [3] and the Wang-Ni method [18]. It follows from (14) that which implies that where The expression (16) will be helpful for our convergence analysis.
The following lemma shows that the search direction (14) is descent one under the Wolfe condition.  (5) and (6).
Proof. (i) If k = 0, then d 0 = −g 0 , and (ii) If d k = −Λ k g k , then it follows from the definite positivity of the matrix Λ k that (17) is also correct in this case. From (i) and (ii), we can make the induction hypothesis that g T i d i < 0 for i = k − 1. It follows from the Wolfe line search condition (6) that

then we can use induction to show that the relation
The proof is completed.
Furthermore, the direction is sufficient descent if the step size α k satisfies the strong Wolfe conditions (5) and (7). Lemma 3.2. Suppose that the sequences {x k }, {g k } and {d k } are generated by Algorithm SMCGMA, in which α k satisfies the strong Wolfe conditions (5) and (7). If g k = 0 for all k ≥ 0, then where m ∈ (0, 1) is defined by Step 1 of Algorithm SMCGMA.
Proof. (i) If k = 0, then d 0 = −g 0 , and (ii) If d k =d k , then It follows from the strong Wolfe condition (7) that (iii) If d k = −Λ k g k , then we have The proof is completed.  (5) and (6). If g k = 0 for all k ≥ 0, then there exists a constant c > 1 such that Proof. For all k ≥ 1, it follows from (14) that where β k can be denoted by Taking the Euclidean norm and squaring both sides of (20), we have that The formulas (17) and (18) imply that g T k d k = 0 for all k. Therefore, which contradicts the Zoutendijk condition (24). Hence, our original assertion (26) must be false, giving that either g k = 0 for some k or (25) holds.
It's worth mentioning that the sufficient descent condition is not necessary in our convergence analysis.
4. Numerical results. In this section, we show a preliminary computational performance of the SMCGMA implemented in Matlab for solving unconstrained optimization problems.
The experiments are run on a personal computer with a 64 bits processor, 2.5GHz of CPU and 4GB of RAM memory. All the codes are written in the Matlab language and are compiled with this software.
All the test functions are the unconstrained problems coming from the CUTEr library [7] and [2], they are listed in Table 1. We would like to compare our algorithm SMCGMA with the CG-Descent and the DYCG. The other two algorithms are described in the following: • CG-Descent: A conjugate gradient algorithm with guaranteed descent proposed by Hager and Zhang in [8]. It has been proven excellent in recent years. • DYCG: A conjugate gradient algorithm with a strong global convergence property proposed by Dai and Yuan in [3]. Our Algorithm SMCGMA is obtained by scaling the DYCG and the search direction of the SMCGMA, and is closely related to the direction of DYCG. To make the comparison as fair as possible, all the step sizes in the three algorithms satisfy the Wolfe conditions (5) and (6). To terminate the executions, we use the criterion g ≤ 10 −3 and impose the restriction on the number of iterations less than 500 in all these algorithms. The parameters in Algorithm SMCGMA are chosen as follows: In order to observe the performance of the SMCGMA, the CG-Descent and the DYCG, we choose three different dimensions of each test function. The dimensions are n = 10 2 , n = 10 3 , and n = 10 4 , respectively.
We use the performance profile proposed by Dolan and Moré [5] to compare the efficiency of these algorithms. The performance profiles are displayed in the log 2 scale. According to the numerical results obtained in every dimension, we plot two figures based on the CPU time and iterations, respectively.    We can find from Figures 1 and 2 that the SMCGMA, the DYCG and the CG-Descent have similar performances. There are also no large discrepancies on the CPU time when we select the dimension n = 10 3 from Figure 3 because every curve crosses the other two, but from Figure 4 it is clear that the SMCGMA needs fewer iterations than the DYCG and the CG-Descent in solving most of the problems.
If we extend the test dimension n to 10 4 , then the SMCGMA stands out from Figures 5 and 6. This can conclude from the fact that the curves of the SMCGMA are above those of the other two methods. It shows that the SMCGMA has the highest probability of being the optimal solver. An observation that emerges from these figures is that the SMCGMA may be competitive for large-scaled problems.

5.
Conclusions. We have proposed the SMCGMA for solving nonlinear unconstrained optimization problems in this paper. It is a new method that combines the conjugate gradient method with the moving asymptotes method. Numerical results show that the SMCGMA is comparable to the DYCG and the CG-Descent, and the SMCGMA may be more suitable for large-scaled unconstrained optimization problems.
The SMCGMA is obtained by scaling the DYCG. From the numerical results we can say that the Dai-Yuan conjugate gradient method can be improved by the scaling technique. Some competitive algorithms may be obtained if we scale other conjugate gradient methods in the same way. Hence, this scaling technique can be further studied.