MULTI-STEP SPECTRAL GRADIENT METHODS WITH MODIFIED WEAK SECANT RELATION FOR LARGE SCALE UNCONSTRAINED OPTIMIZATION

. In this paper, we aim to propose some spectral gradient methods via variational technique under log-determinant norm. The spectral parameters satisfy the modiﬁed weak secant relations that inspired by the multistep approximation for solving large scale unconstrained optimization. An executable code is developed to test the eﬃciency of the proposed method with spectral gradient method using standard weak secant relation as constraint. Numerical results are presented which suggest a better performance has been achieved.

1. Introduction. We consider the unconstrained minimization problem as follows: where f is twice continuously differentiable and its gradient is available. We are interested in the large scale case, say n is greater than 10000 for which the Hessian of f is either not available or requires a large amount of storage. Steepest descent method is the most straightforward optimization tool used to solve large scale unconstrained optimization. However, the steepest descent method is relatively slow when it is close to the minimum. For ill-conditioned problems, the steepest descent increasingly 'zigzags' as the gradient point nearly orthogonally to the shortest direction to a minimum point. Quasi-Newton method was then introduced to overcome this deficiency. Its popularity is attributed to no actual Hessian is required for the algorithm. However, matrices storage for the Hessian approximation is still needed. La Cruz and Raydan(2003) and La Cruz et al(2006) extended the spectral approach to steepest descent direction for unconstrained nonlinear optimization. The main advantage of the spectral methods is that no second order information is needed for the search direction. Therefore, the Hessian is not required explicitly and a low computational cost is expected. On the other hand, a scaling parameter that incorporates certain second order information is used to scale the steepest descent direction. In summary, spectral gradient methods are low-cost nonmonotone schemes for finding local minimizers.
The spectral gradient method has the interesting feature that the line searches are avoided, and the step length is determined by a Rayleigh quotient that approximating an eigenvalue of the Hessian of the function to be minimized. The first spectral gradient method, was originally proposed by Barzilai and Borwein in 1988 which is called as BB method. The iterations are defined by and g k is the gradient evaluation of f at x k . Note that the inverse of the steplength is a Rayleigh quotient corresponding to the average Hessian matrix 1 0 ∇ 2 f (x k−1 + ts k−1 )dt. Compared to line search, less computational work is needed to obtain the steplength in this way and it actually incorporates second order information into the search direction. Ford and Moghrabi (1994) have proposed another type of modified weak secant relation that involved multi-step construction by means of interpolating polynomials, leading to a generalization of the secant equation. The positive-definiteness of Hessian approximations is shown to depend solely on a generalized version of the condition which is required to hold in the original 'single-step' methods. Hence, in this paper, we propose some multi-spectral gradient methods via variational technique under log-determinant measure such that the spectral parameters satisfy the modified weaker secant equations.
Our paper is organized as follows: In section 2 we give some introduction and background of spectral gradient method. In section 3, the modification of weak secant relations will be discussed. In section 4, the formulation for the proposed multi-spectral parameters will be given. In section 5, we compare the proposed multi-spectral methods with modified weak secant relation to the multi-spectral method with standard weak secant relation, using the unconstrained problems in the test problem library CUTE (constrained and unconstrained testing environment). In section 6, we draw the conclusions of the paper based upon our findings.
2. Spectral Gradient Method for Convex Quadratic Minimization. Consider the quadratic minimization problem: where A is the Hessian matrix that is assumed to be positive definite and symmetry, and the gradient g k = Ax k − b. Newton's method has an iterative formula in the form of where d k = −A −1 g k . Motivated by the above analysis, we tend to choose σ k so that −σ −1 k g k = −(σ k I) −1 g k is an approximation of d k in some sense. First, define s k = x k+1 − x k and y k = g k+1 − g k . Then the matrix A satisfies the relation Since it is inappropriate for a multiple of identity to satisfy (3), we will choose σ k such that it satisfies some weaker form of (3). We shall discuss more on the aspect in the later part of this paper. For the quadratic function mentioned above, we can assume without loss of generality that an orthogonal transformation is made to transform A to a diagonal matrix that contains only its eigenvalues λ i . Besides that, if there are any eigenvalues of multiplicity m > 1, then we can choose the corresponding eigenvectors so that g (i) k = 0 for at least m − 1 corresponding indices of y 1 . Using A = diag(λ i ), (2) and the properties of a quadratic function that It is clear that if g (i) k = 0 for any i and k := k, then this property will persist for all k > k. Thus, without any loss of generality, we can assume that A has distinct eigenvalues and that g (i) 1 = 0 for all i = 1, 2, 3, · · · , n. From these conditions and (4), we can easily deduce that if σ k is equal to any eigenvalue λ i , then g (i) k+1 = 0 and this property persists subsequently. If both then from (4) and the extremal properties of the Rayleigh quotient that Hence, assuming that σ 1 is not equal to λ 1 or λ n , then for BB method, a simple inductive argument shows that (6) and (7) hold for all k > 1. It also follows that the BB method does not have the property of finite termination. Since the eigenvalues are distinct, it is reasonable to use a set of different σ, so that they can better overlap the spectrum of A. Hence, this motivates us to choose a diagonal matrix, diag(σ (i) k ) to approximate diag(λ i ). Now, we extend the quadratic optimization problem to nonquadratic unconstrained large scale optimization problem in (1).
To solve the optimization problem in (1), we incorporate the line search to the iterative method in (2) to yield where α k > 0 is a steplength and α k is calculated to satisfy certain line search conditions, such as the Armijo condition. A steplength α k is said to satisfy the Armijo condition if the following inequality holds: where 0 < c < 1.
There are various choices on the search direction d k . In this paper, however, we will only focus on the spectral gradient method where the search direction d k is given by where matrix B k is updated at every iteration and the sequence of matrices {B k } is required to satisfy some weaker form of secant equations.
In most derivation of quasi-Newton updates, the weighted Frobenius norm is often used to measure the derivation between the current and the updated matrix. For instance, Davidon-Fletcher-Powell (DFP) method is derived via the following variational problem: The unique solution for the problem above will be Now, working with the inverse Hessian H k in place of B k , the variational problem become Therefore, the solution of the variational problem will then be This is the well-known Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. Besides the weighted Frobenius norm, Byrd and Nocedal (1989) simplified the convergence proofs for the BFGS update by working simultaneously with the trace (tr) and determinant of B k . For this purpose, they defined, for any positive definite matrix B, the function, defines a measure of matrices, where ln denotes the natural logarithm. Motivated by this measure, we propose to derive a spectral gradient type updating such that it satisfies the weaker secant relation defined by Dennis and Wolkowicz in 1993 as follows: Besides that, we also consider the derivation of spectral gradient methods using modified weak secant relations.

Modification of Weak Secant
Relations. The secant equation is derived based on the single step method. Ford and Moghrabi (1994) proposed the multi-step methods to derive the modified secant equation. Now, the derivation of the modified secant equation using two-step and three-step approximations will be provided. Suppose that, in addition to the two current iterates x i and x i+1 , the m − 1 most recent points x i−m+1 , x i−m+2 , · · · , x i−1 generated by some algorithm, together with the corresponding gradient values are available. Let ν denote a differentiable path {x(τ )} in n , where τ ∈ . Thus, the path ν is a polynomial x(τ ) of degree m which interpolates these points and may be defined as: The precise form of this polynomial will depend upon the value {τ k } m k=0 which the variable τ is chosen to be assigned in order to correspond to the iterates {x i−m+k+1 } m k=0 . From consideration of the basis case in the derivation of the secant equation (that is, m = 1, τ 0 = 0 and τ 1 = 1), a natural choice is to retain a unit spacing of the τ −values: It is convenient to represent the interpolating curve ν in its Lagrangian form where L k (τ ) is the standard Lagrangian polynomial If g(x(τ )) is considered as a function of τ , the obvious course of action is then to approximate it by the corresponding interpolatory polynomial. This polynomial is based on the values of the gradient arising from the iterates {x i−m+k+1 } m k=0 (and thus the values {τ } m k=0 ) used in the construction of the path ν. Hence, from (11), the following can be obtained: The Newton equation is defined as follows: where G is defined as the Hessian matrix. Since τ = τ m = 1 corresponding to x i+1 , the Newton equation (13) will be applied with τ * = τ m in order to be able to derive a relation satisfied (approximately) by G(x i+1 ). From (11), and the values of the coefficients {L k (τ m )} m k=0 are readily available from tables [6], since they arise from numerical differentiation performed on equally-spaced data. Explicitly, , for k < m, The coefficients may, evidently, be employed to form an estimate for the derivative of g, via differentiation of (12): Thus, by analogy to the secant equation, the condition is derived as on the new Hessian approximation B i+1 , as a replacement for the condition imposed by the secant equation. This condition for B i+1 may therefore be met by selecting any standard quasi-Newton formula which satisfies the secant equation, and then replacing s i and y i with r i and w i , respectively. A more useful and slightly more compact representation of r i and w i may be derived as follows. By differentiating the identity m k=0 L k (τ ) = 1, and setting τ = τ m , the following can be obtained: Thus, concentrating on the representation of r i (to be definite), (14) gives

Using (16)
Similarly, a corresponding representation of w i in terms of {y i−j } m−1 j=0 may be derived. It follows that the vectors r i and w i required for updating B i to produce B i+1 may be found from the m most recent "step-vectors" {s i−j } m−1 j=0 and {y i−j } m−1 j=0 , respectively. Now, the multi-step weak secant equations will be provided as below. Two-step: where Three-step: 4. Formulation for Multi-Spectral Parameters. The purpose of the derivation is to construct B k+1 such that it satisfies s T k B k+1 s k = s T k y k where s k and y k are given by either (17) or (18).
Hence, we consider the following approach: where B k+1 is assumed to be diagonal and positive definite. Let B k+1 = diag(r (1) k , · · · , r (n) k ). Then, the minimization becomes . The Lagrangian formed from (20) and (21) will become In order to get the minimizer, we differentiate (22) with respect to r (1) k , r (2) k , · · · , r (n) k and set the resulting derivation to 0: which yields Now, by substituting (24) into the constraint (19), we have Thus, λ can be obtained by solving the nonlinear equation F (λ) = 0. It is not practical to solve the equation accurately, hence, we would approximate the solution by using only one Newton-Raphson iteration from λ = 0. Hence, the Lagrange multiplier, λ k is approximated by Finally, we obtain the updating formula for B k+1 as follows: where r (i) The algorithm for solving the optimization problems is the same as the spectral gradient algorithm, the only different is that instead of using θI (where θ is single spectral parameter) as the updating formula, we use B k+1 .

5.
Numerical Results and Discussion. In order to test the efficiency of the proposed methods, the modified diagonal spectral methods are compared with the standard diagonal spectral method as follows: (1). MSG-multiple spectral gradient method using standard weak secant relation.
(3). MSG3S-multiple spectral gradient method using (18) as weak secant relation. A set of 96 tested problems given by CUTE, presented in Andrei (2008) has been tested with dimensions varying from 10 to 100000. we compared the methods in terms of the number of iterations, function calls and CPU times. Default values are used for all the other parameters, and the stopping criterion is set to be We also set our upper bound of the number of iterations to be 10000. Therefore, whenever the number of iterations exceed the upper bound, we declare that this run as a failure. The codes are written in Matlab software. Figure 1 shows the performance of the methods considered in terms of number of iterations. From the figure, it can be seen that MSG2S and MSG3S methods required less iterations on average to reach the optimal solutions compared to MSG method. This is mainly due to the fact that MSG2S and MSG3S can interpolate a general function more accurately.
In terms of function counts (Figure 2), MSG2S and MSG3S lead among the methods where Figure 2 shows that MSG2S and MSG3S methods required much lesser function evaluations on average compared to the MSG method since both methods required less iterations.    Lastly, in terms of execution time, the comparative results are consistent to that in terms of function/gradient calls, where the MSG2S and MSG3S methods generally performed the best.
In conclusion, MSG3S outperform than other two methods not only in terms of number of iterations , but also number of function calls and computational times. 6. Conclusion. In this paper, some improved multi-spectral gradient methods were derived by using the modified weaker secant relations that are inspired by the works of Ford et al. (1994). The standard weaker secant relation considers the curvature condition as s T k y k whereas the modified weaker secant relations are represented by (17) and (18). The derivation of the methods is similar to the derivation with the standard weaker relation being replaced by the modified weaker relations.
The proposed methods were then tested against the standard multiple spectral gradient method. In general, it can be observed that the MGS2S and MGS3S methods performed better than the standard method due to the ability of the multi-step approximations to interpolate a general function accurately and thus reducing the error. To sum up, the modified spectral gradient methods can be considered as a good alternative to solve large scale unconstrained optimization problems.