A SURVEY OF GRADIENT METHODS FOR SOLVING NONLINEAR OPTIMIZATION

The paper surveys, classifies and investigates theoretically and numerically main classes of line search methods for unconstrained optimization. Quasi-Newton (QN) and conjugate gradient (CG) methods are considered as representative classes of effective numerical methods for solving large-scale unconstrained optimization problems. In this paper, we investigate, classify and compare main QN and CG methods to present a global overview of scientific advances in this field. Some of the most recent trends in this field are presented. A number of numerical experiments is performed with the aim to give an experimental and natural answer regarding the numerical one another comparison of different QN and CG methods.

1. Introduction and preliminaries. In this survey, we focus on solving the unconstrained nonlinear optimization problem min f (x), x ∈ R n , (1.1) where the function f : R n → R is continuously differentiable and bounded from below. The general iterative rule for solving (1.1) starts from an initial approximation x 0 ∈ R n and generates a sequence {x k , k ≥ 0} using the general iterative scheme where the step-size α k is a positive real parameter determined after the exact or inexact line search, x k is the last generated iterative point, x k+1 is the current iterative point, and d k is an appropriate search direction. General class of algorithms of the form (1.2) is known as the line search algorithms. These algorithms require only the search direction d k ∈ R n and the step-size α k ∈ R.
The following notations will be used, as usual: where ∇f (x) denotes the gradient and ∇ 2 f (x) denotes the Hessian of f . For the sake of simplicity, the notation f k will point to f (x k ). Further, x T denotes the transpose of x ∈ R n . Taylor's approximation of the function f at the point x k+1 = x k + α k d k is defined by Therefore, an appropriate descent search direction d k must be determined on the basis of the descent condition g T k d k < 0, for all k. (1.4) Primary choice for descent direction is d k = −g k , which reduces the general line search iterations (1.2) into the gradient descent (GD) iterative scheme (1.5) In this paper, we survey gradient methods satisfying the descent condition (1.4). If there exists a constant c > 0 such that for all k, (1.6) where c is a positive constant independent of k, then it is said that the vector d k satisfies the sufficient descent condition.
As for the choice of search direction, one of the possible choices for the search direction in unconditional optimization is to move from the current point along the negative gradient in each iteration, which correspond to d k = −g k . This choice of search direction leads us to a class of methods known as gradient descent methods. One negative feature of gradient methods is relatively frequent occurrence of, so called, zig-zagging phenomenon, which initiates very slow convergence of GD algorithms to the optimal point, or even divergence [90].
Advantages and disadvantages of GD methods can be summarized as follows.
1. GD methods are globally convergent, i.e., converge to a local minimizer regardless of the starting point. 2. Many optimization methods switch to GD rule when they do not make sufficient progress to the convergence. 3. The convergence is linear and usually very slow. 4. Numerically, GD methods are often not convergent. Another important direction of the search is the Newton's direction d k = −G −1 k g k , obtained from the second-order Taylor-development, assuming that Hessian G k is positive-definite. The pure Newton method (without line search) for minimization of a function f : R n → R is defined using a quadratic approximation of f (x k+1 ): The solution d k = min d (Φ(d)) is given by d k = −G −1 k g k . So, the pure Newton method is defined by The Newton method with line search uses an appropriate step-size α k in (1.8) with the aim to ensure global stability. The resulting iterations are of the form wherein the step-size α k is computed performing a line search. The Newton method exhibits three major drawbacks in practical applications. 1. The descent (and convergence) may not be achieved if the iterations (1.8) are started far away from the local minimizer. 2. Another drawback is numerically expensive and tedious necessity to compute the second derivative matrix (Hessian) and its inverse in every iteration. Moreover, the second derivatives may be sometimes unavailable. 3. The main disadvantages of the Newton method are the possibility that the Hessian G k is not positive definite. Due to that, numerous modifications of it were created, which can be globally divided into two large groups: modified Newton's methods and quasi-Newton (QN) methods. The QN methods are aimed to address all the above difficulties of the Newton method. The first drawback is overcome by taking an appropriately defined positive definite matrix B k that approximates the Hessian G k or an appropriately defined positive definite matrix H k that approximates the true Hessian inverse G −1 k and then performing a line search at each iteration. For a given initial point x 0 ∈ R n and a symmetric positive definite matrix H 0 , the search direction in the kth iteration of the quasi-Newton method is defined by d k = −H k g k , where H k is a symmetric and positive-definite matrix.
QN methods and modified Newton methods belong to most powerful methods for solving unconditional optimization and applicable in many nonlinear optimization problems. A survey of QN methods for solving nonlinear least-squares problems was considered in [86]. The optimization methods have found a number of applications in fluid mechanics [44], free surface flow and solid body contact [13], finding the optimal trajectory for an aircraft or a robot arm, designing a portfolio of investments, controlling a chemical process, computing the optimal shape of an automobile. See [90] for more details. A modification of the quasi-Newton method in defining the two-phase approximate greatest descent was used in [71]. Several variants of multistep spectral gradient methods for solving large scale unconstrained optimization problems were proposed in [111]. Usage of an optimization algorithm in artificial neural networks was considered in [75]. Properties of Hessian matrix which appear in distributed gradient-based multi-agent control systems was considered in [117]. A survey of derivative-free optimization methods was given in [127]. An application of unconstrained optimization in solving the risk probability was presented in [76].
The study of conjugate gradient (CG) methods was started by Hestenes and Stiefel in 1952 in [60], and the development of CG methods for solving large-scale unconstrained optimization problems is still ongoing. After all these years, there is still a need to find a more efficient CG method that will solve unconstrained optimization problems with thousands of variables in the shortest possible time interval as well as with a minimal number of iterations and function evaluations.
CG methods construct a sequence of approximations x k by the line search rule (1.2), such that the search directions d k are generated by d k := d k := d(β k , g k , d k−1 ) = −g 0 , k = 0, −g k + β k d k−1 , k ≥ 1, (1.10) where β k is the real value which is known as the conjugate gradient update parameter (CGUP). More precisely, the search direction d k of the CG method is defined as a proper linear combination of the gradient descent direction and a positive multiple of the direction used in the previously finished iteration. From (1.10) and (1.2), it clearly follows that the CG methods are defined simply only by the gradient direction g k and by the CGUP β k . Different CG methods arise from proper choices of the scalar β k . According to the common agreement, β M k denotes the parameter β k of the CG method M. It is important to mention that some researchers propose usage of β M+ k = max{β M k , 0}. So, it is possible to use β M+ k instead of β M k and generate corresponding dual method.
Popularity of CG methods is confirmed by a number of recent surveys and book chapters [42,57,85,87,88]. In addition to this basic information on the chronological development of the CG methods, it is also important to mention its applications. In general, CG methods are important in solving large-scale optimization problems. CG methods iterates are characterized by low memory allocation and strong local and global convergence properties. Based on this fact, these methods become useful in all areas where optimization problems of any kind are involved. The CG methods have wide use in solving systems of equations and image restoration problem [12,16,70,78,128,135,136,140], the linear response eigenvalue problem [74], in regression analysis [108,143]. On that way, CG methods have the influence on the development of an artificial neural networks learning algorithms [46,75]. A unique approach to the ABS type CG methods was proposed in [1]. Application of CG methods in solving very large symmetric positive semi definite linear systems that appear in optimal surface parameterizations are described in [64]. Also, it is possible to mention application in data analysis [110]. A variant of the projected preconditioned conjugate gradient method and its application in solving the linear response eigenvalue problem was investigated in [74].
Main goals leading current research paper can be highlighted as follows.
(1) A survey and specific classifications of CG and QN methods for nonlinear unconstrained optimization is presented.
(2) Convergence properties of CG methods are investigated.
(3) Specific numerical testings are performed on both the CG and QN methods. Numerical testing on some classes of CG methods and hybrid CG methods as well as on some QN methods is presented. A numerical experiment about the influence of the scalar t in Dai-Liao CG methods is performed and analysed. Also, gradient descent methods defined by appropriate acceleration parameter are tested and compared.
The overall structure of the paper based on contents of each section is described as follows. Section 1 describes basic notation, introductory notions, preliminaries and motivation. Global algorithms and various line search variants are presented in Section 2. Overview of QN methods and their classification are considered in Section 3. Section 4 gives a specific overview of CG methods. Convergence properties of considered CG methods are investigated in Section 5. According to the presented taxonomy of basic CG methods, properties of CG methods with y T k−1 g k in the numerator of β k are considered in Subsection 5.1, properties of CG methods involving g k 2 in the numerator of β k are given in Subsection 5.2, while the convergence properties of DL methods are presented in Subsection 5.3. Numerical experiments are performed in Section 6. In details, Subsection 6.1 arranges numerical results on QN methods with constant diagonal Hessian approximation, Subsection 6.2 compares numerically basic CG methods involving y T k−1 g k in the numerator of β k , Subsection 6.3 compares basic CG methods with g k 2 in the numerator of β k , while numerical experiments on the hybrid CG methods are presented in Subsection 6.4. Finally, Subsection 6.5 describes numerical experiments on the modified Dai-Liao methods. Concluding remarks are given in Section 7.
2. Global algorithms and line search variants. First, we present an algorithm that describes the general scheme of line search methods

Algorithm 1 Gloab line search algorithm
Require: Objective f (x), initial point x 0 ∈ R n and the tolerance 0 < ε 1. 1: k := 0. 2: while g k > ε do 3: Determine the vector d k which represents the search direction.

4:
Compute the step length α k

5:
Compute the new approximation of the minimum x k+1 := x k + α k d k

6:
k := k + 1 7: end while Ensure: x k+1 , f (x k+1 ) 2.1. Line search procedures. To achieve the global convergence of iterative methods, an appropriate step-size α k is required. The most promising at first glance is the exact line search (ELS), which assumes the unidimensional function Φ(α) := f (x k + αd k ) (2.1) and the step-size is defined after the unidimensional optimization of the form The ELS rule may give the most precise minimum. However, ELS is too expensive in practice or even impossible to implement, especially in situations where the iteration is far from the exact solution.
Applying the iterative procedure (1.2), it is most logical to choose a new point so that the step length α k reduces the value of the goal function: Methods that in each iterative step require the fulfillment of conditions (2.3), that is, the reduction of the value of the objective function, define iterations that in each step approach the minimum of the given function.  [4,8,19,50,55,56,109,126]. In most conjugate gradient methods, one of the next ILS procedures methods is used to calculate the step length α k : Wolfe line search developed in [55,56], strong Wolfe line search, or backtracking line search from [4]. In contrast to the monotonic line search, the non-monotonic line search is also known in the literature, where it is not necessary to reduce the value of the objective function in each iteration [51,52,53]. Although non-monotone techniques do not provide a minimum approach to the function in each iteration, they are very common in practical applications and have very good convergence properties. A number of nonmonotonic linear search methods have been proposed recently (see, for example, [119,120]).
2.1.1. Backtracking line search. A backtracking line search scheme based on the Armijo condition, is aimed to determining the maximal value during the moving along a given search vector. It starts with a relatively large step-size estimate and iteratively reduces the step-size value until a decrease in the value of the objective function is observed, according to the local gradient of the goal function. Let β ∈ (0, 1), ϕ ∈ (0, 1) and α > 0 be given. Then there exists a smallest nonnegative integer m k satisfying The procedure for backtracking line search proposed in [4] starts from the initial value α = 1 and its output values are defined such that it decreases the goal function. Consequently, Algorithm 2 from [112] is used in numerical experiments as the implementation of the ILS which defines the principal step-size α k .
2.1.2. Goldstein line search. In order to ensure a sufficient decrease of the objective function, Goldstein rule for ILS requires the following conditions: where 0 < η < σ 1 < 1. In addition, for conjugate gradient methods, the generalized strong Wolfe conditions, which are a conjunction of (2.7) and are often used, where σ 1 > 0. In the case σ 1 = σ 2 , the generalized strong Wolfe conditions reduce to the strong Wolfe conditions, which are a conjunction of (2.7) and (2.10) The condition (2.7) of the Wolfe conditions is called the Armijo condition, which is often used apart or in the form of its variants.
3. Quasi-Newton methods and their classification. The most general iterative rule of QN type with line search is of the form such that H k is an approximation of the inverse Hesiian G −1 k . Further, it is assumed that B k is an appropriately generated symmetric positive definite approximation of G k [118]. The following notations in defining an appropriate updating formula are typical. The update B k+1 of B k is defined using the rule where E k is defined on the basis of the quasi-Newton property (secant condition) The quasi-Newton condition for the matrix H k is given by Methods that require the calculation or approximation of the Hessian matrix and its inverse belong to the class of QN methods as well as its numerous modifications. The pure Newton method requires calculation of second derivatives matrix, which is avoided in QN methods. As a consequence, the Newton method is computationally expensive and exhibits slow computation, while QN methods are computationally cheap and of faster computation. On the other hand, the Newton method requires lesser number of iterative steps and generates more precise convergence path than QN methods.
3.1. Symmetric Rank-One update. The Symmetric Rank-One update (SR1) assumes the matrix H k+1 in the form where E k is assumed to be a symmetric rank-one matrix. Therefore, The quasi-Newton condition (3.5) initiates The conclusion is that u k must be in the direction s k − H k y k . Suppose that (otherwise H k would satisfy the quasi-Newton equation) and the vector v k satisfies v T k y k = 0. Then, on the basis of (3.6) and (3.7), it follows that The condition that the approximation H k+1 of the Hessian inverse is symmetric requires us to take v k = s k − H k y k , so it was obtained which is the Symmetric Rank One (SR1) update. For general functions, Conn, Gould and Toint in [20] proved that the sequence of SR1 Hessian approximations converges to the true Hessian provided that the steps are uniformly linearly independent.
3.2. DFP update. The Hessian update B k+1 is defined as the solution to the problem min The solution to (3.10) is equal to The inverse Hessian update can be generated using the Sherman -Morrison -Woodbury. Moreover, the DFP update is known as a method of updating, of rank 2, that is, H k+1 is formed by adding the matrix H k with two symmetric matrices, each of which is rank 1: and a, b are scalars. From the quasi-Newton condition (3.5) it follows that (3.11) The vectors u k and v k are not unique, but they can obviously be determined in the following way

So we get
where s k and y k are defined in (3.2). The update BFGS formula for the Hessian matrix can be generated using the Sherman-Morrison-Woodbury formula. A rank-one-modification (or perturbation) M = A + bc * of a matrix A ∈ C m×n uses two vectors b ∈ C m×1 and c ∈ C n×1 . The Sherman-Morrison formula establishes a relationship between M −1 and A −1 as follows [45]: (3.15) As a result, the following update for B k is obtained: 3.4. Broyden family of methods. The weighted combinations of DFP and BFGS updates give the whole update class, which is known as the Broyden class. This class of update is defined by If we put in (3.18): -φ = 0, then we will obtain DFP update (3.12) -φ = 1, then we will obtain BFGS update (3.14) -φ = s T k y k (s k −H k y k ) T y k , then we will obtain SR1 update (3.9). The Broyden class of methods can be derived directly from the quasi-Newton equation. Consider the general formula for updating rank 2, which contains s k and H k y k : where a, b, c are unknown scalars. We obtain (3.21) Here we have two equations with three unknowns, so we can introduce the replacement b = −φ/s T k y k , where φ is a parameter. Solving system (3.21) and substituting the obtained result in (3.20), we obtain where υ k is defined by means of (3.19). Previous expression is identical to (3.18).
3.5. Some modifications of quasi-Newton methods. A great effort has been invested to discover QN methods that do not merely possess convergence but it is also better from the BFGS update in the numerical performance. Table 1 shows some of these modifications of the quasi-Newton equations. Table 1. Some modifications of quasi-Newton equations.
Also, it is important to state spectral gradient method. Therein the updating of the formula for B k+1 is done in the following way [111] The general framework of the QN algorithm is given in Algorithm 3.
3.6. QN methods based on constant diagonal matrix approximation. The general QN iterative rule with line search [118]. The update B k+1 of B k is defined on the basis of the quasi-Newton property (3.4) or (3.5).
According to Brezinski's classification in [15], the structure of updating B k can be divided into three categories: scalar matrix B k = λ k I, diagonal matrix B k = diag (λ 1 , . . . , λ n ) and an appropriate full matrix B k . Optimization methods included in this class of iterations are based on simplest approximation of the Hessian and its inverse as (3.25) where I is a proper n × n identity matrix and and γ k > 0 is a parameter. Such choice leads to the iterative rule Usually, the parameter α k is defined using an available ILS, and γ k is defined according to the Taylor's development of f (x). The iterations (3.26) are termed as improved gradient descent (IGD) methods in [63]. Andrei in [4,6] defined iterations Usage of random values of θ k was proposed in [6]. Later, Andrei in [4] proposed appropriate algorithm for defining θ k in (3.27). The iterative rule (3.27) was called in [4] as Accelerated Gradient Descent (AGD): (3.28) A few modifications of the scheme (3.26) were promoted in [94,96,97,112,114]. The iterations defined in [112] are of the form (3.26), in which γ k I is the Hessian approximation, where γ k = γ(x k , x k−1 ) > 0 is the parameter. The SM method from [112] was defined by the iterations where γ SM k > 0 is the acceleration parameter defined using the Taylor's development of the objective f at the point x SM k+1 , as follows: The Double direction and double step-size accelerated methods, termed as ADSS and ADD, respectively, were originated in [94,96].
The next iterations are known as Accelerated double step-size (ADSS) iterations [94]: (3.31) where α k and l k are step-sizes, derived by two independent backtracking procedures. The TADSS method from [114] is proposed using the assumption α k +l k = 1, which gives An application of the TADSS iterations in aviation industry was investigated in [68].
The particular choice γ k = 1 transforms the IGD iterations (3.26) into the GD iterative rule (1.5). Further, the IGD iterations (3.26) in the case α k = 1 can be viewed as the GD iterations where γ k becomes the primary step length which should be appropriately defined.
Barzilai and Borwein in [11] originated two well known variants of the GD method, known as BB method, with the step length γ BB k := γ −1 k in (3.32). The step length γ BB k in the first case is defined by the vector minimization min γ s k−1 − γy k−1 2 , which yields The symmetric case assumes the minimization γs k−1 − y k−1 2 , which produces (3.34) The BB iterations are defined using γ BB k as follows: The BB method was improved in a number of research articles, main of which are [22,24,34,35,36,37,106,107,122,137].
Another member of the IGD iterates is the Scalar Correction (SC) method [84], defined in (3.32) by the rule Accordingly, the SC iterations are defined by the relation Relaxed BB method by an additional step θ k ∈ (0, 2) is proposed in [105].
A modification of GD method (1.5) was proposed in [63]. It is defined by MGD = M(GD) with Further, the next scheme was proposed as the modified SM (MSM) method in [63]: (3.37) The leading principle used in defining the iterations (3.37) is the replacement of α k in the GD methods (1.5) by the slightly longer step α MSM which means that MSM method proposes a slightly longer step with the aim to additionally accelerate the method. As before, α k ∈ (0, 1) is defined by Algorithm 2. The rationale of this modification lies in the inequalities The acceleration parameter γ k+1 in ADD, ADSS, TADSS and MSM methods are defined, respectively, as: , (ADD method [96]) , (ADSS method [94]) , (MSM method [63]).
The efficiency of IGD methods was numerically tested in [98]. The author of [41] proposed two Relaxed Gradient Descent Quasi Newton (RGDQN and RGDQN1) iteration rules such that ξ k is a proper real value. The RGDQN iterations are defined with randomly generated ξ k ∈ (0, 1), while the RGDQN1 algorithm exploits The following algorithm is known as the SM method and introduced in [112].

Gradient methods accelerated by Picard-Mann hybrid iterative pro-
cess. An application of the Picard-Mann hybrid iterative process from [66] is another possibility to accelerate iterations for solving nonlinear optimization problems. The function T : C → C in (3.40) is defined on a convex subset C of a normed space E. The hybrid iterations define two sequences x k , y k by the rules: (3.39) The real number Υ k ∈ (0, 1) from (3.39) is termed as the correction parameter in [97]. Instead of (3.39), it suffices to use an equivalent iteration given by (3.40) The iterates (3.40) are denoted by H(T, In [66] it was proposed a set of constant values α = Υ k ∈ (0, 1), ∀k in numerical experiments and concluded that the process (3.40) converges faster than the Picard, Mann and Ishikawa iterations from [62,83,101]. Further, (3.40) was applied in [97] for a hybridization of the SM method, known as HSM. Using the mapping T in (3.39) or (3.40) to coincide with the SM rule (3.29): the iterations (3.40) become the so called HSM iterative rule given as .
A modified HSM (MHSM) method is defined in [92] by proposing an appropriate initial value in the backtracking procedure. A hybridization of the ADD method was considered in [99] in the form Recently, the hybridization HTADSS ≡ H(TADSS) was proposed, investigated and tested in [95]. 4. Overview of conjugate gradients methods. Nonlinear conjugate gradient (CG) methods form a class of important methods for solving unconstrained nonlinear optimization and solving system of nonlinear equations. Nonlinear CG methods are defined by the line search iterates (1.2) where the search direction d k is defined by (1.10) and the CGUP β k is given using one of many available rules.
In this article, a review on CG methods for unconstrained optimization is given. Main convergence theorems are provided for the conjugate gradient method assuming the descent property of each search direction. Some research issues on conjugate gradient methods are mentioned.
In [21], the nonlinear CG methods are divided into three classes: early conjugate gradient methods, descent conjugate gradient methods, and sufficient descent conjugate gradient methods. Andrei classified the CG methods in three classes: classical CG methods, scaled CG methods, and hybrid and parameterized CG methods.
The classification presented in this paper divides CG methods into the following categories: basic conjugate gradients methods, considered in Subsection 4.1, Dai-Liao class of methods, presented in Subsection 4.2, hybrid conjugate gradient methods, described in Subsection 4.3, and combined BFGS-CG iterations, in Subsection 4.4. Table 2. Some modifications of quasi-Newton equations.
Dai-Yuan 1999 [30] 4.1. Basic conjugate gradients methods. The CG methods included in Table  2 are known as early or classical conjugate gradient methods. where and · stands for the Euclidean vector norm.
In the listed CG methods, the numerator of the update parameter β k is either g k 2 or y T k−1 g k and the denominator is either Two possible choices for the numerator and the 3 possible choices for the denominator lead to 6 different choices for β k . Table 3. Classification of CG methods.

Denominator
Numerator Define the following functions But, there exist exceptions to these rules. One example is given in [38] β From the presented chronological development of the CGUP, we can see that the β D k choice of the CG parameter differs structurally from the other choices.
For a large-scale unconstrained nonlinear optimization problem, in practice, choices for updating a CG parameter that do not require computation or approximation of the Hessian and its inverse are preferred over methods that require Hessian or its approximation in each iteration.
Wei et al. [124] gave a variant of the PRP method which we call the VPRP method, with the parameter β k The VPRP method was extended to a variant of the HS method by Yao et al. in [132], Zhang [138] took a little modification to the VPRP method and constructed the NPRP method as follows, Moreover, Zhang [138] extended this result to the HS method and proposed the NHS method as follows, Recently, Wei et al. [125] proposed a variation of the FR method which we call the VFR method. the parameter β k in the VFR method is given by , µ 3 ∈ (0, +∞) and 1 is an any given positive constant. Motivated by these modifications, in [29] the authors defined the modified PRP method as Recently, Wei et al. [18] gave a variant of the PRP method which we call the WYL method, that is, The WYL method was extended to a variant of the LS method by Yao et al. [132], that is, Also, the following function will be useful: Some particular CG variants are β DHS k [29] and β DLS k [139], defined by , µ > 1 (2017). (4.1) If the functions

Dai-Liao method and its variants. An extension of the conjugacy condi
was studied by Perry [93]. Perry tried to incorporate the second-order information of the objective function into the CG method to accelerate it. Specifically, by using the secant condition and the search direction of the QN methods, which are respectively defined by the following relation is obtained: where B k is a symmetric approximation to the Hessian matrix G k . Then Perry accordingly replaced the conjugacy condition (4.2) by the following condition Furthermore, Dai and Liao [26] included a nonnegative parameter t into Perry's condition and gave In order to find the search direction d k in (1.10) which satisfies the conjugacy condition (4.6), it suffices to multiply (1.10) by y k−1 and use (4.6), yielding Expression (4.7) for defining β k characterizes the Dai where t > 0 is a scalar and y k−1 = g k − g k g k−1 g k−1 . Clearly β DL k with t > 0 defines a class of nonlinear CG methods. Moreover, in the case of the exact line search, i.e., g T k s k−1 = 0, then β DL k = β HS k . Some additional CG methods from the DL class are β MLSDL k [18] and β ZZDL k [141], defined as follows: In order to characterize this family of CG methods, define the mapping The research for the best values of the parameter t was divided into two directions. One direction was to find the best fixed value for t and the other the best approximation for t in each iteration. Analyzing the results from [18,26,131,139], we conclude that the scalar t was defined by a fixed value of 0.1 in numerical experiments. Also, numerical experience related to the fixed valued t = 1 was reported in [26]. Common numerical experience is that different choices of t initiate totally different numerical experience. This was the reason for further research to focus on the values of t that change through iterations. The value of t in arbitrary kth iteration will be denoted by t k . Hager and Zhang in [55] defined t k by the rule Babaie-Kafaki and Ghanbari [9] presented two appropriate choices of the parameter t in (4 .7): Andrei in [5] suggested the following value for t in (4.7) which becomes a variant of the DL method, denoted by DLE: (4.14) Lotfi and Hosseini in [81] discovered the most recent approximations of the parameter t k .

4.3.
Hybrid conjugate gradient methods. In the subsequent sections, we will survey recent advances in CG methods. Two main research streams can be observed: the algorithms which improve the scalar parameter t and the algorithms which improve the CG parameter β k .
Hybrid CG methods can be segmented into two classes: mixed methods as well as methods combined together by introducing one or more parameters.
The following hybrid CG method was suggested in [121]: When a jam in iterations occurs again, the PRP update parameter is used. Hu and Storey in [61] had a similar motivation and suggested the following rule In [49] it is pointed out that β PRP k can be negative, even for strongly convex functions. In an effort to extend the allowed choices for the PRP update parameter, while retaining global convergence, Nocedal and Gilbert [49] suggested the choice Dai and Yuan [31] combined the DY method with other CG methods, which leads to the following CGUP parameters: and In [28], they tested different CG methods for large-scale unconstrained optimization problems and concluded that the hybrid CG method (4.18) gave the best results.
Next hybrid CG method, proposed by Dai in [23], employs either the DY scheme or the CD scheme: (4.20) A modified CG method defined as the hybridization of known LS and CD conjugate gradient methods is presented and analyzed in [130] by the rule CG methods can be combined together by introducing one or more parameters. In [32,33], Dai and Yuan proposed a one-parameter family of CG methods with where θ k ∈ [0, 1] is a parameter. Note that β k = β FR k in the case θ k = 1, and Another hybrid method, proposed by Delladji, Belloufi and Sellami in [39], exploits either the PRP scheme or the HZ scheme, as where ν k , θ k ∈ [0, 1]. This two-parameter family includes FR, DY, PRP, and HS methods as extreme cases.
In [113], the authors proposed hybrid CG methods where the search direction d k := d k , k ≥ 1, is improved using one of the rules and β k is determined using appropriate combinations of β k used in Table 2 and/or previously defined hybridizations. In [113], the authors defined a modification of LSCD method, defined in [130] by The resulting method is known as the MLSCD method with the search direction In general, the idea is based on the replacement of d k = d k (β LSCD k , g k , d k−1 ) from [130] by d k = D(β LSCD k , g k , d k−1 ). Now we give the general framework of the CG class of methods.
STOP; else go to Step 3.
The first iteration in CG methods is a gradient step. Also, it is common to restart the algorithm periodically by taking the gradient step.

Broyden-Fletcher-Goldfarb-Shanno conjugate gradient methods.
Known fact is that CG iterates are better than QN methods in terms of the CPU time. Moreover, BFGS updates require greater memory space usage than CG. On the other hand, the QN methods require lesser number of iterations as well as the number of function evaluations. For this purpose, one of the modern trends in defining new CG methods is usage of the BFGS update in defining new rules for defining β k . A hybrid method which solves the system of nonlinear equations combining the QN method with chaos optimization was discovered in [82]. In [58], the authors defined a combination of a QN and the Cauchy descent method for solving unconstrained optimization problems, which is known as the quasi-Newton SD method. A hybrid direction defined as a combination of the BFGS update of B k and the CGUP β k was considered in [10,67]. The DFP-CG method was originated in [91]. A three-term hybrid BFGS-CG method (termed as H-BFGS-CG1) was proposed in [113] by the search direction In [113], the authors investigated hybrid CG algorithms based on the modified search direction which is defined using one of the following two hybridizations: as well as on the usage of β k defined using convenient combinations of the parameters β k involved in Table 2 and previously defined hybridizations. The matrix B k in (4.30) is defined as an appropriate Hessian approximation by the BFGS update. A three-term BFGS-CG method, known as H-BFGS-CG1, was defined in [113] using

5.
Convergence properties of CG methods. Below we give some assumptions related to line search procedures.
The objective function f is continuous and differentiable in a neighborhood N of S, and its gradient g is Lipschitz continuous. So, there exists a positive constant L > 0, satisfying 3) The proof of Lemma 5.1 is given in [142] and known as the Zoutendijk condition.  Table 2, a clear observation is that HS, PRP and LS methods involve the expression y T k−1 g k in the numerator of the parameter β k . We first mention the Property (*) for β k given by Gilbert and Nocedal [49]. The Property (*) implies that β k is bounded and small when the step s k−1 is small. Property (*) [49] Let a method defined by (1.2) and (1.10) satisfies 0 < γ ≤ g k ≤γ, (5.5) for all k ≥ 0. Under this assumption we say that the method possesses the Property (*) if there exist constants b > 1 and λ > 0 such that for all k: and In order to prove that conjugate gradient methods have Property (*), it suffices to show that there exists a constant c > 0 such that |β k | ≤ c s k−1 for all k, (5.8) under the assumption (5.5). Then, by putting λ = 1 2bc , we have |β k | ≤ max{1, 2bc} ≡ b and which confirms the Property (*). It is easily shown that (5.8) holds for the HS, PRP and LS methods, and thus these methods have Property (*). Next, we give the global convergence theorem of CG methods satisfying Property (*). The proof of Theorem 5.2 is given in [49].

Properties of CG methods involving g k
2 in the numerator of β k . In this section, we recall properties of the FR, CD and DY methods. If we look at the chronological development presented in Table 2, it is observable that FR, CD and DY methods involve the value g k 2 in the numerator of the parameter β k . If the step-size α k satisfies the generalized strong Wolfe conditions (2.7) and (2.9), the following properties are obtained.  .7) and (2.9) with σ 1 + σ 2 < 1, then For the DY method, if α k satisfies the generalized strong Wolfe conditions (2.7) and (2.9), then (c) For the CD method, if α k satisfies the generalized strong Wolfe conditions (2.7) and (2.9) with σ 2 < 1, then Proposition 1 implies that the FR, CD and DY methods satisfy the sufficient descent condition (1.6), dependent on line searches.
We now give the global convergence properties of the FR and DY methods, which were proven in [2] and [30], respectively. Methods surveyed in Subsection 5.2 exhibit strong convergence properties, but they may not be efficient in practice due to appearance of jamming. On the other hand however, methods given in Subsection 5.1 may not be convergent in general, they often perform better than the methods restated in Subsection 5.2. Details about this fact are given in [65]. This clearly implies that combinations of these methods have been proposed with the aim of exploiting attractive features of each family of methods.

5.3.
Convergence of DL methods. The following assumptions will be commonly used in the subsequent convergence analysis of DHSDL, DLSDL, MHSDL and MLSDL methods.
It is supposed that the conditions in Assumption 5.1 hold. Assumption 5.1 initiates the existence of positive constants D and γ satisfying (5.2) and (5.3).
By the uniform convexity of f , there exists a constant θ > 0 such that 14) or equivalently, It follows from (5.14) and (5.15) that By (5.1) and (5.16), we have where the inequalities imply θ ≤ L. In order to improve presentation, an arbitrary method defined by (1.2) and (1.10) will be denoted by M(α k , β k ). In our current investigation, it will be assumed β It is assumed that α k satisfies the backtracking condition. Further, we will use the notation in each iterative step k.
Since µ > 1 the proof is completed.
Proof. The inequality (5.23) will be verified by induction. In the initial situation k = 0, one obtains g T 0 d 0 = − g 0 2 . Since c ≤ 1, obviously (5.23) is satisfied in the basic case. Suppose that (5.23) is valid for some k ≥ 1. By taking the inner product of the left and right hand side in (1.10) with the vector g T k , it can be obtained An application of (5.21) and s k−1 = α k−1 d k−1 leads to further conclusions: From (5.20), t > 0 and α k−1 > 0, one obtains In view of λ ≥ 1, the inequality (5.23) is satisfied for c = 1 − 1 λ and arbitrary k ≥ 0.
The global convergence of the proposed methods is confirmed by Theorem 5.8. Squaring both sides of (1.10) implies Taking into account (5.22), one can obtain Now from (5.25), with respect to t from Lemma 5.7, we conclude Using (5.35) and (5.37) in (5.33), we obtain (5.38) Next, dividing both sides of (5.38) by g k 4 and using (5.32), it can be concluded (5.39) The inequalities in (5.39) imply Therefore, g k ≥ c 1 causes a contradiction to (5.4). Consequently, (5.31) is confirmed for Case 1.
Case 2. In the cases β k ≡ β MLSDL k , applying Lemma 5.6 and Assumption 5.1, we have Replacement of (5.35) and (5.42) in (5.33) leads to (5.43) Next, dividing both sides of (5.43) by g k 4 and using (5.32), it can be concluded The inequalities in (5.44) imply Therefore, g k ≥ c 1 causes a contradiction to (5.4). Consequently, (5.31) is confirmed for Case 2. The proof is complete. The numerical experiments are performed on contains functions presented in [3], where much of the problems are taken over from CUTEr collection [14]. Each test function is tested 10 times with a gradually increasing values of the dimension by the rule n = 10, 50, 100, 200, 300, 500, 700, 800, 1000 and 1500. Strong Wolfe line search use the following choice of parameters for all algorithms σ 1 = 0.0001 and σ 2 = 0.5.
We utilized the performance profile given in [43] to compare numerical results (IT, FE and CPU) for all tested methods. The upper curve of the selected performance profile corresponds to the method that shows the best performance. 6.1. Numerical experiments on QN methods with constant diagonal Hessian approximation. BFGS, DFP, SR1 updates in QN methods with respect to different ILS strategies are compared in [40]. The main conclusion is that the BFGS method is superior to the others. Continuing such research, we compare the numerical performances obtained from AGD, MSM and SM methods, i.e, gradient methods with acceleration parameter. The numerical experiment contains 25 test functions proposed in [3]. For each of tested functions, we performed 12 numerical experiments with 100, 200, 300, 500, 1000, 2000, 3000, 5000, 7000, 8000, 10000, and 15000 variables. Tested algorithms are based on the same implementation of the backtracking line search (Algorithm 2), which we set ω = 0.0001 and ϕ = 0. 8 The uniform stopping criteria in this numerical experiments are Summary numerical data generated by AGD, MSM and SM method, tried on 25 test functions, are arranged in Table 4. Table 4 contains numerical results corresponding to IT, FE and CPU criteria for the AGD, MSM and SM methods. Figures 1, 2, 3 illustrate the performance profiles corresponding to the results in Table 4 corresponding to the criterion IT, GE and CPU, respectively. From Table 4, we conclude that the AGD, MSM and SM methods have successfully solved all test functions. Figure 1 presents the performance profiles of the IT of the AGD, MSM and SM methods. In this figure, it is observable that MSM method is best in 52.00% of the test functions compared with: AGD (24.00%) and SM (32.00%). From Figure 1, it is observable that the graph of the MSM method comes first to the top, which means that the MSM is superior compared to other considered methods with respect to the IT profile. Figure 2 presents the performance profiles of the FE of the AGD, MSM and SM methods. It is observable that MSM method is best in 64.00% of tested functions compared with: AGD (12.00%) and SM (32.00%). In view of Figure 2, the MSM graph first comes to the top, which means that the MSM is winer with respect to the FE profile.  Figure 3 presents the performance profiles of the CPU of the AGD, MSM and SM methods. It is obvious that MSM is winer in 56.00% of the test functions with respect to: AGD (4.00%) and SM (44.00%). Figure 3 demonstrates that the graph of the MSM method first comes to the highest level, which signifies that the MSM is winer with respect to the CPU.  From the previous analysis of the results shown in Table 4 and Figures 1-3, we can conclude that the MSM iterates are most efficient in terms of all three basic metrics: IT, FE and CPU. The MSM method has the smallest IT, FE and the CPU time compared to the other two methods on the most test functions.

6.2.
Numerical experiments on the CG methods with y T k−1 g k in the numerator of β k . The uniform stopping criterion during testing CG methods is where = 10 −6 or when the number of function evaluations becomes greater than 1000000.
In this subsection, we compare the numerical results obtained from HS, PRP and LS methods, i.e., methods with y T k−1 g k in the numerator of β k . The numerical experiment is based on 26 test functions. Summary numerical results for HS, PRP and LS method, tried on 26 test functions, are presented in Table 5. Table 5 shows the numerical results (IT, FE and CPU) for the HS, PRP and LS methods.   Table 5 with respect to IT, FE and CPU criterion, respectively. Figure 4 presents the performance profiles of the IT correspondig to the HS, PRP and LS methods. In this figure, it is observable that PRP method is best in 61.54% of the test functions compared with: HS (26.92%) and LS (23.08%). From Figure 4, it is observable that the graph of the PRP method comes first to the top, which signifies that the PRP outperforms other considered methods with respect to the IT criterion. Figure 5 presents the performance profiles of the FE of the HS, PRP and LS methods. It is observable that PRP method is best in 69.23% of the test functions compared with: HS (23.08%) and LS (23.08%). From Figure 5, it is observed that the PRP graph first comes to the top, which signifies that the PRP is the best with respect to the FE. Figure 6 presents the performance profiles of the CPU of the HS, PRP and LS methods. It is obvious that PRP is best in 69.23% of the test functions compared with: HS (11.54%) and LS (19.23%). Figure 6 demonstrates that the graph of the PRP method first comes to the top, which signifies that the PRP is the best with respect to the CPU.
From the previous analysis of the results shown in Table 5 and Figures 4-6, we can conclude that the PRP method achieved the best and most efficient results in terms of all three basic metrics: IT, FE and CPU.

6.3.
Numerical experiments on CG methods with g k 2 in the numerator of β k . In this subsection, we compare the numerical results obtained from DY, FR and CD methods, i.e, methods with g k 2 in the numerator of β k . The numerical experiment contains 25 test functions. Summary numerical results for DY, FR and CD method, tried on 25 test functions, are presented in Table 6. Table 6 contains numerical results (IT, FE and CPU) for the DY, FR and CD methods. Figures 7, 8 and 9 plot the performance profiles for the results in Table 6 with respect to profiles IT, FE and CPU, respectively. From Figure 7, it is observable that the graph of the CD method comes first to the top, which signifies that the CD outperforms other considered methods with respect to the IT. Figure 8 presents the performance profiles of the FE of the DY, FR and CD methods. From Figure 8, it is observed that the CD graph first comes to the highest level, which means that the CD possesses best performances with respect to the criterion FE. Figure 9 presents the performance profiles of the CPU of the DY, FR and CD methods. Figure 9 demonstrates that the graph of the CD method first achieves the top level,so that the CD is winer with respect to the CPU.
From the previous analysis of the results shown in Table 6 and Figures 7-9, it is clear that the CD method achieved most efficient results in terms of all three basic metrics: IT, FE and CPU. 6.4. Numerical experiments on the hybrid conjugate gradient methods. This subsection analyses numerical results obtained by running a MATLAB implementation with predefined conditions given at the beginning section. The following  Summary numerical results for hybrid CG methods, tried on 25 test functions, with respect to IT, FE and CPU profiles are presented in Table 7.  Figure 10 plot corresponding performance profiles IT for the results included in Table 7, in three columns denoted by IT.
From Figure 10, it is observable that the graph of the HCG6 method comes first to the top, which signifies that the HCG6 outperforms other considered methods with respect to the IT. However, if we look in more detail Figure 10, we can see that the HCG6 method does not have the best results, it has the best results in only (16%), while the HCG7 method has the best results in (52%) on the number of functions tested. The reason for such behavior lies in the fact that the HCG6 method is the only one that has successfully solved all the test problems.
The numerical results of the hybrid CG methods with respect to the FE are arranged in Table 8.   Figure 11 plots the performance profiles for the results in Table 8, in three columns denoted by FE.
From Figure 11, it is observable that the graph of the HCG6 method comes first to the top, which signifies that the HCG6 outperforms other considered methods with respect to the FE. However, if we look in more detail Figure 11, we can see an identical situation as in Figure 10 that the HCG6 method does not have the best results, it has the best results in only (16%), while the HCG2 method has the best results in (48%) on the number of functions tested. Table 9 contains numerical results of the hybrid CG methods with respect to the CPU. Figure 12 plots the performance profiles for the results in Tables 9. From Figure 12, it is observable that the graph of the HCG6 method comes first to the top, which signifies that the HCG6 outperforms other considered methods  with respect to the CPU. However, if we look in more detail Figure 12, we can see an identical situation as in the figures 10 and 11 that the HCG6 method does not have the best results, it has the best results in only (8%), while the HCG7 and HCG10 methods has the best results in (20%) on the number of functions tested.
6.5. Numerical experiments on the modified Dai-Liao methods. The numerical experiments presented in this subsection investigate the influence of the scalar size in the modified Dai-Liao methods. The previously mentioned variants of the DL method use a fixed value of the parameter t. It can also be seen that the scalar t in all the above papers are greater than 0 and is less than 1. Analyzing the results from [18,26,131,139], we conclude that the scalar t was defined by a fixed value of 0.1 in numerical experiments. Also, numerical experience related the fixed valued t = 1 was reported in [26]. Common numerical experience is that different choice of t initiate totally different numerical experience.
That is why we come to the next question. What is the 'best' value of t ∈ (0, 1) from the computational point of view? Because of that, our intention is to investigate numerically and theoretically behavior of different variants of the DL conjugate gradient framework with respect to various values t. For this reason, we started this research with the aim to find answer to the aforementioned question. Our strategy is to select several values of the parameter t within the interval (0, 1) and to compare the obtained results based on different criteria. In that way, we will get the answer to the question: whether it is better to take values closer to zero or closer to one. 6.5.1. Motivations and the corresponding algorithm. As we have already indicated in the previous section, the aim of this subsection is to answer to the question: what is the 'best' value of t ∈ (0, 1) in DL CG computational scheme? Our plan is to examine numerically the influence of the scalar t in the DL class of iterations and determine some rules for its appropriate choice and, if possible, find the best value. The detailed research plan is to find the answer to two challenging questions: -Does and how much the values of the scalar t affect each of the methods DHSDL, DLSDL, MHSDL and MLSDL, which are observed individually, with respect to IT, FE and the CPU time (CPU)?
-Does the choice of t favor one (or some) of the considered methods?
To give an answer to these questions, we would have to test all the methods under the same conditions. During testing, we will compare all the considered methods with the same values of required scalars. The Algorithm 2, i.e. the backtracking line search, determines the step-size α k in (1.2).
Algorithm 6 gives the corresponding general framework for DHSDL, DLSDL, MHSDL and MLSDL methods.
Values of t used during the testing of the observed methods are given in the Table 10. Each particular value of the scalar t is marked with one of the labels T 1 to T 6, corresponding to a joined value of t. Five of the six given values are fixed during testing of the observed methods. Only the value of t labeled by T 1 is variable during the iterations, where the value in kth iteration is denoted by t k .
In this case, the value t k is obtained from backtracking line search, i.e., t k = α k inherits a new value obtained from Algorithm 2 in each iteration. The reason why we decided to define values t k from the backtracking line search algorithm is: -t k ∈ (0, 1); -t k also affects the iterative steps when computing the next value for x k . Value of the scalar t t k = α k 0.05 0.1 0.2 0.5 0.9 The convergence of the above methods has already considered in the mentioned references. Our goal is to give a unified analysis of the convergence of proposed DL methods. The aim of our research is to investigate the influence of the scalar t in the DL iterates.
Each particular value of the scalar t is marked with one of the labels T 1 to T 6, corresponding to a joined value of t. Five of the six given values are fixed during testing of the observed methods. Only the value of t labeled by T 1 is variable during the iterations, where the value in kth iteration is denoted by t k . In this case, the value t k is obtained from backtracking line search, i.e., t k = α k inherits a new value obtained from Algorithm 2 in each iteration.
The convergence of the above methods has already considered in the mentioned references. Our goal is to give a unified analysis of the convergence of the proposed DL methods. The aim of our research is to investigate the influence of the scalar t in the DL iterates.
Arranged numerical results are generated by testing and comparing DHSDL, DLSDL, MHSDL and MLSDL methods on 6 different values t, denoted by T 1-T 6. Three important criteria (IT, CPU, FE) in all tested methods are analyzed. Numerical report is based on 22 test functions proposed in [3]. For each of tested functions, we performed 10 numerical experiments with 100, 500, 1000, 3000, 5000, 7000, 8000, 10000, 15000 and 20000 variables. Summary results for DHSDL, DLSDL, MHSDL and MLSDL methods, tried on 22 tests, are presented.
The uniform stopping criteria are (refer to previous) The backtracking parameters for all algorithms are ω = 0.0001 and ϕ = 0.8. During the testing of the DHSDL and DLSDL methods, the constant parameter µ = 1.2 was used in each iteration. 6.5.2. Numerical eexperiments on DHSDL method. Figure 13 indicates the performance profiles of the IT criterion with respect to the DHSDL method depending on the scalar value t. This figure exhibits that the DHSDL method successfully solved all the problems for all values of the scalar t. Also, the DHSDL-T2 method is finest in 45.5% of testings with respect to DHSDL-T1 (18.2%), DHSDL-T3 (9.1%), DHSDL-T4 (4.5%), DHSDL-T5 (13.6%) and DHSDL-T6 (13.6%). It can be noticed that the graph of DHSDL-T2 reaches the top first, which signifies that DHSDL-T2 is the best with respect to IT.  (T1,T2,T3,T4,T5,T6) method based on IT. Figure 14 shows the performance profile given by FE of the DHSDL solver with respect to t. Evidently, DHSDL solves all test problems for all values of the scalar t, and the DHSDL-T2 method is winer in 50.0% of the test problems compared to the DHSDL-T1 (18.2%), DHSDL-T3 (9.1%), DHSDL-T4 (0%), DHSDL-T5 (13.6%) and DHSDL-T6(9.1%). From Figure 14, it is notifiable that the DHSDL-T2 is the best with respect to FE. Figure 15 illustrates the CPU criterion spanned by the DHSDL method depending on the metrics t. Again, the DHSDL method is able to solve all the tested problems for all values of the scalar t. Further, the DHSDL-T6 method is superior in 40.9% of tests with respect to the DHSDL-T1 (4.5%), DHSDL-T2 (27.3%), DHSDL-T3 (9.1%), DHSDL-T4 (4.5%) and DHSDL-T5 (13.6%). We also observed that at the beginning, DHSDL-T2 does not perform well but as the number of problems increases, its graph crosses other graphs and achieves the top fist, which means that the DHSDL-T2 is dominant with respect to CPU.  (T1,T2,T3,T4,T5,T6) method based on CPU time.
The graphs displayed in figures [13][14][15] show that the DHSDL method has achieved superior results for t ≡ T 2 = 0.05. 6.5.3. Numerical experiments on DLSDL method. Figure 16 illustrates the IT criterion in the DLSDL method depending on the scalar value t. It is observable that DLSDL successfully solves all the problems for all values of the scalar t. Moreover, the DLSDL-T2 method is winner in 36.4% of the tests compared to DLSDL-T1 (22.7%), DLSDL-T3 (13.6%), DLSDL-T4 (9.1%), DLSDL-T5 (9.1%) and DLSDL-T6 (13.6%). Figure 16 (left) exhibits that the graph DLSDL-T2 achieves the top first, which means that DLSDL-T2 outperforms all the other methods with respect to IT. Figure 17 shows the performance profiles of the criterion FE corresponding to the DLSDL method and the scalar value t. In this Figure, it is observable that DLSDL for all values of the scalar t successfully solves all tests, and DLSDL-T2 is best in 36.4% of the test functions in comparison to DLSDL-T1 (18.2%), DLSDL-T3 (4.5%), DLSDL-T4 (18.2%), DLSDL-T5 (13.6%) and DLSDL-T6 (9.1%). From Figure 17, it is observed that DLSDL-T2 is the best with respect to FE.  (T1,T2,T3,T4,T5,T6) method based on FE. Figure 18 shows the performance profiles of the CPU time of the DLSDL method depending on t. The graphs in this figure indicate that DLSDL method solved all the problems for all values of t, and the DLSDL-T4 method is superior in 31.8% of the test problems compared to DLSDL-T1 (13.6%), DLSDL-T2 (22.7%), DLSDL-T3 (18.2%), DLSDL-T5 (9.1%) and DLSDL-T6 (4.6%). We also observed that at the beginning, DLSDL-T2 does not perform well; but as the number of problems increases, its graph crosses other graphs and comes to the top which signifies that, with respect to CPU, the DLSDL-T2 is the best.
Based on figures (16)(17)(18) analysis, we come to the conclusion that the DLSDL method has achieved best responses for t ≡ T 2 = 0.05. 6.5.4. Analysis of average values. In Subsection 6.5.1, we have defined two questions. Our aim in this subsection is to give answers. In order to answer properly to the first question, in Tables 11, 12, 13 are collected average values for all three considered criteria.    After the numerical testing of the compared methods and the individual analysis for each method, we can now give a global conclusion of the behavior of the observed methods. The first conclusion is that the value of the scalar t significantly affects on each of the DHSDL, DLSDL, MHSDL and MLSDL methods with respect to all three criteria. Further, a common conclusion is that all the methods give the best performance profiles for the scalar value t = 0.05. We can also give an answer to another question. According to the previous analysis, a particular selection of the scalar value t can give priority to one of the observed method. All methods do not behave identically for the same values of the scalar t. If we observe Table 11 which contains the average number of iterations, we can notice that the difference between the smallest and the biggest average result of the observed method is in the range of 10.1% to 12.1% in relation to the minimum obtained value. Table 12 shows the average results related FE. An individual comparison of considered methods leads to the conclusion that the difference between the smallest and the biggest average results is in the range of 6.6% to 8.9% in relation to the minimum obtained value. This brings us to the same conclusion once again, that is, the value of the scalar t affects the methods in a given percentage. Also, if we observe Table 13 with the average CPU time, we can notice that the difference between the smallest and the biggest average CPU time observed method is in the range of 2.5% to 10.6% in relation to the minimum value.

7.
Conclusion. Overview of QN methods, CG methods and their classification are presented. Section 5 investigates convergence properties of CG methods, following the CG classes in accordance with the presented taxonomy of basic CG methods. Numerical experiments compare main classes of QN and CG methods. More precisely, main QN methods with constant diagonal Hessian approximation are compared as well as two classes of basic CG methods, hybrid CG methods, and finally some variants of modified Dai-Liao methods.
The problem of defining further improvements of the CGUP β k is still open. Moreover, new CG methods could be defined using appropriate updates of the parameter t. Another research stream includes various hybridizations of so far proposed CG methods. On the other hand, there are open possibilities for defining new updates of the matrices B k and H k , used in defining QN methods. Continuing research on some composite definitions of β k based on CG and BFGS updates, it is possible to discover new three-term (or even different) CG variants.
One of prospective fields for further research includes a generalization of the discrete-time approach to continuous-time approach, considered in [69]. Another possibility for further research is extension of gradient methods to tensor case. This possibility was exploited in [77] on solving M-tensor equations.
Moreover, an application of low rank updates used in optimization can be transferred to appropriate numerical methods for computing generalized inverses. Applications of rank-one update formula are investigated in [115,116].