ARTIFICIAL INTELLIGENCE COMBINED WITH NONLINEAR OPTIMIZATION TECHNIQUES AND THEIR APPLICATION FOR YIELD CURVE OPTIMIZATION

. This study makes use of the artiﬁcial intelligence approaches combined with some nonlinear optimization techniques for optimization of a well- known problem in ﬁnancial engineering called yield curve. Yield curve estimation plays an important role on making strategic investment decisions. In this paper, we use two well-known parsimonious estimation models, Nelson-Siegel and Extended Nelson-Siegel, for the yield curve estimation. The proposed mod- els of this paper are formulated as continuous nonlinear optimization problems. The resulted models are then solved using some nonlinear optimization and meta-heuristic approaches. The optimization techniques include hybrid GPSO parallel trust region-dog leg, Hybrid GPSO parallel trust region-nearly exact, Hybrid GPSO parallel Levenberg-Marquardt and Hybrid genetic electromagnetism like algorithm. The proposed models of this paper are examined using some real-world data from the bank of England and the results are analyzed.

1. Introduction. One of the most important issues on bond market is to estimate the yield curve in time horizon which is used for different purposes such as investment strategies, monetary policy, etc. In the bond market, there are different kinds of bonds with different times of maturities to invest. In this paper, a bond with no coupon payment, zero-coupon bond, is considered where its yield to maturity is the spot interest rate for its maturity. Since these decisions are crucial, the yield curves and their estimation methods have been broadly considered. The estimation models of the yield curve can be classified into two popular groups: (a) Parsimonious yield curve models and (b) Spline-based models. Parsimonious models present a function for forward interest rate and have some parameters to be estimated whereas spline-based methods approximate the yield curve by some piecewise polynomial functions, which can be either quadratic or cubic [1]. In this paper, two parsimonious models, Nelson-Siegel and Extended Nelson-Siegel, are taken into consideration, which are essentially nonlinear optimization problems. Thus, nonlinear optimization techniques such as hybrid parallel versions of Trust region-dog leg, Trust region-nearly exact and Levenberg-Marquardt methods are employed for the first time to solve the resulted problem formulation. The hybridization is occurred by employing the hybrid genetic particle swarm optimization technique. In addition, the hybrid genetic electromagnetism like meta-heuristics is also employed to solve the problem. In the following, the relevant literature is reviewed in two scopes of yield curve estimation modelling and optimization. Nelson and Siegel [18] are the first who present the original parsimonious model of the yield curve by introducing a function with a few parameters for the forward interest rates. The model is considered either monotonic or with a hump. Later, Svensson [23]- [24] and Soderlind and Svensson [22] extend the model by adding some parameters which allows the curves to have a second hump. The new model is known as Extended Nelson-Siegel model or Svensson model which captures a greater variety of shapes for yield curves. Pooter [21] examined various extensions of the Nelson and Siegel model [18] with the purpose of fitting and forecasting the term structure of interest rates. Bolder and Streliski [5] investigated the behaviour of the Nelson-Siegel and the Svensson yield curve models on the data from the bank of Canada. Mcculloch [16] develops the quadratic spline-based model to estimate the yield curve. The cubic spline method proposed by Mcculloch [17] seems to have a better performance than the quadratic spline models. Bliss [4] compares five distinct methods for estimating the term structure: the Unsmoothed Fama-Bliss method, the Smoothed Fama-Bliss method, Mcculloch method (cubic spline), the Fisher-Nychka-Zervos method (cubic spline) and the extended Nelson-Siegel method. Also, there is an approach in which the nonlinear parameters are predetermined [8]. There are different optimization techniques to solve yield curve estimation models. Csajbok [7] compares different kinds of yield curve models using the Gauss-Newton algorithm. Ioannides [11] uses the BFGS algorithms for estimating the yield curve. Manousopoulos and Michalopoulos [13]- [14] study the yield curve estimation methods in Greek bond market and define the models of estimating the yield curve as a nonlinear optimization problem. Manousopoulos and Michalopoulos [15] compare different nonlinear optimization algorithms in order to estimate the yield curves. In this work, they use parsimonious modeling of the yield curve and some algorithms such as Nelder-Mead, Powel, Simulated annealing, BFGS and generalized reduced gradient are implemented to solve the yield curve optimization problem. The remaining of the paper is organized as follows. The studying issue of this paper is elaborated in section 2. Section 3 describes the solution procedure used to solve the yield curve optimization. In section 4 the computational results are presented and the proposed methods are compared. Finally in section 5 we conclude the paper along with proposing future research directions.
2. Problem statement. One important decision in the bond market is to estimate and optimize the yield curve. Yield curve is a curve that depicts the term structure of interest rate and describes the yield as a function of time. There are various types of yield curve. The frequently used one is spot interest rate curve or zerocoupon yield curve which considers bonds with no coupon payment. To understand about the spot interest rate for maturity m, consider a bond with m payments. The price of a bond is determined through the present value of its future cash flows (see formula 1).
where C is the periodic coupon payment in times t j (j = 1, 2, , m) , R is the redemption payment in time t m and is the discount function. The discount function is continuously compounded according to formula 2.
where s(t) denotes the spot interest rate for maturity time t. The instantaneous forward rates which guarantee the return of the investment in a single period are then defined by formula 3.
Finally, the spot interest rate for maturity m is calculated as the mean forward rate over the interval [0, m] (see formula 4).
The corresponding spot interest rate for maturity m is then defined with formula 6.
b. Extended Nelson-Siegel estimation model. To improve the flexibility of the Nelson-Siegel model and let the estimation model to have a second hump, Svensson [23]- [24] and Soderlind and Svensson [22] added two other parameters to the previous model. The function of the forward interest rate is defined according to formula 7.
The corresponding spot interest rate for maturity m is then defined with formula 8.
Where τ 2 and β 2 describe the time of the second hump and the magnitude and the direction of this hump, respectively.
..) and using the first two sentences the extended Nelson-Siegel can be approximated according ) which is still nonlinear in τ 1 and τ 2 .

2.2.
Optimization of the zero-coupon yield curve. The optimization problem is defined as the minimization of the squared yield errors between the estimated yield curve and the market data subject to some constraints. For Extended Nelson-Siegel model that is more general than Nelson-Siegel model, the optimization problem is formulated according to model 9.
Where N is an number of maturities and s j,m is market data of the jth maturity at time m.
3. Solution procedures. As described in the previous section, yield curve model is nonlinear which reinforce the complexity of the problem. Nonlinear nature of the problem suggests nonlinear methods and complexity of the problem offers metaheuristic approaches. In the followings, both approaches are presented. In all methods, the solution is represented by X, for example for Extended Nelson-Siegel we have X = [β 0 , β 1 , β 2 , β 3 , τ 1 , τ 2 ]. In general, a new solution is obtained by X t+1 = X t + α t d t , where d t and α t are the direction and step size of the move, respectively. Each procedure has a special method for determining d t and α t .

Meta-heuristic approaches.
3.1.1. Hybrid genetic particle swarm optimization. Particle swarm optimization (PSO) is a computational approach initially proposed by [12] which is inspired from social behavior of bird flocking or fish schooling to find a rich source of food. PSO has been successfully used for continuous optimization problems where there is no need of having the gradient information of the problem. In this approach, a swarm consisting of a number of particles flies around in an N dimensional search space. In optimization problems, each swarm plays the role of a solution. Fig.1 shows the search space of the particles graphically. To move from one solution to a new one, PSO makes use of a velocity vector which simultaneously determines the direction of the fly and step of the move. The position of each particle is updated through formula 10. Following Eberhart and Shi [9] and Clerc and Kennedy [6], the modified velocity update presented in formula 11 is used which has better performance and convergence rate than the original PSO.
j of the ith particle (solution) in the iteration t. ν t ij is the velocity of the ith particle in the jth position which simultaneously determines the direction of the fly and step of the move. A new solution in the next iteration is obtained by formula 10. C 1 and C 2 are acceleration constants which range from [0, 4]. pbest t ij is the best position of particle i in the jth position at time t. Hence, pbest t ij − X t ij is known as cognitive component. gbest t j is the jth position of the global best particle. Therefore, gbest t j − X t ij is known as social component. In the proposed hybrid genetic particle swarm optimization (HGPSO) the swarm undergo crossover and mutation operation to preserve exploration and exploitation of the search. This paper takes advantage of arithmetic crossover and non-uniform mutation in HGPSO. The main steps of this method are summarized in procedure 1.
Procedure1 HGPSO () Begin 1. Define the crossover probability (pc), maximum number of iterations (maxiteration); 2. Randomly generate a population of initial points of the size popsize; 3. for t=1 to maxiteration do 4. Evaluate the particles with respect to objective function of model 9; 5. Update personal best position of each particle (pbest); 6. Update global best position (gbest); 7. Velocity Update(); 8. Position Update(); 9. Repair the infeasible solutions; 10. Calculate the probability of each particle in the population; 11. for i=1:pc × popsize do 12. Apply roulette wheel selection two times to select two candidate solutions for crossover; 13. Arithmatic crossover();   3.1.2. Hybrid genetic electromagnetism like algorithm. Electromagnetism like algorithm (EM) is a population based meta-heuristic which was originally designed for continuous optimization problems by [3]. The method stems from electromagnetism theory of physics in which each electrically charged particle plays the role of potential solutions in the solution space. The particles use an attraction-repulsion mechanism to move towards optimal solution. New position of each particle is determined through a random step length λ in the direction of the total force exerted on that particle from other particles. Fig. 2 shows the total force exerted on particle i from particles j and k. Formula 12 presents the new position of the moved particle in which RN G is the allowed feasible length and the force F i is calculated through Formulas 13 and 14..
The proposed hybrid genetic electromagnetism like algorithm (HGEM) uses genetic operators such as arithmetic crossover and non-uniform mutation operators in its process. The main steps of this algorithm are presented in procedure 2.
Procedure 2 HGEM() Begin 1. Define the crossover probability (pc), maximum number of iterations (maxiteration) and population size (popsize); 2. Generate a population of m particles having n dimensions; 3. for t=1 to maxiteration do 4. Calculate the probability of each particle in the population; 5. for i=1:pc × popsize do 6. Apply roulette wheel selection two times to select two candidate solutions for crossover; 7. Arithmatic crossover(); 8. Repair the infeasible solutions; 9. end(for) 10. for j=1:(popsize − pc × popsize) 11. Randomly select a candidate solution; 12. Nonuniform mutation(); 13. Repair the infeasible solutions; 14. end(for) 15. Select popsize number of best particles from the old and new populations; 16. Apply local search on each particle of this selected population coordinate by coordinate; 17. Calculate the amount of charge of each particle using formula 13; 18. Calculate the total force exerted on each particle using formula 14; 19. Normalize the force vector; 20. Move particles except best one in the direction of the force using formula 12; 21. end(for) End.

3.2.
Search methods using derivatives. Derivative search methods start with an initial solution and iteratively move to a new position in the direction d t which depends on the gradient of the problem with a specified step size α t . formula 15 shows the new position in the solution space [19]- [2].
The two famous derivative search methods are gradient method or steepest descent and Newton method which respectively use formulas 16 and 17 as their directions in which ∇f (X t ) and H(X t ) are in turn the gradient and Hessian of the objective function at point X t . Note that the plus sign is used for maximization problems and the minus sign is used for minimization problems. In Newton method α t is unit whilst in gradient method is not necessarily unit.
Hereafter denote α t d t with P . Hence, for Newton method in case of the minimization problem we have P = − ∇f (X t ) H(X t ) The drawback of these methods is that the gradient method rapidly approaches to the optimal solution but acts in a zigzag manner near the optimal solution whilst Newton method acts well only when its initial solution is near the optimal solution. To overcome these deficiencies modifications of these methods are used. In the following subsections new versions of the two modifications, Trust region and Levenberg-Marquardt, are presented.
3.2.1. Hybrid genetic PSO parallel trust region (HGPSOPTR). The original trust region starts with an initial solution and iteratively searches the solution space within a circular region known as trust region. The performance of the method highly depends on the choice of initial solution. To overcome this drawback a parallel algorithm is designed to search different regions of the solution space with their assigned trust regions in parallel. To select these regions a hybrid genetic PSO algorithm is employed. Fig. 3 shows an example search of solution space with five different trust regions. Procedure 3 summarizes the main steps of the proposed hybrid parallel trust region. 3. Choose the initial Tr radius in the interval of (0, Max radius); 4. Determine η ∈ [0, 1 4 ); 5. for itr=1 to maxiteration do 6. for pr=1 to Num pr do in parallel 7. HGPSO() 8. for t=1:maxit do 9. Obtain P applying either Procedure 4 or Procedure 5; 11. if ρ t < Dog leg (DL). The dogleg method finds an approximate solution P considering the trust region radius constraint. It first finds the unconstrained minimizer P Gradient along the steepest descent or gradient direction,−∇f , defined by formula 18.
If the trust region radius is small and the solution P Gradient is not in the trust region radius set P according to formula 19.
If P Gradient is in the trust region area find an unconstraint minimizer P N ewton according to the Newton method considering formula 20.
If it is feasible, set P = P N ewton . Otherwise, the solution P follows a curved trajectory like the one presented in Fig. 4. The approximate solution P is found by replacing the curved trajectory with a path consisting of two line segments.
The first line segment runs from the initial point to the unconstrained minimizer P Gradient along the steepest descent or gradient direction,−∇f (X t ) . The second line segment runs from P Gradient along the Newton direction, − ∇f (X t ) H(X t ) . The new solution X t+1 is the intersection of the dogleg path and the boundary of the trust region. In this case P is specified according to formula 21 in which τ is calculated by solving the quadratic equation 22.  Nearly exact (NE). As shown in formula 17 the direction of the Newton method is not defined when the Hessian matrix is zero. Therefore the Hessian matrix is modified according to formula 23.
Then λ is updated until a feasible point in a trust region is achieved. The procedure of this method is shown in procedure 5.
Procedure 5 NE() Begin 1. Initialize λ 0 and trust region radius (T r radius); 2. for k=0, 1,2, 3. Calculate Cholesky factorization of H(X t ) + λ k I and set it as R T R; 4. Solve R T RP k = −∇f (X t ) and find P k ; 5. Solve R T q k = P k ; 7. end(for) End.

Hybrid genetic PSO parallel Levenberg-Marquardt(HGPSOPLM).
Due to illconditioning which might occur at point where the Hessian is singular or near singular, the modification of Hessian matrix is used according to formula 24, where I is an identity matrix.
The algorithm starts with an initial feasible solution and iteratively improves the position of the solution in the solution space according to formula 25.
At each iteration of the algorithm the value of the ε t is modified regarding the value of ρ t which is calculated according to formula 26. Where, Since the performance of the Levenberg-Marquardt algorithm depends on the choice of initial solution, a new hybrid parallel version of Levenberg-Marquardt is suggested in this paper in which a number of initial points are generated through PSO and Levenberg-Marquardt methods is implemented at each point in parallel. The main steps of the new proposed method are summarized in procedure 5. 3. for itr=1 to maxiteration do 4. for pr=1 to Num pr do in parallel 5. HGPSO () 6. for t=1:maxit do 7. calculate P using formula 13; 8. X t+1 = X t + P ; 9. Evaluate X t+1 ; 10. Calculate ρ t using formula 12; 11. if ρ t < mulow 16. if norm(gradient) < specif ied tolerance 17. break; 18. end(if) 19. end(for) 20. end(for) 21. The best solution obtained from all processors is gathered by central unit and the next iteration starts with the best solution of each processor. 22. end(for) End.

Computational results.
A computational experiment is conducted to investigate the performance of the proposed methods. First, the methods are run using the nominal data ("zero-coupon" yields) of bank of England 1 in 2011 from January 4 to January 19. Each set of data in each algorithm is run regarding Nelson-Siegel and Extended Nelson-Siegel yield models. For each model, the results are averaged upon all data and the averages and standard deviations of the errors are reported in Table 1. Fig. 5 and Fig. 6 illustrate the interval plot for Extended Nelson-Siegel and Nelson-Siegel models. Also, Fig. 7 and Fig. 8 compare the proposed methods on Extended Nelson-Siegel and Nelson-Siegel models considering 12 sets of data taken from bank of England from January 4 to January 19. As can be seen for Extended Nelson-Siegel model, HGPSOPLM results in minimum error. In case of Nelson-Siegel model, HGPSOPTR-DL has minimum error. Amongst the two proposed meta-heuristics HGEM has minimum error in both considered models. The implementation of HGPSO lonely results the maximum error but we could get better results by hybridizing it with the proposed parallel nonlinear methods. The improvement on the results is significant when Extended Nelson-Siegel model is used compared with the case of Nelson-Siegel model. The average and standard deviation of parameters are also presented in Table 2 and 3 for Extended Nelson-Siegel and Nelson-Siegel models, respectively. Table 4 and Table 5 show the eigenvalues of the Hessian matrix. For Nelson-Siegel model all eigenvalues are positive and all gradient norms are near zero. Therefore one may conclude that the Hessian matrix is positive definite and the resulted solutions are optimal. In the appendix Figs. 9a to 9f show how the obtained yield curves fit market data on January 4 which is a normal upward sloping yield curve. This shape happens when an economy is growing and investors are confident. This is the significant signs of development  which requires effective and adequate investment [10]. In this situation investors demand an additional yield for longer maturity bonds. This is logical when there is more risk associated with long term investment. In sum, healthy economies nearly always have an upward sloping yield curve, although the interest rates that make up that curve may differ substantially from one period to another. To support our findings, the proposed algorithms were tested again on a data set from a recent year with respect to the Extended Neslson-Siegel model which has been proven to be a more accurate model than Neslson-Siegel model. We used a rather larger data set, i.e. a quarter, from 1 April till 30 June in 2015. The results have been shown in Table 6. The results support the superiority of the HGPSOPLM algorithm. For more analysis, we estimated the yield curve for the data of a crisis period, i.e. May 2008, through the Extended Neslson-Siegel model and the HGPSOPLM algorithm which are respectively an accurate estimation model and the best algorithm in our experiments. The performance of the Levenberg-Marquardt (LM) algorithm depends on the choice of the initial solution. One time, the LM algorithm was run with a randomly generated initial solution. Another time, the LM algorithm was run with the estimation of the yesterday as an initial solution. Finally, the LM algorithm was run with HGPSO results as an initial condition. The results are        [20]. Therefore, when there is serious concern about downside on economy, most investors prefer to stay in the safe side by holding significant amount of money as either cash or safe investment options such as longer term bond market, etc. This increases longer maturity bond prices, and the yields step down. Investors are also reluctant to invest on short term bonds. This lack of demand drives short-term treasury prices down and the yields up.

5.
Conclusion. An important financial decision in a bond market is to determine the yield as a function of time. Such a function is called yield curve which could be . Fitted models versus market data set 1 monotonic or involve one or more hump. In this paper we have used two famous models of the yield curve, Nelson-Siegel and Extended Nelson-Siegel, with at most one and two humps, respectively. To have an appropriate yield curve, the parameters of the yield curve models need to be optimized such that they have the least square error from market data. Our contribution is the first work in applying parallel hybrid nonlinear and meta-heuristic methods to solve the resulted nonlinear yield curve optimization problem. The proposed methods are hybrid GPSO parallel trust region-dog leg, hybrid GPSO parallel trust region-nearly exact, hybrid