Model selection based on value-at-risk backtesting approach for GARCH-Type models

This paper aims to investigate the efficiency of the value-at-risk (VaR) backtests in the model selection from different types of generalised autoregressive conditional heteroskedasticity (GARCH) models with skewed and non-skewed innovation distributions. Extensive simulation is carried out to compare the model selection based on VaR backtests and Akaike Information Criteria (AIC). When the model is given but the innovation distribution is one of the six selected distributions which may be skewed or non-skewed, the simulation results show that both AIC and the VaR backtests succeed in selecting the correct innovation distribution from the set of six distributions under consideration. This indicates that both AIC and the VaR backtests are able to distinguish between skewed and non-skewed distributions when the innovation distribution is misspecified. Using an empirical data from NASDAQ index, we observe that the selected combination of model and innovation distribution based on the smallest AIC does not agree with that selected by using the in-sample VaR backtests. Examination of confidence limits for VaR and the expected shortfall forecasts under various loss functions provides evidence that the selected combination of model and innovation distribution using the VaR backtests tends to possess smaller mean absolute percentage error and logarithmic loss.


(Communicated by Hailiang Yang)
Abstract. This paper aims to investigate the efficiency of the value-at-risk (VaR) backtests in the model selection from different types of generalised autoregressive conditional heteroskedasticity (GARCH) models with skewed and non-skewed innovation distributions. Extensive simulation is carried out to compare the model selection based on VaR backtests and Akaike Information Criteria (AIC). When the model is given but the innovation distribution is one of the six selected distributions which may be skewed or non-skewed, the simulation results show that both AIC and the VaR backtests succeed in selecting the correct innovation distribution from the set of six distributions under consideration. This indicates that both AIC and the VaR backtests are able to distinguish between skewed and non-skewed distributions when the innovation distribution is misspecified. Using an empirical data from NASDAQ index, we observe that the selected combination of model and innovation distribution based on the smallest AIC does not agree with that selected by using the in-sample VaR backtests. Examination of confidence limits for VaR and the expected shortfall forecasts under various loss functions provides evidence that the selected combination of model and innovation distribution using the VaR backtests tends to possess smaller mean absolute percentage error and logarithmic loss.

1.
Introduction. Volatility is a statistical measure of the dispersion of returns for a particular asset index. This measure often relates to the uncertainty or the risk of the asset index. Since the stock market crash of 1987, stock price volatility has captured the eyes of both academic researchers as well as the regulators' attentions [29]. Therefore, there has been considerable search for appropriate volatility forecasting and risk measurement methodologies that have extensively been studied and reviewed by many academicians and practitioners in the past two decades. Some of the noteworthy characteristics that are commonly observed in asset returns are the persistence volatility clustering, time-varying volatility, leptokurtic and leverage effects.
In the past, various time series volatility models were proposed and used to formulate the volatility forecast. Autoregressive conditional heteroskedasticity (ARCH) model was one of the popular models, and yet, being extensively enhanced in the late nineties [15]. [15] proposed to use ARCH of order q to model time varying variance of inflation data. One of the novel features of ARCH model in the volatility modelling is its conditional variance of the model written as a function of the past squared returns, which are capable of capturing the volatility clustering property that exists in numerous financial time series. Generalised conditional autoregressive heteroskedasticity (GARCH) model proposed by [7], which is based on an infinite ARCH specification, allows one to reduce the number of estimated parameters from infinity to a finite number [1]. Even though the ARCH and GARCH models are capable in detecting volatility clustering, however, these two models fail to capture some of the common characteristics such as leverage effect that may occur in financial data. To address this issue, different nonlinear and asymmetric extension models have been proposed. [31] proposed the exponential GARCH model, which specifies conditional variance in logarithmic form and also includes the additional terms for capturing the leverage effect. On the other hand, various types of models that allow for asymmetric dependencies were introduced such as Glosten-Jagannathan-Runkle GARCH (GJRGARCH) by [20], Threshold GARCH (TGARCH) by [33] , Asymmetry Power ARCH (APARCH) by [14], nonlinear asymmetric GARCH (NA-GARCH) by [16], Quadratic GARCH by [34], Fractional Integrated GARCH by [5], Bilinear GARCH (BLGARCH) by [38], Threshold BLGARCH by [12] and etc.
In addition to the choices of appropriate GARCH-Type models, attention has also been centralised on the specification of innovation distribution. Traditionally, stock market returns are modelled based on the normality assumption. However, this distributional assumption does not embrace the stylised characteristics of financial time series such as heavy tailed, leptokurtic and asymmetry [25]. Thus, a number of papers focusing on different forms of distribution assumptions were proposed to discern the respective characteristics. Among them were skewed normal distribution [3], student-t distribution [8], generalised error distribution [31], and skewed student-t distribution with an additional skew parameter proposed by [18], which was later extended to the GARCH framework by [24]. [4] considered the symmetric/asymmetric GARCH models with the skewed generalised error and skewed student-t distributions in modelling the interest rate volatility. The asymmetric exponential power distribution [42] and asymmetric student-t distribution [41] were also used for the modelling of the five popular commodities [30]. [27] modelled crude oil returns using NAGARCH model with skewed distributions that provide improvements in terms of its forecasting ability.
Value-at-risk (VaR) is one of the major concern and popular risk measures for quantifying market risks. Most of the practitioners and investors refer it as the maximal loss of financial position at a given time period and probability (see [39]), which are served as guideline for the regulatory committee to set their margin requirements. To evaluate the accuracy of the estimation of VaR, some backtests such as Kupiec test [23], conditional coverage Christoffersen test [13], and dynamic quantile test [17] were performed. [19] showed that the Kupiec test results on the estimation of in-sample VaR for returns using APARCH model based on skewed student-t distribution agreed with the chosen confidence level. Most studies have noted that generalising the non-skewed innovation distribution of GARCH-Type models to skewed family distribution will significantly improve the accuracy of VaR forecast [2,10,19,22]. Meanwhile, [37] asserted that the best selected model based on the smallest AIC does not always give an satisfactory backtests' results for estimating VaR using GARCH-Type models in empirical application. However, to date, there is lack of simulation study on the misspecified distributional assumption on the GARCH-Type models for VaR forecast. As discussed by [27], though under the same volatility model specification, different distribution assumptions may cause a great discrepancy of VaR values.
Hence the aim of this paper is threefold. First, we investigate the model selection capability when estimating the in-sample long and short positions VaR of returns. The accuracy of VaR is further tested using the backtests. A number of GARCH-Type models with known skewed and non-skewed, as well as misspecified innovation distributions, are then compared and reported. Secondly, we examine the choices of model based on the smallest Akaike Information Criteria (AIC) and the significance of all in-sample VaR backtests using an empirical study of NASDAQ index. Lastly, we present the forecasts of the returns, the VaR and the expected shortfall (ES) of the returns on the basis of the best fitted models. Meanwhile, the predictive performances of these models are evaluated using various loss functions, and VaR backtests are carried out to assess the comparative forecasting accuracy across the models.
This paper is organised as follows. Section 2 discusses the general description on several GARCH-Type models and the related distributions that employed in volatility modelling. The simulation study is then implemented in Section 3 to investigate the GARCH-Type models with the misspecified innovation distributions. Section 4 describes the empirical data set with their corresponding summary statistics. The best fitted models using model selection criteria, including AIC and VaR as well as its backtests results, are elaborated and compared. It is then followed by the comparison of its forecasting performance via the models. Finally, the concluding remarks are presented in Section 5. 2. Model specification and distributional assumption. Conditional heteroskedasticity models are well-known tools that are frequently applied in modelling and forecasting the volatility of the financial returns. This section reviews various types of GARCH models used in this study and different types of distributions for innovation associated with the criteria for selecting the best fitted models.
Consider the closing price of a stock, P t at time t, where t = 1, 2, . . . , n. The log return of the stock is defined as r t = log Pt Pt−1 which can be decomposed into two components: the mean component µ t and the error component (or shock) a t , i.e., where ε t is independent and identically distributed (i.i.d) innovation with mean zero and unit variance, and σ t is the conditional standard deviation of the returns.
2.1. GARCH-Type models. The conditional variance for the ARCH of order q model proposed by [15] is where α 0 > 0 and α i ≥ 0 for i = 1, 2, . . . , q. It can be observed from (2) that large past squared shocks a 2 imply a large value of conditional variance, σ 2 t . [7] introduced a more general class of ARCH process, which is the GARCH model that allows for more flexible lag structure. The GARCH model appears to be a simple generalisation of the ARCH model, that is, a low order GARCH may capture the properties similar to high order ARCH model. Thus, the conditional variance of the GARCH (p, q) model at time t can be expressed as where α 0 > 0, α i ≥ 0 for i = 1, 2, . . . , q and β j ≥ 0 for j = 1, 2, . . . , p, in which the sufficient condition for ensuring the stationarity of the variance is ( Note that GARCH (p, q) model will reduce to ARCH (q) model when β j = 0 for all j. Meanwhile, a special case of GARCH, which is the integrated generalised autoregressive conditional heteroskedasticity model (IGARCH), is restricted to the case where To account for the asymmetry of the returns time series, various asymmetric GARCH-Type models were proposed. [20] introduced the GJRGARCH (p, q) model by imposing dummy variable that may effectively identify the asymmetry effect of the data. The GJRGARCH (p, q) model is therefore where γ i is the asymmetric component and the dummy variable I t−i = 1 if a t−i < 0 and I t−i = 0 if a t−i > 0 for i = 1, 2, . . . q. GJRGARCH (p, q) model with positive shock a t−i will lead to a lower volatility and negative shock will lead to a higher volatility when γ i > 0. Another example of asymmetry GARCH model is TGARCH model proposed by [33]. Rather than modelling on the conditional variance, TGARCH (p, q) model seeks to express the conditional standard deviation of the data in the following way, that is where a + t = max (a t , 0) and a − t = min (a t , 0). Note that α 0 > 0, α + i , α − i ≥ 0 and β j ≥ 0 for i = 1, 2, . . . , q, and j = 1, 2, . . . , p, in order to ensure the positivity of the conditional standard deviation. If α + i < α − i , then the negative shock would indicate a higher conditional standard deviation (as well as the conditional variance). This result coincides with the leverage effect mentioned by [6] .
Meanwhile, bilinear GARCH model proposed by [38] allows one to capture the asymmetry property in the time series data by means of the interaction between the past shocks and volatilities, and the BLGARCH (p, q) model is defined as where α i , β j and ζ k are the parameters to be estimated and p * = min (p, q). Here, ζ k is the asymmetry component in the model and the leverage effect is accounted by the interaction between the past shocks and volatilities. Notice that for the BLGARCH (1, 1) model, 4α 1 β 1 > ζ 1 is the necessary and sufficient condition for the positivity of σ 2 t [38].
2.2. Distributions of innovation. By convention, financial time series models generally assume that the logarithmic of asset returns (refer to returns hereafter) would follow normal distribution. However, many empirical studies show that financial returns are usually non-normal, and demonstrating asymmetric, fat-tail and leptokurtic properties (see [11,28]). In this study, we consider six different popular innovation distributions, which are normal distribution (NORMD), student-t distribution (STD), generalised error distribution (GED), skewed normal distribution (SNORMD), skewed student-t distribution (SSTD) and skewed generalised error distribution (SGED). The probability density function (pdf) for each standardised distribution is given as below: where Γ (·) is the gamma function. 3. Generalised error distribution (GED) where λ = .
The construction of these skewed distributions was explicitly described by [18] and [24].
3. Model selection criteria. All GARCH-Type models will be fitted by employing the maximum likelihood method. To select the best fitted models for the financial data set used, two approaches which are based on the smallest AIC and the agreement of VaR backtests, will be considered.
The first approach is grounded on the smallest AIC value of the model. This method has been extensively adopted (see [21,37]). Hence, it will be used to obtain the best fitted model for the data set in which the AIC value for each model is computed by where k is the number of estimated parameters and LLH is the logarithmic likelihood value.
The second approach is based on the accuracy of the estimation of VaR. Valueat-risk at α level for a time series is defined as the corresponding empirical quantile at α% [19]. For the long and short trading positions, the VaR values in time t are determined by Short Position: in which case, µ t and σ t denote the conditional mean and the conditional standard deviation at time t, respectively, while q α and q 1−α are the corresponding critical values of α and (1 − α) quantiles from the empirical distribution.
To evaluate the accuracy of the estimation of VaR, several VaR backtests are carried out including unconditional coverage Kupiec (UCK) test [23], conditional coverage Christoffersen (CCC) test [13], and dynamic quantile (DQ) test [17]. These VaR backtests are used to compare the forecasted losses from a fitted model with the actual calculated losses realised at the ends of a fixed time horizon. As indicated by [40], the backtests results provide information on whether the VaR is underestimated or the losses are greater than the original expected VaR value.
In order to select the best fitted GARCH-Type model based on the backtesting for VaR approach, we first calculate the number of actual exceedances from the VaR values, in which the number of actual exceedances for the long position is the total number of observed return at time t that is smaller than VaR l t,α for t = 1, 2, . . . , n. Analogously, the total number of actual exceedances for short position is equal to the number of observed return at time t that is greater than VaR s t,α for t = 1, 2, . . . , n. The best fitted model is then selected based on the minimum absolute difference between the actual and the expected exceedances. Note that, the expected exceedance is nα with n is the sample size. We would expect the poor performance of the model for both long and short trading positions on that returns if the model was inadequate in explaining the data in terms of its VaR measure.
4. Simulation study. Oftentimes, in application of real data set, it is impossible to realise the true underlying innovation distribution for the model, therefore the misspecification of innovation distribution is of highly concerned. In this study, although AIC measure is traditionally found to be instrumental in fitting the best model [21], the in-sample VaR backtests approach in selecting the model is considered as an alternative method.
To commence, we shall generate an innovation series {ε t } of length 2,500 under each of true innovation distributions. Setting initial value σ 1 = 0.5, the simulated returns can be calculated using r t = a t = σ t ε t , where σ t is computed based on GARCH (1, 1) model in (3) with the mean component µ t = 0 for simplicity. Note that the first 500 realisations are discarded from the simulated r t in the process for the reason to avoid the dependency on initial values when simulating the data series. Leaving us with a series of length n = 2, 000, the simulated r t will be first fitted with GARCH (1, 1) model under the six different types of distributions. The AIC values are then computed together with various VaR backtests performed on the series. For each of the GARCH-Type models, the simulation process is repeated for N = 100 runs with the six distributions as indicated in Section 2. Tables 1 to 4 show the simulation results for the GARCH (1, 1), GJRGARCH (1, 1), TGARCH (1, 1) and BLGARCH (1, 1) models, respectively, with the average of AIC values, and the total number (out of 100) of non-rejection in null hypothesis of the model being 'correct' with α = 0.05 using various VaR backtests.
From Table 1, we observe that when the data are generated based on GARCH (1, 1) model with the true innovation distribution, the fitted GARCH model with its corresponding true innovation distribution always gives the smallest average AIC value. These findings are observed for other GARCH-Type models under studied as shown in Tables 2 to 4. It can be seen that the AIC approach tends to fit the model best for the true distribution in terms of its precision and consistency as anticipated by [21].
Also from Table 1, when the data are generated from the GARCH (1, 1) model with true non-skewed innovation distributions, majority of in-sample VaR backtests results will comply with the non-rejection null hypothesis decision regardless of whether the fitted model is from skewed or non-skewed innovation distribution. In contrast, when the generated data come from the model with skewed distributions (SNORMD, SSTD and SGED), the VaR backtests results based on the fitted model with non-skewed distributions (NORM, STD GED) show about 40% to 100% (out of 100) of rejections of null hypothesis at α = 0.05 level. However, the fitted model with skewed distributions (such as SSTD and SGED) apparently produce at least 81% of non-rejections of null hypothesis implying that the backtests are capable in distinguishing whether the data are stemming from non-skewed or skewed distribution. The similar patterns are observed for the GJRGARCH (1, 1), TGARCH (1, 1) and BLGARCH (1, 1) models as shown in Tables 2 to 4.

Application.
5.1. Data description. Data excerpted from the NASDAQ composite index for daily adjusted closing price of the index were used in this study dating back from Table 1. Average values of AIC and the number of non-rejections in null hypothesis at 5% significance level for various backtests out of 100 runs for the fitted GARCH (1, 1) model with parameters α 0 = 0.2, α 1 = 0.1 and β 1 = 0.6.  with the returns of the composite index is evaluated via r t = 100 × log Pt Pt−1 . Figure 1 presents the time series plots for the closing prices and the returns of NASDAQ index. It is found that the NASDAQ index shows an overall upward trend from 1984 to 2000, followed by the decreasing in trend for about two years after year 2000, and tends to rise again after the overall declining of another two years. The volatility of the prices can also be observed from the fluctuation of the returns. From Figure 1(b), the volatility of returns for NASDAQ index are changing over the time indicating that the GARCH-Type models might be useful in explaining the data. Table 5 shows the summary statistics of returns for NASDAQ index. The minimum values represent the maximum loss (gain) that the investors will bear (obtain) Table 2. Average values of AIC and the number of non-rejections in null hypothesis at 5% significance level for various backtests out of 100 runs for the fitted GJRGARCH (1, 1) model with parameters α 0 = 0.2, α 1 = 0.1, γ 1 = 0.2 and β 1 = 0.6.  if they hold for the long (short) position on the commodities for the specified period of time. Similarly, the maximum values indicate the maximum gain (loss) on the commodities if they hold for the long (short) position. From the perspective of long position, the maximum loss of NASDAQ index is −12.04%, while the maximum gain is 13.25%. The mean of the returns is 0.036% indicating that the NASDAQ index has a slightly positive return on the whole, with the standard deviation of the returns of 1.38%. It also noted that the NASDAQ index returns are negatively skewed with skewness of −0.234. The excess kurtosis for the NASDAQ index return is 8.771, which may mean that the returns might have a thicker tail on both positive and negative sides in comparing to normal distribution. Jarque-Bera statistic appears to reject null hypothesis of normality assumption at 1% significance level, suggesting that the returns might follow fat-tail distribution. Furthermore, Ljung Box statistic on the returns has the p-value lesser than 0.05 that might support the Table 3. Average values of AIC and the number of non-rejections in null hypothesis at 5% significance level for various backtests out of 100 runs for the fitted TGARCH (1, 1) model with parameters α 0 = 0.2, α + 1 = 0.1, α − 1 = 0.3 and β 1 = 0.6.  presence of serial correlation in the returns. Meanwhile, the result of Lagrange Multiplier (LM) test reveals the existence of ARCH effect in the returns data leading us to propose that the GARCH-Type model might be useful in explaining the data. For graphical displays, the histogram and QQ-plot of the returns for NASDAQ index are depicted in Figure 2.

Model fitting and model selection.
This section intends to determine the most appropriate model for the returns data from NASDAQ index on the basis of its modelling performance from the corresponding models. Initially, the sample period is divided into two sub-periods, with that the first portion of sample period of length 5551 is used for modelling while the second portion of length 125 is retained for validating purpose among the models. To proceed, the returns of NASDAQ index will first be fitted to the four GARCH-Type models [GARCH (1, 1), GJRGARCH (1, 1), TGARCH (1, 1) and BLGARCH Table 4. Average values of AIC and the number of non-rejections in null hypothesis at 5% significance level for various backtests out of 100 runs for the fitted BLGARCH (1, 1) model with parameters α 0 = 0.2, α 1 = 0.1, β 1 = 0.6 and γ 1 = 0.2.   0  0  0  72  100  100  UCK-short  0  0  0  45  83  93  CCC-long  0  0  0  83  100  100  CCC-short  0  0  0  53  84  94  DQ-long  0  0  0  96  98  98  DQ-short  2  0  2  63  93  97 (1, 1)] under the six different types of distributions. Hence, there will be 24 models and the parameter estimates are computed by employing the ML method. Consequently, the best fitted models are determined based on the smallest in-sample AIC and VaR backtests values. Table 6 gives the AIC values for various GARCH-Type models for NASDAQ index returns. It can be observed that the best fitted model for NASDAQ index return is TGARCH (1, 1) under the skewed student-t distribution. Contrasting the same fitted model, the GARCH model has the overall largest AIC value as compared to other models which might indicate that the asymmetric component is essential in modelling the returns of NASDAQ index.
Turning to Table 7 that shows the p-values for various VaR backtests results for the NASDAQ index returns. With different α level under various models, it can be seen that most of the tests have the p-values smaller than 0.05 when the Notes: Q(10) is the Ljung and Box statistics of order 10 on the returns. ARCH(10) is the Lagrange Multiplier (LM) test of orders 10 (Engle, 1982). P -values of the statistics are reported in the parentheses. *Denote rejection of the null hypothesis at the 10% significance level. **Denote rejection of the null hypothesis at the 5% significance level. ***Denote rejection of the null hypothesis at the 1% significance level. data are fitted under the non-skewed distributions. This might further suggest that the returns data appear to come from the skewed distribution. Out of these models, we observe that the selected models based on backtests results that are nonsignificant (boldface) are TGARCH (1, 1) model with SNORM distribution, and BLGARCH (1, 1) model with SNORM, SSTD and SGED distributions. To select the best fitted model, based on the difference between the expected and the actual exceedances with α = 0.05, the VaR measures (number of exceedances) tabulated in Table 8 reveal that the models from the non-skewed distributions are incapable of capturing the large positive and negative returns. This might cause the models to overestimate or underestimate the variance of the returns. In light of the VaR backtests, the BLGARCH (1, 1) model with the skewed student-t distribution is found to be the best fitted model.  The estimated parameters given in Table 9 for the two selected models show that all the parameter estimates are significant at 1% significance level. Overall, the values of α + 1 are lower than α − 1 for the fitted TGARCH (1, 1) model indicating that the future prices are prone to be having negative shocks rather than the positive shocks. Moreover, the negative value of ζ 1 for the BLGARCH (1, 1) model would also suggest that the volatility tends to increase when there are in negative shocks. Accordingly, we might come to a conclusion that NASDAQ index has a leverage effect of negative shocks.

5.3.
Out-of-sample forecast. In time series analysis, the out-of-sample performance of the model might be crucial to academicians and practitioners. This is due to that the market players are more concerned about how well the models are capable in capturing the future trend of market price, rather than the "information" about the historical price and their performance as time goes on. A number of studies have investigated this question, for instance, [36] stated that there might be several plausible models in the forecasting process. While [21] argued that the best fitted model (based on AIC), which is not necessary statistically different from the best forecasted model, might be employed in volatility forecasting. To warrant a better model, we would contradistinguish across different types of GARCH models on the basis of AIC and in-sample VaR model selection criterion integrating the  where M represents the total number of observations. This technique will reserve H = 125 observation points for forecasting evaluation purpose at the end of the process. It is started from the first (M − H) observations which denote the in-sample period, and model will be fitted based on these (M −H) observations along with the one-step ahead forecasting value evaluated. The in-sample period is then rolled by adding the next observation and dropping the oldest observation, and continuing for H times until each one-step forecasting is computed. In this case, in-sample period size is fixed so that the estimation of the forecasted values will not be overlapping.
To quantify the forecasting performance of the respective GARCH-Type models, the following six loss functions are considered: 1. Mean absolute error (MAE) 2. Mean square error (MSE)  is low. Also, the LL will penalise the under-prediction volatility more severely, and more weight is given to the over-prediction volatility when using the HMSE and HMAE. Concurrently, the out-of-sample one-step-ahead VaR prediction values are computed in order to perform the UCK, CCC and DQ tests for out-of-sample VaR prediction. Another measure, which is called the expected shortfall (ES), is defined as the expected loss conditioned on the loss being greater than VaR [35]. The expected shortfall at α level for the long and short positions are calculated via ES long = E r t |r t < VaR l t,α , ES short = E r t |r t > VaR s t,α .
Table 10 displays the out-of-sample volatility forecasting evaluated under the six criteria of loss function, the model based on AIC tends to have better accuracy using MAE, MSE, HMAE and HMSE. Meanwhile, the BLGARCH model under the selection of in-sample VaR is superior to TGARCH model for its MAPE and LL implying that there is no single best fitted model that can outperform other models associating with all the loss functions utilised. We are in the view that the two approaches considered in this study might have their own strengths in forecasting of the returns. Moreover, Table 11 shows the various tests on the out-of-sample VaR, and both models seem to not to reject the null hypothesis for all tests in terms of their long and short position which might suggest that both models are adequate in explaining the returns. For the purpose of illustration, Figures 3 and 4 present the VaR and ES plots for NASDAQ index returns for the models selected on the basis of AIC and VaR measures.  6. Conclusion. To sum up, in this research, we observe that the GARCH-Type models with the existence of innovation misspecification will greatly affect the model selection process. When the innovation of returns follows skewed distribution, inappropriate use of likelihood function of non-skewed distribution will lead to the significant indication of all VaR backtests that will eventually impacting the accuracy of VaR measurement. We have modelled the NASDAQ index returns through various GARCH-Type models under the six different potential distributions. There are two benchmarks utilised to determine the best fitted models for both the AIC and in-sample VaR backtests approaches, in which AIC measure is traditionally deemed to be more appropriate in fitting the model. The results appear to indicate that the TGARCH (1, 1) model with skewed student-t distribution is more appropriate based on AIC, while the BLGARCH (1, 1) model with skewed student-t distribution is the best fitted model when using in-sample VaR backtests. In addition, the forecasting performance of both selected models obtained via AIC and VaR backtests were also contrasted. Various measures based on different loss function are evaluated. Results tend to suggest that the model selected using VaR backtests consistently offer smaller MAPE and LL as compared to the model based on AIC. Various tests including Kupiec, Christoffersen and DQ tests were also applied in order to investigate their respective out-of-sample VaR forecasting performance. It turn out that both models selected on the basis of the two approaches perform emulously and might be a suggestive of their usefulness with their own strengths in forecasting the returns.