HIDDEN MARKOV MODELS WITH THRESHOLD EFFECTS AND THEIR APPLICATIONS TO OIL PRICE FORECASTING

. In this paper, we propose a Hidden Markov Model (HMM) which incorporates the threshold eﬀect of the observation process. Simulated exam- ples are given to show the accuracy of the estimated model parameters. We also give a detailed implementation of the model by using a dataset of crude oil price in the period 1986-2011. The prediction of crude oil spot price is an important and challenging issue for both government policy makers and industrial investors as most of the world’s energy comes from the consumption of crude oil. However, many random events and human factors may lead the crude oil price to a strongly ﬂuctuating and highly non-linear behavior. To capture these properties, we modulate the mean and the variance of log- returns of commodity prices by a ﬁnite-state Markov chain. The h -day ahead forecasts generated from our model are compared with regular HMM and the Autoregressive Moving Average model (ARMA). The results indicate that our proposed HMM with threshold eﬀect outperforms the other models in terms of predicting ability.


(Communicated by Hailiang Yang)
Abstract. In this paper, we propose a Hidden Markov Model (HMM) which incorporates the threshold effect of the observation process. Simulated examples are given to show the accuracy of the estimated model parameters. We also give a detailed implementation of the model by using a dataset of crude oil price in the period 1986-2011. The prediction of crude oil spot price is an important and challenging issue for both government policy makers and industrial investors as most of the world's energy comes from the consumption of crude oil. However, many random events and human factors may lead the crude oil price to a strongly fluctuating and highly non-linear behavior. To capture these properties, we modulate the mean and the variance of logreturns of commodity prices by a finite-state Markov chain. The h-day ahead forecasts generated from our model are compared with regular HMM and the Autoregressive Moving Average model (ARMA). The results indicate that our proposed HMM with threshold effect outperforms the other models in terms of predicting ability.
1. Introduction. Crude oil price has an important impact on the world economy as a significant proportion of energy consumption around the globe comes from crude oil. The fluctuation of the crude oil price is a major concern of both government policy makers and industrial investors, especially in countries of fast-growing economy such as China and India. There are many factors that may affect the crude oil price, for example wars and conflicts, political stability, economic growth, discovery of new energy sources, financial crises and so on. The combined effect of these factors on crude oil prices render the modeling and forecasting of their movements complex and challenging issues. This will have a significant impact on stakeholders in energy markets since investment decisions related to future fluctuations in crude oil price are hard to make.
The prediction of crude oil price becomes extraordinarily important and many different approaches have been proposed for this purpose. In [16], the authors provide probabilistic forecasts of average oil prices by utilizing inherently probabilistic belief network models. In [30,31], a regression model was proposed to forecast West Texas Intermediate (WTI) price by considering the petroleum inventory level maintained by the Organization for Economic Cooperation and Development (OECD) countries. Later Kaufman et al. [17] take both OPEC's production policies and OECD's stocks into consideration to forecast oil price. The results indicate that OPEC's decisions are important price drivers. However, it is almost impossible to predict the inventory level of the OECD and the OPEC's decisions. During recent decades time series models have been widely used, including the prediction of oil price. Several authors have tried linear time series models [14,29] to predict oil prices. However, the results indicate that these models cannot capture the nonlinear behavior of the oil price. Recently De Souza e Silva et al. [2] employed nonlinear time series models and Hidden Markov Models (HMMs) to predict the future oil price trends, where the high frequency price movements have been removed in advance by employing wavelet analysis.
In recent years, HMM filtering methods have found diverse applications in electronics, statistics, physics, finance and commodity markets. Elliott et al. [4] provides self-updating estimates for all model parameters of HMM. In [11], an HMM filteringbased method was applied to the asset allocation problem using a mean-variance type utility criterion. [5,19] study the application of HMMs in option pricing. In [20], the authors illustrate the optimal filtering of log returns of commodity prices in which both the mean and volatility are modulated by a hidden Markov chain with finite state space. In [21,22], higher-order Markov-switching models were proposed with applications to crediting rating and risk measurement. An interactive hidden Markov model has been proposed in [1] for modeling default data. Erlwein et al. [13] developed investment strategies relying on HMM approaches. Elliott et al. [7] employed an HMM to the mean-variance portfolio selection problem. Elliott et al. [10] discussed ruin theory where the risk process is described by an HMM.
However, the transition probability matrix in most of the mentioned works is time homogeneous, which has its limitation in empirical applications. In this paper, we consider HMM with zero-delay observations and take a threshold effect on the transition matrices into account. The optimal estimation of parameters will be discussed and compared with actual values by considering simulated examples. The model will then be applied to forecast the crude oil price, where both the mean and variance of the log return prices are modulated by a finite-state hidden Markov chain. Here we model the observable feedback by using the threshold principle [25,26,27]; interested readers can also refer to [6,8,9]. The optimal estimate of the threshold parameter will be derived using Akaike's information criterion (AIC) [28].
The remainder of this paper is structured as follows. In Section 2, we introduce the HMM with threshold effect. Section 3 presents a reference probability measure under which the observation process is a sequence of identically independent distributed random variables. In Section 4, we give the recursive formulas for the related processes. Section 5 describes the optimal estimation for all parameters. In Section 6, numerical examples based on simulations are given to test the quality of estimated parameters. Then a numerical example is given to demonstrate both the effectiveness and efficiency of our proposed model. Finally concluding remarks are given in Section 7.
2. The Hidden Markov Model with threshold effect. In this section, we present our Hidden Markov Model with threshold effect (THMM). We begin with a complete probability space (Ω, F, P), where P is a real-world probability measure. Let {x k } k≥0 be a discrete-time, finite-state, Markov chain on (Ω, F, P) with the state space being given by the set of standard unit vectors {e 1 , e 2 , · · · , e N }. We suppose that x 0 or its distribution is known. Suppose {y k } k≥0 is the observation process on (Ω, F, P) with the state space .
To specify the probability law of the chain {x k } k≥0 under the "historical measure" P, we consider a family of transition matrices {A y k } k≥0 where A y k := [a ji (y k )] i,j=1,2,··· ,N and a ij (y k ) := P(x k+1 = e i |x k = e j , y k ). Here and r ∈ R is the threshold parameter. Consequently we are incorporating the threshold effect using the threshold principle introduced by Tong [25,26,27] to parametric, nonlinear time series analysis. Indeed, the threshold principle plays a fundamental role in nonlinear time series analysis since it can be used to describe a number of important parametric nonlinear time series models.
Let {F k } k≥0 and {Y k } k≥0 be the P-completed, natural filtrations generated by the chain {x k } k≥0 and the observations process {y k } k≥0 , respectively. Write, for each k ≥ 0, G k := F k ∨ Y k . Define a sequence of random variables {v k } k≥1 by putting: (1) Then it is not difficult to see that {v k } k≥1 is a martingale difference sequence with respect to the filtration {G k } k≥0 and the measure P. That is, for each k ≥ 1, Let {w k } k≥0 be a sequence of independent and identically distributed random variables with common distribution, N (0, 1), the standard normal distribution. We further assume that there are vectors µ = (µ 1 , µ 2 , . . . , µ N ) and σ = (σ 1 , σ 2 , . . . , σ N ) such that µ(x k ) = µ, x k and σ(x k ) = σ, x k , where µ i ∈ and σ i > 0, for each i = 1, 2, . . . , N . Here "H " and "v " denotes the transpose of the matrix H and the vector v respectively, and ·, · is the scalar product in N . We suppose that the observation process y is governed by the following dynamics: The above relation implies that what we are considering is a zero-delay model. In [18], a zero-delay setting with different observation process is discussed. Our filtering and estimation method to be discussed below also applies to the case with delay.
3. Reference probability approach. In this section, we discuss a reference probability approach for obtaining an optimal filter of the hidden Markov chain {x k } k≥0 , given information about the observations process {y k } k≥0 . The key idea of the reference probability approach is to start with a probability measure under which the observation process becomes simpler and does not depend on the hidden Markov chain. The idea of the reference probability approach to filtering was introduced by Zakai [32] and was later applied to filtering hidden Markov models by Elliott et al. [4]. We start with a reference probability measure P on (Ω, F) under which the observation process {y k } k≥0 is a sequence of i.i.d. standard Gaussian random variables independent of the hidden Markov chain {x k } k≥0 and {x k } k≥0 has the probability law specified by {A y k } k≥1 . In the sequel, we reconstruct the real-world probability measure P from the reference probability measure P by a density process for a measure change.
Let φ(z) be the probability density function of a standard normal random variable. For l = 0, 1, 2, · · · , we define where the functions µ and σ are the same as presented in Section 2.
which is a {G k } k≥0 -adapted process and will be utilized in changing probability measure.
Then we reconstruct P by putting: dP dP | G k := Λ k . We define a sequence of random variables {w k } k≥1 by putting:

Proof. See Appendix A.
Our goal is to derive the optimal filter: . This filter is optimal in the mean-square sense. By a version of the Bayes' rule, Write, for each k ≥ 1, Then it is easy to check that Here 1 := (1, 1, · · · , 1) T ∈ N .

4.
Recursive filters and filter-based estimates. In this section, we first derive the recursive filter γ k and then some recursive relations for some useful processes. We write where diag(z) denotes the diagonal matrix with vector z as its diagonal.
Proof. The proof is given in Appendix B.
In order to estimate the parameters A y k = (a ij (y k )), µ = (µ i ) and σ = σ i , the following processes are required: x n−1 , e r x n , e s the number of jumps from state e r to e s up to time k; x n−1 , e r the number of times that x occupies the state e r up to time k − 1; x n , e r the number of times that x occupies the state e r up to time k; where f denotes either f (y k ) = y k or f (y k ) = y 2 k . We now derive recursive relations for these processes and writê . From a version of the Bayes' rule, see for instance (Elliott et al., 1995) [4], Write The recursive relations for γ(J rs x) k , γ(P r x) k , γ(O r x) k and γ(T r (f )x) k are given in the following proposition.
Proposition 2. With the above notations, we have where a r· (y k−1 ) and a ·r (y k−1 ) denote the rth row and rth column of the matrix A y k−1 separately.
Proof. The proof is given in Appendix C. 5. Parameter re-estimation. In this section, we present the estimation formulas for all the model parameters. Write for the set of the parameters in our model. Furthermore, we note that Suppose such a set of parameters θ is given. We then wish to determine a new set which maximizes the conditional pseudo log-likelihoods, as in [4].
From [4], if a sr (y) = 0 then a sr (y) = 0 andâ sr (y) a sr (y) = 1; otherwise the optimal estimates are given bŷ For the parameters µ i , 1 ≤ i ≤ N , we consider a new probability measure P * defined by By maximizing we obtain the optimal estimateŝ For the parameters σ i , 1 ≤ i ≤ N , consider the probability measure P * * defined by Again by maximizing we can obtain the optimal estimateŝ For the threshold parameter r in the transition probability matrix of the hidden sequence, Akaike's Information Criterion (AIC) approach [27,28] will be used. In the general case, we have where m = 1, 2, . . . , M , the candidate values for the threshold parameter r. The likelihood function is given by where f (y k |Y k−1 ) denotes the conditional probability density function y k given Y k−1 . By minimizing AIC, the optimal r is determined.

Simulated examples.
In this section, we investigate the quality of the estimated parameters. We start from a "true" model with known parameters and simulate the logarithmic returns from this "true" model. The data will then be used to estimate the parameters using the filtering algorithm. The first thing we need to address is to determine the number of states. Many researchers restrict their attention to a two-state (N = 2) or a three-state (N = 3). Indeed, in 2005, Taylor [23] pointed out that a two-state model is good enough to distinguish a normal market from one experiencing crisis. In 2009, Erlwein and Mamon, [12] determine the optimal number of regimes by using the Akaike Information Criteria (AIC) and they found that a two-state Markov chain model outperforms other multiple state models. We refer interested readers to [24] for more details about order selection of a Markov chain model. With the development of Internet, the market is changing and becoming more complex, just considering normal market and one with crisis may not be enough. In the following discussions, we set N = 3, i.e., we will conduct our research under three state model, which can be seen as considering Expansion, Neutral and Recession markets, respectively. Under three-state THMM setting, the implementation procedure starts by assigning initial values for µ i , σ i , i = 1, . . . , N , the transition probability matrices A (1) , A (2) and the threshold parameter r. After a batch of y, new estimates of µ, σ, A are updated by the above recursive filters, where A (1) is updated by using those data with y > r and A (2) is updated by using data with y ≤ r. After another batch of y, the threshold parameter r is updated by using the AIC method with new estimates of µ, σ, A and the candidate values for r are given by all the values from −5 to 5 with step size 0.1. These estimates are then used iteratively to update the model parameters until all the data is used up. It was empirically observed that all the parameters µ, σ, A (1) and A (2) , except r, converge to certain values. Furthermore, it appears that this kind of stability does not depend on the initial parameter values. For r, we take the mean of all updated values as the estimate.
In the following, 50 groups of "true" parameters are given and each group generates 2, 000 observable values. To indicate the stable property of the evolution of µ, σ, A 1 and A 2 , the dynamics of these parameters are shown in Figures 1-4 based on one randomly chosen group of observable values from the simulated data. The accuracy of the model parameter estimates under different window sizes is depicted in Table 1, where the differences between "true" parameters and corresponding estimated parameters are measured using the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the Sum of Squares due to Error (SSE). The number of steps to provide new estimates for parameters µ, σ, A 1 and A 2 is denoted as "It", which takes values 5, 10 and 15 separately. The corresponding number of steps for updating the parameter r is represented as "It r ", which is given by 10, 20 and 30 respectively. The data indicates the accuracy of estimation, and it also shows that the estimation tends to be more accuracy by updating more frequently. However, this is not the case for the parameter r. This observation will be taken into consideration in the followings where we will employ our method to predict the oil price by using real data.
6.2. Implementation to a real data set. The estimation method given above was then applied to the log returns series of daily oil prices recorded from 1986 to We assume that the drift µ and the volatility σ of the log return are both governed by a hidden Markov chain x with N states. We observe the logarithmic increments y k+1 of the oil price process S k , and we suppose it have dynamics The parameters µ, σ, A and r can be estimated by the same procedure as in the last subsection. In turn they are then used to forecast daily oil prices.
From the representation of x in Eq. (1), it is easy to see that For t ∈ consider the following conditional distribution Then the density of y k+1 given Y k is Therefore, the one-step ahead forecast of the asset price based on available information is given by The two-step ahead prediction of the price can be obtained by (11) using one-step ahead forecast, and then h-step (h = 3, . . . , 20) ahead prediction will be given.
In [20], the predictability performance under the Diebold-Kilian metric of twoand three-state HMMs is compared with that implied by the autoregressive conditional heterroscedasticity (ARCH(1)) and generalized ARCH (GARCH(1,1)) models. Their results indicate that HMMs outperform the ARCH and GARCH models in terms of short-run forecasts. However, they did not mention anything about the Autoregressive Moving Average (ARMA) model, which is supposed to be the most popular model in forecasting application. To show the advantages of our model, ARMA model would be used as a benchmark in the following numerical experiments.
In our numerical experiments, we aim to compare the predictability of the threestate hidden Markov model with threshold parameter (THMM) with the predictability implied by the regular HMM and the benchmark ARMA model. To make the comparison more convincing, we evaluate the forecasting errors for the three different model types using the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Harding and Pagan Test [15] and Diebold-Mariano Test [3]. Harding and Pagan test seeks to identify synchronicity in the turning points of two series, and it is a statistical measure which casts directionally accurate changes. This test is generated by creating two series, X F for the forecast (or futures) series and X S for the actual spot price series: where F and S are the futures and spot price series, respectively, and n is the forecast horizon. For a given forecast horizon, the concordance statistic is determined This statistic measures how closely predictions move with actual values in directional terms. The Diebold-Mariano test is proposed for the comparison of different forecasting models. The basic underlying principle of Diebold-Mariano Test is to base such test on the expectation of the difference between loss functions of the forecasting errors for different models. In the following, let {y t } denote the actual data series, and {ŷ h i,t } denote the i th competing h-step forecasting series. The loss differential between two forecasts is given by d t = g(e it ) − g(e jt ), i, j = 1, 2, . . . , m, and the two forecasts have equal accuracy if and only if the loss differential has zero expectation for all t. Suppose that the forecasts are h(> 1)-step-ahead. In order to test the null hypothesis, H 0 : E(d t ) = 0, ∀t, the following statistics is utilized by Diebold and Mariano: the spectral density of the loss differential at frequency 0, γ d (k) is the autocovariance of the loss differential at lag k. The quantityf d (0) is defined bŷ and Under the null hypothesis, the test statistics DM is asymptotically N (0, 1) distributed. The null hypothesis of no difference will be rejected if the computed DM statistic falls outside the range of −Z α/2 to Z α/2 . In our numerical experiment, the number of underlying hidden state is set to be 3 (N = 3), and the forecasting errors for the three different model types for the h-step ahead forecasts (h = 1, 2, . . . , 20) are reported in Table 2. By analyzing the autocorrelation functions and the partial autocorrelation functions of the price series, the orders of the model ARMA are given by 8 and 12 respectively. Additionally, the results for Harding and Pagan test are reported in Table 3 and the values of DM statistics are presented in Table 4.   Table 2, by comparing the forecasting errors of the THMM with those of HMM and ARMA models, we find that THMM and HMM models dominate ARMA model in terms of the three criteria for all the h-step ahead forecasts(h = 1, 2, . . . , 20). Additionally, these criteria also indicate that THMM performs better than the HMM model for short-run forecasts. Nevertheless, the difference of longrun forecasting errors is too small to yield any practical significance, which can be justified by corresponding values of DM statistics from Table 4. On the other hand, from Table 3, we conclude that ARMA model shows the best directional forecasting ability. However, THMM outperforms HMM in directional forecasts for long-run forecasts. By calculating the Diebold-Mariano statistics, the null hypothesis of no difference is rejected at the 5% level for THMM with ARMA model. For THMM with HMM model, the null hypothesis is rejected at the 5% level for short-run forecasts. These results reinforce further the adequacy of the THMM model in capturing the dynamics of the data set. A comparison of one-step ahead forecasts generated from our model with the actual data is depicted in Figure 5 and the zoom-in view of year 2011 is given in Figure 6. The two figures shows directly how closely the one-step ahead forecasts follow the actual data. 7. Concluding remarks. In this paper, we have added a threshold parameter to a hidden Markov model for oil prices and considered the zero-delay case. The optimal estimators of the model parameters are given and compared with corresponding true values. The prediction results show good forecasting ability of our model. For future research, it will be interesting to extend this model to a higher-order case, where the hidden sequence follows a discrete-time higher-order Markov chain. The embedding technique discussed in [21] can be employed to analyze such a higher-order HMM. The model introduced in this article can also be applied to other research areas, such as optimal portfolio management, option valuation and risk measurement.
The result follows.
Proof of Proposition 1. With the notations, we have γ k , e j Γ i (y k+1 )e i a ij (y k ) = D k+1 A y k γ k .