Ergodic control for a mean reverting inventory model

In this paper, an inventory control problem with a mean reverting inventory model is considered. The demand is assumed to follow a continuous diffusion process and a mean-reverting process which will take into account of the demand dependent of the inventory level. By choosing when and how much to stock, the objective is to minimize the long-run average cost, which consists of transaction cost for each replenishment, holding and shortage costs associated with the inventory level. An approach for deriving the average cost value of infinite time horizon is developed. By applying the theory of stochastic impulse control, we show that a unique (s, S) policy is indeed optimal. The main contribution of this work is to present a method to derive the (s, S) policy and hence the minimal long-run average cost.


1.
Introduction. In the literature, stochastic inventory problems with different demands models have been extensively studied [4,34]. There are many applications in various domains for common inventories in our daily life [1,17,26,28,30,31,32]. It can also be employed in commodities [13,29], agricultural products [7] and energy products [27]. When the long-run average cost is considered, the famous Economic Order Quality (EOQ) formula is derived [20]. A popular way is to apply the (s, S) 858 JINGZHEN LIU, KA FAI CEDRIC YIU AND ALAIN BENSOUSSAN policy, i.e., when the inventory is below some level s, then expedites the inventory to a level S > s. Different numerical techniques can be applied to locate such policy for both discrete and continuous settings under various conditions (see, for example, [10,15,25,12]). However, it is of great importance to show that (s * , S * ) is really optimal among all policies. It was verified in [33] that the optimality of an (s, S) policy for discrete demands. When considering Markovian demands, it was verified in [2,3] the optimality of an (s, S) policy. The difference is that the former deals with a bounded demand while the latter relaxes this assumption with certain finite moment requirements. When taking the continuous demand into consideration, [23] assumed that the demand process is a diffusion process, while [18] assumed that the demand process is the sum of a constant demand rate and a compound Poisson process. [16] considered two sided action, which could both increase and decrease the inventory while the demand fluctuated as a Brownian motion with a maximal and nonnegative constraint on the inventory.
Recently, the mean reverting model has aroused much interest [5]. This type of inventory arises from a long term empirical observation and is applicable to a variety of commodity products, such as oil, metals, energy and many others. Moreover, for inventory, it can be observed that in supermarkets large piles of displayed goods attract customers. In this work, we assume the demand is composed of two terms. One term denotes the traditional demand, the other one denotes the demand which is dependent of the inventory level. For this type of inventory model, the replenishment policy can perform two functions, namely to fulfill stochastic demand and to smooth the inventory level due to the mean reverting property. Each intervention will incur certain cost. There are storage/backlog cost when the inventory is in/out of stock. The objective is to find when and what replenishment should be performed to minimize the total cost.
It is possible to derive closed form solutions to the problems under certain conditions, such as under an infinite time horizon. When the time horizon is infinite, the discount rate plays an essential role. The discount rate is often assumed to be the risk free rate. However, when the discount rate is close to zero, the discount factor is close to 1, and the objective function with a total cost will tend to infinity due to the infinite sum. To avoid this problem, an average cost function [21,22] can be employed instead. For this long-run average cost criterion, in order to make it welldefined and avoiding unbounded solutions, additional assumptions on the inventory are imposed, such as an bounded inventory constraint [16] or the holding cost is increased artificially [24] when the inventory is outside a range. It is also possible to impose a bounded demand [2] or a finite moment [3]. Furthermore, combining with mean-reverting stochastic model, the long-run average cost problem has not been explored properly in the literature.
In this paper, we consider the long-run average cost problem with mean reverting inventory model. There are both fixed and proportional costs associated with each intervention in the inventory level. The objective is to find when and what replenishment should be performed to minimize the total cost. Mathematically when the coefficients of the Q.V.I. depend on the state, which is different from that in the literature with constant coefficients, the problem becomes more difficult to tackle. Using the mean reverting model, we show that the problem is well defined without imposing any additional constraint on the inventory level and demand. Then we show among all the policies, a pair (s, S) is optimal for the long-run average cost problem with the mean reverting model. Moreover, we present an approach to derive optimal (s * , S * ) policy and the value of long-run average cost. This paper is structured as follows. In Section 2 we formulate the stochastic mean reverting inventory problem with the long-run average cost criterion and present the main result. In Section 3, we give the proof of the result in Section 2. Conclusions will be given in Section 4.
2.1. The model. We assume the demand is composed by two parts, which denote the inventory-independent demand and inventory-dependent demand, respectively. First we present the inventory-independent demand model. Suppose that W (t) is a Wiener process defined on a probability space (Ω, A, P). The demand on time t is then given by where r > 0 is the steady demand per unit time, and σ > 0 is the standard deviation of the steady demand. Let F t be the smallest σ−field with respect to which D(s) is measurable, ∀ 0 ≤ s ≤ t. An impulse control is a sequence where τ n is a stopping time with respect to F t and v i is a F τi -measurable random variable. Here τ i denotes the time of the i th order and v i denotes the amount of order at time τ i . The cost for ordering an amount of v > 0 is given as where K > 0 and c > 0 are the fixed order cost and proportional cost, respectively. The policy V is said admissible if lim sup where N (T ) := max{n, τ n ≤ T }. We denote the set of admissible policies by V. Let V denote an admissible impulse control, the corresponding inventory is described by the formula Here denote the replenishment before time t, the term k t 0 y x (s; V )ds − γt describes the inventory-demand demand, which can be explained as: the large inventory in supermarkets attracts more consumers or goods may deteriorate or change quality during storage as indicated in Goyal and Giri [8] and Raafat [19]. Write which denotes the storage cost when x ≥ 0 and the backlog cost when x < 0. We define The problem is thus to investigate the optimal policy with an ergodic cost criterion of minimizing (2.6). Denote In this work, we will present the approach for obtaining the value of ρ 0 and the optimal policy, which is an (s, S) strategy. That is, once the inventory falls to s, the decision maker will bring the inventory to S. By the approach in inventory control theory and ergodic control theory in [4], we try to derive the value function of ergodic problem from the following Q.V.I.
where the operator Here, u(x) can be interpreted as the value function for the control problem with the cost function f (x) − ρ. By solving Q.V.I. above, we obtain the main result of this work. The details are given in the next section.
Theorem 2.1. When ck < p, let (s * , S * ) be the solution of , (2.12) (2.13) and let (ρ, u(x)) be the solution of Q.V.I. (2.8) with (s * , S * ) policy, then which is given as . (2.14) 3. The proof of Theorem 2.1. The proof of Theorem 2.1 will be proceeded by a sequence of lemmas and theorems in this section. Let U denote the set of continuously differentiable real-valued functions with bounded derivative and continuous second derivative at all but a finite set of point in R.
Lemma 3.1. For each admissible policy V = (t n , v n ), n = 1, 2, · · · , and function u ∈ U, we have In fact, applying Ito lemma (see Theorem I.4.57 of Jacod and Shiryaev [11] for where y x (T, V ) is given by (2.4), and then dividing e −kT to both sides leads to Taking expectation on both sides, we have Because u ∈ U, which means both u (·) and E(y x (T, V )) are bounded, therefore, The following lemma gives a lower bound of the value function ρ 0 .
Lemma 3.2. Suppose u(x) ∈ U and a constant ρ, satisfy Then Proof. If u(x) and ρ (3.7), by Ito's lemma, for any policy V , we have Dividing both sides of (3.9) by T and let T → ∞, we have and therefore for each admissible policy V and initial state x, which finishes the proof.
Inspired by the (s, S) policy in the traditional inventory models, the following lemma shows that a pair of (s, S) policy will make ρ = ρ 0 .
According to the conclusion of Lemma 3.3, if we can construct (ρ, u), with respective to a pair (s, S), satisfies (3.12), and verify they also satisfies (3.7). Then ρ is equal to the long time average cost ρ 0 and (s, S) is the optimal policy. The steps for construction along this line are given by the following subsection in detail.

3.1.
Construction of the solution. According to Lemma 3.3, we will construct the solution to (3.12). As u s (x) ∈ U, it is continuously differentiable, therefore we have u s (s) = −c. From the notation of S, it should also satisfy u s (S) = −c.
Notice that if (s, u s (x), ρ) satisfies (3.12), then the second equation of (3.12) can be rewritten as

3.1.1.
Step (a): Construct u s (x). First, for any fixed number s, we consider It satisfies We let f (0) = 0. The solution of (3.18) can be derived by considering the Green f unction It can be solved by considering the function χ(x), which is the solution of with boundary condition χ(0) = exp(− k σ 2 γ 2 ), χ(+∞) = 0. The unique solution of (3.21) is Then the unique solution of (3.20) is Now we are ready to construct H s (x).

Lemma 3.4. The unique bounded solution of (3.18) is
and therefore, That is, H s (x) constructed by (3.25) is the solution of (3.18). We will prove that the bounded solution of (3.18) is unique and verify that H s (x) of (3.25) is bounded, then (3.25) is just the solution of (3.18).
To show that the bounded solution of (3.18) is unique, it suffices to show that the bounded solution of On the other hand, the definition of Y (x) shows that From the expression of Φ(x) and ϑ(x), we know Y (x)Φ 2 (x)ϑ(x) is bounded and approaches to 0 when x → +∞. Due to this property, it follows from (3.33) that With some calculations, we have Therefore, when x > 0, which implies H s (x) is bounded. Now we are ready to derive ρ and construct u s (x). For x ≥ s, (3.12) can be rewritten as On the other hand, by integrating (3.18) from s to x, we have Proof. From the construction, it can be seen that u s (·) is continuously differentiable, has bounded derivative, and has a continuous second derivative at all but s. Also the construction of u s (·) gives Au s (x) + ρ − f (x) = 0 f or x > s.

3.1.2.
Step (b): Decide the value of S. From (3.13), it can be seen that S depends on the value of s. For any fixed s, we let S(s) be the value where u s (x) + cx − cs attains its global minimum. We will show such an S(s) exists, which satisfies H s (S(s)) = −c. 1
Proof. We will prove the first item by advantage of Lemma 3.3. Lemma 3.5 shows the first equation of (3.12) is satisfied, Γ(s * ) = −K means that the second equation of (3.12) holds. Then we only need check the complementary slackness condition. When x ≤ s * , (3.52) We need also show that When x < s, from the assumption p < ck and the fact that x < s * < 0, we have and Au s * (x) + ρ = Au s * (s * ) + ρ = ρ, thus, we obtain the first complementary slackness condition (u s * (y + η) + c(y + η) + K).
That is, (3.54) It remains to show the case s * < x < S * . From the assumption that u s * (x) ≤ −c on x < S * , we have which finishes the proof from Lemma 3.3. Now the remains are to calculate the value of ρ 0 and (s, S). The explicit solution of (3.21) can be rewritten as .
and (3.39), we have with some calculations for integral, the expression of H s * (x) and ρ 0 and (s * , S * ) can be derived. The calculations are given in Appendix.
Remark 3.1. When k = 0, the case is reduced to the one in [23]. If x > 0, The first term of the right hand If x > 0, the first term of the right hand of (A.2) is .
If x ≤ 0, the first term of the right hand of (A.2) is If x > 0, the second term of the right hand of (A.2) is +g − K 2 [−s * − (−λ erf (−λ ) + e −λ 2 √ π ) + (z 2 (s * )z 1 (s * ) + e −z 2 2 (s * ) If x ≤ 0, the second term of the right hand of (A.2) is If x < 0, the second term of the right hand of (A.1) is ) +∞ x g (η)χ(η)dη Similarly, if x ≥ 0, the second term of the right hand of (A.1) (A.6) Summarizing all the terms of (A.1), we have the explicit expression of H s * (x). As and H s * (s * ) = 0, so we only need calculate the second term of (A.8).
Since s * < 0, we just consider the case x < 0, .

(A.9)
Substituting x with s * in the above equation, we get the expression of ρ 0 .