OPTIMALITY OF ( s, S ) POLICIES WITH NONLINEAR PROCESSES

. It is observed empirically that mean-reverting processes are more realistic in modeling the inventory level of a company. In a typical mean- reverting process, the inventory level is assumed to be linearly dependent on the deviation of the inventory level from the long-term mean. However, when the deviation is large, it is reasonable to assume that the company might want to increase the intensity of interference to the inventory level signiﬁcantly rather than in a linear manner. In this paper, we attempt to model inventory replenishment as a nonlinear continuous feedback process. We study both inﬁnite horizon discounted cost and the long-run average cost, and derive the corresponding optimal ( s,S ) policy.


1.
Introduction. In inventory control problems, stochastic models account for the randomness in demands [3]. However, the problem becomes complicated with the addition of a number of parameters [1], such as fixed cost, variable cost, order latency and differentiate decision models [24] in the supply chains. Although numerical techniques can be easily applied for solving the inventory policies for both discrete [13,15,27] and continuous models [22,23,25], analytic solutions are still appealing as practitioners can gain more insights. It is possible to derive some closed form solution to the problem under certain conditions, such as under infinite time horizon. Indeed, under different demand models, closed form solutions have been derived. In the literature, the optimal policy has been derived for the 2. Problem formulation. We first formulate the stochastic inventory problem under consideration. Suppose that W (t) is a Wiener process defined on a given probability space (Ω, A, P). In the absence of intervention, the inventory in the interval [0, t] can be described by the process where γdt + σdW (t) represents the external accumulated demand per unit time, f (x) is part of the mean reversion process satisfying In the literature, f (·) usually takes a linear form. The mean reversion of the process can be explained by the assumption that if inventory recently has gone down because of a strong demand, one could expect the demand in the near future to be weaker, allowing the inventory to revert back toward its preferred target [3,8].
Here we extend f (x) to be a nondecreasing nonnegative function defined on R. The term f (·) captures an inventory dependent demand, or a materialized deterioration of the inventory. If f (·) takes the mean reversion, it is reduced to the case discussed in [3]. If f = 0, it recovers a demand model when γt + σW (t) represents the external accumulated demand on (0, t). Denote An impulse control is a sequence (θ n , v n ), n = 1, 2, · · · , where θ n is a stopping time with respect to F t and v n is a F θn -measurable random variable. Here θ n denotes the nth order and v n denotes the amount at time θ n . The cost for ordering an amount of v n > 0 is given by where K > 0 is the fixed set up cost of ordering and c denotes the unit cost for each item ordered. Let V denote an impulse control, the corresponding inventory level can be described by the formula where x is an initial inventory level, γ > 0 is a constant rate of demand, σ > 0 and Here we first consider a discounted cost objective function. Let α > 0 be a specified discount rate. For any given initial inventory level x and an ordering policy V , we define the discounted cost as where g(x) = hx + + px − denotes the storage cost when x > 0 and the backlog cost when x < 0. Define the value function associated with (5) by In order that (5) is well defined, we let V denote all V satisfying the following conditions The impulsive control V is said to be admissible if V ∈ V. The problem is to find 3. The results with discounted problem. To solve the discounted problem, we first define the operator From the dynamic programming principle, the value function u α (x) satisfies the following Q.V.I.: where The derivation of Q.V.I. (9) follows the standard techniques described in Bensoussan and Lions [5] and Bensoussan [3]. We will sketch the proof in Appendix. We will study (9) in a continuous functional space and verify that it is equal to the solution to the value function (6) by a classical verification argument. We refer to [5] for the general theory of impulse control and Q.V.I. To simplify the second inequality of (9), we apply the transformation Then solving (9) is reduced to find G α (x) which satisfies whereg (x) := g(x) + c(αx + f (x)).

OPTIMALITY OF (s, S) POLICIES WITH NONLINEAR PROCESSES 165
We require G α (·) to be C 1 with linear growth. Naturedly, u α (·) has the same properties. For any fixed s, let S α (s) denote the point where G α,s (x) attains its smallest minimum, then G α,s (S α (s)) = 0.
Instead of (10), we will construct a pair (s, G α,s (x)), which is the solution of It follows from (10) that G α,s (s) = 0 and G α,s (S α (s)) = 0. Then the construction will be proceeded with the following three steps: (a) First of all, for any fixed s, solve the first equation in (11) with the condition G α,s (s) = 0 and obtain a C 1 solution G α,s (·). (b) Show that S α (s) exists, which satisfies the condition G α,s (S α (s)) = 0. (c) Use the second equation in (11) to determine a unique optimal s, which is denoted by s α . This method leads to a unique function, which is C 1 , and a unique pair (s α , S α ). These steps will be described in details in the following subsection 3.1-subsection 3.3. Later we will verify that the function constructed in this way satisfies the original Q.V.I. (10) of the inventory problem, which will be given as Theorem 4.2.

3.1.
Step 1: For any fixed s, construct G α,s (x). Consider Because of the regularity, we necessarily have G α,s (s) = 0. Denote H α,s (x) := G α,s (x), then it satisfies Notice thatg (x) = −p + c(α + f (x)) if x < 0, we make the assumption To solve (13) we take advantage of the Green f unction, which is derived from The following lemma shows the solution properties of (15).
(i) If Ψ α (x) > 0 and if there exits x <x satisfying Ψ α (x) ≤ 0, then a local minimum exists, where the first derivative is zero and the second derivative is positive, which contradicts (19).
From Case 1 and Case 2, we get the desired result.
It follows from (17) that .
Therefore, there existsx < 0, when x <x, Thus α M , which contradicts the fact that lim x→−∞ χ α (x) is bounded. This completes the proof.
Remark 3.1. When f (x) takes the linear form as in [3], lim x→−∞ χ α (x) = +∞ becomes more obvious. The proof in Lemma 3.2 is more involved to show that lim inf x→−∞ χ α (x) ≥ 0, which is not only needed in the proof lim x→−∞ χ α (x) = +∞, but also for later use in Lemma 3.8. Now we are ready to construct a bounded solution of (13) by the following lemma. then is a bounded solution of (13).
Proof. Obviously, H α,s (x) constructed by (22) is the solution of (13). The remaining work is to show that it is bounded. We first show that and (23) is the bounded solution of Denote Obviously, Z α (x) is the bounded solution of (24). To show (23) holds, we denote (24); it can be seen that On the other hand, the definition of Y α (x) means that is bounded and approaches to 0 when x → +∞. Thus, it follows from (26) that Substituting (25) into (27) leads to We can now go to the proof of (22). We prove it directly from the expression of H α,s (x): where the third equation follows from (23). From the relation Thus which means that H α,s (x) is bounded.
The following lemma guarantees that the bounded solution of (13) is unique.
The bounded solution of (13) is unique.
Proof. It is sufficient to show that 0 is the unique solution of In fact, if there exists which contradicts (13). Thus (a) cannot hold.
From (a) and (b), we know that H α,s (x) > 0 cannot be true and H α,s (x) cannot be positive when x > s; therefore we must have Similarly, H α,s (x) cannot be negative. Therefore, the unique bounded solution of (30) is 0. Consequently the unique bounded solution of (13) is (22). By Lemma 3.3 and Lemma 3.4, we can conclude that H α,s (x) defined by (22) is the unique bounded solution of (13). Let's construct G α,s (x) from H α,s (x). For x ≥ s, (12) can be rewritten as On the other hand, by integrating (13), we have Comparing (31) and (32) on [x, s], we have which decides the value of G α,s (s). For any given s, define Obviously, G α,s (x) constructed by (34) and (33) is in C 1 .

3.3.
Step 3: Decide the value of s. Notice that the second equation of (11) is to decide the value of s satisfying Denote Γ α (s) := Sα(s) s H α,s (y)dy. We have the following lemma about Γ α (s).
Next we construct the proof of (40) by the following lemma. One way is to approximate Φ α (x) as suggested in the previous work [3]. However, it requires more algebra in the process. Here a shorter proof is provided as follows.
We denote the pair(s, S α (s)),which satisfies γ α (s) = −K, by (s α , S α ). By summarizing the lemmas above, we come to the main result of this work.
Theorem 3.1. The function G α,s (x) defined by (11) is equal to the solution G α (x) of Q.V.I. (10). The strategy (s α , S α ) is optimal, which can be derived from the relation where and χ α (x) is the solution of (17).
Proof. The proof is given in Appendix.

4.
The long run average cost. One important problem is to study the behavior when the discount factor approaches to 1. Although the objective function tends to infinity, an average cost function can be employed instead. This is referred as the ergodic control problem. Define where N (T ) = max{i, θ i ≤ T }. The problem is thus to investigate the inventory control problem with an ergodic cost criterion of minimizing (53): Denoteũ α (x) := u α (x) − u α (s α ). By the "vanishing discount method", we will show (ũ α (x), αu α (s α )) converges to a pair (u(x), ρ), which is the solution of Q.V.I.: followed by showing that ρ is equal to the value of ρ 0 . To simplify the algebra, denoteG In the following sections, instead of proving the convergence of (ũ α (x), αu α (s α )) and solving (54), we prove that (G α (x), αG α (s α )) converges to the solution (G(x), ρ) of the Q.V.I.: AG(x) + ρ ≤g(x) + cγ, and solve ρ from the transformed problem (55).
Proof. We first prove that H α (x) uniformly converges to H(x) on any compact set of R, thenG α (x) converges to G(x). The idea is to prove that Φ α (x) and χ α (x) uniformly converge to Φ(x) and χ(x). The details for the proof are given in Appendix.
Proof. The proof is similar to Lemma 3.3, which is omitted here.
Based on Theorem 4.1 and Lemma 4.1, we obtain the following important results.
and Proof. The proof can be achieved by repeating a similar procedure to that in the discounted case in Theorem 3.1. The procedure is to show firstly that (G s (x), ρ) is the solution of (56), then it is the solution of Q.V.I. (55).
Obviously, from the relation G α (x) = u α (x) + cx, we have G(x) = u(x) + cx. Moreover, it is the same ρ and the same optimal strategy for (54) and (55). The following verification theorem says the long run average cost ρ 0 can be obtained by the value ρ from Q.V.I. (54), and (s, S) policy is the optimal control.
The proof is given in Appendix.

5.
Conclusion. In this paper, we have considered a new stochastic inventory control problem under a nonlinear process which varies depending on the current inventory level. With the formulated model, we have derived and proved the optimality of the (s, S) strategy, and shown that the strategy is unique. The strategy is reduced to the case in [3] when f (x) takes a linear form. Furthermore, we have considered the limiting case with ergodic control when the discount factor vanishes under the nonlinear process using the long run average cost function. Again we have derived the (s, S) strategy and proved the optimality. We hope this work can shed light on the nonlinear inventory control process and can be used for practical problems.
6.1. The derivation of (9). From the dynamic programming theory (See Bensoussan [3];Bensoussan and Lions [5]; Fleming and Soner [12]). It can be shown that u α satisfies the dynamic programming principle, If an order with quantity v is made at the initial time, then K + cv should be paid and the inventory level becomes x + v. If we proceed optimally from now on, the best we can obtain is Therefore, If no order is made at the initial time, we assume that the first order is made after time δ. Then δ n=0 (K + cv n ) exp(−αθ n ) = 0.
Substituting δ with t in(5) results in Applying Ito lemma to exp(−αδ)u α (y x (δ, V )) leads to Let u be the solution of (9). By Ito's lemma 1 , for any policy V ∈ V, we apply the Ito differential rule to u α (x)e −αt : (Au α (y x (s, V ))ds + αu α (x s ))ds e −αθn (cv n + K). That is, for any admissible V , we have and therefore, Let V s * ,S * denote the (s * , S * ) policy. It is defined as follows. Let Otherwise, there is a replenishment, and u(y x (θ n , V s * ,S * )) = u(y x (θ − n , V s * ,S * )) + K + cv n , n = 1, · · · .
In the next proposition we show E +∞ 0 Then lim t→∞ E(e −αt |y x (t, V s * ,S * )|) = 0, and the inequality of (A.11) becomes an equality. That is, and the policy (s * , S * ) is optimal. (A.13) When n ≥ 1, θ n < t < θ n+1 , the dynamics y(t) Applying Ito Lemma to y 2 (t)e −αt leads to where the inequality follows from that f (y) = f (0) + y which yields to yf (y) ≥ yf (0). By integrating y 2 (t)e −α(t) from θ n to θ − n+1 and taking expectation, we have Similarly, we have It follows from (A.17) and (A.18) that That is, E(e −αθn ) < ∞ when n → ∞, we obtain the desired result. In the remainder we show this property.
It is easy to show that Ee −ατ = u(S) < 1. Therefore, which finishes the proof of (A.20).
6.3. The proof of Theorem 4.1. The following Lemma 6.1 and Lemma 6.2 will be needed for the proof of Theorem 6.1 below, which in turn is the main result for proving Theorem 4.1.
So χ lim (x) is the solution of (A.25).
Theorem 6.2. There exists M such that S α ≤ M for small enough α.