OPTIMAL IMPULSE CONTROL OF A MEAN-REVERTING INVENTORY WITH QUADRATIC COSTS

. In this paper, we analyze an optimal impulse control problem of a stochastic inventory system whose state follows a mean-reverting Ornstein- Uhlenbeck process. The objective of the management is to keep the inventory level as close as possible to a given target. When the management intervenes in the system, it requires costs consisting of a quadratic form of the system state. Besides, there are running costs associated with the diﬀerence between the inventory level and the target. Those costs are also of a quadratic form. The objective of this paper is to ﬁnd an optimal control of minimizing the expected total discounted sum of the intervention costs and running costs incurred over the inﬁnite time horizon. We solve the problem by using stochastic impulse control theory.

1. Introduction. We investigate an optimal control problem of a stochastic inventory system which the inventory level follows a mean-reverting Ornstein-Uhlenbeck stochastic process. There are intervention cost and running cost, what's more, they are also of a quadratic form. Our objective is to find an optimal impulse control strategy. Under the optimal strategy, the management can determine the optimal inventory levels at which interventions should be implemented as well as the magnitudes of the inventions that minimize the total expected cost.
Many authors have studied transaction cost in mathematical finance and economics (see, for example, Cadenillas [7]). In general there are two types for transaction cost. One is proportional cost which depends on the magnitude of control. The other is fixed transaction cost which is independent on the magnitude of control. We can solve the problem by applying the stochastic control theory when dealing with optimization problem of management under uncertainty(see, for example, Cadenillas and Zapatero [6] and Yao et al. [22]). They solve them by using impulse control theory, which is approached via quasi-variational inequalities (QVI) introduced by Bensoussan and Lions [1]. However, it is unreasonable to consider linear one in some situations. Accounting to the Transaction Cost Economics(see, for example, Suzanne Young [23]), when certain factor changes in the actual production process(such as production or amount of intervention over a certain range of related), the unit fixed costs and variable costs will increase gradually, in which total transaction cost is nonlinear. In the stochastic optimization theory, the impulse control with quadratic costs constitute an extremely important category, since they can model many practical problems, more importantly, many impulse control problems with nonlinear costs can be reasonably approximately by the quadratic costs. In this view point, stochastic optimal impulse control with quadratic costs provide a basic knowledge for the general costs problem.
Such type of inventory control is applicable widely, such as dealing with stock value and commodity products. Stock value is represented by stock price directly or indirectly. After market price deviates from its intrinsic value, it will follows a dynamic moving process, either bubble or mean reversion. We can understand the mean-reverting Ornstein-Uhlenbeck stochastic process that the supply and demand in the medium and in the long term for goods from the outside will remain more or less stable. That is, if inventory recently has gone down because of a strong demand, one could expect the demand in the near future to be weaker, such that the inventory reverts back toward its preferred target. Another application of a mean-reverting process is the inventory of shares in a particular company held by a specialist who is responsible for trading in that company's shares. We note that it could happen that the inventory level moves too far from its target, and the management should take immediate action to control it (see, for example, Miao et al. [14] and Bouasker et al. [3]).
In this paper, we apply the theory of stochastic impulse control (see the book by Øksendal and Sulem [16]) to study the problem described above. The application of impulse control approach to mathematical finance, medical science and economics has been investigated among many studies (see, for example, Cadenillas and Zapatero [5], Constantinides et al. [9], Vela et al. [20] and Gescheidt et al. [11]). Cadenillas et al. [8] studied an inventory model of a company which the inventory level follows a mean-reverting Ornstein-Uhlenbeck stochastic process while its transaction cost is linear. Ohnishi and Tsujimura [15] considered an optimal impulse control of a geometric Brownian motion with quadratic transaction cost in which the management is only allowed to reduce the system state. In our paper, we investigate an optimal control of a mean-reverting inventory model which the cost rate function and the transaction cost function are both of a quadratic form.
We make some simple comparisons of our paper with others in the literature and show our novelties. First, we notice that in most literature (see, for example, Weerasinghe et al. [21], Bertola et al. [2], Feng et al. [10] and Pan et al. [18]), the formulations of inventory controls do not treat with nonlinear transaction costs. However, in this paper the quadratic transaction cost situation will be considered. Second, it is worth noting that our model is the generation of the model discussed by Cadenillas et al. in [8]. When k 21 and k 22 are equal to zero, our reduced results are consistent with the results in [8]. Third, our paper contributes to theoretical analysis for impulse control approach which has allowed us to perform a comparative statics analysis with nonlinear transaction costs. More importantly, our model can be used in many economic and management problems such as dividend policy, control of interest rate, cash management and product management.
The remainder paper is organized as follows. We first give a detailed description of the inventory model in Section 2, and then we characterize the value function in Section 3. In Section 4 we analyze and obtain the solution. Section 5 is devoted to numerical solutions and comparative static analysis. We finally conclude the whole procedure and propose some topics for further research in Section 6.
2. Model description. We consider an inventory model and use a Brownian motion to characterize the uncertainty in the inventory level. Let (Ω, F , {F t } t≥0 , P ) denote a complete filtered probability space, where {F t } t≥0 satisfies the usual conditions, a right continuous increasing family of complete sub σ-algebra of F . We assume that filtration {F t } t≥0 is the natural filtration of a one-dimensional Brownian motion W in R such that F 0 contains all P -null elements of F . The evolution of the system can be described by the stochastic process X = {X(t), t ≥ 0}, where X t corresponds to the inventory level at time t.
We suppose that X is an adapted mean-reverting Ornstein-Uhlenbeck stochastic process given by where x > 0 is the initial inventory level, k > 0 is the speed of mean reversion, ρ ∈ (−∞, ∞) is the long-term mean of the process X, σ > 0 is the volatility, τ i is the time of the ith intervention, and ξ i is the intensity of the ith intervention. We observe that, in the special case in which there are no interventions X is just an Ornstein-Uhlenbeck stochastic process. In addition, if k were equal to zero, then we could get that X follows a Brownian motion without drift.
We assume ξ 0 = 0. We also suppose that the management decides to act at time τ i by adding ξ i to the inventory level process at time τ i . That is, We note that ξ i can also take negative values, because the management is allowed to reduce in inventory level. However, it is more reasonable to guarantee that the inventory level remains positive. Therefore, we will consider only those impulse controls such that Problem 2.1. The management wants to select the pair v that minimizes the function J defined by where λ > 0, k 01 , k 02 , k 21 , k 22 , k 11 , k 12 ∈ (0, ∞), ρ ∈ (0, ∞).
Here, f corresponds to the running cost incurred by deviating from the aimed inventory level ρ (we consider the square of x − ρ because we assume that both upward and downward deviations from that target ρ are equally undesirable), K presents the transaction cost, and λ is the discount rate. Furthermore, k 01 and k 02 are the fixed cost per intervention when the management pushes the inventory level upward and downward, respectively. k 11 and k 12 correspond to the proportional cost per intervention when the management pushes the inventory level upward and downward, respectively. k 21 and k 22 represent the quadratic transaction cost per intervention when the management pushes the inventory level upward and downward, respectively.
In our model, there is a preferred optimal inventory level ρ for the company. When there is no volatility, the inventory level process would be reduced to The management can implicitly conduct the inventory toward the preferred level ρ at a speed k without paying any costs. Nevertheless, when there is too much volatility, the inventory level may go far away from the target ρ, and the management will have to pay transaction costs to conduct the inventory toward ρ. This explanation of our model is agree with some of the application of inventory theory in financial economics. Interested readers may refer to Madhavan and Smidt [13].
We should consider only those strategies for which J is well defined due to the fact that our objective is to minimize the function J. In order to define the expectation well, we need that the two expected values on the right-hand side be finite. However, it is easy to get that the condition implies In fact, condition (7) implies

For the expectation
and To obtain the inequality (8), we need that Using the same method, we can see that the inequality (10) implies the inequality (9). To obtain the inequality (10), we need that Indeed, we can prove that the inequality (7) and (10) imply the condition (12). The following lemma verifies our conjecture.
Lemma 2.2. Let X v t be stochastic process given by (1). Suppose that X v t satisfies the conditions (7)and (10), then Proof. Let v be an impulse control, according to (1), we can get , then Y t is also a stochastic process. According to the Itô's formula, we can arrive at Then according to the formula of integration by parts (see, for instance, Rogers and Williams [19]), for every 0 < s ≤ t < ∞, Because condition (7) implies that E t s σe −λu X v u dW u = 0, and the condition (7), (9) and (10) imply that, for every 0 ≤ t n ↑ ∞, the sequence E e −λtn (X v tn ) 2 is a Cauchy sequence, and therefore converges to a nonnegative number. That number must be zero. Otherwise, we would get This completes the proof. (7), (11) and (12) are satisfied. We denote A (x) the set of admissible impulse controls.
Example 1. We consider the special case of no intervention, that is, P {τ 1 = ∞} = 1. Let X be the inventory level in that case, and we would get

So that
and Then, we can easily claim that the process X satisfies all conditions (7), (11) and (12).
Let v 0 represent the control under which the management does not intervene in the system. v 0 leads to the following expected present value of the cost: 3. The value function. In this section, we study the value function of the Problem 2.1 and present a verification theorem. Let V denote the value function, that is, for every x ∈ [0, ∞), Suppose that φ : (0, +∞) −→ [0, +∞) is a candidate for the value function. Let M denote the intervention operator on the space of functions φ defined by M φ(x) represents the value of the strategy that consists of choosing the best immediate intervention. Let us define an operator L of the X v by In this paper, we assume that φ is C 2 except on the boundary of the considered region. φ is called stochastically . Now we give the following equality that will be used lately.
Now we start to characterize the value function and an associated optimal strategy.
Assume that there exists an optimal strategy for each initial point. If the system initial state is x and follows the optimal strategy, then the cost function under this optimal strategy is φ(x). However, if the process starts at x and selects the best immediate intervention, and then follows an optimal strategy, then the cost function associated with this optimal strategy is M φ(x). therefore, φ(x) ≤ M φ(x). Furthermore, these two costs are equal when it is optimal to jump. Hence, φ(x) = M φ(x) when it is optimal to intervene. When the manager doesn't intervene, we must have L φ(x) + f (x) = 0 by using the dynamic programming principle.
We write down above conclusions in the form of next two definitions and a theorem.
We note that a solution v of the QVI separates the interval (0, ∞) into two disjoint regions: a continuation region According to a solution of the QVI, we are able to establish the following impulse control. Definition 3.3. (QVI-control) Let φ(x) be a solution of QVI, then the following impulse controlṽ = {(τ i ,ξ i )} i≥1 is called the QVI-control associated withṽ: here, Xṽ t is the result of applying the impulse controlṽ.
This definition shows that the manager intervenes whenever φ and M φ coincide and the size of the intervention is the solution to the optimization problem corresponding to M φ. Now we should prove that a QVI-control is an optimal impulse control. To the best of our knowledge, Cadenillas et al. [8] considered stochastic impulse control problem in which the dynamic of the uncontrolled process is a mean-reverting process. However, its transaction cost is linear. Ohnishi and Tsujimura in [15] considered an impulse control of a geometric Brownian motion with quadratic costs. The following Theorem 1 is a minor modification of Theorem 3.1 of Brekke and Φksendal in [4]. Since the control which they discuss is combined continuous control and impulse control, we cannot directly apply their result to our problem. Here, we repeat a modified version of Brekke and Φksendal in [4], and we just present the theorem. Interested readers may refer to Theorem 3.1 in [15] for more details. (1) lim t→∞ e −λt φ(X v t ) = 0, a.s. v ∈ A (x); (2)The family φ(X v τ ) τ <∞ is uniformly integrable with respect to P for v ∈ A (x). Then for every x ∈ (0, ∞), we have Furthermore, if the QVI-control corresponding to v is admissible, that is,ṽ ∈ A (x), thenṽ is an optimal impulse control, and for every x ∈ (0, ∞), 4. The solution of the QVI. In this section, we analyze the solution of the QVI and give our main result in Theorem 4.1. We conjecture that there exists an optimal solutionṽ = (T ,ξ) characterized by four parameters a, α, β, b with 0 < a < α < β < b < ∞ such that the optimal strategy is to stay in the band [a, b] and jump to α and β when reaching a and b, respectively. Thus, we conjecture that the optimal impulse control is

OPTIMAL IMPULSE CONTROL OF A MEAN-REVERTING INVENTORY
whereX denote the trajectory determined by (T ,ξ). In addition, we would also expect that if x > b, then the optimal strategy would be to jump to β; and if x < a, then the optimal strategy would be to jump to α. Hence, we conjecture that the value function would satisfy and (31) from (30) and (31), we have and (33) In addition, we also hope that V were differentiable in {a, b, α, β}, then from equations (32) and (33), we obtain Here, because we hope that the minimum of V (y) + k 01 + k 21 (y 2 − a 2 ) + k 11 (y − a) is attained at y = α, and the minimum of V (y) + k 02 + k 22 (b 2 − y 2 ) + k 12 (b − y) is attained at y = β, then equations (36) and (37) hold. We also conjecture that the continuation region is (a, b). Thus, we have This is a second order ordinary differential equation, we get the general solution of it by using the power series method. Now we present the solution D(x) as follows: where A and B are real numbers, The coefficient of the series E and F are given by Obviously, we consider the convergence of the power series E and F , and note that E converses absolutely in any interval of the form (ρ − M, ρ + M ), where M < ∞. Similarly, the power series F converses absolutely in any bounded interval. Furthermore, we also find that E(ρ) = 1, F (ρ) = 0, E (ρ) = 0 and F (ρ) = 1. Hence 2k) and B = D (ρ).
In our problem, we should note that A should less than or equal to zero. In summary, we conjecture that the solution of our problem is described by (28) and (29), and that the six unknowns A, B, a, α, β, b with 0 < a < α < β < b < ∞ are a solution to the systems of six equations: where .

Remark 1.
We observe that if k 21 and k 22 were equal to zero, then Problem 2.1 is consistent with Problem 2.1 in Cadenillas et al. [8].
Next we show the candidate function of the value function of Problem 2.1 satisfies the QVI under some additional conditions. Thus the candidate function is the value function, then the QVI-control induced by the candidate function is optimal. Now we are going to prove rigorously that the above conjecture is valid.
If the following conditions are satisfied: Then, the function V defined in (47) is the value function of Problem 2.1. That is, Furthermore, the optimal strategy is given by (28) and (29).
If x ∈ (a, b), it is clear from equation (39) and the derivation of (47) that The right-hand side of equation (55)(denoted by h 1 (x)) is a quadratic function of x, we require h 1 (x) is non-negative. We observe the coefficient of x 2 is positive, then we have to show that a < x − , where x − is the smaller foot of h 1 (x). This inequality is guaranteed by (49). Hence If (47) and (50) were satisfied, we would get (ii) Inequality (21). According to the definition of the intervention operator M in (17) and conditions (51)-(53), we obtain That is, V (x) − M V (x) is equal to zero in the intervention region (0, a] [b, ∞) and is negative in the continuation region (a, b) because of conditions (51)-(53). Therefore, V satisfies inequality (21).
In part (2), we show Theorem 4.1 is established. For any v ∈ A (x), referring to the part (1), we can note that V is stochastically C 2 with respect to X v and is a solution of the QVI. It is obvious that conditions (1) and (2) of Theorem 1 are satisfied, so V is also the value function of Problem 2.1 by Theorem 1. Furthermore, the QVI-control associated with v would be admissible. Indeed, the trajectory X v generated by the QVI-control associated to v behaves like a mean-reverting process in each random interval (τ n , τ n+1 ) and satisfies P {∀t ∈ (0, ∞) : X v t ∈ [a, b]} = 1. Thus, conditions (7), (11) and (12) would be satisfied, and the QVI-control associated to v would be admissible. Hence, the impulse control given by (28) and (29) is optimal. This completes the proof.
5. Numerical solutions. In this section, we provide some numerical solutions and analyze the effects of given parameters.
The solution of Problem 2.1 considered in this paper involves the numerical solutions of the system of six equations (41)-(46) with six unknowns: A, B, α, β, a and b. We have written a computer code using Matlab to solve the systems equations (41)-(46). We now present an example to illustrate the procedure.  The graph of D(x) and its derivative in the continuation region ( a, b ) are given in Figures 1 and 2, respectively. From Figures 1 and 2, we can get the values of V and V in the continuation region ( a, b ). In the interval ( 0, a ], V is a linear function with slope −2k 21 , and in the interval [ b, +∞ ), V is a linear function with slope 2k 22 . However, in the interval ( a, α ), V is smaller than −2k 21 x − k 11 , and in the interval (β, b ), V is larger than 2k 22 x + k 12 . Therefore, we obtain that the function V defined in (47) is the value function of our problem according to Theorem 2 . Furthermore, the optimal strategy is to stay in the band [ 0.7090, 8.1028 ] and jump to 3.1358 and 5.5454 when reaching 0.7090 and 8.1028, respectively.

Figure 2. Function D (x)
Next, we begin to analyze the effects of changes in the parameters on the optimal strategy of our problem. In Table 1 we provide the effects of changes in the speed of mean reversion k. It is clear that if k increases, then a decreases, b increases, α − a increases and b − β increases. In other words, when the speed of mean reversion increases, it is optimal to wait longer before intervening and the sizes of the intervention will be larger.  Table 2 describes the effects of changes in the discount rate λ. As is to be expected, we observe that if λ increases, then a decreases, α decreases, β decreases, b increases, α − a increases, and b − β increases. This agrees with intuitive expectations, because the discount rate basically represents the inflation rates. Therefore, when the discount rate increases, it is optimal to postpone intervention.  Table 3 shows the effects of changes in the volatility σ. As it can be see from Table 3 that the larger the volatility, the higher the level of intervention b, and the lower the level of intervention a. We can see that there is a fixed cost of intervention, so if the volatility increases, then the management would wait longer to intervene, and enlarge the magnitude of the intervention. In Table 4 we present the effects of changes in the target ρ. It is apparent that when ρ increases, a and b increase, but α−a and b−β decrease. This is as expected. Therefore, the manager need cut down the size of intervention. Table 5 illustrates the effects of changes in the transaction cost k 21 , k 22 , k 11 , k 12 , k 01 , and k 02 . We observe that when k 21 , k 11 and k 12 increase, a decreases, b increases, α − a decreases, and b − β decreases. However, if k 22 increases, then a decreases, b increases, α − a decreases, and b − β increases. It is easily explained taking into account the fact that when transaction cost k 21 , k 11 , and k 12 increase, it is optimal to wait longer before intervening but the interventions will be smaller. However, when transaction cost k 22 increases, the size of the intervention on the increasing side will be smaller, and the size of the intervention on the reducing side will be larger.
Finally, we study the effect of the fixed costs. As it can be see from Table 5 that when fixed cost k 01 increases, then a decreases, b increases, α − a increases, and b − β decreases. Furthermore, when fixed cost k 02 increases, then a decreases, b increases, α−a decreases, and b−β increases. This agrees with intuitive expectation. Therefore, when a fixed cost increases, it is optimal to wait longer before intervening in any direction. It is interesting that changes in the fixed cost of intervention only on one side of the target ρ will also affect the optimal strategy on the other side of the target.  6. Conclusion. This paper has analyzed an optimal impulse control problem of a mean-reverting inventory with quadratic costs. The objective of this paper is to find an optimal impulse control of minimizing the expected total discounted sum of the intervention costs and running costs incurred over the infinite time horizon. We have proved that the QVI-control is the optimal impulse control under some suitable set of sufficient conditions on the given problem parameters.
There are some interesting topics for further research. First, we could generalize the system state such as a Lévy process, which means that the system state may have jumps. Second, this study enable us to consider a generalized convex function forms for the transaction cost function. Third, we could consider the quadratic installation cost case in our model. Finally, we could study how to minimize the expected long-run average cost. These interesting extensions are left as future's works.