A STOCHASTIC NEWSVENDOR GAME WITH DYNAMIC RETAIL PRICES

. We extend stochastic newsvendor games with information lag by including dynamic retail prices, and we characterize their equilibria. We show that the equilibrium wholesale price is a nonincreasing function of the demand, while the retailer’s output increases with demand until we recover the usual equilibrium. In particular, it is then optimal for retailer and wholesaler to have demand at least equal to some threshold level, beyond which the retailer’s output tends to an upper bound which is absent in ﬁxed retail price models. When demand is given by a delayed Ornstein-Uhlenbeck process and price is an aﬃne function of output, we numerically compute the equilibrium output and we show that the lagged case can be seen as a smoothing of the no lag case.


1.
Introduction. The newsvendor problem is a two-staged game in which a wholesaler sells products to a retailer, and the retailer sells to consumers in a market according to a demand constraint. The retailer's optimal quantity is a function of the wholesaler's quoted per-unit price. Such problems have yielded a rich literature, see [6] for an overview of known results and possible extensions, such as demand that is a function of retail prices or quantities and variable wholesale prices. In [4] an extended model in which both retail price and quantity are under the control of the retailer and demand depends on the retail price, is presented.
Stochastic differential Stackelberg games have been studied via the maximum principle, and applied to continuous time newsvendor problems with exogenously fixed retail prices in [2], where demand is modeled by an Itô-Lévy process, and information about the demand is lagged. In this framework, the retailer can only sell a quantity limited to a level D t ≥ 0 of products at time t ∈ [δ, T ], with δ > 0. In this setup, information about demand is lagged, that is to say, δ > 0 represents the magnitude of the information lag in the following way: at time t − δ firms have to decide on their strategies for time t, unknowing of D t i.e. demand at time t. This approach relies on the vanishing of derivatives of Hamiltonians at any time t in order to determine the equilibrium values of the controls.

IDO POLAK AND NICOLAS PRIVAULT
This paper extends the framework of [2], in which retail prices were exogenously fixed, to newsvendor games in which the maximum price that can be charged in the market when selling quantity q t at time t, 0 ≤ q t ≤ D t , is given as a function P (q t ). In our framework, purchasing from the retailer more than demanded by the market is costly to the retailer, not only because he pays for buying an excess supply, but also because it forces down the price that he can charge to sell off d units. The excess, unsold supply will be disposed of at zero cost. Although we present our model in its most general form, we will explicitly treat the case where the market price P (q t ) is of the familiar affine form P (q t ) = a−bq t for a, b > 0, with numerical illustrations. We also treat the limiting case δ = 0 in which everything about the games is deterministic as there is no uncertainty about the demand d t . We rely on optimal stochastic control of diffusions, cf. [1] and [3], where necessary and sufficient maximum principles are derived for both the zero sum as well as the nonzero sum cases using forward-backward stochastic differential equations, with applications to newsvendor models.
In Theorem 3.1 we present necessary and sufficient conditions for optimality of controls based on the stochastic maximum principle in the delayed case with δ > 0, by allowing retail prices to depend on the quantity of products sold between wholesaler and retailer. When demand D t is modeled by an Ornstein-Uhlenbeck process we characterise the optimal control of Player 2 (the retailer) at time t in Proposition 3.1, and we also find a characterisation of the optimal control for Player 1 (the wholesaler) at time t.
When the market price is an affine function of the output we derive an equation for the equilibrium output based on the conditional distribution function of the demand D t given the delayed information, which parallels the equation (3) in [6], see (29) below. In this setting we compute numerically the equilibrium output as a function of wholesale prices, cf. Figures 1-3. We note that the retail output remains capped at the level a/(4b) as a function of demand, whereas it remains strictly increasing in the case of fixed prices, cf. Figure 1 below and Figure 3 in [2].
In the no lag case with δ = 0, we are able to obtain closed form solutions based on the parameters of the economic environment in the affine case. We show in Proposition 4.1 that equilibrium output is always below demand, and both the wholesaler and retailer would prefer demand at least equal to a given threshold, after which profits remain constant. In general, if the demand is affected by only a short delay, decisions based on the delayed rule will be very close to the optimal decisions taken in the absence of delay. This is illustrated by the convergence of (30) to (39) below.
We also note from Figures 1-3 that the lagged case can be seen as a smoothing of the deterministic case, and that the solution of the deterministic no lag problem can be obtained as the limit of the lagged setup as δ tends to 0. However, unlike in the Cournot setup [5], we do not observe a collapse from multiple equilibria to a single equilibrium.
2. Notation and setup. We consider a sequential game in which Player 1 is selling a quantity q t ≥ 0 of products priced w t to Player 2 who in turn sells to customers according to the demand rate D t .
Let t ∈ [δ, T ], where δ ≥ 0 is the information delay. In the first stage, Player 1 chooses a price w t per unit to charge to Player 2, who in the second period orders the quantity q t ∈ R + and sells to customers according to a demand D t . Precisely, Player 1 tries to maximise the wholesaler profit function and Player 2 tries to maximise the retailer profit function where P (q t ) ≥ 0 denotes the price function, depending on output, thereby finding the optimal value q * t based on the knowledge of d t and w t . This optimum is known to Player 1, who attempts in turn to maximize the quantity w t q * t using his control variable w t in the second stage. We assume w t ≥ 0, this is without loss of generality since w t < 0 implies q t > 0 and therefore w t q t < 0, which cannot be an equilibrium.
We assert there exists a map ψ * : . This is the case if the profit function (2) is strictly concave in q t for given d t , w t .
Since Player 1 knows that Player 2 will act in this rational way, he will choose w t such that the profit function is maximal for w t = w * t . Given two performance functionals that measure the (expected) profit over the time period δ ≤ t ≤ T , an equilibrium of the game is defined as a pair (w * , ψ * ), where w * = (w * t ) t∈[δ,T ] , such that and 3. Delayed stochastic demand. We consider a demand rate D t at time t ∈ [0, T ], which is described by a controlled stochastic differential equation (SDE) of the form where µ · (d, q, w), σ · (d, q, w) are given predictable processes for each d, q, w ∈ R and (B t ) t∈[0,T ] is a standard Brownian motion that generates the P-augmented filtration F = (F t ) t∈[0,T ] on a filtered probability space Ω, (F t ) t∈[0,T ] , P . We assume that (7) has a unique solution.
The information flow available to the players is represented by the filtration where δ > 0 denotes the information delay or lag. In this lagged scenario, Player 2 maximizes the functional where P (q t ) is the market price at time t ∈ [δ, T ] when output equals q t , in order to find the optimal value ψ * t (w) based on the knowledge of the demand d t and the wholesale price w t . This optimum is known to Player 1, who attempts in turn to maximize the functional using Player 1's control variable w t , namely the per unit price he charges to Player 2.
Maximum principle. We base our analysis on the stochastic control framework developed in [1], [3]. The control processes q = (q t ) t∈[δ,T ] = ψ(w) and w = (w t ) t∈[δ,T ] belong to the family A = A (1) × A (2) , where the set A (i) of admissible control processes for Player i is made of real-valued (E t ) t∈[0,T ] -predictable processes, i = 1, 2.
We endow A (i) with the supremum norm, i = 1, 2. We note that A (i) contains functions that may take negative values, however in the sequel we will only focus on nonnegative equilibrium outputs, which are economically relevant.
In the sequel we work under the following conditions on the set A of control processes.
We now have the following necessary maximum principle for equilibria, cf. Theorem 1 in [2], where (Y for t ∈ [δ, T ]. Wholesaler / retailer equilibrium. In the sequel the profit functions f (1) : R 3 → R and f (2) : R 3 → R will be given by (1) and (2), i.e., f (1) (d t , w t , q t ) = w t q t and f (2) (d t , w t , q t ) = P (q t ) min(d + t , q t ) − w t q t . In this case, during the time period δ ≤ t ≤ T , Player 1 will get the expected profit and Player 2 will get the expected profit To find an equilibrium, we use the maximum principle of Theorem 3.1. The Hamiltonians are given by cf. (9), and H (2) t (d, w, q, a (2) , b (2) ) = P (q) min(d + , q) − w t q + a (2)  Optimal retailer quantity. By (16), the first order condition (14) becomes for the optimal values q * t . In fact, when demand is controlled, as in this case, we actually need a more advanced maximum principle, requiring more technical assumptions. The interested reader is referred to [2]. We will relax this assumption in the sequel, and we focus on the case where the equation (7) for the demand D t reduces to an exogenous SDE of the form where the coefficients µ · (d) and σ · (d) do not depend on the controls q and w. In this case, since µ t (d) and σ t (d) do not depend on q, (17) simplifies to which has at most one solution by monotonicity of the conditional expectation. We will set q * t := 0 when (19) does not admit a solution. In the sequel we denote by q t (w t ) = ψ * t (w) the solution of (19). Assuming that for every t ∈ [0, T ], the function w −→ wq t (w) has a unique maximum at w = w * t , then the equilibrium is (w * , ψ * ) and we have The maximization problem (20) can be solved for the optimal value w * t from the first order condition w * t q t (w * t ) + q t (w * t ) = 0 (21) that follows from (13).
Ornstein-Uhlenbeck setting. Henceforth we assume that with α, β, σ > 0, therefore the dynamics (7) of (D t ) t∈R+ becomes with the Ornstein-Uhlenbeck solution We note that given F t−δ , the (unknown) demand D t is Gaussian and we let Φ : R → [0, 1] and φ : R → R + respectively denote the standard Gaussian cumulative distribution and probability density functions.
Proposition 3.1. The equilibrium action q t := ψ(w t ) for Player 2 at any t ∈ [δ, T ] is the solution of the equation Proof. The equation (19) can be written as where is a Gaussian random variable independent of F t−δ . Focusing on the first expectation in (24) we have Considering the second expectation in (24), due to the relation (22) we have hence (19) boils down to (23).
By abuse of notation we will write the solution to (23) as where ψ D t−δ : R → R is the function used in Figures 2 and 3. Next, we can solve for the optimal w * t using the first order condition (21), which will yield Affine prices. In case P (q t ) = a − bq t with a, b > 0, i.e. retail prices are affine functions of output, which is a common assumption, (23) becomes In particular, when b = 0 and retail price are fixed at a, independent of q t , we have where ε is the conditional distribution function of the demand D t given F t−δ , which can be seen as an analog of the equation a − w t a in the framework of fixed retail prices, where F (·) is the cumulative distribution function of a random demand, see e.g. the equation (3) in [6].
4. Deterministic demand with no lag. In this section the respective performance functionals of Players 1 and 2 are given from (1) and (2) by and From (2) it is clear that q * t (d t , w t ) ≤ d t and so q * t exists since the functional (2) is continuous on a compact set. A sufficient but not necessary condition for q * t (d t , w t ) is given by the first order condition We can rewrite (33) as Therefore, when q t ≤ d t we have that q t solves P (q t )q t + P (q t ) − w t = 0.
If on the other hand q t > d t then we need q t to satisfy a − max (a − 2bd t , a/2) 2b = a + min (2bd t − a, −a/2) 2b = min (2bd t , a/2) 2b = min d t , a 4b .
The plots in Figures 1-4 have been made for P (q t ) = a − bq t with a, b > 0. From Figure 1 we note that, due to dynamic retail prices, the retail output remains capped at the level a/(4b) as a function of demand, whereas it remains strictly increasing in the case of fixed prices, cf. also Figure 3 in [2].     Figure 4 illustrates the equilibrium wholesaler and retailer profits respectively given by (38),