Discount-sensitive equilibria in zero-sum stochastic differential games

We consider infinite-horizon zero-sum stochastic differential games with average payoff criteria, 
discount -sensitive criteria and, infinite-horizon undiscounted reward criteria which are sensitive to the growth rate of finite-horizon payoffs. These criteria include, average reward optimality, strong 0-discount optimality, strong -1-discount optimality, 0-discount optimality, bias optimality, F-strong average optimality and overtaking optimality. The main objective is to give conditions under which these criteria are interrelated.

for i = 1, 2, u i (·) is a U i −valued stochastic process representing the control actions of player i at each time t ≥ 0.
Notation. For vectors x and matrices A we use the usual Euclidean norms where A and Tr(·) denote the transpose and the trace of a square matrix, respectively.

Assumption 1. (a)
The action sets U 1 and U 2 are compact.
Let C 2 (R n ) be the space of real-valued functions ν(x) on R n which are twice continuously differentiable. For (u 1 , u 2 ) ∈ U 1 × U 2 , and ν in C 2 (R n ), let where b i is the i-th component of b, and a ij is the (i, j)-component of the matrix a(·) defined in Assumption 1(d).
Strategies of the players. Let P(U 1 ) be the space of probability measures on U 1 endowed with the topology of weak convergence. The space P(U 2 ) is defined similarly. For our purpose, in which we study non cooperative or Nash equilibria, we can restrict ourselves to consider randomized stationary strategies defined as follows.
Definition 2.1. A stationary randomized strategy for player 1 (resp. player 2) is a probability measure on U 1 (resp. U 2 ).

Remark 2.
In general, the set of deterministic (ordinary) strategies for zero-sum stochastic differential games (SDGs) is such that, except for a quite restricted class of games (such as scalar linear SDGs, see [4,20], and the references therein), one cannot assure the existence of a Nash equilibrium in the set of ordinary strategies for the players, see for instance [29]. By this reason, we enlarge the sets of ordinary strategies to include randomized strategies so that an equilibrium can be found in this new set.

BEATRIS A. ESCOBEDO-TRUJILLO
Thecnically, our hypotheses ensure the existence of Nash equilibria for the αdiscounted criteria (section 3), as well as, for average criterion (section 4) in the class of stationary strategies for all players (see, for instance [3,7,22] ) which is crucial in our study. Further, it is worth to mention that recurrence and ergodicity properties of the state system (1) can be easily verified through the use of stationary strategies, but for a more general class of strategies, this can be hard to handle.
The interpretation of a stationary randomized strategy is as follows. If player 1 observes the system (1) at time t ≥ 0, then player 1 chooses his/her action in U 1 according to the probability measure φ ∈ P(U 1 ).

2.2.
Recurrence and ergodicity. Next assumption (a Lyapunov-like condition) guarantees the positive recurrence of the diffusion (1).
Definition 2.2. Let w be the function in Assumption 2. Let B w (R n ) be the normed linear space of real-valued measurable functions ν on R n with finite w−norm, which is defined as Assumption 3. Suppose that for every (φ, ψ) ∈ P(U 1 ) × P(U 2 ), the process x φ,ψ (·) in (1) is uniformly w−exponentially ergodic, that is, there exist constants C > 0 and δ > 0 such that for all x ∈ R n , t ≥ 0, and ν ∈ B w (R n ). Sufficient conditions for the exponential ergodicity of the process x φ,ψ (·) can be seen in [16,Theorem 2.7].
2.3. Reward rate function. Let r : R n × U 1 × U 2 → R be a measurable function, which we call the payoff (or reward/cost) rate function; that is, r is the reward rate function for player 1, and it is interpreted as the cost rate function for player 2. The payoff rate satisfies the following conditions: The function r(x, u 1 , u 2 ) is continuous on R n × U 1 × U 2 and locally Lipschitz in x, uniformly with respect to (u 1 , u 2 ) ∈ U 1 × U 2 ; that is, for each R > 0, there exists a constant K(R) > 0 such that (c) r(x, u 1 , u 2 ) is upper semicontinuous (u.s.c) and concave in u 1 ∈ U 1 for every (x, u 2 ) ∈ R n × U 2 , and lower semicontinuous (l.s.c) and convex in u 2 ∈ U 2 for every (x, u 1 ) ∈ R n × U 1 .
When players use the pair of strategies (φ, ψ) ∈ P(U 1 ) × P(U 2 ) we write the payoff rate r as in (3), that is, The next lemma provides important facts. Lemma 2.3. Under Assumptions 1 and 4, for each fixed h ∈ C 2 (R n ) ∩ B w (R n ) and x ∈ R n , the functions r(x, φ, ψ) and L φ,ψ h(x) are upper semicontinuous (u.s.c) in φ ∈ P(U 1 ) and lower semicontinuous (l.s.c) in ψ ∈ P(U 2 ).
Proof. Under assumptions 1(b) and 4(c) functions r(x, u 1 , u 2 ) and b(x, u 1 , u 2 ) are upper semicontinuous in u 1 ∈ U 1 , lower semicontinuous in u 2 ∈ U 2 , and bounded on U 1 ×U 2 for each x ∈ R n . Hence, by the definition of weak convergence of probability measures, the result follows.
Now, let w be the function in Assumption 2. In addition to the space B w (R n ) in Defintion 2.2 we consider the space B w (R n × U 1 × U 2 ), which consists of the real-valued measurable function v on R n × U 1 × U 2 such that where M v is a positive constant depending of v. By Assumption 4(b), the payoff rate r belongs to 3. Discounted optimality. In this section, we are interested in existence of Nash equilibria for α−discounted criteria. This criteria is defined as follows.
The following result shows a bound of the α−discounted-v-payoff (11) in a certain sense. We will omit its proof, because it is a direct consequence of (7) and (8).
Proposition 1. Assumption 2, (7) and (10) Here, c and d are as in Assumption 2, and M v is the constant in (10).
Definition 3.3. The α−discounted game value is defined as the common value (13), that is, = inf Definition 3.4. We say that a function v α in C 2 (R n ) ∩ B w (R n ) and a pair of strategies (φ * , ψ * ) ∈ P(U 1 ) × P(U 2 ) verify the α-discount Bellman equations if for all x ∈ R n .
Our next theorem states the existence of a saddle point equilibria for the αdiscounted stochastic differential game. Moreover, it also ensures that the value of the game given in (14) verifies the so-named α−discount Bellman equations (15)- (17). Its proof is given in [3, Theorem 2.1].

Average optimality.
In this section we show existence of Nash equilibria for average criteria, this criteria is defined as follows: be the v− total expected payoff of (φ, ψ) over the time interval [0, T ], when the initial state is x ∈ R n . The v−payoff of (φ, ψ) given the initial state x is For v ≡ r, (19) is called long -run average payoff (also known as the ergodic payoff ).

Definition 4.2.
We say that a pair of strategies (φ * , ψ * ) ∈ P(U 1 ) × P(U 2 ) is average optimal (abbreviated AO) (also known as a saddle point for the average payoff criterion) if As for the discounted case, our assumptions also imply that Isaac's condition hold.
Definition 4.3. The game value for average payoff is given by and a pair of strategies (φ * , ψ * ) ∈ P(U 1 )×P(U 2 ) verify the average payoff optimality equations (also known as the average payoff Bellman equations) if, for every x ∈ R n , In this case, the pair of strategies (φ * , ψ * ) ∈ P(U 1 ) × P(U 2 ) that satisfies (23)- (25) is called a pair of canonical strategies (CS).
Next result, which can be found in [3, Theorem 2.2] and [7, Theorem 4.1], establishes the equivalence between an average optimal strategy and a canonical strategy. More precisely, [3] deals with the existence of a constant J, a function h and a Nash equilibrium for the long-run average payoff such that (23)-(25) holds, and [7] proves the equivalence between an average optimal strategy and a canonical strategy.
Theorem 4.5. If Assumptions in Theorem 3.5 hold, then:  5. F-strong average optimality. The main objective in this section is to prove that F-strong average optimality and average optimality are equivalent concepts. Next definition generalizes the concept of F-strong average given in [13] for discrete-time Markov control processes on Borel spaces. For this end, let J T (x, φ, ψ, r) be as in (18) with v ≡ r.
hence, by (29) and (30), (φ * , ψ * ) is an average optimal. Now, suppose that (φ * , ψ * ) is average optimal, then by Definition 4.2 But, we know that So, (26) follows from (31) and (32). By a similar argument, we can obtain the equation (27) implying that (φ * , ψ * ) is F-strong average optimal. DIAGRAM 2. As a consequence of Theorem 4.5 and 5.2, we have We conclude this section with a generalization of the m−discount optimality concepts given in [17]. and It is evident that strong m−discount optimality implies m−discount optimality. In this article it concerned with the cases m = −1, 0.
6. Strong -1-discount optimality. Our main objective in this section is to prove that strong −1− discount optimality and average reward optimality are equivalent concepts (Theorem 6.5). To this end, (a) we define the discrepancy function (37); (b) the α-discount payoff with v ≡ r is expressed in terms of the constants (J, h) that satisfy the average payoff optimality equations (23)-(25) (Lemma 6.2); (c) some properties of V α (x, φ, ψ, v) are given (Lemma 6.3); (d) Lemma 6.4 will provide some important facts. We prove each of these lemmas in the sequel. Definition 6.1. Let (J, h) be a pair satisfying the average payoff optimality equations (23)- (25). We define the discrepancy function ∆ as Remark 7. Let (φ * , ψ * ) be an average optimal strategies. By (24), Similarly, using the equation (25), we obtain Moreover, by Theorem 4.5(iii) the average optimal strategies satisfies the average payoff optimality equations, in particular, equation (23), then The following lemma relates the expected α−discounted reward criteria for the different reward rates ∆, h and r. Proof.
Taking v ≡ ∆ in equation (11) we have On the other hand, an application of Dynkin's formula to e −αt h(x(t)), t ≥ 0, gives taking limit as t → ∞ in this expression yields Finally, the stated result follows from (42) and (43). Lemma 6.3 (below) gives important facts that we are going to use to prove equivalence between concepts average optimality and strong -1-discount optimality.
Proof. (a) Pick an arbitrary strategy (φ, ψ) ∈ P(U 1 ) × P(U 2 ) and note that where the second inequality follows from (8). This implies the result.
7. Strong 0-discount optimality. The main objective in this section is to prove that under our Assumptions: (a) strong 0−discount optimality implies bias optimality, and (b) bias optimality implies 0−discount optimality. To this end, we introduce the bias optimality concept.

Definition 7.2. (Bias optimality) We say that a pair of average optimal strategies
for every x ∈ R n and every pair of average optimal strategies (φ, ψ). The function h φ * ,ψ * is called the optimal bias function.
Proof. Suppose that (φ * , ψ * ) is strong 0-discount optimal but it is not bias optimal. Now, we observe that for all φ ∈ P(U 1 ).
A criterion which is "sensitive" to the growth rate of the finite-horizon reward/cost is the criterion of overtaking optimality. We define it because of its relationship with the bias optimality criterion. Definition 7.5. A pair of strategies (φ * , ψ * ) ∈ P(U 1 ) × P(U 2 ) is said to be overtaking optimal if for each (φ, ψ) ∈ P(U 1 ) × P(U 2 ) and x ∈ R n we have Remark 9. (a) Comparing inequalities given in the Definitions 4.2 and 7.5 together with (19), we can see that if (φ * , ψ * ) ∈ P(U 1 ) × P(U 2 ) is overtaking optimal in P(U 1 ) × P(U 2 ), then it is average optimal. (b) Let (φ * , ψ * ) and (φ, ψ) be average optimal strategies. Then, a simple use of the Definition 4.2 gives Remark 10. Authors in [7] identify the set of average optimal strategies and then, within this set, they look strategies that, for instance, maximize/minimize the bias (7.2). Finally, they show that overtaking optimality implies bias optimality [7, Theorem 6.1], but the converse holds only in the class of average optimal strategies. In our framework, the relationship between overtaking optimality and bias optimality is similar to the obtained by authors in [7]. In fact, applying Dynkin's formula to the bias function, h φ,ψ , and using the Poisson equation (62) we obtain the following equality.

DIAGRAM 5.
Overtaking optimal in AO Strong 0-discount optimal =⇒ Bias optimal =⇒ 0-discount optimal 8. Stochastic differential games with additive structure. To obtain that bias optimality implies strong 0−discount optimality, in this section we work with a class of stochastic differential games with additive structure and bounded coefficients. This kind of games has been recently studied in [8,18,23]. The following additional assumption is required.
Assumption 5. (a) There exist measurable functions b 1 : R n × U 1 → R and b 2 : R n × U 2 → R with the same properties from b given in Assumption 1 (continuity, linear growth and Lipschitz), such that the diffusion coefficient b is given by There exist bounded measurable functions v 1 : R n ×U 1 → R and v 2 : R n ×U 2 → R with the same properties from r given in Assumption 4 such that the function v in (10) can be expressed as (c) There exist density functions p φ (t, x, y) and p ψ (t, x, y) with the same properties from p φ,ψ (t, x, y) such that Throughout this section, by Assumption 5 the expected α−discounted-v-payoff (Definition 11) and the v−payoff (19) are expressed as follows.
Remark 11. All the results in previous sections are valid without the Assumption 5.
8.1. Bias optimality implies strong 0−discount optimality. In this subsection, we prove that if the game has additive structure and bounded coefficients, then, bias optimality implies strong 0-discount optimality.
Remark 12. If one of the players, say player 2, fixed a strategy ψ ∈ P(U 2 ), then we have a control problem for player 1. We say that φ * ∈ P(U 1 ) is an optimal response to ψ ∈ P(U 2 ) if Similarly, we say that ψ * ∈ P(U 2 ) is an optimal response to φ ∈ P(U 1 ) if It is easy verify that if φ * is an optimal response to ψ * and, conversely, then (φ * , ψ * ) is α-discount optimal.
The following result plays an important role in our analysis. Its proof is given in the Appendix.
Proof. Suppose that (φ * , ψ * ) is bias optimal. Then, Lemma 8.2(b) gives Similar arguments show that Hence, by (86) and (87), (φ * , ψ * ) is strong 0−discount optimal. 9. Appendix: Proof Lemma 8.1. Throughout this section, we work with a zerosum stochastic differential game with additive structure. The proof of Lemma 8.1 is based on a result in [18], which is an extension to zero-sum stochastic differential games of Lemma A.16 in Arapostathis and Borkar [1]. We next write in our present setting the theorems in [18], and then we verify that the hypotheses in these theorems indeed hold . To this end, we introduce the following notation.
Let O be a bounded domain in R n , i.e., an open and connected subset of R n and denote the closure of this set byŌ. For every x ∈ R n , (u 1 , u 2 ) in U 1 × U 2 , α > 0, and h in W 2,p (O), let with a as in Assumption 1(d) and b and v ≡ r as in Assumption 5. We denotê .
Definition 9.4. (Weak topology on P(U 1 )). Let C b (U 1 ) be the space of continuous bounded functions on U 1 . A sequence {φ m } in P(U 1 ) is said to converge weakly to φ ∈ P(U 1 ), and we denote such convergence as for all h ∈ C b (U 1 ). Similarly, we define the convergence in P(U 2 ).

BEATRIS A. ESCOBEDO-TRUJILLO
Then there exist a function h ∈ W 2,p (O) and a subsequence {m k } ⊂ {1, 2, ...} such that h m k → h in the norm of C 1,β (Ō) for β < 1 − n p as k → ∞. Moreover, The following result deals with the convergence presented in (5).
Theorem 9.6. Assume that conditions of Theorem 9.5 are satisfied except that now we replace condition (a) of that theorem by the following: The following result is analogous to Theorem 9.6.
Theorem 9.7. Assume the hypotheses of Theorem 9.5, except for condition (a), which we replace with: (a"):L 2,φm αm h m = ξ m in O for m = 1, 2, ... Then, there exist a function h ∈ W 2,p (O), and a subsequence {m k } ⊂ {1, 2, ...} satisfying h m k → h in the norm of C 1,β (Ō) for β < 1 − n p as k → ∞. Moreover, L 2,φ α h = ξ in O. 9.1. Proof Lemma 8.1. The vanishing discount technique. We will prove existence of a Nash equilibrium for the long-run average payoff (19) with v ≡ r (r = r 1 + r 2 ) using the so-called vanishing discount approach. The idea is to impose conditions on an associated α-discounted payoff model in such a way that, when α ↓ 0, we obtain average payoff optimality equations (23)- (25). To this end, when (φ αm , ψ αm ) becomes a Nash equilibrium for the α m − discounted payoff, we define: The proof of the following proposition is the same as the proof of Proposition 4.4 in [18], therefore is omitted.
Proof Lemma 8.1. Let (φ αm , ψ αm ) be a Nash equlibrium for the α m -discounted payoff. Then, the pair (φ αm , ψ αm ) satisfies equations (15)- (17) in R n for each m = 1, 2, . . .. Replacing h αm given in (8)  r (x, φ m , ψ) + L φm,ψ h αm (x) − α m h αm (x) (9) In terms of the operators in (1) as well as It is easy to verify that results in Proposition 2 together with equation (9) yield that the hypotheses established in Theorems 9.5, 9.6 and 9.7 hold. Then, by invoking this theorem, and noting (10) and (11), we can claim the existence of a function h ∈ W 2,p (B R ) such that the following is satisfied h αm → h as α m → 0 uniformly on B R . Further, r (x, φ * , ψ) + L φ * ,ψ h(x) Last treatment can be extended for all x ∈ R n because R > 0 was arbitrary. Finally, since previous convergence was uniform, we get that for each > 0, there exists a natural number N such that, for all m ≥ N ,
implying that h is in B w (R n ).
It only remains to prove that (φ * , ψ * ) is indeed a Nash equilibrium for the long run average payoff criterion with additive structure.To do that, first notice that the third equality in (12), shows that g ≤ r (x, φ * , ψ) + L φ * ,ψ h(x); x ∈ R n , ψ ∈ P(U 1 ).
diagram 6 and 7 is that in this last bias optimality implies strong 0-discount optimality (Theorem 8.3). DIAGRAM 6. Games without additive structure and unbounded coefficients.