TIME-INCONSISTENT RECURSIVE ZERO-SUM STOCHASTIC DIFFERENTIAL GAMES

. In this paper, a kind of time-inconsistent recursive zero-sum stochastic diﬀerential game problems are studied by a hierarchical backward se- quence of time-consistent subgames. The notion of feedback control-strategy law is adopted to constitute a closed-loop formulation. Instead of the time- inconsistent saddle points, a new concept named equilibrium saddle points is introduced and investigated, which is time-consistent and can be regarded as a local approximate saddle point in a proper sense. Moreover, a couple of equilibrium Hamilton-Jacobi-Bellman-Isaacs equations are obtained to characterize the equilibrium values and construct the equilibrium saddle points.


1.
Introduction. Let (Ω, F, F, P) be a complete filtered probability space on which a d-dimensional standard Brownian motion W (·) is defined, and F = {F t } t≥0 is its natural filtration (augmented by all the P-null sets). Let T > 0 be a given terminal time. For any t ∈ [0, T ] regraded as an initial time, we denote the set of all possible initial states by 1052 QINGMENG WEI AND ZHIYONG YU control processes for Player i (i = 1, 2) on [t, T ] by and E T t |u i (r)| 2 dr < ∞ with U i ⊆ R mi being a nonempty set which could be bounded or unbounded.
(1. 3) In the game, Player 1 wants to maximize the functional J 0 and Player 2 aims to minimize it. Therefore, the criterion functional J 0 can be regarded as a payoff for Player 1 and a cost for Player 2.
In a special case where the mapping g 0 is independent of (y, z), i.e.
g 0 (r, x, u 1 , u 2 , y, z) = g 0 (r, x, u 1 , u 2 ), the BSDE (1.2) is reduced to be a trivial one and the criterion functional reads J 0 t, ξ; u 1 (·), u 2 (·) = E t h 0 (X(T )) + T t g 0 r, X(r), u 1 (r), u 2 (r) dr , for simplicity. This is the classical Bolza type criterion functional. While, in the general case, J 0 defined through the BSDE (1.3) is called a recursive criterion functional. Using BSDEs to describe recursive functionals was originated from the financial background. For more details about recursive criterion functionals, please refer to Duffie-Epstein [4], Wei-Yong-Yu [15], and the references therein.
In the above, the admissible controls are defined in the open-loop form. For the requirement of numerous practical problems, some kinds of closed forms of admissible controls and strategies are desired. Now we give some definitions.
Similarly, we can define an admissible feedback control law u 2 for Player 2.
The notion of strategy is widely used in the game theory to characterize the changing of control of a player when the control of his/her opponent changes. A more general Elliott-Kalton type non-anticipative strategies ( [7]) and some related literatures are recalled in the next section. In this paper, we combine the "strategy against control" setting and "feedback" mechanism to constitute a kind of closedloop form. (1.5) The control-strategy law (u 1 , α 2 ) (resp. (α 1 , u 2 )) satisfying (1.4) (resp. (1.5)) is called a saddle point with form (I) (resp. form (II)).
In the next section, we shall derive that Problem (C-SDG) has an important property: there exists a saddle point (u 1 , α 2 ) with form (I) (resp. (α 1 , u 2 ) with form (II)) on a given time interval [t, T ] which is still a saddle point on the small time interval [s, T ] for any s ∈ (t, T ). Such a property is called the time-consistency of Problem (C-SDG). For a precise statement and a proof, please see Remark 1 and Theorem 2.4.
Although time-consistency is very good for mathematical treatments, but it is too ideal to be satisfied by most practical problems. Among various reasons leading to the time-inconsistency, a typical one is people's subjective time preference. As a matter of fact, people usually discount more on the utility for the outcome of immediate future events. Mathematically, such a situation can be described by the so-called non-exponential discounting. As suggested in [15,16,17,18,19], to incorporate the non-exponential discounting into the game problem, one may consider the following recursive criterion functional: where (Y (·), Z(·)) is the unique solution to the following BSDE: (1.7) Here, we introduce a notation 8) and the deterministic mappings g and h are defined on D[0, T ] × R n × U 1 × U 2 × R × R 1×d and [0, T ] × R n , respectively. We notice that, comparing with (1.2), the initial time t is introduced into the new BSDE (1.7) as a parameter in order to characterize the time preference in the criterion functional. A typical example of functional J defined by (1.6) is a kind of recursive utility/disutility involving the so-called hyperbolic discounting, in which g t, r, x, u 1 , u 2 , y, z = 1 (1.10) Similarly, the control-strategy law (u 1 , α 2 ) (resp. (α 1 , u 2 )) satisfying (1.9) (resp. (1.10)) is called a saddle point with form (I) (resp. form(II)).
Let us do a simple analysis. If there exists a saddle point satisfying the timeconsistent property, then, when we fixed the feedback control laws or the feedback strategy laws, Problem (InC-SDG) is reduced to be a family of recursive stochastic optimal control problems which was studied in Wei-Yong-Yu [15] (see also Yong [17] for a special case), and the time-consistent saddle point is reduced to be a timeconsistent optimal control. However, the results in [17,15] point out the family of optimal control problems are time-inconsistent in general. This contradiction shows that Problem (InC-SDG) is also time-inconsistent.
By now, the time-inconsistent optimal control problems have attracted many researches. Instead of the time-inconsistent optimal controls, the time-consistent equilibrium controls are introduced to deal with time-inconsistent optimal control problems. One major method to investigate such problems is the Stackelberg type multi-person differential games approach, which can be traced back to Pollak [14] in 1968. Later, this approach was further developed by Ekeland-Lazrak [5], Yong [16,17,18,19], Hu-Jin-Zhou [10], Björk-Murgoci [1], Björk-Murgoci-Zhou [2], and so on for various kinds of time-inconsistent optimal control problems.
In this paper, we aim to study time-inconsistent differential game problems (Problem (InC-SDG)). Since the saddle point in the classical sense is no longer time-consistent, inspired by the concept of equilibrium control proposed for the time-inconsistent optimal control problems, we shall suggest a new concept named equilibrium saddle point which is time-consistent and has some properties of local saddle point to characterize Problem (InC-SDG). Meanwhile, we shall develop the multi-person differential games approach (which is for time-inconsistent optimal control problems) to a new one called backward sequence of time-consistent subgames to investigate Problem (InC-SDG).
We explain the new method briefly. Firstly, we divide the whole time interval For each k = 1, 2, . . . , N , there is a pair of players (the k-th pair of players, which can be regarded as the future selves of Player 1 and Player 2) who control the system and play a subgame on [t k−1 , t k ). In the subgames sequence, the k-th pair of players take over the system at time t k−1 from the (k − 1)-th pair of players, and hand it over to the (k + 1)-th pair of players at t k . Although the k-th pair of players will not control the system on [t k , T ], they will still "discount" the future payoffs/costs in their own way, which can be interpreted as the time-preference feature of the problem ( [17,18,19]). Therefore, the criterion functional for the k-th pair of players is "sophisticated" and recursive. In detail, the sophisticated recursive criterion functional for the k-th pair of players is defined by a BSDE on [t k−1 , t k ], whose coefficient/generator depends on its initial pair (t k−1 , X k (t k−1 )) with X k (t k−1 ) equaling to X k−1 (t k−1 ) (the terminal state of the (k − 1)-th pair of players) and whose terminal value at t k equals to Θ k (t k , X k (t k )) with X k (t k ) being the terminal state of the k-th pair of players. The function Θ k (·, ·) is constructed based on the assumption that later players will play at the saddle point with respect to their sophisticated recursive criterion functionals. There will be a standard timeconsistent recursive stochastic differential game (SDG, for short) problem for each pair of players on each subinterval. The verification theorem for time-consistent recursive SDG problems allows us to find a saddle point on each subinterval. These saddle points on all subintervals constitute a partition-dependent equilibrium saddle point for the whole sequence of subgames. At the same time, we also obtain the partition-dependent equilibrium lower and upper value functions for the sequence of subgames.
Next, by letting the mesh size of the partition tend to zero, the partitiondependent equilibrium saddle point will bring us a time-consistent equilibrium saddle point of Problem (InC-SDG), and the partition-dependent equilibrium lower (resp. upper) value function will bring us an equilibrium lower (resp. upper) value function of Problem (InC-SDG) (The definitions of equilibrium lower and upper value functions will be given in Section 3). A couple of so-called equilibrium (lower and upper) Hamilton-Jacobi-Bellman-Isaacs equations (HJBI equations, for short) are obtained to characterize the couple of value functions. We also establish the well-posedness of the equilibrium HJBI equations when the control processes u 1 (·) and u 2 (·) do not enter the diffusion term σ of the state equation (1.1). The general case that σ depends on u 1 (·) and u 2 (·) is still under our investigation.
We summarize the main innovations and difficulties that have been overcome in this paper. (1) As far as we know, it is the first time to study the time-inconsistent SDG problems. (2) As the basis of studying time-inconsistent problems, the theory of the corresponding time-consistent problems plays an important role. The main difficulties in the paper arise when we consider the time-consistent zero-sum SDG problems. To overcome them, we employ a control-strategy law framework and focus on the existence of saddle points instead of values of the games. With the help of some delicate analysis techniques, we establish successfully a verification theorem (see Theorem 2.4) for the time-consistent zero-sum SDG involving recursive/differential utility. (3) A backward sequence of time-consistent subgames approach is introduced to deal with Problem (InC-SDG). We believe this method can also be used to solve some other time-inconsistent SDG problems.
The rest of this paper is organized as follows. We make some preliminaries in Section 2. We first recall the zero-sum SDGs involving Elliott-Kalton type admissible strategies. Then a stochastic verification theorem is established for Problem (C-SDG). In Section 3, for Problem (InC-SDG), a new concept named equilibrium saddle point is proposed. In Section 4, we introduce a backward sequence of time-consistent subgames, and get the local saddle point for each subgame problem. In Section 5, by letting the mesh size of the partition go to zero, we obtain the time-consistent equilibrium saddle points and the equilibrium HJBI equations characterizing the equilibrium value functions. Sections 3, 4 and 5 mainly focus on the equilibrium lower saddle points and the equilibrium lower HJBI equation. A similar analysis could lead to the corresponding results on the equilibrium upper saddle points and the equilibrium upper HJBI equation, which are presented in Section 6.

2.
Preliminaries. Firstly, we present some notations and assumptions which will be frequently used in the rest of this paper. For any Euclidean space M, we introduce a couple of spaces: is an F-progressively measurable process with continuous paths and satisfies E sup For the mappings b, σ, g, and h appearing in (1.1) and (1.7), we introduce the following assumptions.
Besides the saddle points defined in the statement of Problem (C-SDG), we also recall some other definitions in game theory including admissible strategies, values, and value functions.
1. An admissible strategy (also called an Elliott-Kalton type nonanticipative strategy [7]) for Player 1 is a mapping α 1 : T ] for Player 2 is defined in the same way. The set of all admissible strategies for Player i is denoted by A i [t, T ] (i = 1, 2).
In the Elliott-Kalton "strategy against control" setting, when the maximizing Player 1 chooses an admissible control u 1 (·), the minimizing Player 2 will choose an admissible strategy α 2 [·]. Asymmetrically, when the minimizing Player 2 chooses an admissible control u 2 (·), the maximizing Player 1 will choose an admissible strategy α 1 [·]. Additionally, if α j is selected to be an admissible feedback strategy law for Player j, it is easy to check that u i (·) → α j (·,X(·), u i (·)) is an admissible strategy for Player j (j = 1, 2).
exists, then it is called the lower value of Problem (C-SDG) with the initial pair (t, ξ). If the F t -measurable random variable exists, then it is called the upper value of Problem (C-SDG) with (t, ξ). If both the lower value and the upper value exist and equal, we call this common value the value of Problem (C-SDG) with (t, ξ). Moreover, if there exists a function then V − (resp. V + ) is called the lower (resp. upper) value function of Problem (C-SDG).
Due to this, a saddle point with form (I) is also called a lower saddle point.
there exists an admissible feedback control-strategy law (α 1 , u 2 ) satisfying (1.5), then the corresponding upper value exists. Moreover, Similarly, a saddle point with form (II) is also called an upper saddle point.
Proof. We only prove the conclusion (i), and the proof of (ii) is the same. On the one hand, by the first equation in (1.4), Therefore, Then, On the other hand, by the second equation in (1.4), and noticing that u 1 (·) → α 2 (·,X(·), u 1 (·)) is a special admissible strategy for Player 2, we have We finish the proof.
The time-consistent differential games have been extensively researched. Among the rich literatures, we would like to mention the following results which are more related to our present work. In 1989, Fleming-Souganidis [9] adopted the Elliott-Kalton type "strategy against control" setting to study zero-sum SDGs for the first time. They proved the celebrated dynamic programming principle holds true and the lower value and upper value functions are the unique viscosity solutions to the associated HJBI equations. Their work generalized that of Evans-Souganidis [8] from the deterministic framework to the stochastic one. Later, Buckdahn-Li [3] further generalized the work in [9] to the zero-sum SDGs with recursive criterion functionals. The researches in the above mentioned works focused on the existence of lower and upper values. Clearly, in the view of Proposition 2.3, when a lower (resp. upper) saddle point exists, the lower (resp. upper) value must exist. On the other hand, in general, one should not expect the existence of the lower (resp. upper) value implies the existence of a lower (resp. upper) saddle point. Recently, for the time-consistent zero-sum SDGs in the linear-quadratic case, Yu [21] constructed explicitly a lower (resp. upper) saddle point in the control-strategy law form by virtue of an associated Riccati equation. In the rest of this section, we will focus on the existence and presentation of saddle points of Problem (C-SDG) which is much more general comparing with the model studied in [21]. Now, we introduce a couple of partial differential equations (PDEs, for short) named HJBI equations as follows: and , v xx (·, ·) exist and are also continuous .
Proof. Since the proofs of (i) and (ii) are similar, then we only prove (i).
Remark 1. For a given initial pair (t, ξ) ∈ D, let (u 1 , α 2 ) be defined by (2.10), and letX(·) be the corresponding state process with the initial pair (t, ξ) under the feedback control-strategy law (u 1 , α 2 ). For any s ∈ [t, T ], we denote It follows from the above verification theorem that, for any s ∈ (t, T ), In the special case where U 1 and U 2 are the sets containing only one element, the verification theorem is reduced to be the following result, named a nonlinear Feynman-Kac formula, which was introduced and well studied by Peng [13], Pardoux-Peng [12], Ma-Protter-Yong [11], and so on.

27)
where, for simplicity, we use the notation H in (2.1) omitting τ , u 1 and u 2 . We also introduce a family of FBSDEs parameterized by (t, ξ) ∈ D:      dX(r) = b(r, X(r))dr + σ(r, X(r))dW (r), r ∈ [t, T ], dY (r) = −g(r, X(r), Y (r), Z(r))dr + Z(r)dW (r), r ∈ [t, T ], (2.28) Then, Θ(t, ξ) = Y (t; t, ξ), a.s., ∀ (t, ξ) ∈ D. (2.29) 3. Time-inconsistent zero-sum stochastic differential game. This section focuses on Problem (InC-SDG). We recall from Section 1 that, the difference between Problem (C-SDG) and Problem (InC-SDG) is the appearance of two time variables in the functions g and h, which is the resource of time-inconsistent. In Problem (InC-SDG), the classical saddle point does not keep the time-consistency as time goes by. Therefore, instead of the classical saddle point, we shall propose a new notion called the equilibrium saddle point (which will be proved to be time-consistent) to fit Problem (InC-SDG). The new notion can be regarded as the counterpart of the equilibrium control in the study of time-inconsistent optimal control problem (see [17,15] for example).

4.
A backward sequence of time-consistent subgames. In this section, we shall carry out the idea proposed in Section 1. Let Π : t = t 0 < t 1 < · · · < t N −1 < t N = T be a partition of [t, T ], and denote the forthcoming associated backward sequence of time-consistent subgames by Problem (G) Π . There are N pairs of players performing in the N stochastic differential game problems on [t k−1 , t k ), 1 ≤ k ≤ N , in total. We will denote by Player k 1 and Player k 2 the first and second one in the k-th pair of players (1 ≤ k ≤ N ), respectively.
Let us begin with the last game problem on the last time interval [t N −1 , t N ].

4.1.
The N -th pair of players -a classical zero-sum differential game.
On the last interval [t N −1 , t N ], the controlled system for the N -th pair of players is described by The N -th pair of players take over the system from the previous (N −1)-th pair of players. Therefore, the initial state ξ N −1 of the N -th pair of players is just the terminal state of the (N − 1)-th pair of players. The recursive criterion functional is given by (4.2) The differential game problem for the N -th pair of players is the following (Ω; R n ), find a lower saddle point, i.e. an admissible feedback control-strategy law (u Π 1 , α Π 2 ) such that Here, t N −1 appearing in the functions g and h is a time parameter, which does not lead to any essential difference from the classical Problem (C-SDG). Then we can apply the verification theorem (see Theorem 2.4) developed in Section 2 to solve Problem (C N ). Under some mild conditions, the following HJBI equation admits a unique smooth solution V Π− (·, ·) ∈ C 1,2 ([t N −1 , t N ]×R n ). Let the functions ϕ 1 and ψ 2 be given in Assumption (A2)-(i), and define By Theorem 2.4, (u Π 1 , α Π 2 ) is a lower saddle point to Problem (C N ) with the initial pair (t N −1 , ξ N −1 ). More detailed, let (X N (·),Ȳ N (·),Z N (·)) be the solution to the following FBSDE: where, for simplicity of notations, we denote by and so on, thenX(·) is the corresponding state under (u Π 1 , α Π 2 ), and the corresponding lower value is given by

4.2.
The (N − 1)-th pair of players -a sophisticated differential game. The (N −1)-th pair of players only control the system on [t N −2 , t N −1 ), and they will hand the system over to the N -th pair of players at time t N −1 . Moreover, although it is known by the (N − 1)-th pair of players that the N -th pair of players will act by carrying out the lower saddle point (u Π 1 , α 2 ) in (4.4), due to the subjective time-preference, the (N − 1)-th pair of players still "discount" the future payoffs (or costs) in their own way.
Therefore, based on the above viewpoint, the controlled system of the (N − 1)-th pair of players is is the control process of Player (N − 1) i , i = 1, 2, respectively. Different from Problem (C N ), the criterion functional for the (N − 1)th pair of players is a bit complex. Precisely, we define the sophisticated recursive criterion functional for the (N − 1)-th pair of players as follows: (4.7) The differential game problem for the (N − 1)-th pair of players is posed as (Ω; R n ), find a lower saddle point, i.e. an admissible feedback control-strategy law (u Π 1 , α Π 2 ) such that Since on the time interval [t N −1 , t N ], the controls are fixed to obey the law (u Π 1 , α Π 2 ), then we would like to use the nonlinear Feynman-Kac formula to simplify Problem (C N −1 ). For this aim, we introduce the following PDE: (4.8) If the above PDE admits a classical solution Θ N −1 (·, ·) ∈ C 1,2 ([t N −1 , t N ] × R n ), then Y N −1 (t N −1 ) admits the following representation: Consequently, the controlled system can be restricted on [t N −2 , t N −1 ] as follows: (4.9) and Problem (C N −1 ) with the initial state ξ N −2 turns out to be a standard recursive zero-sum stochastic differential game problem on [t N −2 , t N −1 ).

4.3.
The k-th pair of players and equilibrium saddle point of Problem (G Π ). Generally, the k-th pair of players, who take over the system from the (k−1)th pair of players, control the system on [t k−1 , t k ), and then hand it over to the (k + 1)-th pair of players at time t k .
Suppose that the equilibrium lower saddle point (u Π 1 , α Π 2 ) of Problem (G Π ) on [t k , t N ] has been constructed. Although the k-th pair of players know that the future players will control the system obeying the law (u Π 1 , α Π 2 ) on [t k , t N ], but they "discount" in their own way. According to this viewpoint, for any pair of admissible , we have the following FBSDE: (4.14) The associated sophisticated criterion functional of the k-th pair of players is given by (4.15) And then the zero-sum game problem for the k-th pair of players is formulated as (Ω; R n ), find a lower saddle point, i.e. an admissible feedback control-strategy law (u Π 1 , α Π 2 ) such that (4.16) Now, by the same method in Subsection 4.2, we give the solution to Problem (C k ). Let Θ k (·, ·) ∈ C 1,2 ([t k , t N ] × R n ) be a classical solution to the following PDE: Let V Π− (·, ·) ∈ C 1,2 ([t k−1 , t k ) × R n ) be a classical solution to the following HJBI equation: The nonlinear Feynman-Kac formula (Corollary 1) and the verification theorem (Theorem 2.4) imply that (u Π 1 , α Π 2 ) is a lower saddle point for the k-th pair of players on [t k−1 , t k ). Let (X k (·),Ȳ k (·),Z k (·)) be the solution to the following FBSDE: (4.20) thenX k (·) is the corresponding state under (u Π 1 , α Π 2 ) on [t k−1 , t k ), and the corresponding lower value is Additionally, the equilibrium lower saddle point of Problem (G Π ) is extended to the time interval [t k−1 , t N ]: In the above, functions V Π− (·, ·) and Θ k (·, ·) (1 ≤ k ≤ N − 1) are constructed recursively. Through a careful observation on (4.17), (4.18), and (4.19), we find V Π− (·, ·) and Θ k (·, ·) can be represented in a unified form. In detail, firstly, for any then the extended function Θ k : [t k−1 , t N ] × R n satisfies the following PDE: (4.23) Secondly, we sum up all Θ k (·, ·) from k = 1 to N by introducing a new time variable τ in the following way: (4.24) With the help of the following notations H Π (τ, r, x, u 1 , u 2 , θ, p, P ) = tr a(r, x, u 1 , u 2 )P + b(r, x, u 1 , u 2 ), p + g(τ, r, x, u 1 , u 2 , θ, p σ(r, x, u 1 , u 2 )), Θ Π (·, ·, ·) satisfies the following PDE: (4.27) Similarly, the strategy law α Π 2 in the lower saddle point can be rewritten as (4.28) Obeying the control-strategy law (u Π 1 , α Π 2 ) given by (4.27) and (4.28), the corresponding equilibrium state process X Π (·) satisfies the following SDE: (4.29) We notice that, for any 1 ≤ k ≤ N , . Moreover, for any t k ∈ Π\{t N }, let (Y Π (t k , ·), Z Π (t k , ·)) be the solution to the following BSDE: and then the corresponding lower value is Similarly, the following relationship holds: Putting (4.29) and (4.30) together, we obtain the decoupled FBSDE system (3.2) actually.

5.
Equilibrium saddle points and equilibrium HJBI equations. In Section 4, for any initial pair (t, ξ) ∈ D, any partition Π of the time interval [t, T ], a backward sequence of time-consistent subgames (Problem (G Π )) associated with Problem (InC-SDG) was studied. In this section, we focus on the limit behaviors of Problem (G Π ) when the mesh size of partition Π tends to zero, which will provide a solution to Problem (InC-SDG).
5.1. The formal limits. In this subsection, we study the limit behaviors formally to get the limit equations. In Subsection 5.2, we will show the formal limits can be made rigorously under some conditions. Temporarily, we assume the following assumption.
In fact, the appearance of Θ(r, r, x), Θ x (r, r, x) and Θ xx (r, r, x) in (5.4) will bring us some difficulties in the studying of the regularity properties which are necessary for the well-posedness of the equilibrium HJBI equation. At the moment, we are not able to overcome the difficulty from Θ xx (r, r, x). To avoid this difficulty, we need the following Assumption (A4). σ(r, x, u 1 , u 2 ) = σ(r, x), (r, x, u 1 , u 2 ) ∈ [0, T ] × R n × U 1 × U 2 .
(5.6) Comparing with (5.4), Θ xx (r, r, x) does not appear in the equation (5.6), which reduces significantly the difficulty in the issue of the solvability of equations. Moreover, the well-posedness of (5.6) was established already in [15], and the result will be recalled below. For the proof and more details, the interested readers please refer to [15].
The following assumption is also introduced for obtaining some estimates in the proof of the unique solvability for (5.6).
In the previous subsection, we introduce Assumption (TA) temporarily to ensure some associated convergences. Now it is the time to get rid of it. Let us introduce the following (TA ). There exists some Θ(·, ·, ·) ∈ C 0,0,2 (D[0, T ] × R n ) such that Due to Assumption (A4), the functions ϕ 1 and ψ 2 are independent of the variable P ∈ S n . Therefore, if we replace Assumption (TA) by the new Assumption (TA ), all the convergences and results in Subsection 5.1 still hold. Furthermore, a similar analysis with [15, Theorem 6.2] leads to the following result. Here, we omit the proof.