ON CLASS OF NON-TRANSFERABLE UTILITY COOPERATIVE DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING

. This paper considers and describes the class of cooperative diﬀerential games with the non-transferable utility and continuous updating. It is the ﬁrst detailed paper about the application of continuous updating approach to the non-transferable utility diﬀerential games. The process of how to con- struct Pareto optimal strategy with continuous updating and Pareto trajectory is described. Another important contribution is that the property of subgame consistency is adopted for the class of games with continuous updating. The resource extraction game model is used as an example. The Pareto optimal strategies and corresponding trajectory are constructed, and the set of Pareto optimal strategies satisfying the subgame consistency property is presented. The results of numerical simulation are demonstrated in the Matlab environ-ment, and the conclusion is drawn.


1.
Introduction. Most conflict-driven processes in real-life evolve continuously in time, and their participants continuously receive updated information and adapt accordingly. The principal models considered in classical differential game theory are associated with problems defined for a fixed time interval (players have all the information within a closed time interval) [7], problems defined for an infinite time interval with discounting (players have all information specified over an infinite time interval) [1], problems defined for a random time interval (players have information over a given time interval, but the duration of this interval is a random variable) [22]. One of the first paper in the theory of differential games was devoted to a pursuit differential game (the player's payoff depends on the time of capture) [12]. In all the above models and approaches, it is assumed that at the beginning of the game players use all information about the game dynamics (motion equations) and player's preferences (payoff functions). However, these approaches do not take into account that many real-life conflict-controlled processes are characterized by the fact that players at the initial time instant do not have all the information about the game. Therefore such classical approaches for defining optimal strategies as the Nash equilibrium, the Hamilton-Jacobi-Bellman equation [2], or the Pontryagin maximum principle [13], for example, cannot be directly used to construct an extensive range of real game-theoretic models. Another interesting application of dynamic and differential games is the network, [5].
This particular paper is devoted to the class of cooperative non-transferable utility games, where players cooperate by choosing the joint outcome. One of the fundamental questions in the theory of cooperative differential games with the nontransferable utility is the formulation of optimal behaviour for players or economic agents, the design of Pareto optimal trajectories, the computation of the corresponding solution, and the analysis of its subgame consistency. The well-known solution in the games with non-transferable utility is the Nash bargaining solution. Haurie analyzed the problem of dynamic instability of Nash bargaining solutions in differential games [6]. The notion of time consistency in differential games solutions was formalized mathematically by Petrosyan [14]. In the paper [23] the authors derive subgame consistent solutions for a class of cooperative stochastic differential games with non-transferable utility. This approach is based on the set of Pareto optimal strategies, the algorithm of how to define them in differential games is presented in [24]. Another technique for the construction of a subgame consistent solution is presented in [15].
In this paper, the concept of Pareto optimality for the class of games with continuous updating is defined, and optimality conditions in the form of Hamilton-Jacobi-Bellman equation are derived. The corresponding trajectory is also constructed. Also, subgame consistency property is formulated for the class of games with continuous updating. Further, subgame consistent Pareto optimal solution is defined. The presented approach is tested on the non-renewable resource extraction game model of two firms. In the game models with continuous updating, it is assumed that 1. at each current time t ∈ [t 0 , +∞), players only have or use information on the interval [t, t + T ], where 0 < T < ∞ is the length of the information horizon, 2. as time t ∈ [t 0 , +∞) evolves, the information of the game updates, players receive the updated information.
In the framework of the dynamic updating, the following papers were published [16], [17], [18], [19], [20], [25]. Their authors set the foundations for further study of a class of games with dynamic updating. It is assumed that the information about motion equations and payoff functions is updated in discrete time instants and the interval on which players know the information is defined by the value of the information horizon. Non-cooperative setting with dynamic updating was considered, and the concept of Nash equilibrium with dynamic updating. Also, in the papers above cooperative case of game models with dynamic updating was studied and Shapley value for this setting was constructed. However, the class of games with continuous updating provides new theoretical results. The class of differential games with continuous updating was introduced in the papers [21], [8], here it is supposed that the updating process evolves continuously in time. In the paper [21], the system of Hamilton-Jacobi-Bellman equations is derived for the Nash equilibrium in a game with continuous updating. In the paper [8], the class of linear-quadratic differential games with continuous updating is considered, and the explicit form of the Nash equilibrium is obtained.
The approach of continuous updating has some similarities with the following related series of papers about the class of stabilizing control [11], [9], [10], here the similar approaches were considered for the class of linear-quadratic optimal control problem. However, the aim is different, in the current paper and papers about continuous updating approach, the main goal is to model the behaviour of players when information about the process updated continuously in time.
The paper is organized as follows. Section 2 starts by describing the differential game model with continuous updating. Section 3 is devoted to classical Pareto optimality principle, which is adapted for the class of games with continuous updating. In Section 4, a new type of Hamilton-Jacobi-Bellman equations for Pareto optimal strategies in a class of games with continuous updating is presented. In section 5, the subgame consistency property with continuous updating is defined. Afterwards, in Section 6, the results are demonstrated using the non-renewable resource extraction model and MATLAB simulation.
2. Differential game model with continuous updating. Consider n-player Motion equation for the subgame Γ(x, t, T ) has the form: The expected payoff of player i in the subgame Γ(x, t, T ) has the form: where x t (s), u t (s, x) are trajectories and strategies in the game Γ(x, t, T ),ẋ t (s) is the derivative of s,s ∈ [t, t + T ]. Strategy profile u(t, x) in the differential game with continuous updating has the form: where u t (s, x), s ∈ [t, t + T ] are strategies in the subgame Γ(x, t, T ). In the game defined by (1), (2) the strategy profile u t (s, x) is defined for every fixed current time instant t for s ∈ [t, t + T ]. Therefore, time parameter s ∈ [t, t + T ] can be considered as an imaginary time parameter, because it is used to construct the expected payoffs of players. As current time instant t changes, the time interval [t, t + T ] also changes and the function u t (s, x) as a function of s for another time instant t can be different. Trajectory x(t) in the differential game with continuous updating is determined in accordance witḣ where u = u(t, x) are strategies in the game with continuous updating (3) anḋ x(t) is the derivative of t. We suppose that the strategy with continuous updating obtained using (3) is admissible or that the problem (4) has a unique and continuous solution.
3. Pareto optimal strategies with continuous updating. In the framework of continuously updated information, it is important to model the behavior of players.
To do this, we use the concept of Pareto optimality. However, for the class of differential games with continuous updating, we would like to have it in the following form: x)) coincides with the Pareto optimal strategy profile in the game (1), (2) defined on the interval [t, t + T ] in the instant t. However, direct application of classical approaches for finding of Pareto optimal strategies is not possible. To construct such strategies, we consider a concept of generalized Pareto optimal strategies as the principle of optimality which we are going to use further for construction of strategies u P (t, x). , x), . . . , u P n (t, s, x)) is a generalized Pareto optimal strategy profile in the game with continuous updating, if for any fixed t ∈ [t 0 , +∞) strategy profile u P (t, s, x) is Pareto optimal strategy in the game Γ(x, t, T ).
Definition 3.2. Strategy profile u P (t, x) is called the Pareto optimal strategy with continuous updating if it is defined in the following way: where u P (t, s, x) is the generalized Pareto optimal strategy profile defined in definition (3.1).
Trajectory x * (t) corresponding to the Pareto optimal strategy profile with continuous updating u P (t, x) can be obtained from the system (4).

4.
Hamilton-Jacobi-Bellman equation with continuous updating. In order to construct Pareto optimal strategies and corresponding trajectories with continuous updating, we need to consider the following optimization problem for every vector of weights α : α i ∈ (0, 1), n i=1 α i = 1 [24]. Later, we will denote Pareto optimal strategy profile as u α (t, x) and generalized Pareto optimal strategy profile as u α (t, s, x): where t ∈ [t 0 , +∞] is the current time instant. In order to solve (7) for a fixed vector of weights α or to define strategy profile u α (t, x), it is necessary to determine the generalized Pareto optimal strategy profiles u α (t, s, x), for that we will use a modernized version of dynamic programming. By combining all possible u α (t, x) for all possible weights α we will obtain the set of Pareto optimal strategy profile with continuous updating. Classically weights α define the agreement between the players, but they can be reconsidered. But in this paper weights α are fixed in the beginning of the game, α(t) = α.
By W α (t, s, x) denote the Bellman function in a subgame starting at the time instant s of the game starting at the current time t: The Hamilton-Jacobi-Bellman (HJB) equation has the following form: is the generalized Pareto optimal strategy profile in the differential game with continuous updating, if there exist functions W α (t, s, x) : [t 0 , +∞) × [t, t + T ] × R → R continuously differentiable by s and x, satisfying the following system of partial differential equations (9): where u α −i (φ i ) = ( u α1 1 , . . . , φ i , . . . , u αn n ). Proof. According to the definition of generalized Pareto optimal strategy profile u α (t, s, x) should be Pareto optimal for any fixed t.
By fixing t in the formulation of Theorem 1 and in particular in (9), we obtain classical sufficient conditions for Pareto optimal strategy profile in the differential game with prescribed duration [t, t + T ] presented in [1]. Therefore, for any fixed t, the conditions for the definition of generalized Pareto optimal strategy profile are satisfied. The theorem is proved.
We consider only the class of generalized Pareto optimal strategy profiles such that for the Pareto optimal strategy profile with continuous updating the solution of the system (4) satisfies the conditions of existence, uniqueness, and continuability of A. F. Filippov [4]. In the case, if it is possible to obtain generalized Pareto optimal strategy profile u α (t, s, x) using equations (9), then by using the procedure (6) we obtain desired strategy profile u α (t, x).

5.
Subgame consistency with continuous updating. Under cooperation with non-transferable payoffs, the players negotiate to establish an agreement (optimality principle) on how to play the cooperative game. In particular, the chosen optimality principle has to satisfy group optimality (i) and individual rationality (ii) along the chosen trajectory x * (t) (in our case Pareto optimal trajectory). Subgame consistency requires that the extension of the solution to a later starting time and any possible state brought about by prior optimal behaviour of the players would remain optimal. Both group optimality and individual rationality are required. Group optimality requires the players to seek a set of cooperative strategies or controls that yields a Pareto optimal solution. The solution has to satisfy individual rationality in the sense that all players would obtain more payoff in the cooperative case rather in the individual case.
According the procedure in (6), Pareto optimal strategies and trajectories with continuous updating satisfy the group optimality property. But the individual rationality property is not always satisfied [24]. For the class of games with continuous updating, the individual rational property has the following form:

ZEYANG WANG AND OVANES PETROSIAN
i (x * (t), t; u α (t, s, x)), f or ∀i and ∀t, where s, x)) is the individual payoff of player i in Nash equilibrium in the game defined on the interval [t, t + T ] starting along the Pareto optimal trajectory x * (t), K t,α i (x * (t), t; u α (t, s, x)) is the individual payoff under cooperation (6) in the game starting along the Pareto optimal trajectory x * (t). A formal definition of subgame consistency can be stated as: Definition 5.1. Pareto optimal strategy profile u α (t, x) is called subgame consistent, if the corresponding generalized Pareto optimal strategy profile u α (t, s, x) is such that the following conditions are satisfied: (i) Group optimality: K t,α i (x * (t), t; u α (t, s, x)), i ∈ N is Pareto optimal; (ii) Individual rationality: s, x)), ∀i and ∀t.
Suppose that there exits the set A of weight α such that conditions (i) and (ii) are satisfied. The set of Pareto optimal strategies u α (t, x), where α ∈ A, we will call the subgame consistent cooperative solution with continuous updating. 6. Differential game of non-renewable resource extraction. As an illustrative example consider a differential game model with continuous updating for the extraction of a nonrenewable resource (see [3]).
6.1. Initial game model. By x(t) denote the state vector indicating the resource stock at time t. u i (t, x) denotes the extraction rate of player i at time t if resource stock is equal to x. We assume that u i (t, x) ≥ 0 and that, if x(t) = 0, then the only feasible rate of extraction is u i (t, x) = 0.
The dynamics of the stock is given by the following equation: where b i > 0 for all i = 1, . . . , n, and x 0 > 0. Payoff function of player i: 6.2. Pareto optimal strategies with continuous updating. According to Section 3, to determine Pareto optimal strategies in a game with continuous updating, we consider the family of auxiliary subgames Γ(x, t, t + T ) with duration T , starting from the moment t from the state x. In order to define Pareto optimal strategies u α (t, s, x) in an auxiliary subgame Γ(x, t, t + T ) we use the dynamic programming technique. By W α (t, s, x) denote the Bellman function in a subgame for current time instant t starting at s: subject to : The Hamilton-Jacobi-Bellman (HJB) equation has the following form: The solution of (13) will be found in the form W α (t, s, x) = A(t, s) ln x + B(t, s). Partial derivatives are given by whereȦ(t, s),Ḃ(t, s) are the derivatives by s. Maximizing the expression on the right-hand side of (13), and substituting it into (14), we obtain: Substituting (14), (15) into (13), we obtain the following system of differential equations: The solution of (16) has the form: Finally, we obtain Pareto optimal strategies in auxiliary subgame Γ(x, t, t + T ): And , s ∈ [t, t + T ). (21) Solution: This solution shows under what conditions Pareto optimal strategy profile with continuous updating could satisfy the subgame consistency property. Consider numerical example where n = 2, x 0 = 5, b i = 2, T = 0.5, α i = (0.1, 0.2, . . . , 1). In Figure 1, we can see the comparison of trajectory in the initial game and game with continuous updating. In Figure 2, the difference between the Pareto optimal strategies in the initial and continuous updating game models for α 1 = 0.664, α 2 = 0.336. Figure 3 shows the Pareto optimal strategy of player i for different value of α. In Figure 4 the payoff functions (26) for different weights α and Nash equilibrium payoff (27)   7. Conclusion. In this paper, the process of constructing the Pareto strategy profile for a class of differential games with continuous updating is described. An important part of the paper is devoted to the subgame consistency and construction of subgame consistent Pareto optimal set of strategies. Pareto optimal strategies are defined using special weights α that reflect the bargaining process of the game. In the current model, it is supposed that the weights do not change in time as if  players would agree once for the whole game. The resource extraction game model is used as an example, and numerical simulation results are demonstrated in the Matlab environment. In future, we plan to consider dynamic weights and define when it is profitable for the players to change the weights and when not.