SOME PARTIALLY OBSERVED MULTI-AGENT LINEAR EXPONENTIAL QUADRATIC STOCHASTIC DIFFERENTIAL GAMES

. Some multi-agent stochastic diﬀerential games described by a stochastic linear system driven by a Brownian motion and having an exponential quadratic payoﬀ for the agents are formulated and solved. The agents have either complete observations or partial observations of the system state. The agents act independently of one another and the explicit optimal feedback control strategies form a Nash equilibrium. In the partially observed problem the observations are the same for all agents which occurs in broadcast situations. The optimal control strategies and optimal payoﬀs are given explicitly. The method of solution for both problems does not require solving either Hamilton-Jacobi-Isaacs equations or backward stochastic diﬀerential equations.


1.
Introduction. Two-person, zero-sum stochastic differential games developed as a natural generalization of (one-person) stochastic control problems and minimax control problems. Isaacs [16] obtained nonlinear partial differential equations that determine the lower and the upper values of the game. If these two equations for the game are equal then the game is said to have a value and the two nonlinear partial differential equations become one. Two well known general methods are available for solving stochastic differential games. The first method uses the Hamilton-Jacobi-Isaacs (HJI) equations which can be considered as the game analogue of the Hamilton-Jacobi-Bellman equations of stochastic optimal control and the Hamilton-Jacobi equations of deterministic optimal control. A number of results (e. g. Fleming and Hernandez-Hernandez [13], Fleming and Souganidis [14]) have provided conditions for the existence of solutions of HJI equations for stochastic differential games. The second method uses the solution of backward stochastic differential equations that generalizes the use of backward stochastic equations from stochastic optimal control [15]. This latter approach is more recent than the HJI equation approach and fewer results about these backward equations are available (e. g. Buckdahn and Li [5]). Two general references on differential games are the monographs [2] and [3].

TYRONE E. DUNCAN
Exponential quadratic cost functionals with linear stochastic systems have been considered for problems of optimal control with complete or partial observations (e.g. [4,7,17,19,21]). These problems are often called risk sensitive quadratic control. Two-person linear stochastic differential games with exponential quadratic payoffs and complete observations have been explicitly solved [9]. Many situations described by stochastic differential games require more than two agents so it is natural to consider multi-agent linear stochastic differential games with exponential quadratic payoffs. Furthermore it is important to study partially observed stochastic differential games. In this paper some multi-agent games are solved where the payoff is an exponential quadratic functional for each of the players and the observations are either complete or partial. The optimal control strategies for both types of problems form Nash equilibria [20]. The methods of solutions for the explicit optimal control strategies do not require solutions of Hamilton-Jacobi-Isaacs equations or backward stochastic differential equations. It seems that there is a limited amount of work on linear-quadratic n-person games [1] and no work for linearexponential-quadratic n-person games particularly for partially observed problems.
Alternatively a sequence of stopping times (T n , n ∈ N) could be defined where the payoff is finite for each T n and these stopping times converge to T .
Agents with the control strategies (U 1 , ..., U k1 ) seek to minimize the payoff J µ and agents with the control strategies (V 1 , ..., V k2 ) seek to maximize the payoff J µ . Both families of agents use feedback strategies determined by (A1).
The Riccati equation for the completely observed stochastic differential game is It is assumed that this Riccati equation has a unique, positive symmetric solution which follows for The parameter µ is constrained to ensure this Riccati solution uniqueness.
The solution of the Riccati equation arises naturally in this optimization problem. The natural geometric setting for the problem (1) and (3) is the Lagrangian Grassmannian that is usually denoted Λ(n) for an appropriate positive integer n. It is a Grassmannian of n planes in 2n dimensional Euclidean space with a closed, nondegenerate two form denoted ω. The space Λ(n) has dimension n(n+1) 2 and it can also be described as the homogeneous space U (n)/O(n) where U (n) is the group of unitary transformations on C n = R 2n and O(n) is the orthogonal group on R n . A (nonsingular) plane in Λ(n) can be described as (x, P x) where P is a symmetric (nonsingular) matrix. This description of Λ(n) shows that dimΛ(n) = n(n+1) 2 and Λ(n) is a natural description to determine optimal control strategies from the solution of a Riccati equation and the symplectic geometry of optimization.
3. Completely observed stochastic differential games. Initially a completely observed multi-agent stochastic differential game with an exponential quadratic payoff is solved. The solution can be considered as a natural generalization of the control problem [7] and the two-agent problem [9] with some additions. The following proposition provides optimal feedback control strategies for the multiagents and these strategies are shown to form a Nash equilibrium for the game. Proposition 1. The multi-agent stochastic differential game given by (1) and (3) with the families of admissible control strategies (A1) has a Nash equilibrium with the optimal control strategies that are given by for i = 1, ...., k 1 and j = 1, ..., k 2 where P µ is the unique symmetric, positive solution of the Riccati equation (5). The optimal payoff is Proof. The proof is motivated by a method that is used to solve a linear exponential quadratic Gaussian control problem [7] and a two person game [9]. In both of these publications the Brownian motions should have dimension of the state and the linear transformations acting on these Brownian motions in the stochastic equations (1) and P µ is the unique positive symmetric solution of (5). The dependence of Y on µ is suppressed for notational simplicity. Apply the change of variables formula of Ito to the process (Y (t), t ∈ [0, T ]) to obtain the following equality.
Let L µ be the quadratic functional that appears in the exponential of the payoff functional (3) as so that the payoff J µ can be expressed as The following equality is determined from the definition of L µ and (9).
To eliminate the stochastic integral term in (12) when it and the other terms are exponentiated, the stochastic integral and the term following it in (12) (an increasing process) are identified as a Radon-Nikodym derivative that is used to define a transformation of the Wiener measure by absolute continuity. The additional term in the Riccati equation (5) from the Riccati equation for a linear-quadratic payoff is used to determine the increasing process. Let P be the probability measure that is obtained from the following Radon-Nikodym derivative as d P = M dP where X is the solution of (1) and the linear feedback strategies from (12) are used. The process M (·) describes a Radon-Nikodym derivative by the strong dichotomy of the absolute continuity for Gaussian measures (Feldman-Hajek theorem) so it follows that E[M ] = 1 and P is indeed a probability measure. To verify the optimality of the strategies determined by making the quadratic integrands of the integrals of the agents' strategies to be zero in (12), some variations of these strategies are made that preserve absolute continuity. It follows directly that suitably small variations of these strategies determine a solution X and a measure that is absolutely continuous with respect to Wiener measure so that a new measure is obtained from a process that corresponds to M (·) in (13). Thus letŪ i andV j for i ∈ (1, ..., k 1 ) and j ∈ (1, ..., k 2 ) be given as where U i,1 (t) = α i for t ∈ [t i 0 , t i 1 ] and the family α i are F(t i 0 ) measurable and bounded random variables for i = 1, ..., k 1 and similarly for V j , j = 1, ..., k 2 . For the control strategiesŪ i andV j i ∈ (1, ..., k 1 ) and j ∈ (1, ..., k 2 ) the exponential that corresponds to the exponential (13) is also a Radon-Nikodym derivative so that it follows by comparing the two payoffs that the optimal strategies are determined by making the quadratic integrands in the agents' strategies zero, that is, the optimal strategies are for i = 1, ...., k 1 and j = 1, ..., k 2 where P µ is the unique, symmetric, positive solution of the Riccati equation (5), that is, for µ > 0 and the inequalities are reversed for µ < 0. Clearly the optimal payoff is 4. Partially observed stochastic differential games. In this section some partially observed stochastic differential games are formulated and explicitly solved. The stochastic model is the linear stochastic differential equation (1) and the payoff is the exponential of the quadratic functional (3) as given in the previous section. However in this case the players have only noisy partial observations of the state. Again the solution is obtained directly without requiring the solutions of Hamilton-Jacobi-Isaacs equations or backward stochastic differential equations. This result can be viewed as a generalization of the partially observed linear exponential quadratic control problem [4] with a more direct proof that is motivated by [11]. It is assumed that the players have only noisy partial observations of the state that is the same for all players. This situation occurs in a broadcasting model or other models where all players have access to the same information. The common observation equation is the following where Y (t) ∈ R p , H(t) ∈ L(R n , R p ), G ∈ L(R p , R p ) is invertible for each t ∈ .., k 2 where P µ is defined below. The appropriate estimation equation for the exponential quadratic payoff (3), often called the information filter (e.g. [12]), is given by and (K µ (t), t ∈ [0, T ]) is the unique, positive symmetric solution of the following Riccati equation and µ is suitably constrained to ensure a unique, positive, symmetric solution of the Riccati equation. The dependence of Z on µ is suppressed for notational convenience. It is useful to note for some subsequent computations that ( t 0 (dY − HZds), G(t), t ∈ [0, T ]) is a Brownian motion with incremental covariance GG T .
The control strategies for the multi-agents are adapted to the observation filtration. The uniqueness assumption for the Riccati equation places a restriction on µ which is satisfied if it is also assumed that µ is constrained so that uniformly for t ∈ [0, T ], H T (t)(G(t)G T (t)) −1 H(t) − µQ(t) > 0, that is, there is a constant c that does not depend on t ∈ [0, T ] such that < (H T (t)(G(t)G T (t)) −1 H(t) − µQ(t))x, x >≥c < x, x > for each t ∈ [0, T ] and x ∈ R n . This estimation process Z is defined as follows where H is the family of square integrable G(·) progressively measurable processes on [0, T ], µ > 0, and h(·) is a generic element of H e.g. [18]. The family of admissible feedback strategies is generically assumed to satisfy (B1) {U : [0, T ] → R i where U is adapted to the observation filtration (G(t), t ∈ [0, T ]) and T 0 |U | 2 dt < ∞ a.s.} and i is a suitable positive integer as well as the finiteness of the payoff given in (A1) to prevent indeterminate forms.
Let ( P µ (t), t ∈ [0, T ]) be the unique, positive, symmetric solution of the following Riccati equation where K µ is the solution of (23) and it is assumed that µ is also chosen to ensure uniqueness of this Riccati equation, e.g. if the following inequality is satis- The following result describes explicit optimal control strategies for the agents for this partially observed problem and shows that these strategies form a Nash equilibrium.
Theorem 4.1. The partially observed, multi-agent stochastic differential game given by (1), (3), and (20) with the admissible control strategies (B1) has a Nash equilibrium given by the following optimal control strategies for i = 1, ..., k 1 and j = 1, ..., k 2 where Z is the solution of the information filter (22) and P µ is the unique symmetric, positive solution of the Riccati equation (26).
where K µ is the solution of the filtering Riccati equation (23).
Proof. The proof for the optimal control strategies uses a refinement of a technique from the solution of the completely observable stochastic differential game given above. Initially the evaluation of the payoff is restricted to computations about the process Z instead of the process X. The (G(·)) progressively measurable version of the integral quadratic term of the state in the payoff functional is This expression follows from (24) where (Z(t), t ∈ [0, T ]) satisfies (22) and is a result from Gaussian measures e.g. [18]. Now an expression for the conditional expectation of the final condition is given. Furthermore, = |det(I − µM K µ (T ))]| 1 2 Thus the problem is reduced to considering the process Z. Apply the Ito formula to the process, ( 1 2 < P µ (t)Z(t), Z(t) >, t ∈ [0, T ]) where (Z(t), t ∈ [0, T ]) satisfies (22) and P µ is the solution of the Riccati equation (26) to obtain Recall that X(0) = Z(0) because X(0) is a constant vector. Let L µ (U 1 , ..., U k1 , V 1 , ..., V k2 ) be the integral terms in the quadratic functional that appear in the exponential of the payoff functional (3) by replacing X by Z, that is, dt This quadratic functional, L µ , and (33) can be combined as Thus the Z measurable payoff is where E is the expectation for P given by Recall from the observation equation (20) and the likelihood function result [6] that ( t 0 (dY − HZdt), G(t), t ∈ [0, T ]) is a Brownian motion with the incremental covariance GG T . The fact that the exponential in (38) is a Radon-Nikodym derivative, that is, it integrates to one, follows from the strong dichotomy for the absolute continuity of Gaussian measures (Feldman-Hajek theorem).
It is claimed that the strategies (U * * i , V * * j , 1 = 1, ..., k 1 ; j = 1, ..., k 2 ) that are defined by making the quadratic terms in the last equality of (36) to be zero are optimal. To verify the optimality of these strategies, let ( U i , V j , i = 1, ..., k 1 ; j = 1, ..., k 2 ) be defined as where and α i is G(t i 0 ) measurable and uniformly bounded for i = 1, ..., k 1 and similarly for V j,1 = β j for t ∈ [t j 0 , t j 1 ] for j = 1, ..., k 2 . The corresponding solution Z for the information filter has a Radon-Nikodym derivative corresponding to the Radon-Nikodym derivative (38) so it is only necessary to consider the quadratic forms in the exponential terms from the last equality of (36). From these quadratic terms it follows directly that (U * * i , V * * j , i = 1, ..., k 1 ; j = 1, ..., k 2 ) are optimal control strategies and form a Nash equilibrium so that for µ > 0 Thus the optimal payoff is achieved by choosing the (optimal) feedback control strategies The determinant term arises from (32). Thus the optimal control strategies are suitable linear functions of the solution of the information filter.
5. Concluding remarks. The explicit solutions of some multi-agent linear-exponential-quadratic stochastic differential games with complete or partial observations are obtained using a direct method that does not require solving either nonlinear partial differential equations (Hamilton-Jacobi-Isaacs equations) or backward stochastic differential equations. This direct method is naturally related to the method for a two-person, completely observed linear-quadratic stochastic differential game [9]. Furthermore, this direct method of verification is not limited to linear or finite dimensional stochastic differential equations e.g. [8], [10].