Optimal Control and Zero-Sum Games for Markov Chains of Mean-Field Type

We show existence of an optimal control and a saddle-point for the zero-sum games associated with payoff functionals of mean-field type, under a dynamics driven by a class of Markov chains of mean-field type.


INTRODUCTION
A Markov chain of mean-field type (also known as nonlinear Markov chain) is a pure jump process with a discrete state space whose jump intensities further depend on the marginal law of the process. It is obtained as the limit of a system of pure jump processes with mean-field interaction, as the system size tends to infinity. The marginal law of the nonlinear process, obtained as a deterministic limit of the sequence of empirical distribution functions representing the states of the finite systems, satisfies a 'nonlinear' Fokker-Planck or masters equation called the McKean-Vlasov equation. In a sense, it represents the law of a typical trajectory in the underlying collection of interacting jump processes. In particular, optimal control and games based on the nonlinear process dynamics would give an insight into the effect of the design of control and game strategies for large system of interacting jump processes.
This class of processes is widely used for modeling purposes in chemistry, physics, biology and economics. Nicolis and Prigogine [NP77] proposed such a class of nonlinear processes as a mean-field model of a chemical reaction with spatial diffusion. It plays the same role as nonlinear diffusions play in the study of diffusions and more generally Lèvy processes with mean-field interaction (see Sznitman [Szn91] and Jourdain et al. [JMW08] and the references therein). Mean-field models of the so-called first and second Schlögl processes [Sch72] and the auto-catalytic process, which are widely used to model chemical reactions, provide interesting examples of Markov chains of mean-field type with unbounded jump intensities and have been studied in depth in Dawson and Zheng [DZ91], Feng and Zhang [FZ92] and Feng [Fen94]. These nonlinear processes are obtained as limits of systems of birth and death processes with mean field interaction. For application in the spread of epidemics see e.g. Léonard [Léo90], Djehiche and Kaj [DK95], Djehiche and Schied [DS98]. For an account of existence and uniqueness of such nonlinear jump processes with bounded jump intensities we refer to Oelschläger [Oel84]. See [Léo95] for the case of unbounded jumps.
In this paper we suggest another proof of existence and uniqueness of Markov chains of mean-field type using a fixed point argument based on a Girsanov-type change of measure and the Csiszár-Kullback-Pinsker inequality. Moreover, we consider optimal control and zero-sum games associated with payoff functionals of mean-field type, when the nonlinear Markov chain is controlled through its jump intensities. More precisely, we consider pure jump processes x whose jump intensities at time t depend on the whole path over the time interval [0, T] and also on the marginal law of x(t), as long as they are predictable. In a sense, this way of constructing a nonlinear jump process is a generalization of the classical thinning procedure of a point process. A similar program for controlled diffusion processes is performed in [DH16].
After a short section of preliminaries, we introduce in Section 2 the class of Markov chains of mean-field type and prove their existence and uniqueness under rather weak conditions on the underlying unbounded jump intensities. In section 3, we consider the optimal control problem and prove existence of an optimal control. Finally, in Section 4, we consider a related zero-sum game and show existence of a saddle-point under the so-called Isaacs' condition. In both cases, the main results are derived using techniques involving Markov chain backward stochastic differential equations.
To x we associate the indicator process I i (t) = 1 {x(t)=i} whose value is 1 if the chain is in state i at time t and 0 otherwise and the counting processes N ij (t), i = j, such that which count the number of jumps from state i into state j during the time interval (0, t]. Obviously, since x is right continuous with left limits implies that both I i and N ij are right continuous with left limits. Moreover, by the relationship the state process, the indicator processes, and the counting processes carry the same information which is represented by the natural filtration F 0 := (F 0 t , 0 ≤ t ≤ T) of x. Below, C and c 1 denote generic positive constants which may change from line to line.
In view of e.g. Theorem 7.3 in [EK09], or Theorem 20.6 in [RW00] (for the finite state-space and time independent case), given the Q-matrix G and a probability measure ξ 0 over I, there exists a unique probability measure P on (Ω, F ) under which the coordinate process x is a time-inhomogeneous Markov chain with intensity matrix G and starting distribution ξ 0 i.e. such that P • x −1 (0) = ξ. Equivalently, P solves the martingale problem for G with initial probability distribution ξ meaning that, for every f on I, the process defined by By Lemma 21.13 in [RW00], the compensated processes associated with the counting processes N ij , defined by are zero mean, square integrable and mutually orthogonal P-martingales whose predictable quadratic variations are Moreover, at jump times t we have Thus, the optional variation of M is We call M := {M ij , i = j} the accompanying martingale of the counting process N := {N ij , i = j} or of the Markov chain x.
We denote by F := (F t ) 0≤t≤T the completion of (F 0 t ) t≤T with the P-null sets of Ω. Hereafter, a process from [0, T] × Ω into a measurable space is said predictable (resp. progressively measurable) if it is predictable (resp. progressively measurable) w.r.t. the predictable σ-field on [0, T] × Ω (resp. F).
For a real-valued matrix m := (m ij , i, j ∈ I) indexed by I × I, we let If m is time-dependent, we simply write m(t) 2 g . Let (Z ij , i = j) be a family of predictable processes and set (2.9) Z(t) 2 Consider the local martingale Then, the optional variation of the local martingale W is and its compensator is Provided that W is a square-integrable martingale and its optional variation satisfies Moreover, the following Doob's inequality holds: If Z is another predictable process that satisfies (2.14), setting (2.17) and considering the martingale it is easy to see that Since, the filtration F generated by the chain x is the same as the filtration generated by the family of counting processes {N ij , i = j}, we state the following martingale representation theorem (see e.g. [Brè81], Theorem T11 or [RW00], IV-21, Theorem 21.15).
In particular, at jump times t, we have Next, we give an important application of Theorem 2.1 to the local martingale M f given by (2.2) where an explicit form of the process Z can be displayed in terms of the function f . At jump times t we have since, by (2.6), at a jump time t, 2.2. Probability measures on I. Let P (I) denote the set of probability measures on I. For µ, ν ∈ P (I), the total variation distance is defined by the formula Furthermore, let P 2 (Ω) be the space of probability measures P on Ω such that Similarly, on the filtration F, we define the total variation metric between two probability measures P and Q as It satisfies Endowed with the total variation metric D T , P 2 (Ω) is a complete metric space. Moreover, D T carries out the usual topology of weak convergence. For P, Q ∈ P 2 (Ω) with time marginals P t := P • x −1 (t) and Q t := Q • x −1 (t), the total variation distance between P t and Q t satisfies

JUMP PROCESSES OF MEAN-FIELD TYPE
In this section we prove existence of a unique probability measure P on (Ω, F ) under which the coordinate process x is a jump process with intensities λ ij (t, x, P • x −1 (t)), i, j ∈ I, where we allow the jump intensities at time t depend on the whole path x over the time interval [0, T] and also on the marginal law of x(t), as long as the intensities are predictable. Because of the dependence of its jump intensities on the marginal law, we call it jump process of mean-field type. If the intensities are deterministic functions of t and the marginal law of x(t) i.e. they are of the form λ ij (t, P • x −1 (t)), i, j ∈ I, we call x a Markov chain of mean-field type or simply an nonlinear Markov chain.
The probability measure P is constructed as follows. We start with the probability measure P which solves the martingale problem associated with G = (g ij ), where the intensities g ij are assumed time-independent, making the coordinate process x a time-homogeneous Markov chain. Then, using a Girsanov-type change of measure in terms of a Dolean-Dade exponential martingale for jump processes which involves the intensities λ ij and g ij , we obtain our probability measure P. It is also possible to choose G time-dependent. But, it is easier to deal with time-independent intensities.
Remark 3.1. Assumption (A4) is needed to guarantee that the chain has finite second moment.

Example 3.2. A mean-field Schlögl model. In the mean-field version of the Schlögl model (cf.
[DZ91], [FZ92] and [Fen94]) the intensities are The martingale problem formulation states that, for every f on I, the process defined by Let P be the probability under which x is a time-homogeneous Markov chain such that P • x −1 (0) = ξ and with Q-matrix (g ij ) ij satisfying (2.1), with g ij time-independent. Assume further that We impose this condition because we are going to use a Girsanov-type change of measure between two probability measures under which the chain has jump intensities are λ ij and g ij , respectively. This amounts to only taking into account nonzero jump intensities.
To ease notation, we set, Let P Q be the measure on (Ω, F ) defined by is the Dolean-Dade exponential. It is the solution of the following linear stochastic integral equation By Girsanov theorem, if L Q is a P-martingale, then P Q is a probability measure on (Ω, F ) under which the coordinate process x is a jump process with intensity matrix λ Q := (λ Q ij (t)) i,j and starting distribution ξ i.e. P Q • x −1 (0) = ξ. In particular, the compensated processes associated with the counting processes N ij defined by are zero mean, square integrable and mutually orthogonal P Q -martingales whose predictable quadratic variations are Using (3.8), we may write M Q ij in terms of M ij as follows.
Now, since L Q is a positive P-local martingale, it is a supermartingale. Thus, E[L Q T ] ≤ 1. In order to show that is a P-martingale, we need to show that E[L Q T ] = 1. We note that the imposed conditions (A1)-(A4) on the intensity matrix λ Q do not fit with the assumptions displayed in the literature ranging from [Brè81], Theorem T11, to [SH15], Theorem 2.4, to guarantee that L Q is a P-martingale.
Proof. The proof is inspired by the proof of Proposition (A.1) in [EKH03]. As mentioned above, it suffices to prove that E[L Q T ] = 1. For n ≥ 0, let λ n be the predictable intensity matrix given by λ n ij (t) := λ Q ij (t)1 {ω, x(ω) t ≤n} and let L n the associated Dolean-Dade exponential and P n the positive measure defined by dP n = L n T dP. Noting that, for i, j ∈ I, i = j, |i − j| ≥ 1, by (A4), we have λ ij (t, w, µ) ≤ C(1 + |w| t + |y|µ(dy)).
Thus, for every n ≥ 1, λ n ij (t) ≤ C(1 + n + Q 2 ), i.e. λ n ij is bounded. In view of [Brè81], Theorem T11, L n is a P-martingale. In particular, E[L n T ] = 1 and P n is a probability measure. By (3.13), x T < ∞, P-a.s. Therefore, on the set {ω, x(ω) T ≤ n 0 }, for all n ≥ n 0 , Denoting by E n the expectation w.r.t. P n , we have where, by (3.12), C does not depend on n.
Let η > 0. Choose m 0 ≥ 1 such that C/m 0 < η. We have, for all n ≥ m 0 , L n So there exists a 0 > 0 such that whenever a > a 0 , in view of (3.18) and (3.19). This finishes the proof since η is arbitrary.
Next, we will show that there is Q such that P Q = Q, i.e., Q is a fixed point. It is the probability measure under which the coordinate process is a jump process of mean-field type.
Theorem 3.5. The map admits a unique fixed point, Q, which satisfies Proof. First, we note that if Q ∈ P 2 (Ω) then by (3.12), which implies that Φ(Q) ∈ P 2 (Ω), since ξ 2 2 < +∞ by (A5). Next, we show the contraction property of the map Φ. To this end, given Q, Q ∈ P 2 (Ω), we use an estimate of the total variation distance D T (Φ(Q), Φ( Q)) in terms of the relative entropy H(Φ(Q)|Φ( Q)) between Φ(Q) and Φ( Q) given by the celebrated Csiszár-Kullback-Pinsker inequality: In view of (3.6), we have Taking expectation w.r.t. Φ(Q), using (3.9), we obtain where τ(x) := x log x − x + 1, x > 0 is a convex function. We note that the r.h.s. of this last equality is non-negative, since, by convexity, we have where τ ′ (y) = log y. Using Taylor expansion we get for some z ∈ {tx + (1 − t)y : t ∈ (0, 1)}, where τ ′′ (z) = 1/z. Taking x, y such that x, y ≥ c 1 > 0, as in (A2), we obtain Applying (3.22) to the entropy (3.22), we obtain Combining this inequality with (3.21), we obtain We may use (A3) to obtain Therefore, Iterating this inequality, we obtain, for every N > 0, where Φ N denotes the N-fold composition of the map Φ. Hence, for N large enough, Φ N is a contraction which implies that Φ admits a unique fixed point.
Finally, using (3.17) with Φ( Q) = Q, noting that w 2 Applying Gronwall's inequality we obtain the estimate (3.20): Corollary 3.6. The mapping t → P • x −1 (t) is continuous. More precisely, we have Proof. The inequality (3.24) follows by applying the above estimates to the martingale (2. 2) with f (x) = I {x∈A} , A ⊂ I, where we use the matrix λ instead of G.
3.1. Markov chain BSDEs. An important consequence of Theorem (2.1) are solutions (Y, Z) of Markov chain backward stochastic differential equations (BSDEs) defined on (Ω, F , F, P) by It is easily seen that if (Y, Z) solves (3.25) then it admits the following representation: Hence, we may write Existence and uniqueness results of solutions of Markov chain BDSEs (3.25) based on the martingale representation theorem (L 2 -theory) have been recently studied in a series of papers by Cohen and Elliott (see e.g. [CE12] and the references therein). Their approach essentially adapts the method for solving Brownian motion driven BSDEs established first in [PP90]. Recently, Confortola et al. [CFJ14] derived existence and uniqueness results for more general classes of BSDEs driven by marked point processes under only L 1 -integrability conditions. In this paper we use the L 2 -theory as we want to use the martingale representation theorem in our optimal control problem.
Below, we establish existence of an optimal control and a saddle-point for the zero-sum game using some properties of the following class of BSDEs which is a special case of (3.25).
(M ij ) ij is the P-martingale given in (2.4) and the driver φ is essentially of the form where p := (p ij , i, j ∈ I) is a real-valued matrix indexed by I × I and ℓ = (ℓ ij , i, j ∈ I) is given by where the predictable process λ(t, x) = (λ ij (t, x), i, j ∈ I) is the intensity matrix of the chain x under a probability measure P on (Ω, F ) given by a similar formula as (3.5)-(3.6).
In particular, as in (3.11), the processes Moreover, φ satisfies a 'stochastic' Lipschitz condition. More precisely, we make the following assumptions on the driver φ and the terminal value ξ.
Using Proposition 3.7, the proof of the theorem is similar to that of the Brownian motion driven BSDEs derived in [HL95], Theorem I-3, using an approximation scheme by an increasing sequence of standard Markov chain BSDEs for which existence, uniqueness and comparison results are similar to that of the Brownian motion driven BSDEs derived in [PP90] and [EKPQ97], along with the properties (2.15) and (2.16) related to the martingale W displayed in (2.11) together with Itô's formula for semimartingales driven by counting processes. We omit the details.

OPTIMAL CONTROL OF JUMP PROCESSES OF MEAN-FIELD TYPE
Let (U, δ) be a compact metric space with its Borel field B(U) and U the set of Fprogressively measurable processes u = (u(t), 0 ≤ t ≤ T) with values in U. We call U the set of admissible controls. In this section we consider a control problem of the jump process of mean-field type introduced above, where the control enters the jump intensities.
For u ∈ U , let P u be the probability measure on (Ω, F ) under which the coordinate process x is a jump process with intensities satisfying the following assumptions similar to (A1)-(A5).
(B4) For p = 1, 2 and for every t ∈ [0, T], w ∈ Ω, u ∈ U and µ ∈ P 2 (I), (B5) The probability measure ξ on I has finite second moment: ξ 2 2 := |y| 2 ξ(dy) < ∞. Existence of P u such that P u • x −1 (0) = ξ is derived as a fixed point of Φ u defined in the same way as in Theorem 3.5 except that the intensities λ ij (·) further depend on u, which does not rise any major issues.
Let P be the probability measure on (Ω, F ) under which x is a time-homogeneous Markov chain such that P • x −1 (0) = ξ and with Q-matrix (g ij ) ij satisfying (2.1) and (3.3). We have where ℓ u ij (s) := ℓ ij (t, x, P u • x −1 (s), u(s)) is given by the formula and (M ij ) ij is the P-martingale given in (2.4). Moreover, in a similar way as in (3.11), the accompanying martingale M u = (M u ij ) ij satisfies We first derive continuity of the map u → P u and then state the optimal control problem we want to solve.
Let E u denote the expectation w.r.t. P u . Using (3.20), we have, for every u ∈ U , (4.7) We further have the following estimate of the total variation between P u and P v .
In particular, the function u → P u from U into P 2 (Ω) is Lipschitz continuous: for every u, v ∈ U, Moreover, (4.10) for some constant C > 0 that depends only on T and ξ.
Proof. A similar estimate as (3.23) yields Using (B3), we obtain By (2.28) and Gronwall inequality we finally obtain Inequality (4.9) follows from (4.8) by letting u(t) ≡ u ∈ U and v(t) ≡ v ∈ U. It remains to show (4.10). But, this follows from (4.7) and the continuity of the function u → P u from the compact set U into P 2 (Ω).
Let f be a measurable functions from [0, T] × Ω × P 2 (I) × U into R and h be a measurable functions from I × P 2 (I) into R such that (B5) For any u ∈ U and Q ∈ P 2 (Ω), the process ( for φ ∈ { f , h}. (B7) f and h are uniformly bounded.
The cost functional J(u), u ∈ U associated with the controlled the jump process through the intensities λ ij (t, u(t)) is where f and h satisfy (B5), (B6) and (B7) above.
Any u ∈ U satisfying (4.13) is called optimal control. The corresponding optimal dynamics is given by the probability measure P on (Ω, F ) defined by (4.14) where L u is given by the same expression as (4.3) and under which the coordinate process x is a jump process with intensities We want to prove existence of such an optimal control and characterize the optimal cost functional J( u).
Next, we show that the cost functional J(u), u ∈ U , can be expressed by means of solutions of a linear BSDE.
Proposition 4.2. For every u ∈ U , the BSDE

admits a solution (Y, Z) which consists of an F-adapted process Y which is right-continuous with left limits and a predictable process Z which satisfy
This solution is unique up to indistinguishability for Y and equality dP × g ij I i (s − )ds-almost everywhere for Z.
Proof. Since the Hamiltonian H(t, w, µ, p, u) is linear in p, by Theorem (3.8), existence and uniqueness of solutions of the BSDE (4.18) satisfying (4.19) follows from (4.17), the boundedness of h(x(T), P u • x −1 (T)) and the boundedness of H(t, x, P u • x −1 (t), 0, u(t)) which follows from (B7). It remains to show that Y u 0 = J(u). Indeed, in terms of the (F, P u )-martingale ℓ ij (s, u(s))I i (s − )g ij ds.
the process (Y u , Z u ) satisfies, for 0 ≤ t ≤ T, Therefore, In particular, 4.1. Existence of an optimal control. In the remaining part of this section we want to find u ∈ U such that u = arg min u∈U J(u). A way to find such an optimal control is to proceed as in Proposition 4.2 and introduce a linear BSDE whose solution Y * satisfies Y * 0 = inf u∈U J(u). Then, by comparison (cf. Proposition 3.7), the problem can be reduced to minimizing the corresponding Hamiltonian w.r.t. the control u.
We may use (2.28), (4.9) and (4.16) to see that the function u → H(t, w, P u • w −1 (t), p, u) is continuous on the compact set U. Thus, for each (t, w, p), H * (t, w, p) := inf u∈U H(t, w, P u • w −1 (t), p, u) is finite. Moreover, the set of minima of H, is not empty.
Furthermore, by (2.28), (4.9) and (B6), the function u → h(w T , P u • w −1 (T)) is continuous. Therefore, g * (w T ) := inf u∈U h(w T , P u • w −1 (T)) is finite and the set of minima of h is not empty.
We have the following Lemma 4.3. For every (t, w) ∈ [0, T] × Ω, the function p → H * (t, w, p) from the set of I × Imatrices with real entries into R is Lipschitz continuous. More precisely, we have Proof. We have by (4.17), where by the continuity of u → P u , K T := sup u∈U P u 2 is finite.
Combining (4.23) and (4.24), it is easily seen that the progressively measurable function u * defined by In the next theorem we characterize the set of optimal controls associated with (4.13).
Theorem 4.4. The BSDE , admits a solution (Y * , Z * ) which consists of an F-adapted process Y * which is right-continuous with left limits and a predictable process Z which satisfy This solution is unique up to indistinguishability for Y * and equality dP × g ij I i (s − )ds-almost everywhere for Z * .
where Y u is solution of the BSDE (4.18), and the process u * t := u(t, x, Z * t ) given by (4.25) is an optimal control for the problem (4.13). Proof. Existence and uniqueness of solutions (Y * , Z * ) of (4.27) satisfying (4.28) follow from the estimates (4.22), (3.13) and the boundedness of the function H * (t, x, 0) which follows from (B7) and the boundedness of g * (x(T)) which also follows from (B7) . Furthermore, since g * (x(T)) ≤ g(x(T), P u • x −1 (t)) and H * (t, x., p) ≤ H(t, x, P u • x −1 (t), p, u), by Theorem (3.8), it holds that Y * t ≤ Y u t a.s.. Hence, Y * t ≤ ess inf u∈U Y u t , a.s.. In view of (4.25), (4.26) and the uniqueness of the solution of (4.27), we have . This finishes the proof of the theorem.

THE TWO-PLAYERS ZERO-SUM GAME PROBLEM
In this section we consider a two-players zero-sum game. Let U (resp. V) be the set of admissible U-valued (resp. V-valued) control strategies for the first (resp. second) player, where (U, δ 1 ) and (V, δ 2 ) are compact metric spaces.
The distance δ defines a metric on the compact space U × V.
As in the previous section, let P be the probability measure on (Ω, F ) under which x is a time-homogeneous Markov chain such that P • x −1 (0) = ξ and with Q-matrix (g ij ) ij satisfying (2.1) and (3.3).
(C9) f and h are uniformly bounded. The performance functional J(u, v), (u, v) ∈ U × V, associated with the controlled Markov chain is The zero-sum game we consider is between two players, where the first player (with control u) wants to minimize the payoff (5.5), while the second player (with control v) wants to maximize it. The zero-sum game boils down to showing existence of a saddle-point for the game i.e. to show existence of a pair ( u, v) of strategies such that for each (u, v) ∈ U × V. The corresponding optimal dynamics is given by the probability measure P on (Ω, F ) defined by (5.7) d P = L u, v T dP under which the chain has intensity λ u, v . For (t, w, µ, u) ∈ [0, T] × Ω × P 2 (I) × U × V and matrices p = (p ij ) with real-valued entries, we introduce the Hamiltonian associated with the optimal control problem (5.5) In a similar way as for (4.16) and (4.17), whenever p g (t) and p ′ g (t) are finite, the Hamiltonian H satisfies (5.11) In view of (2.28), (4.9), (5.9) and (C7) the functions are continuous on the compact set U × V. Thus, both sides of (5.11) are finite.
Using Beneš selection theorem ([Ben71]), with a similar construction that yielded (4.25), Isaacs' condition is equivalent to the following condition.

Existence of a saddle-point for the zero-sum game.
In the remaining part of this section, we want to find ( u, v) ∈ U × V which satisfies the relation (5.6). A way to find such a saddle-point is to proceed as in Proposition 4.2 and introduce a linear BSDE associated with each of the Hamiltonians and end-values involved in the inequality (S). Then, by the comparison theorem for BSDEs (cf. Proposition 3.7), we show that (u * , v * ) satisfies (5.6) i.e. it is a saddle-point for the game.
We start by showing that the performance functional J(u, v), (u, v) ∈ U × V, can be expressed by means of solutions of a linear BSDE.
admits a solution (Y u,v , Z u,v ) which consists of an F-adapted process Y u,v which is right-continuous with left limits and a predictable process Z u,v which satisfy (5.14) This solution is unique up to indistinguishability for Y u,v and equality dP × g ij I i (s − )ds-almost everywhere for Z u,v . Moreover, Y u,v 0 = J(u, v). Proof. Since the Hamiltonian H(t, w, µ, p, u) is linear in p, existence and uniqueness of solutions of the BSDE (5.13) satisfying (5.14) follows from (5.10), (3.13) and the boundedness of H(t, x, P u,v • x −1 (t), 0, u(t), v(t)) which follows from (C7) and (C8) and the boundedness of h. The remaining proof is similar to the one of Proposition 4.2.
We have the following lemma whose proof is similar to that of Lemma 4.3 and therefore is omitted.
This solution is unique up to indistinguishability for Y and equality dP × g ij I i (s − )ds-almost everywhere for Z.
Furthermore, the process ( u, v) is a saddle-point for the zero-sum game associated with (5.5).
Proof. Existence and uniqueness of solutions ( Y, Z) of (5.15) satisfying (5.16) follow from Lemma 5.3 and the boundedness of H(t, x, 0) which follows from (C7) and (C8), and the boundedness of h.