Discrete Mean Field Games: Existence of Equilibria and Convergence

We consider mean field games with discrete state spaces (called discrete mean field games in the following) and we analyze these games in continuous and discrete time, over finite as well as infinite time horizons. We prove the existence of a mean field equilibrium assuming continuity of the cost and of the drift. These conditions are more general than the existing papers studying finite state space mean field games. Besides, we also study the convergence of the equilibria of N -player games to mean field equilibria in our four settings. On the one hand, we define a class of strategies in which any sequence of equilibria of the finite games converges weakly to a mean field equilibrium when the number of players goes to infinity. On the other hand, we exhibit equilibria outside this class that do not converge to mean field equilibria and for which the value of the game does not converge. In discrete time this non-convergence phenomenon implies that the Folk theorem does not scale to the mean field limit.


Introduction
Mean field games have been introduced by Lasry and Lions [34] as well as Huang, Caines and Malhamé [30] to model interactions between a large number of strategic agents (players) and have had a large success ever since.Since the seminal work in [32,33,34,30], a large variety of papers have been investigating mean field games.Most of the literature concerns continuous state spaces and describes a mean field game as a coupling between a Hamilton-Jacobi-Bellman equation with a Fokker-Planck equation (see for example [28,7,9,24,10,25,22,23,3]).Here, we are interested in studying mean field games with a finite number of states and finite number of actions per player.In this case, the analog of the Hamilton-Jacobi-Bellman equation is the Bellman equation and the discrete version of the Fokker-Planck equation is the Kolmogorov equation.
Finite state space mean field games in discrete time (a.k.a. with synchronous players) were previously studied in [20].In their work, the strategy of the players is the probability matrix of the Kolmogorov equation.This implies that each player can choose her dynamics independently of the state of the others: the behavior of players is only coupled via their costs.In that case, the Kolmogorov equation becomes linear.
Finite state space mean field games in continuous time (a.k.a. with asynchronous players) have also been previously analyzed in [21,27,5,13].In their model, the players also control completely the transition rate matrix so that the dynamics are again linear once the actions of the players are given.Again, players do not interact with each other directly in these models, but only through their costs.
The models we study here, both in the synchronous and asynchronous cases cover non-linear dynamics: We consider that the players do not have the power to choose the rate matrix and that their actions only have a limited effect on their state.Here, the transition rate matrix may depend not only on the actions taken by the player, but also on the population distribution of the system.This introduces an explicit interaction between the players (and not just through their costs).This non-linear dynamics is called the relaxed case in [14].We claim that the model with explicit interactions covers several natural phenomena such as information/infection propagation or resource congestion where the cost but also the state dynamics of a player depend on the state of the all the others.This type of behavior is classical in systems with a large number of interacting objects [6] and cannot be handled using previous mean field game models.For instance, in the classical SIR (Susceptible, Infected, Recovered) infection model [39], the rate of infection of one individual depends on the proportion of individuals already infected.Similarly, in a model of congestion one player cannot typically use a resource if it is already used to full capacity.
We show that the only requirement needed to guarantee the existence of a Mean Field Equilibrium in mixed strategies is that the cost is continuous with respect to the population distribution (convexity is not needed).This result nicely mimics the conditions for existence of a Nash equilibrium in the simpler case of static population games (see [36]).The existence of a mean field equilibrium in mixed strategies has been previously shown by [31,12] in the diffusion case.In [27] the existence of a Mean Field Equilibrium is proven under the assumption that the cost of a player is strictly convex w.r.t.her strategy and in [21] the authors also consider uniformly convex functions.These conditions are rather strong because they are not satisfied in the important case of linear and/or expected costs.In [14] existence of a Nash equilibrium is also proved under mere continuity assumptions and with a compact action space (more general than the simplex, used here).However, the main difference between the two approaches is the type of mean field limit that is used.In [14], the trajectories of the states of the players are considered while we only consider the state at time t.The first approach uses arguments in line with the propagation of chaos while the second one is closer to the work in [4,38].While the convergence of trajectories is a more refined convergence than the point-wise convergence in general, this is useless here.Indeed, for mean field games, costs are associated to states and actions and not to trajectories.Therefore, the point-wise mean field approach is sufficient.Another difference with [14] is that an additional assumption about the uniqueness of the argmin is needed in some parts of the convergence proof as well as for existence (in the feedback case).This is not the case here, so both papers do not cover the exact same set of games.
As in most existence proofs, our proof is based on a version of the fixed point theorem of Kakutani in infinite dimension (see for example [13] where such an extended version of the fixed point theorem is used in a mean field game model with minor and major players).Here however, we do not consider the best response operator but the evolution of the population distribution instead, as in [14].Out of the four cases (asynchronous/synchronous, finite/infinite horizons), we mainly detail the asynchronous player case for which we prove this existence of a mean field equilibrium in an infinite horizon with discounted costs.We also show, more briefly, how these results can be extended to a finite horizon or to a finite or infinite time horizon in the synchronous-player case.
Our second contribution concerns convergence of finite games to mean field limits.Different authors have studied the convergence of N -player games equilibria to mean field equilibria, e.g.[29,1,37,38].The type of strategies considered in these paper is different from ours: they consider that the strategy of a player only depends on her internal state (these are called stationary policies in [38]), whereas here we allow time dependence in these policies.The model in [38] does include state dynamics that depend on the population distribution but only considers stationary strategies that do not depend on time, hence cannot depend on the population dynamics.
In all four combinations (finite / infinite horizon, synchronous / asynchronous), a mean field equilibrium is always an ǫ-approximation of an equilibrium of a corresponding game with a finite number N of players, where ǫ goes to 0 when N goes to infinity.This is the discrete pending result to similar results in continuous games [11].However, we show also that not all equilibria for the finite version converge to a Nash equilibrium of the mean field limit of the game.We provide several counterexamples to illustrate this fact.They are all based on the following idea: The "tit for tat" principle allows one to define many equilibria in repeated games with N players.However, when the number of players is infinite, the deviation of a single player is not visible by the population that cannot punish him in retaliation for her deviation.This implies that while the games with N players may have many equilibria, as stated by the folk theorem, this may not be the case for the limit game.This fact is well-known for large repeated games (see examples of Anti-folk Theorems in [35,2]).However, up to our knowledge, these results have not yet been investigated in the mean field game framework. 1inally, our four models of dynamic games do not face the issue of the order of play, nor partial information.Thus, we avoid two difficulties of dynamic games: the information structure of each player and the existence of a value [15].In our case, all players are similar, so the order of play is irrelevant, and we only consider the full information case: players know the strategy of the other players and the current global state (more details on this are given in Section 3.2).
The rest of the article is organized as follows.We introduce mean field games with explicit interactions in continuous time in Section 2 where we mainly focus on the infinite horizon with discounted costs.We describe the evolution of the state of the players, the cost function as well as the best response operator.In both cases (finite and infinite horizon), we prove the existence of an equilibrium.We show in Section 3 that this equilibrium is an approximation of an equilibrium for the game with a finite number of players.Finally, we study an example of an N -player game inspired from the prisoner's dilemma whose equilibria are not always equilibria for the limit mean field game.We focus on the synchronous case in Section 5 (where players all play at the same time).In this case, N -player games can be seen as classical stochastic games in discrete time.We derive the mean field limit dynamics and the existence of an equilibrium.Here counter-examples of equilibria for finite games that do not go to the limit are easier to find.Indeed, the folk theorem applies and all equilibria based on retaliation cannot be equilibria at the limit.

Notations and Definitions
A discrete mean field game G is a tuple G = (E, A, {Q a }, m 0 , {c a }, β), where E is the state space, A the action set, {Q a } the transition rate matrices, m 0 the initial state, {c a } the cost functions and β ∈ R a discount factor.
The game is described as follows.
State and action sets.We consider a population made of an infinite number of homogeneous players that evolve in continuous time.Each player has a finite state space denoted by E = {1, . . ., E} and a finite action set A = {1, . . ., A}.
We denote by P(A) (resp.P(E)) the set of probability measures over A (resp.E).Since A is finite, P(A) is the simplex of dimension A.

Set of strategies.
A mixed strategy (or strategy for short) is a measurable function π : E × R + → P(A), that associates to each state i ∈ E and each time t ≥ 0 a probability measure π i (t) ∈ P(A) on the set of possible actions.We also denote by π i,a (t) the probability that, at time t, a player in state i takes the action a, under strategy π.For all t ≥ 0 and all i ∈ E, we have a∈A π i,a (t) = 1.The set of all possible strategies is denoted by S.
We say that a strategy is pure if, for all state i and all t ∈ R, there exists an action a ∈ A such that π i,a (t) = 1 and π i,a ′ (t) = 0 for all a ′ = a.
The set S is a bounded subset of the Hilbert space of the functions E × R + → R A equipped with the inner product the exponentially weighted inner product : f, g = ∞ 0 f (g)g(t)e −βt dt.This shows that S is weakly compact, where the weak topology is defined as follows: a sequence of policy π n converges to a policy π if for any bounded function g: Rate matrices.We denote by m π (t) ∈ P(E) the population distribution at time t.
As the state space is finite, m π (t) is a vector whose i-th component, m π i (t), is the proportion of players in state i at time t.The evolution over time of the population distribution is driven by rate matrices: {Q a (m π (t))} a∈A .By definition, Q ija (m π (t)) is the rate at which a player in state i moves to state j when choosing action a, when the population distribution is m(t).Note that by definition, j∈E Q ija (m π (t)) = 0 for all i and a and Q ija (m π (t)) is non-negative for all j = i and all a.
In the following, we assume that for all i, j, a, Q ija (m) is Lipschitz-continuous in m with constant L.
The initial condition is m π (0) = m 0 .For t ≥ 0, the population distribution m π (t) is the solution of the following differential equation, that depends on the strategy π: The rationale behind this differential equation is that all players in state i use the action a ∈ A and move to state j with rate Q ija (m π (t)).
If the strategy π i (t) is not continuous in time, the differential equation (2) may not be well-posed at time-points where π i is not continuous.The existence of a continuous solution for (2) is guaranteed by the Carathéodory's Existence Theorem.The Lipschitz condition on Q further implies that this solution is essentially unique because any solution of (2) must be a fixed point of In anticipation, the same properties (existence and uniqueness of the solution of the ODE) hold for the differential equation (4).
Remark 1 (Explicit interactions).In this model, the rate matrix Q ija (m π (t)) depends explicitly on the population distribution: the rate to go from state i to state j under action a depends on how the whole population is distributed among the states of the system.Other mean field models, such as [20], only consider the special case where This restricts the population dynamics given in (2) to linear dynamics.
Cost function.We now concentrate on a particular player, that we call Player 0. Player 0 chooses her own strategy π 0 : R + ×E → P(A).We denote by x π 0 (t) ∈ P(E) the probability distribution of Player 0 when Player 0 uses strategy π 0 against a population who has distribution m.For a given state i ∈ E, x π 0 ,m i (t) denotes the probability for Player 0 to be in state i at time t.The distribution x π 0 ,m evolves over time according to the following differential equation: If Player 0 is in state i and takes an action a, it suffers from an instantaneous cost c i,a (m(t)), that depends on the population distribution at time t.We assume that the cost is always continuous in m.Given a population distribution m and the strategy of Player 0 π 0 , we define the discounted cost of Player 0 as where β > 0 is the discount factor.
We also introduce the notation V (π 0 , π) that represents the discounted cost of Player 0 when the population plays a strategy π: Best response.The best response to π of Player 0 is to choose a strategy π 0 ∈ S that minimizes her discounted cost (5) when the rest of the population plays strategy π.For a given population strategy π, we denote the set of best responses of Player 0 to π by BR(π).This set is the set of strategies that minimizes her discounted cost: Note that the best response function is well defined or, in other words, that the "argmin" is reached for some strategy in Equation (6).To prove that, we will later prove in Section 2.3 that the function V is continuous for the weak topology.As S is weakly compact, this shows that the minimum in π 0 is attained.
Proposition 1.The function V , defined in Equation ( 5) is continuous in π 0 and π (for the weak-topology on S).
Mean field equilibrium.We then define a mean field equilibrium as a strategy π M F E such that when the population strategy is π M F E , a selfish Player 0 would also choose the same strategy π M F E as her best response.

Definition 1 (Mean Field Equilibrium).
A strategy π is called a mean field equilibrium if it is a fixed point for the best response function, i.e., A mean field equilibrium is pure if it is a pure strategy.
The rationale behind this definition is when one considers that the population is formed by players that each take selfish decisions.As the population is homogeneous, each player best response is the same as Player 0. In other words, for a given population strategy π, all the rational players of the populations (or players) choose the strategy BR(π).As in classical games, a mean field equilibrium is a situation where no player has incentive to deviate unilaterally from the common strategy.

Existence of Mean Field Equilibrium
We now show that, under very general assumptions, all discrete mean field games admit a mean field equilibrium.As for classical games, these equilibria are not necessarily pure.As most proof on existence of equilibria, our proof relies on a generalization of Kakutani fixed point theorem to infinite dimensional spaces.However, the classical approach consisting of showing that the best response function BR(π) is a Kakutani map does not work here when the cost function is not strictly convex.Therefore, in our approach we focus on the state of the game instead of the best response function.
As mentioned before, the differential equations ( 2), ( 4) and the cost equation ( 5) are all well defined under our running Assumption (A1): In particular, this assumption implies that the costs and the rates are all bounded by a finite value.
Theorem 1.Any discrete mean field game G whose rate and cost satisfy Assumption (A1) admits a mean field equilibrium.
Note that in general, the best response function π → BR(π) is neither continuous nor hemi-continuous in general under (A1).In particular, the best response set BR(π) may not be a convex set.This makes difficult the application of the classical fixed point theorems on the best response function.As a result, our proof will formulate the fixed point problem in an alternative manner by considering a fixed point in m.

Proof of Proposition 1
For a strategy π, the function m π satisfies the differential equation (2).As m π (t) lives in a compact and the functions Q are continuous, the right-hand side of this differential equation is bounded.This shows that there exists a constant L ′ such that for any strategy π, the function m π is Lipschitz-continuous with constant L ′ .Similarly the function x π 0 is also Lipschitz-continuous with constant L ′ .Let M be the set of functions from R + to P(E) that are Lipschitz-continuous with constant L ′ .We equip this set with the exponentially weighted L ∞ -norm : By the Arzela-Ascoli theorem, M is a compact space.
To prove that V is continuous in π and π 0 , it suffices to show that the mapping π → m π is continuous (for the weak topology) and that the mapping (π 0 , m) → x π 0 ,m is continuous.To prove the continuity of m π , let π n be a sequence of strategy that converges to a strategy π.As M is compact, there exists a function m and a subsequence of m πn that converges to m.Moreover, we have : where the convergence holds because π n converges weakly to π and m πn converges uniformly on all compact to m.
Equation (9) shows that the function m is equal to the function m π .This shows that π → m π is continuous in π which implies that V is continuous in π.
The proof that (π 0 , m) → x π 0 ,m is continuous is very similar to the above proof and we therefore omit it.

Proof of Theorem 1
Recall that for a given population distribution m ∈ M, the cost of a strategy π 0 is defined as where x satisfies (for all j ∈ E): We now define the function Φ : M → 2 M as the best response to a population distribution m.It is a mapping that associates to a population distribution m ∈ M, the set of all state distributions that can be induced by an optimal policy: In the remainder of the proof, for all m ∈ M, Φ(m) is well defined and non empty (i.e., the minimum is attained), is convex and compact.Moreover, we will also show that the function Φ(•) is upper-semicontinuous.As M is compact [8,Prop. 11.11], this shows that Φ(•) satisfies the conditions of the fixed point theorem given in [26,Theorem 8.6] and therefore has a fixed point m * .By the definition of Φ, this implies that there exists a strategy π 0 that is a best-response to m π 0 , which implies that π 0 is a mean field equilibrium.
Definition of Φ(m) -It can be shown that W is continuous (by using a reasoning similar to the one for V (Proposition 1)).This shows that there exists π 0 that attains the minimum on the right hand side of Equation (12), which shows that Φ(m) is well defined and non-empty.

Compactness of Φ(m) -Let us consider the following optimization problem
such that z satisfies The above problem is a linear problem, which implies that the set of optimal solutions is convex and compact.Let us show that the set of optimal solution of the optimization problem ( 13) is Φ(m).To show this, let us remark that the constraints (11) are equivalent to the constraints ( 14) by replacing the variables x i (t)π 0 i,a (t) by z i,a (t).Then, the constraint π ∈ S of (11), that corresponds to π 0 (t) ∈ P(A), is replaced with z i,a (t) ≥ 0 and a z i,a (t) = x i (t).
Upper-semi continuity of Φ.To prove that Φ is upper-semi continuous, let us show that the graph of As W is continuous, for all x n ∈ Φ(m n ), there exists a strategy π n that minimizes W (π, m n ) and such that x n = x πn,mn .As the set S is weakly compact, this sequence of strategies has a subsequence that converges weakly to a strategy π * .Moreover, we have: • The solution of (11) is continuous in π and m, which shows that x ∞ = x π * ,m∞ .
Combining these two facts shows that x ∞ ∈ Φ(m ∞ ) which implies that the graph of Φ is closed.
Remark 2. The continuity assumption (A1) is tight in the following sense: 1-If the rate Q is not Lipschitz-continuous in m, then the evolution of the population is not well defined, in the sense that the evolution equation (2) may have several solutions or no solution at all.
2-There exist games with non-continuous cost functions that do not admit any mean field equilibrium.For example, consider the following mean field game: Assume that this game has a mean field equilibrium and let denote by m(t) the state at equilibrium.By definition of Q a and Q b , m 2 (t) is a non-decreasing function.

Convergence of Finite Games to Mean Field Games
Mean field games are often presented as a limit of a sequence of finite games as the number N of players goes to infinity.In this section, we investigate positive and negative results that link finite games and mean field games.

Markov Game with N Exchangeable Players
To any discrete mean field game G = (E, A, {Q a }, m 0 , {c a }, β), one can associate a stochastic N -player game G N = (N, E, A, {Q a }, m 0 , {c a }, β) as follows.The finite stochastic game G N has the same state and action spaces E, A, the same rate matrices Q a , the same cost functions c a , the same discount factor β, and the same initial state as G.The time evolution of the finite game is as follows.At any time t, each player (say Player n) chooses a (randomized) action A n (t) ∈ P(A).
We consider a mean field interaction model between the players, which means that the behavior of one object only depends on the states of the other objects through the proportion of objects that are in a given state.To be more precise, we denote by M(t) ∈ P(E) the population distribution of the system at time t.As the set E is finite, M(t) is a vector with |E| components and for all i ∈ E, M i (t) is the fraction of players that have state i at time t: The state of one player (say Player n) follows a continuous time Markov chain whose rate varies over time.The only dependence between players is through the rate that depends on the population distribution.
More precisely, the evolution of the state of Player n, under F t , the natural filtration of the process, satisfies for all k ∈ N and all states i = j, (17) where A n (t) is the action taken by Player n at time t.
At any time t, Player n suffers an instantaneous cost that is a function of her state X n (t), the action that she takes A n (t) and the population distribution M(t).We write this instantaneous cost c Xn(t),An(t) (M(t)).
The objective of Player n is to choose a strategy π n from some set of admissible strategies Π, in order to minimize her expected discounted cost, knowing the strategies of the others.As before, the discount factor is denoted by β.Given a strategy π n ∈ Π used by Player n and a strategy π ∈ Π used by all the others, we denote by V N (π n , π) the expected discounted cost of Player n: A Nash equilibrium for this game is a strategy π such that Player n does not have another admissible strategy that leads to a lower cost.This notion depends naturally on the set of admissible strategies.
Definition 2 (Equilibrium of the N player game).For a given set of strategies Π, a strategy π ∈ Π is called a symmetric equilibrium in Π if for any strategy π n ∈ Π: We will also use the notion of ε-equilibrium: Definition 3 (ε-equilibrium of the N player game).For a given set of strategies Π, a strategy π ∈ Π is called an ε-symmetric equilibrium in Π if for any strategy π n ∈ Π: V N (π, π) ≤ V N (π n , π) + ε.

Subsets of Admissible Strategies
In a full information setting, A n (t) is a (possibly random) function of the values X n ′ (t ′ ) up to time t ′ ≤ t and all actions taken in the past A n ′ (t ′ ), for t ′ < t and for n ′ ∈ {1 . . .N }.Such a strategy is, however, hard to analyze.Therefore, in the following, we will consider two natural subclasses for the set of admissible strategies, depending on the information available to the players:

• (Markov) -A strategy π is called a Markov strategy if it induces a choice of
A n (t) that is a (possibly random) measurable function of only t, M(t) and X(t): This definition is motivated by the fact that, as indicated by Equation ( 17), the behavior of one object depends on the others only through the value M(t).This implies that when all the other players use a Markov strategy, the set of Markov strategies is dominant among the set of full-information strategies: there exists a full-information best response for Player n that is a Markov strategy.Furthermore, any Markov game admits a Markovian Nash equilibrium (see [17]).
• (Local) -A strategy π is a local strategy if the choice of the action only depends on the player's internal state and on the time.
If a player uses a local strategy, its actions may depend on time, hence may track the law of the population M(t) (but not M(t) itself).Also notice that a local strategy is not necessarily stationary because of its dependence on time.

Nash Equilibria Limits
The next theorem provides a relation between local equilibria of finite games and mean field equilibria of the limit mean field game.In particular, it shows that mean field equilibria are a good approximation of local equilibria.However, as we will show later, this result does not hold for Markovian equilibria.
Theorem 2. Consider a finite stochastic game G N , with N players and assume that (A1) holds for its rate matrices Q a and its cost functions c a .Then: (i) Let π be a mean field equilibrium of the associated mean field game G.There exists N 0 such that for all N ≥ N 0 , π is a local ε-equilibrium of the N player game.
(ii) Let (π N ) N ∈N be a sequence of local strategies such that π N is an ε N -equilibrium for the N player game, with ε N → 0. Then any sub-sequence of the sequence (π N ) has a sub-sequence that converges weakly to a mean field equilibrium of G.
Proof.First, V N (π n , π) converges to V (π n , π) uniformly in (π n , π).Uniform convergence follows from Theorem 3.3.2 in [38] (The theorem is stated for stationary strategies, but local strategies as defined here are equivalent to stationary strategies, as defined in [38]).
Thus, for any ε, there exists N 0 such that N ≥ N 0 implies that V N (π n , π) − V (π n , π) ≤ ε/2.Hence, if π is a mean field equilibrium, this implies that for any local strategy π n : This shows (i).
For (ii), if π N is a sequence of local strategies, then any sub-sequence has a subsequence that converge weakly to some local strategy π ∞ .As V (π n , π) is continuous in π n and π (for the weak topology), this implies that V (π ∞ , π ∞ ) ≤ V (π n , π ∞ ) for all local strategy π n .

Markov Equilibria May Not Converge to Mean Field Equilibria
We now show that Theorem 2-(ii) does not generalize to Markov strategies.the following example was first presented in [16].The main ingredient used to construct the following counterexample, is the "tit-for-tat" principle.This principle can be used to construct equilibria for any N -player game but cannot be used in mean field games.This approach has been used in repeated game papers (see for example the examples in [35], further generalized [2]).Up to our knowledege, this type of behavior has not yet been described in the mean field game framework.
Let us consider a mean field version of the classical prisoner's dilemma.The state space of a player is E = {C, D} (that stand for Cooperate and Defect) and the action set is the same A = E.At each time step, one player is chosen.If she selects an action a ∈ A, her state becomes a at the next time step.
The instantaneous cost of a Player n depends on her state i and on the mean field m: At each time step, this cost function corresponds to a matching game where a player plays against a randomly assigned opponent and suffers a cost that corresponds to the following matrix: The strategy D dominates the strategy C.This implies that playing D is the unique mean field equilibrium.Indeed, the expected cost (given by ( 5)) of a Player 0 that has a state vector x while the mean field is by using the fact that π 0 It should be clear that this cost is minimized when x C is minimal, which occurs when the strategy is to choose action D regardless of the current state.This shows that the only mean field equilibrium is when all players choose action D.
Let us now consider the game with N players and consider the following Markov strategy: and let us show that for β < 1 and N large, π N is a Markov Nash equilibrium.
Assume that all players, except Player n, play the strategy π N and let us compute the best response of Player n.It should be clear that if at time 0, m C < 1, then the best response of Player n is to play D. On the other hand, if m C = 1, then: • If Player n applies π N , she will suffer a cost exp(−βt)dt = 1/β.
• If Player n deviates from π N and chooses the action D, all players will also deviate after that time.This implies that m D (t) ≈ 1 − exp(−t) and that the player n will suffer a cost approximately equal to When β < 1, then 2/(β(β + 1)) > 1/β, so that Player n has no incentive to deviate from the strategy π N and that therefore, π N is a Nash equilibrium.We also observe that for this example, the value of the finite game does not converge to the one of the mean field game.
In conclusion to this section, one can argue that this counter-example should not be surprising because, in mean field games, punishment is possible against a fraction on the population that deviates but is not possible against individual deviation, because it is not seen in the population distribution.
As a final remark, as in the case of repeated games, the continuity with respect to m (not true here) is critical for convergence (see [35]).

Finite Horizon Case
Let us now consider mean field games over a finite time horizon T .These games are similar to games with discounted costs, previously defined, but they only run for a finite duration T .As in the discounted case, the evolution over time of the population distribution m π is given by ( 2) and the evolution of Player 0's distribution is given by (4).
Given the population strategy π and Player 0 strategy π 0 , the expected cost of Player 0 for the finite horizon case is defined as follows: In the literature, similar models have been studied, considering continuous time finite state space mean field games with finite horizon.The authors in [21] consider uniformly convex cost functions and in [27] cost functions are assumed to be strictly convex.In our model, we assume that the costs are continuous in the population distribution.It can also be observed that the instantaneous cost of Player 0 is linear in π 0 .Therefore, the model we study in this work is not covered by these papers.
We define the notion of mean field equilibrium for the finite horizon case as in the discounted case, by replacing the cost function ( 5) by (18).Then, the proof of the existence Theorem 1 applies mutatis mutandis to show the existence of a mean field equilibrium in this case: Any continuous time mean field game over a finite horizon that satisfies Assumption (A1) has a mean field equilibrium.

Convergence to a Mean Field Equilibrium
The construction of a counter example of convergence with an infinite time horizon given in §3.4 cannot be directly adapted to the finite horizon case.In the finite-horizon version of the game defined in §3.4, the strategy π N is not a Nash equilibrium for the N -player game because at the last time-slot, the best response of Player n to any strategy is to play D. By induction on the number of time-slots, the only Nash equilibrium of the N -player game is when all players play D, which coincides with the mean field equilibrium.
Yet, a counter-example also exists for finite-time horizon.The essential idea is to start with a matrix game with two pure Nash equilibria instead of one as in the previous example.Let us consider the following cost matrix: The setting is similar to the previous example: the action set is equal to the state state E = A = {C, D, P } and at each time step, one player is chosen.If she selects an action a ∈ A, then her state becomes a at the next time step.This game can be viewed as a generalization of the prisoner's dilemma with an additional Nashequilibrium P (which stands for "punish").It can be shown using a similar path as in the previous section that, when T is large enough, the following time-dependent Markovian2 strategy is a Nash equilibrium: In the above strategy, the state P is used as a stick to punish people from deviating from the imposed strategy.In this case, nobody has an incentive to deviate from this strategy at the last step because D is also a Nash equilibrium.
The mean field game has only two equilibria: The whole population always plays D, or the whole population always plays P .These equilibria are also equilibria for the finite-game.Yet, they both have a larger cost than the strategy of Equation ( 19).This leads us to say that the value of the game does not converge: the asymptotic cost of the strategy ( 19) is strictly smaller than the cost of any of the mean field equilibria.

Synchronous Players
As explained in the previous section, mean field games in continuous time appear naturally as the limit of N -player asynchronous games as N goes to infinity.In these asynchronous games with N players, only one player changes state at the same time.However, there are other situations where it is more natural to consider synchronous games in which, at each time step, all players take an action.

Synchronous N -Player Games with Exchangeable Players
Here we consider a finite synchronous game G N s = (N, E, A, {P a }, M 0 , {c a }, β) with N identical players with several differences from the model used in Section 3.1, the main one being the replacement of the rate matrices by stochastic matrices.As before, each Player n has an internal state X n (t) that belongs to a finite state space E (X(t) = (X 0 (t), . . ., X N −1 (t)) and chooses an action from a finite action space A. The main difference with the previous asynchronous model is that at each time step t ∈ Z + , all players choose an action A n (t) ∈ A simultaneously.We assume that, a player in state i who chooses action a goes to state j with probability P ija (X(t)) and that, given X(t), the evolution of all players are independent.Furthermore, we assume that the players are exchangeable, i.e. for any permutation σ of the N players, P ija (X 0 (t), . . ., X N −1 (t)) = P ija (X σ(0) (t), . . ., X σ(N −1) (t)).The fact that all players are exchangeable implies that the dependence in X(t) can be replaced by a dependence on the population distribution M(t).More precisely, for any vector state x, y ∈ E N and any action vector a ∈ A N , one can write: where F t is the natural filtration of the game up to time t, m is the population distribution of x and ∀i, j ∈ E, ∀a ∈ A, P ija (m) forms a stochastic matrix, continuous in m.
The instantaneous cost at time t depends on actions and state at time t − 1, symmetric in all players, so it can be written as a function of the population distribution: c Xn(t),An(t) (M(t)), and a discount factor δ at each time step.Given a strategy π 0 used by Player 0 and a strategy π used by all the others, the expected cost of Player n is:

Corresponding Mean field Game
Synchronous games also admit mean field game limits.To construct this limit, let us consider a strategy π such that π i,a (m) is the probability for a player to choose action a given that she is in state i and that M(t) = m.Assume that M(0) converges in probability to some m(0) as N goes to infinity and that all players except Player 0 apply a strategy π that is continuous in m.As shown in Theorem 1 in [19] (up to differences in notations, the mean field model in [19] is the same as Equation (20)), the population distribution M π (t) converges (in probability) to a deterministic quantity m π (t) as N goes to infinity.m π (t) is defined by We denote by π 0 the strategy of Player 0. The probability that Player 0 is in state j ∈ E evolves over time according to the following equation: In this case, the cost of Player 0, given by (21) becomes As the evolution of m is deterministic, for any closed loop strategy π i,a (m(t)) and any initial condition m(0), there exists an open-loop strategy π i,a (t) that leads to the same values for m π (t) and the same cost.Hence, for the mean field model, one can replace any state-dependent strategy π(m(t)) in the above equations by a time-dependent strategy π(t).
Player 0 chooses the strategy that minimizes her expected cost.When Player 0 does so, we say it uses the best response to the mass strategy π.BR(π) = arg min A strategy is said to be a mean field equilibrium if it is a fixed point for the best response function, that is, One of the difficulties of the analysis of continuous time mean field game is that the elements under consideration (the population distribution, the population strategy, Player 0 strategy...) are continuous functions of time.In the discrete time case, the model gets significantly simplified since all the elements are vectors.Hence, the proof of the existence of a mean field equilibrium for continuous-time mean field game (Theorem 1) can be adapted to show that the following result.
Theorem 3 (Mean Field Equilibrium Existence for Synchronous Games).Any synchronous mean field game with discounted cost that satisfies Assumption (A1) for P and c respectively, has a mean field equilibrium.Sketch of proof.We first observe that the set of discrete-time open-loop policies is a compact and convex set.Thus, to finish the proof, we need to show that the best response function has a closed graph and it is convex.The former condition is true since the set of open-loop policies belongs to a finite dimensional space and from the continuity assumptions (A1).The last condition can be shown using the same arguments as in the proof of Theorem 1.

An Important Special Case: Repeated Games
The classical repeated games with discounted costs and with identical players form a subclass of synchronous games, as defined here.To see this, let us first consider a static N -player matrix game G with symmetric cost: u(a 1 , . . ., a N ) is the instantaneous cost of any player when the players use actions a 1 , . . ., a N respectively.Furthermore, we assume that u(a 1 , . . ., a N ) = u(a σ 1 , . . ., a σ N ), for any permutation σ of {1, . . ., N }.The players repeat the matrix game infinitely often and their cost under strategy π 1 , • • • , π N is the discounted sum of the costs: These games fit in our framework: The state of a player is merely her current action (X(t) = A(t)) and the evolution of the state becomes trivial: Under state x = a and selecting action b, the next state does not depend on the other players and becomes b with probability one: P ab (b, M(t)) = 1.The cost of one player at each stage corresponds to an immediate cost c Xn(t),An(t) (M(t)) = u(X(t)) since the cost u only depends on the population distribution by symmetry.As for the total cost of a player, (24) coincides with (21), as long as all players in the same state use the same strategy.

The Folk Theorem Does Not Scale
The relation between equilibria of N -player games with their mean field limits is also complex in the discrete time case.
Let us first focus on results that concern the performance of mean field equilibria in the N -player game.The situation is almost similar to the continuous time case and resembles Theorem 2 (i) in the sense that if π is a mean field equilibrium, then under assumption (A1), there exists N 0 such that for all N ≥ N 0 , π is a local εequilibrium of the N -player game.The proof of this is essentially similar to the proof of Theorem 2.
Let us now consider the Nash equilibria of the N -player game.The situation is very different from the continuous time case because the state of all the players can change in one time unit in the discrete time while in continuous time, state can only change in small steps, one player at a time.This has several consequences on the nature of equilibria under both models.As mentioned before, the Nash equilibria in the continuous time case may depend on the initial population distribution, but this is not the case here, so that there is more latitude for designing equilibria.
Let us consider the particular case of repeated games, introduced in Section 5.2.1.For this type of games, the set of equilibria can be characterized using the Folk Theorem for repeated games.
Theorem 4 (Folk theorem, adapted from Theorem A in [18])).Let G be a symmetric matrix game, and let V * be the cost under the strategy that repeats the Nash equilibrium of the static game G. Then for any compatible 3 cost V smaller than V * , there exists a discount factor δ ∈ (0, 1) such that V * is the cost of an equilibrium of the discounted repeated game.
Actually, for any V < V * , the construction of an equilibrium whose cost is V is based on the "tit for tat" principle.We claim that none of these equilibria scale at the mean field limit.Let us consider the following example for a static game.Let us now consider the following strategy (denoted π N in the following) for all players: Play D for k rounds then play C as long as every-other player has followed the same pattern, else play D forever.The cost of this strategy is between −1 and −2: The strategy π N is an equilibrium of the finite game if δ is large enough.Indeed, no player wants to deviate in the first k rounds, because her cost would increase: In the rounds after k, a deviation provides an immediate cost advantage, at the cost of being punished until the end of time, so that a larger enough δ makes this non-profitable.
Let us now consider the mean field game setting.If the whole population uses the strategy π N and if Player 0 uses the same strategy her cost becomes However in the mean field setting, the best response of Player 0 to π N is not π N but the strategy π D where she plays D all the time.Indeed in this case her total cost becomes This shows that π N is not a mean field equilibrium and a "free rider" player can take advantage of the fact that the population will not act against her.

Finite Horizon Case
We now focus on the mean field games when objects evolve in discrete time time over a finite horizon, 0 to T .In this case, the population distribution m π is defined by (22), which depends on the strategy of the mass π.We assume that Player 0 can choose her own strategy π 0 .The expected cost of Player 0 is x i (t)c i,a (m π (t))π 0 i,a (m(t)), where x i (t) is the probability that Player 0 is in state i at time t.The evolution of x i (t) over time is described in (23).
Player 0 uses best response to a given population strategy π, which means that she selects the strategy π 0 that minimizes her expected cost.We are interested in proving the existence of a mean field equilibrium which consists of finding a strategy that is a fixed-point for the best response function.In Section 5.2, we showed this for the discounted case.In the finite horizon case, the vectors have finite size and, as a consequence, it is immediate to show, using the same arguments of those required for the proof of Theorem 3, that any discrete time mean field game with finite horizon cost such that P and c satisfy Assumption (A1) has a mean field equilibrium.Again, the proof mimics the proof of the analog Theorem 1 in continuous time over a finite horizon.

Conclusions
In this article, we generalize the framework of discrete-space mean field games to the cases of non-convex costs and explicit interactions.They hit a good compromise between tractability (existence of an equilibria) and modelization power (including propagation and congestion behaviors).This model consists of a finite state space mean field game where the transition rates of the objects and the cost function of a generic object depend not only on the actions taken but also on the population distribution.We also show that there exists a sub-class of Nash equilibria for Nplayer games that converge to mean field equilibria when the number of players goes to infinity.Outside of this class, and in particular for all equilibria using the "tit for tat" principle over which the Folk theorem is based, the convergence does not hold.
For future work, we are interested in finding conditions ensuring the uniqueness of the mean field equilibrium.We believe that monotony assumptions similar to assumptions in [21] are required to prove the existence of a unique mean field equilibrium in this model.On the other hand, another interesting open question concerns the convergence of N -players equilibria to mean field equilibria when the number of player grows.We believe that there exist many N -player games for which the only limiting equilibria are mean field equilibrium, for example when players have incomplete information about the game.It would be interesting to characterize the sub-class of strategies where convergence to mean field equilibria holds.Obviously, this class includes all local strategies (no information) and excludes some Markovian ones (full information).

1 .
Each player only has two strategies, D and C. If all players play D, the cost is −If all players play C, the cost is −2.If some players play D and others play C, then, all the players who play C get −2M C while the players who play D get −3M C − M D , where M C and M D are the proportions of players playing C and D respectively.These costs correspond to the average costs obtained by a player in a matching game against a random opponent.The unique Nash equilibrium of the static game is strategy (D, D, . . ., D).The cost of the corresponding repeated game is (1 − δ) t −δ t = −1.