Smale Strategies for Network Prisoner's Dilemma Games

Smale's approach \cite{Smale80} to the classical two-players repeated Prisoner's Dilemma game is revisited here for $N$-players and Network games in the framework of Blackwell's approachability, stochastic approximations and differential inclusions.


1.
Introduction. It has been well known for many years that mutual cooperation is a Nash equilibrium outcome in a two players infinitely repeated Prisoner's Dilemma game, even though defection is the dominant strategy of the one-shot game (see e.g the classical book by Axelrod [2]).
In 1980 Smale [13] studied the two players repeated Prisoner's Dilemma game under the assumption that both players have limited memory and only keep track of the cumulative average payoffs. In this setting, he showed that a very simple deterministic strategy called a good strategy, if adopted by one player, leads to cooperation, in the sense that the other player has interest to cooperate. A good strategy, as defined by Smale, is a strategy such that the player cooperates unless her average payoff to date is significantly less than her opponent. Later, Benaïm and Hirsch [6] considered the stochastic analogue of Smale's solution. In 2005, Benaïm Hofbauer and Sorin [7] using tools from stochastic approximation and differential inclusions showed that the results of Smale, Benaïm and Hirsch can be reinterpreted in the framework of Blackwell's approachability theory [9], and that the assumption that "both" players keep track only of the cumulative average payoff is unnecessary.
The present paper extends these works to variant of the classical Prisoner Dilemma game including N -Players where the underlying structure is a network. It is based on K. Abhyankar's PhD thesis [1], Blackwell's approachability [9] and the stochastic approximation approach to differential inclusions developed in [7]. Section 2 sets up the notation and reviews briefly Blackwell's approachability and some of the results in [7]. Section 3 considers N -players prisoner dilemma games and Section 4 prisoner dilemma games in which players are located at the vertices of a symmetric graph and interact only with their neighbors. Smale good strategies are defined for these games and are shown to be Nash equilibria.
2. Notation and background. Let A and B be two finite sets representing respectively the action sets of some decision maker DM (for instance a player, or a group of players) and the action set of Nature (for instance the player's opponents). Let U : A × B → R N be a vector valued payoff function.
Throughout, we let E ⊂ R N denote the convex hull of the payoff vectors At discrete times n = 1, 2, . . . , DM and Nature choose their actions (a n , b n ) ∈ A × B. We assume that: (a): The sequence {(a n , b n )} n≥0 is a random process defined on some probability space (Ω, F, P) and adapted to some filtration {F n } (i.e {F n } is an increasing family of sub-σ fields of F, and for each n (a n , b n ) is F n measurable). Here F n has to be understood as the history up to time n. (b): Given the history F n , DM and Nature act independently: Let P(A) (respectively P(B)) denote the set of probabilities over A (respectively, B).
A (long term) strategy for DM is a stochastic process Θ = {Θ n } adapted to {F n } taking values in P(A). We say that DM uses strategy Θ if Θ n (a) = P(a n+1 = a|F n ) for all a ∈ A. The cumulative average payoff at time n is the vector Strategy Θ is said to be payoff-based provided for all a ∈ A, where for each u ∈ E, Q u (·) is a probability over A and u ∈ E → Q u ∈ P(A) is measurable. In this case, the family Q = {Q u } u∈E is identified with DM 's strategy.
Example 1 (M -Players games). Consider an M -players game with M ≥ 2. Players are denoted i = 1, . . . , M. Player i has a finite action set (or pure strategy set) denoted Σ i , and a payoff function At each discrete time n = 1, 2, . . . Player i chooses an action s i n ∈ Σ i and receives the payoff U i (s 1 n , . . . , s M n ). Choose DM to be some given subset of players, say I = {1, . . . , k}. Set a, b), . . . U M (a, b)).

SMALE STRATEGIES FOR NETWORK GAMES 143
The limit set theorem. Assume that DM has a payoff-based strategy Q. For each u ∈ E let ( The set C(u) is the convex set containing all the average payoffs that are obtained when DM plays the mixed strategy Q u and Nature plays any mixed action. Let C ⊂ E × E be the intersection of all closed subset G ⊂ E × E for which the fiber {y ∈ E : (x, y) ∈ G} is convex and contains C(x). The closed-convex extension of C, denoted co(C) is defined as For convenience we extend co(C) to a set-valued map co(C) on R N , also denoted co(C), by setting co(C)(x) = co(C)(r(x)).
(4) where for all x ∈ R N , r(x) ∈ E denotes the unique point in E closest to x. Associated to co(C) is the differential inclusion A solution to (5) is an absolutely continuous mapping t → η(t) verifyingη(t) ∈ F (η(t)) for almost every t ∈ R. Given such a solution, its initial condition is the point η(0). Throughout, we let S u ⊂ C 0 (R, R N ) denote the set of all solutions to (5) with initial condition u. By construction, F maps points to non empty compact convex sets and has a closed graph. Thus, by standard results on differential inclusions, S u is a nonempty subset of C 0 (R, R N ) that is compact (for the topology of uniform convergence on compact intervals) and (5) induces a set-valued dynamical system Φ = {Φ t } defined for all t ∈ R and u ∈ R N by A set Λ ⊂ R N is said to be invariant for (5) if for all u ∈ Λ there exists η ∈ S u such that η(R) ⊂ Λ (see section 3 of [7] for other notions of invariance, more details and references on set valued dynamics). A nonempty compact set Λ is called an attracting set for Φ provided there is some neighborhood U of Λ, called a fundamental neighborhood, with the property that for every ε > 0 there exists Here N ε stands for the ε neighborhood of Λ. If in addition Λ is invariant, Λ is called an attractor. By Proposition 3.10 in [7], every attracting set contains an attractor with the same fundamental neighborhood.
The basin of attraction of an attracting set Λ is the set We let L = L({u n }) denote the limit set of the sequence {u n } defined by (2). Note that L is a random subset of E.

KASHI BEHRSTOCK, MICHEL BENAÏM AND MORRIS W. HIRSCH
Point p ∈ R N is called attainable if for any n ∈ N and any neighborhood U of p P(∃m ≥ n : u m ∈ U) > 0.
We let Att({u n }) denote the set of attainable points.
Parts (i) and (ii) of the following result follow from Theorems 3.6 and 3.23 in [7], generalizing the limit set theorem obtained for stochastic approximation processes (associated to an ODE) in [3,4] and asymptotic pseudotrajectories (of an ODE) in [5]. Part (iii) follows from [10] generalizing a result obtained for stochastic approximation processes (associated to an ODE) in [4].  (5).
We refer the reader to [7] for the definition of "internally chain-transitive" sets since this notion will not be used here but for the fact that an internally chaintransitive set is compact and invariant under differential inclusions (5).
Given a compact subset Λ ⊂ E and x ∈ E, define We say that Λ is a local B-set for the payoff-based strategy Q (or simply a local B-set) if there exists r > 0 such that for all x ∈ N r (Λ) \ Λ there exists y ∈ Π Λ (x) such that the hyperplane orthogonal to [x, y] at y separates x from C(x). That is, for all v ∈ C(x) as defined by (3). If Λ is a local B-set for all r > 0 it is simply called a B-set. Blackwell [9], proved that being a B-set is a sufficient condition for approachability.
Proof. It is proved in [7], Corollary 5.1 that Λ contains an attractor for (5) provided inequality (6) holds for all v ∈ co(C)(x) (rather than merely v ∈ C(x)). It then suffices to prove that (6) also holds for all v ∈ co(C)(x).
Let D(x) be the convex hull of Graph x (C). It follows from (6) and compactness of Λ that x − y, v − y ≤ 0 for all x ∈ E, v ∈ Graph x (C) and some y ∈ Π Λ (x). Clearly, this inequality still holds for all v ∈ D(x). We claim that co(C)(x) = D(x) from which the proof of (i) follows.
Proof of the claim. The inclusion D(x) ⊂ co(C)(x) follows from the definitions. To prove the opposite inclusion it suffices to verify that Graph(D) is closed. Let x n → x, y n → y with y n ∈ D(x n ).
By the Caratheodory Theorem (see e.g Theorem 11.1.8.6 in [8]), the convex hull of a set G ⊂ R N equals the set obtained by taking all convex combinations of N + 1 points in G. Thus, there exist By compactness, after replacing sequences by subsequences we can assume that α n → α ∈ ∆ N and w n → w. Closedness of Graph(C) ensures that w ∈ Graph x (C) N +1 . Thus y ∈ D(x). This proves the claim.
Assertions (a) and (b) are now consequences of Theorem 2.1. The last statement was proved by Blackwell [9]. Note that it also follows from (i).
A straightforward application of this last theorem is given by the following result. It will be used several times in the forthcoming sections.
Corollary 1. Suppose there exist actions a 1 , a 2 ∈ A and numbers α, β such that for all b ∈ B µ(U (a 1 , b)) ≤ α and µ(U (a 2 , b)) ≥ β. Let Q be a payoff-based strategy such that Proof. Equation (6) in this context becomes for all u ∈ E, v ∈ C(u). By convexity of the half spaces {µ(v) ≤ α}, {µ(v) ≥ β}, and the definition of Q this is equivalent to the condition given in the statement of the corollary.

KASHI BEHRSTOCK, MICHEL BENAÏM AND MORRIS W. HIRSCH
We conclude this section with some quantitative estimates given in the excellent recent survey paper by Perchet [12]. Let The first assertion of the next theorem follows from Corollary 1.1 in [12]. It is slight variant of a result obtained by Blackwell [9]. The second assertion follows from Corollary 1.5 in [12]. Theorem 2.3. Suppose DM adopts the payoff-based strategy Q and that Λ ⊂ E is a B-set for Q. Then for all η > 0 (i): This can be seen as a simple model of "free riding". Each player can either Contribute (Cooperate), or Defect from contributing, to a public good. Individual contribution costs c and everyone -even if a defector-benefits from the good and is paid f (k), when there are k contributors. Note that the assumption on f imply that conditions (i) − (iv) above are satisfied. The fact that mutual defection is a Nash equilibrium of the one-shot game is reminiscent of Hardin's book The Tragedy of the Commons [11].
Let δ be a nonnegative real number, Adapting [13], [6] and [7], we define a δ-good strategy for Player i as a payoff-based strategy Q i (as defined in section 2) for the Decision Maker, Player i, such that We call such a strategy continuous whenever the map u → Q u is continuous.
The following result shows that, by playing a δ-good strategy, a player (or a group of players) makes sure that her opponent's average payoff cannot be much better than hers, nor than the Pareto optimal payoff. Under the supplementary condition (iv) she ensures that her payoff cannot be much worse that the payoff resulting from mutual defection. If furthermore, all the players play a δ-good strategy, one of them being continuous, the outcome is the one given by mutual cooperation. As a consequence (Corollary 2), continuous δ-good strategies form a Nash equilibrium. The proof is postponed to the end of the section.
and, if mutual defection is inefficient, Let ε > 0. Let Θ i be a strategy (as defined by equation (1)) for Player i. The strategy profile (Θ 1 , . . . , Θ N ) is called an ε-Nash equilibrium if for every i and every alternative strategy Ξ i for i, the payoff to i resulting from (Θ 1 , . . . , In other words, if all players but i play the equilibrium strategy, Player i cannot improve his payoff by more than ε if he deviates from Θ i .
By Corollary 1 and definition of Q 1 , this concludes the proof.
By definition of Q 1 and C 1 , Now, the proof of Theorem 1 shows that µ 1 (U (C, s −1 ) ≤ 0 with equality only if s = (C, . . . , C). Thus This implies that h(t) = v * and η(t) = e −t (u−v * )+v * for all t ∈ R. By compactness of L we must have u = v * (for otherwise {η(t)} would be unbounded).
representing the payoff function to Player i against Player j.
Let Neigh(i) = {j ∈ V : (i, j) ∈ E} and let N i be its cardinal. The payoff function to i is the map U i : Σ → R Ni defined by U i (s) = (U ij (s i , s j )) j∈Neigh(i) .
Using the notation of Example 1, set N = M i=1 N i , and define the vector payoff function of the game as The state space of the game is then E = conv{U (s), s ∈ Σ} ⊂ R N .
In addition to these data, we assume given a Markov transition matrix K = (K ij ) i,j∈V adapted to (V, E). That is The mean payoff to Player i for the strategy profile s is defined as Irreducibility of the graph (V, E) ensures irreducibility of the transition matrix K. Therefore there is a unique invariant probability π for K. That is, Define the weight of edge (i, j) ∈ E as Such weights will prove to be useful for defining δ-good strategies below. Note that, by invariance of π, Example 3. Suppose Network prisoner's dilemma games. We consider now a particular example of network games where each pair of neighboring players is engaged in two players prisoner dilemma game. We assume that for each i ∈ V Σ i = {C, D}, and where (i): CD < DD < CC < DC as usual for the two player prisoner's dilemma game.
(ii): We furthermore assume that the outcome CC is Pareto optimal and that the outcome DD is Pareto inefficient, in the sense that for all (i, j) ∈ E (ω ij + ω ji )DD < ω ij CD + ω ji DC < (ω ij + ω ji )CC; Remark 3. If K is reversible with respect to π (meaning that ω ij = ω ji ) as in Example 3, Pareto inefficiency means Equivalently, the polygon with vertices is convex and hence equal to E.
Given δ ≥ 0, a δ-good strategy for Player i is a payoff-based strategy Q i such that The following result is similar to Theorem 3.1. It shows that if a group of players use δ-good strategies, their payoffs cannot be much worse that the payoff resulting from mutual defection and that a weighted average of the other players payoffs cannot be much better than hers. If furthermore, all the players play a δ-good strategy, and that of player i is continuous, then the payoffs of i against j and j against i both equal CC, given by mutual cooperation.
As a consequence (Corollary 3), continuous δ-good strategies form a Nash equilibrium. The proof is postponed to the end of the section.
Recall that for all δ ≥ 0 we let (ii):  Nash equilibrium.
Proof of Theorem 4.1. Then where t j = 1 if s j = C and 0 otherwise. Thus Suppose now that s i = D. Then The results then follows from Corollary 1.
Proof of Theorem 4.1. The proof is similar to the proof of Theorem 3.1. Assertion (i), (ii) and (iii) follow from Propositions 3 and 4. For (iv) we use the fact that if player 1 plays a continuous δ good strategy, then the limit set L of {u n } is an invariant set of the differential inclusionu ∈ −u + C 1 (u) contained in Λ i (0). By proposition 3 and remark 4, for all u ∈ i Λ i (0) and v ∈ C 1 (u) u 1j = u j1 = CC. Thus, reasoning like in the proof of Theorem 3.1, invariance of L shows that for all u ∈ L u 1j = u j1 = CC.