Good Strategies for the Iterated Prisoner's Dilemma : Smale vs. Markov

In 1980 Steven Smale introduced a class of strategies for the Iterated Prisoner's Dilemma which used as data the running average of the previous payoff pairs. This approach is quite different from the Markov chain approach, common before and since, which used as data the outcome of the just previous play, the memory-one strategies. Our purpose here is to compare these two approaches focusing upon good strategies which, when used by a player, assure that the only way an opponent can obtain at least the cooperative payoff is to behave so that both players receive the cooperative payoff. In addition, we prove a version for the Smale approach of the so-called Folk Theorem concerning the existence of Nash equilibria in repeated play. We also consider the dynamics when certain simple Smale strategies are played against one another.


Introduction
The Iterated Prisoner's Dilemma has been an object of considerable study ever since Axelrod's description of the results of computer tournaments [6] and Maynard Smith's application of game theory to evolutionary competition [13]. Most of this work has focused upon what I call here Markov strategies, i.e. memory-one plans. Competition between two such stategies leads to a Markov process on the set of outcomes, e.g. [9], [15] and various surveys cited below. Considerable simulation work has been done to analyze numerically the competition between such strategies, e.g. [12] and [19]. In this context, I characterized in [3] and [5] certain so-called good strategies which ensured that an opponent could receive at least the cooperative payoff only by playing so that both players receive exactly the cooperative payoff. Recently, Mike Shub pointed out to me that Steve Smale had described in 1980 strategies which he called good [18]. These strategies use an entirely different way of aggregating the data of past outcomes. While there has been some work on Smale's procedure, e.g. [1], [8] and [7], most of the game theory literature has ignored it. For example, Smale's work is not referred to in [12], [14], [11], [16] or [17]. Our purpose here is to compare the Smale and Markov procedures and especially to use these to clarify the notion of a good strategy. In addition, we use the Taylor-Jonker equations for evolutionary dynamics [21] to analyze competition among certain simple Smale strategies.
The author would like to express his appreciation to the referee for his helpful comments and corrections.

Plans for the Iterated Prisoner's Dilemma
We will focus mostly on the symmetric version of the Prisoner's Dilemma. Each of the two players, X and Y, has a choice between two strategies, c and d. Thus, there are four outcomes which we list in the order: cc, cd, dc, dd, where, for example, cd is the outcome when X plays c and Y plays d. Either player can use a mixed strategy, randomizing by choosing c with probability p c and d with the complementary probability 1 − p c .
Each then receives a payoff. The following 2 × 2 chart describes the payoff to the X player. The transpose is the Y payoff.
Alternatively, we can define the payoff vectors for each player by (2.2) S X = (R, S, T, P ) and S Y = (R, T, S, P ).
The payoffs are assumed to satisfy (2.3) T > R > P > S and 2R > T + S, but 2P might lie on either side of T + S.
In the Prisoner's Dilemma, the strategy c is cooperation. When both players cooperate they each receive the reward for cooperation (= R). The strategy d is defection. When both players defect they each receive the punishment for defection (= P ). However, if one player cooperates and the other does not, then the defector receives the large temptation payoff (= T ), while the hapless cooperator receives the very small sucker's payoff (= S). The condition 2R > T + S says that the reward for cooperation is larger than the players would receive by dividing equally the total payoff of a cd or dc outcome. Thus, the maximum total payoff occurs uniquely at cc and that location is a strict Pareto optimum, which means that at every other outcome at least one player does worse. The cooperative outcome cc is clearly where the players "should" end up. If they could negotiate a binding agreement in advance of play, they would agree to play c and each receive R. However, the structure of the game is such that, at the time of play, each chooses a strategy in ignorance of the other's choice. Furthermore, the strategy d strictly dominates strategy c. This means that, whatever Y's choice is, X receives a larger payoff by playing d than by using c. In the array (2.1) each number in the d row is larger than the corresponding number in the c row above it. Hence, X chooses d, and for exactly the same reason, Y chooses d. So they are driven to the dd outcome with payoff P for each.
In the search for a way to avoid the mutually inferior payoff (P, P ), attention has focused upon repeated play. X and Y play repeated rounds of the same game. For each round the players' choices are made independently, but each is aware of all of the previous outcomes. The hope is that the threat of future retaliation will rein in the temptation to defect in the current round. It is this Iterated Prisoner's Dilemma which we will consider here.
After the k th round the players receive payoffs S k = (S k X , S k Y ) determined by the payoff matrix. For a single payoff after N rounds of play we use the time average: Observe that (2.5) s N +1 = N N + 1 and so The vector s N = (s N X , s N Y ) lies in S defined to be the convex hull of the four payoff pairs. If P < (T + S)/2, then S is a quadrilateral with vertices: (S, T ), (R, R), (P, P ), (T, S). If P ≥ (T + S)/2, then S is the triangle with vertices: (S, T ), (R, R), (T, S), which contains (P, P ).
The Euclidean diameter of the set S is √ 2(T − S). For > 0 we will call N * an time-step when N * > √ 2(T − S)/ . In that case, for N ≥ N * the distance ||s N +1 − s N || of the step from s N to s N +1 is less than .
Proposition 2.1. For any infinite sequence of outcomes, the set Ω of limit points of the sequence {s N } of payoff pairs is closed, connected subset of S. If U is an open subset of S which contains Ω, then there exists N * such that s N ∈ U for all N ≥ N * .
Proof: From (2.6) it is clear that ||s N +1 − s N || → 0 as N → ∞. So the conclusion is immediate from the following well-known lemma. 2 For the lemma we introduce a bit of notation. For A a nonempty, closed subset of a compact metric space X and x ∈ X we let d(x, A) = inf{d(x, a) : a ∈ A}. If B is another nonempty, closed subset we let d(A, B) = inf{d(a, b) : a ∈ A, b ∈ B}. By compactness, there exists a ∈ A such that d(x, A) = d(x, a), a point in A closest to x, and there exist a ∈ A, b ∈ B such that d(A, B) = d(a, b). Given > 0 we let V (A) denote the open set {x : d(x, A) < }, i.e. the set of points less than away from a point of A.
Lemma 2.2. Let {x N } be a sequence in a compact metric space X. If d(x N +1 , x N ) → 0 as N → ∞ then the set of limit points Ω is a nonempty, closed, connected subset of X. If U is an open set containing Ω then there exists N * such that x N ∈ U for all N ≥ N * .
Proof: The set limit points Ω is the intersection of the decreasing sequence {X k } with X k the closure of the tail {x N : N ≥ k} of the sequence. If U is an open set containing Ω then {U } and {X \ X k : k = 1, . . . } is an open cover of X and so has a finite subcover. Since the X k 's are decreasing, it follows that for some N * , {U, X \ X N * } is a cover of X. Hence, {x N : N ≥ N * } ⊂ U . If Ω were empty we could apply this to U = ∅ and get a contradiction. Now let A 0 and A 1 be disjoint nonempty, closed subsets of Ω. We will see that Ω \ (A 0 ∪ A 1 ) is nonempty and this implies that Ω is connected. Let 3 be the distance between the sets d(A 0 , A 1 ). Choose n * so that d(x N +1 , x N ) < for all N ≥ n * .
The sequence repeatedly approaches arbitrarily closely to each point of Ω. Since A 0 and A 1 are nonempty we can define n 0 to be the minimum n ≥ n * such that x n lies in V (A 0 ). Inductively, for k ≥ 0, define n 2k+1 to be the minimum n ≥ n 2k such that x n ∈ V (A 1 ), and for k ≥ 1, n 2k define n 2k to be the minimum n ≥ n 2k−1 such that x n ∈ V (A 0 ). It is clear that x n i −1 ∈ V 2 (A 0 ) \ V (A 0 ) if i is even and is in V 2 (A 1 ) \ V (A 1 ) if i is odd. Hence, the subsequence {x n i −1 } lies in the closed set X \ (V (A 0 ) ∪ V (A 1 )) and so it has limit points not in The choice of play for the first round is the initial play. A strategy is a choice of initial play together with what we will call a plan. A plan is a choice of play, after the first round, to respond to any possible past history of outcomes in the previous rounds.
If X and Y use strategies with only pure strategy choices then the result is an infinite sequence of outcomes. However, if mixed strategy choices are used either on the initial plays or as part of the plan there results a probability measure on the space of all such sequences.
We will consider plans which use just a crucial portion of the past history data.
We will use the label Markov plan for a stationary, memory-one plan which bases its response entirely on the outcome of the previous round. For example, the Tit-for-Tat plan, hereafter T F T , due to Anatol Rapoport, plays the opponent's response from the previous play.
With the outcomes listed in order as cc, cd, dc, dd, a Markov plan for X is given by a vector p = (p 1 , p 2 , p 3 , p 4 ) = (p cc , p cd , p dc , p dd ) where p z is the probability of playing c when the outcome z occurred in the previous round. On the other hand, if Y also uses a Markov plan q = (q 1 , q 2 , q 3 , q 4 ) then the response vector is (q cc , q cd , q dc , q dd ) = (q 1 , q 3 , q 2 , q 4 ) and the successive outcomes follow a Markov chain with transition matrix given by: Notice the switch in numbering from the Y strategy q to the Y response vector. This is done because switching the perspective of the players interchanges cd and dc. This way the "same" plan for X and for Y is given by the same vector. For example, T F T for X and for Y is given by p = q = (1, 0, 1, 0), but the response vector for Y is (1, 1, 0, 0). The plan Repeat is given by p = q = (1, 1, 0, 0) with the response vector for Y equal to (1, 0, 1, 0). This plan just repeats the previous play, regardless of what the opponent did.
We can think of a Markov chain on a finite set I (in this case I = {cc, cd, dc, dd}) as representing motion on a directed graph with vertices I and an edge from i 1 to i 2 if there is a positive probability, according to M, of moving from i 1 to i 2 , i.e. M i 1 i 2 > 0. In particular, there is an edge from i to itself when M ii > 0. A path in the graph is a state sequence i 1 , ..., i n with n > 1 such that there is an edge from i k to i k+1 for k = 1, ..., n − 1. A set of states J ⊂ I is called a closed set when it is nonempty and no path that begins in J can exit J. For example, the entire set of states is closed and for any i the set of states accessible via a path that begins at i is a closed set. The subset J is called a terminal set when it is closed and when for i 1 , i 2 ∈ J there exists a path from i 1 to i 2 . Equivalently, a terminal set is a minimal closed set. Since the set is closed, the path moves only on elements of J.
Restricted to a terminal set, the system is ergodic and so for any function f : I → R the sequence of time averages . . } converges with probability 1 to the space average Σ i∈J (v J ) i f (i), the expected value with respect to v J . That is, such convergence occurs except on a set of outcome sequences which has probability 0. Think of the outcome sequence for a fair coin such that Heads comes up every time.
Distinct terminal sets are disjoint. If i ∈ I lies in some terminal set then it is called recurrent; otherwise, it is called transient. If i is transient then for each terminal set J there is a probability, determined by i and M so that the process beginning from i eventually enters J. This probability might be zero but when summed over all of terminal sets the probabilities add up to 1. If the process enters J then the time average for any function f approaches the expected value with respect to v J with probability 1.
The matrix M is called convergent when there is a unique terminal set J. In that case, v J is the unique stationary distribution for M and with probability 1, the time average for f : I → R converges to the v J expected value regardless of the initial position.
Suppose X and Y play Markov plans leading to the 4 × 4 matrix M of (2.7) and v J is the stationary distribution for a terminal set J ⊂ {cc, cd, dc, dd}. If the sequence of outcomes enters J, then it remains in J and with probability 1 the average payoff (s N X , s N Y ) converges with If there is more than one terminal set J, then with probability 1 the sequence will enter some terminal set J with the probabilities for different J's depending upon the initial plays.
For example, suppose that p satisfies 0 < p i < 1 for i = 1, . . . , 4 and that q satisfies the analogous condition. Every entry of the associated Markov matrix M is positive and {cc, cd, dc, dd} is the unique terminal set. So there is a unique stationary distribution v with v i > 0 for i = 1, . . . , 4 and with probability one the outcome sequence passes repeatedly through each of the four outcomes with the average payoff sequence converging according to (2.8).
Smale in [18] aggregates the data in a different way. He suggest using as data the current average payoff given by (2.4). A Smale plan is a function π : S → [0, 1]. If X uses the Smale plan π then in round N + 1 he plays c with probability π(s N X , s N Y ). Again we have the switch due to reverse in labeling for the other player. Let Switch : S → S be defined by Switch(s X , s Y ) = (s Y , s X ). If Y uses the Smale plan π then she cooperates with probability π • Switch(s N X , s N Y ) = π(s N Y , s N X ). That is, Y responds to s ∈ S by using π • Switch applied to s.
So if X uses π X and Y uses π Y then the outcomes cc, cd, dc, dd at time N + 1 have probabilities given by After the randomization is applied, we obtain the time N + 1 payoff Smale only uses pure strategy responses for which π maps to {0, 1}. For the most part we will follow him. We will call π −1 (1) ⊂ S the cooperation zone for π and π −1 (0) ⊂ S the defection zone.
The following Separation Theorems will be used repeatedly.
Proof: Since an affine map commutes with convex combinations, (2.5) implies On the other hand, since N * ≥ 1 Finally, observe that L(s N * ) ≤ M N * N * . So inequality (2.10) follows from (2.12) and (2.13) by mathematical induction. 2 Lemma 2.4. Let L : R 2 → R be a nonconstant affine map. X and Y use general strategies.
(a) If L(P, P ), L(T, S) < 0, then there is a positive integer k such that for all N there exists n with N ≤ n ≤ kN such that either L(s n ) < 0 or X plays c on round n.
(b) If L(R, R), L(S, T ) > 0, then there is a positive integer k such that for all N there exists n with N ≤ n ≤ kN such that either L(s n ) > 0 or X plays d on round n.
Proof: (a) Let m = min{−L(P, P ), −L(T, S)} and let k be an integer with k ≥ 2 such that M k−1 < m. Assume X plays d in rounds N, . . . , kN then in each round the outcome is either dc or dd and so L(S n ) ≤ −m for n = N + 1, . . . , kN . On the other hand, L(S n ) ≤ M for n = 1, . . . , N . Hence The proof of (b) is completely analogous. 2 Theorem 2.5. Let L : R 2 → R be a nonconstant affine map with M be the maximum value of |L| on S. Player X uses a Smale plan π from round N * on and player Y uses an arbitrary plan and X and Y use arbitrary initial plays.
So lim sup N →∞ L(s N ) ≤ 0. (ii) If L(P, P ), L(T, S) < 0 then there is a positive integer k so that for all N ≥ N * there exists n with N ≤ n ≤ kN such that L(s n ) ≤ 0. (iii) If L(R, R), L(S, T ) > 0 then there is a positive integer k so that for all N ≥ N * there exists n with N ≤ n ≤ kN such that X plays d on round n. (b) Assume that L(s) < 0 implies π(s) = 1 i.e. {L < 0} is contained in the cooperation zone of π.
(ii) If L(R, R), L(S, T ) > 0 then there is a positive integer k so that for all N ≥ N * there exists n with N ≤ n ≤ kN such that L(s n ) ≥ 0. (iii) If L(P, P ), L(T, S) > 0 then there is a positive integer k so that for all N ≥ N * there exists n with N ≤ n ≤ kN such that X plays c on round n.
Proof: (a)(i) If L(s N ) > 0 then X plays d and so the N + 1 outcome is either dc or dd. Hence, S N +1 is either (T, S) or (P, P ) which implies L(S N +1 ) ≤ 0. So we can apply Lemma 2.3 to get (2.15).
(a)(ii) Apply Lemma 2.4(b) to obtain k so that for some n between N and kN either L(s n ) < 0 or X plays c on round n. By assumption, if X plays c on round n then L(s n ) ≤ 0.
(a)(iii) Apply Lemma 2.4(a) to obtain k so that for some n between N and kN either L(s n ) > 0 or X plays d on round n. If L(s n ) > 0 then X plays d on round n.
The proof of (b) is completely analogous to that of (a) applying Lemma 2.3 to −L to get (2.16). 2 We will say that a player eventually uses a plan when there exists N * so that the plan is used for all N ≥ N * .
Notation: For distinct points A, B ∈ R 2 we will use [A, B] for the closed segment connecting the points, [A, B) for the half-open segment [A, B] \ B, etc. We will denote by )A, B( the line through A and B and use [A, B( for the ray from A through B. In general, for a finite set of points {A 1 , . . . , A n } we will use [A 1 , . . . , A n ] for the convex hull. We will call the line )(P, P ), (R, R)( the diagonal and the line )(S, T ), (T, S)( the co-diagonal. If 1 and 2 are non-parallel lines, then we will let 1 ∩ 2 denote the point of intersection, abusively identifying the singleton set with the point contained therein.
Any line in R 2 is the intersection of two half-planes H + and H − . When the line is not vertical we will use H + for the upper half-plane. We will then refer to H + \ as the set of points above , with H − \ the points below . Up to multiplication by a positive constant an affine map L : R 2 → R is uniquely defined by the conditions that L is zero on and is positive on H + \ . We will say that L is associated with and vice-versa.
A line is called a separation line for the game when (S, T ) and (R, R) lie in one half-plane while (P, P ) and (T, S) lie in the other. Corollary 2.6. Assume that eventually X plays a Smale plan π, Y uses an arbitrary plan and that the initial play is arbitrary. Let Ω be the limit point set of an associated sequence of outcomes. Let C ⊂ S be a closed, convex set and be a separation line. (a) If (P, P ), (T, S) ∈ C and S \ C is contained in the defection zone π −1 (0), then Ω ⊂ C.
(c) If S ∩ ⊂ C and π(s) = 0 for s above C and π(s) = 1 for s below C, then Ω ⊂ C.
Proof: (a): For s ∈ S \ C, let s be the closest point in C. The line through s which is perpendicular to )s, s ( is a line of support for C. That is, if L is an affine function associated with such that L(s) > 0 then L ≤ 0 on C. From Theorem 2.5 (a)(i) it follows that L ≤ 0 on Ω. In particular, s ∈ Ω.
(b): Proceed as above, using Theorem 2.5 (b)(i), instead. (c): Let C + consist of the points of S on or above C and C − consist of the points of S on or below C. To be precise, if H ± are the halfspaces associated with , C ± = C ∪ (H ± ∩ S). These are each closed, convex sets. Because is a separation line, (P, P ), (T, S) ∈ C − and (R, R), (S, T ) ∈ C + . From (a) it follows that Ω ⊂ C − and from (b) that Ω ⊂ C + . Thus, Ω ⊂ C − ∩ C + = C. Definition 2.7. The map π : S → [0, 1] is a simple Smale plan with separation line if for an associated affine function L for That is, the π player responds with d if s is above the line and with c if s is below the line. We do not specify the value of π on the line. Corollary 2.8. Assume that from some round N * on, player X uses a simple Smale plan with separation line and associated affine function L. Let M be the maximum of |L| on S. Assume that player Y uses an arbitrary plan and that the initial plays are arbitrary.
So lim N →∞ L(s N ) = 0 and the limit point set Ω is contained in . Thus, if eventually X plays a simple Smale plan and player Y uses an arbitrary plan, then after the randomization for mixed strategies has been applied, a sequence of outcomes is obtained and the limit point set Ω of the corresponding payoff sequence {s N } is a point or closed segment in the separation line by Proposition 2.1.
The plan All-C, which always cooperates, and so has π = 1 on S, is a simple Smale plan with separation line )(R, R), (S, T )(. If P ≤ 1 2 (T + S) then All-D, which always defects, and so has π = 0 on S, is a simple Smale plan with with separation line )(P, P ), (T, S)(. However, if P > 1 2 (T + S) then every simple Smale plan cooperates below the line )(P, P ), (T, S)( and so, in particular, has π = 1 on a neighborhood of 1 2 (T + S, T + S).
If P ≤ E ≤ R then the horizontal line {s Y = E} is a separation line. The associated simple Smale plan is the Smale version of an equalizer plan introduced in [9] and also described by Press and Dyson [15]. If X uses this equalizer plan then the limiting payoff for Y is E regardless of Y's play. On the other hand, the payoff to X can be anything between R and P, or even lower.
If, eventually, X and Y play simple Smale plans π X and π Y with separation lines X and Y , respectively, then Y responds with π Y • Switch and so the set Ω of limiting payoffs lies on the intersection X ∩ Switch( Y ). Except for the extreme cases with X = Y = Switch( Y ) equal to the diagonal or X = Y = Switch( Y ) equal to the co-diagonal the intersection is a single point and so the payoff sequence {s N } converges to the intersection point X ∩ Switch( Y ). Proposition 2.9. (a) If, eventually, X plays the plan All-C then for any strategy for Y and any initial plays, the limit point set Ω is contained in the segment [(R, R), (S, T )].
(b) If, eventually, X plays the plan All-D then for any strategy for Y and any initial plays, the limit point set Ω is contained in the segment Corollary 2.10. Assume that, eventually, X uses a Smale plan π. Let Y use an arbitrary strategy and let X and Y use arbitrary initial plays.
If Ω is the limit point set, then Ω is not contained in the S interior and Ω is not contained in the S interior of π −1 (0) \ [(P, P ), (T, S)].
Proof: Assume X uses π from some time N * If Ω ⊂ U then by Proposition 2.1 there exists N * ≥ N * 1 such that s N ∈ U for all N ≥ N * . This implies that for every round beyond N * , X plays c. So the sequence of payoffs is the same as though X plays All-C starting from round N * . So by Proposition 2.9 (a) the limit point set would be contained in [(R, R), (S, T )]. This contradiction shows that Ω ⊂ U is impossible.
The second assertion similarly follows from Proposition 2.9 (b). 2

Good Plans
We describe informally the conditions that we would like a good plan to satisfy.
• (Cooperation Condition) If the players X and Y use fixed good strategies, i.e. good plans together with initial cooperation, then for every N , S N = (R, R) and so, of course, the time averages s N = (R, R) for all N as well. • (Protection Condition) If X eventually plays a fixed good plan, and s * = (s * X , s * Y ) is a limit point for the sequence {s N } with arbitrary initial play and with Y using any plan, then If eventually the players X and Y use fixed good plans, then regardless of the earlier play lim N →∞ s N = (R, R) at least with probability one. A Markov plan is called agreeable if the response to a cc outcome is always c. That is, p satisfies p 1 = p cc = 1. A Markov plan is called firm if the response to a dd outcome is always d. That is, p satisfies p 4 = p dd = 0. For example, the T F T plan with p = (1, 0, 1, 0) is both agreeable and firm.
If both X and Y use Markov plans then {cc} is a terminal set if and only if both plans are agreeable.
We call a general plan is weakly agreeable if the response is c when every previous outcome is cc. A general plan is called weakly firm if the response is d when every previous outcome is dd. For example, a Smale plan π is weakly agreeable if and only if π(R, R) = 1 and is weakly firm if and only if π(P, P ) = 0. Clearly, a Markov plan is weakly agreeable if and only if it is agreeable and is weakly firm if and only if it is firm.
If X and Y both use weakly agreeable plans and initially cooperate then every outcome is cc and s N = (R, R) for all N .
So to obtain the Cooperation Condition we demand that a good plan be weakly agreeable.
The agreeable Markov plans which satisfy the Protection Condition can be completely characterized.
The plan p satisfies the Protection Condition if and only if the following inequalities hold.
Proof: See [5] Theorem 1.5, where a plan is called good if it is agreeable and satisfies the Protection Condition. See also [3]. 2 For a Markov plan p we will refer to (3.2), together with the assumption p 1 = 1, as the protection inequalities. Thus, a Markov plan is agreeable and satisfies the Protection Condition exactly when the the protection inequalities hold.
For Smale plans we have That is, is a line through (R, R) with slope m satisfying 0 < m ≤ 1.
In particular, if π is a simple Smale plan with separation line then π satisfies the Protection Condition.
Proof: The line contains (R, R), and the point (P, R) lies above . Since is a separation line, it follows that If is a line through (R, R) with slope m satisfying 0 < m ≤ 1 then is a separation line. and we call such a line a protection line for π if π = 0 above the line. The above result says exactly that if a Smale plan admits a protection line then it satisfies the Protection Condition.
That is, a generous plan is agreeable and with positive probability responds to an opponent's defection with cooperation, but does not always cooperate from a cd outcome (which had payoff (S, T )).
Theorem 3.4. Assume that X and Y, eventually, play Markov plans p and q respectively. If both plans are generous then {cc} is the only terminal set for the associated Markov chain. So from any initial play, with probability one, eventually the outcome sequence is constant at cc and so lim N →∞ s N = (R, R).
Proof: Since the two plans are agreeable, {cc} is a terminal set. Since p 4 , q 4 > 0, there is a positive probability of moving from dd to cc and so dd is a transient state. From cd, 1 > p 2 > 0 implies that X plays either c or d with positive probability. If Y plays c (or d) with positive probability then from cd there is a positive probability of moving to cc (resp.to dd and thence to cc). Hence, cd is transient. Recall that Y uses the response vector (q 1 , q 3 , q 2 , q 4 ) and so cooperates after dc with probability q 2 . Thus, a symmetric argument shows that dc is transient.
If the players use p and q from time N * on then from that point, the play follows the Markov chain given by M and so with probability one the sequence of outcomes eventually arrives at the unique terminal set {cc}. (T + S, T + S), (R, R)) such that π(s) = 1 for s ∈ U . That is, the cooperation zone contains U and the points on and below the diagonal line.
The following is essentially a part of [18] Theorem 1.
Theorem 3.6. Assume that, eventually, X and Y play Smale plans π X and π Y , respectively. If both plans are generous then from any initial Since L 0 (s) < 0 implies π X (s) = 1 and π Y (s) = 1, we can apply Theorem 2.5 (b) to L 0 and π X to get L 0 (s N ) ≥ −M 0 N * /N , where M 0 = T − S is the maximum value of |L 0 | on S. The Y player uses π Y •Switch and so we apply the theorem to L 0 •Switch and π Y •Switch Thus, the limit point set Ω is contained in the diagonal. Observe that everywhere in S either X or Y plays c and so the only outcomes are cd, dc and cc. Hence, we have L 1 (S N ) ≥ 0 for all N . We can apply Lemma 2.
) on which π X = 1 and π Y = 1, respectively. By assumption, if s N ∈ U then both players play c with outcome cc at time N + 1. So S N +1 = (R, R).
It would be nice to show that if X plays a generous Smale plan and Y plays a generous Markov plan, then with probability one lim N →∞ s N = (R, R). I don't know if the conjecture is true in full generality. However, we get the result we want if we strengthen the assumption.
Theorem 3.8. Assume X eventually plays a convex-generous Smale plan and Y plays a generous Markov plan. With probability one there is a time after which both players play c and so lim N →∞ s N = (R, R).
Proof: Let X play π with convex set C = π −1 (1) and let Y use q = (q 1 , q 2 , q 3 , q 4 ) with q 1 = 1 and < q 2 , < q 4 for some > 0. Recall that for Y, q 2 is the probability of cooperating in a round following the outcome dc.
Claim 1: With probability one, for every N there exists n ≥ N such that s n ∈ C.
Proof: It suffices to show that for every N the event has probability zero. Let 1 = )V, (R, R)(, which is a separation line, and let L 1 be an affine function associated with 1 . Clearly, (P, P ) and (T, S) lie below the line and so L 1 (P, P ), L 1 (T, S) < 0. By Lemma 2.4 (a) there exists a positive constant k 1 (which depends only on L 1 ) such that for some N 1 between N and k 1 N either L 1 (s N 1 ) < 0 or X plays c on round N 1 . Note that the latter is equivalent to s N 1 ∈ C. Assuming E N , s N 1 lies in ∆ since it is not in C. For n ≥ N if s n ∈ ∆ then, since it is not in C, the payoff S n+1 is either (P, P ) or (T, S) and so by (2.5) L 1 (s n+1 ) ≤ n n+1 L 1 (s n ) ≤ 0. Since s n+1 ∈ C, it follows that s n+1 ∈ ∆ . Inductively, we have s n ∈ ∆ for all n ≥ N 1 and so for all n ≥ k 1 N . Now if among the rounds k 1 N, · · · , M − 1 Y plays c exactly r times then and so the vector from (P, P ) to s k 1 N +M is the nonzero vector It follows that, assuming E N , there exists R < ∞ such that Y plays c at most R times and so from some N 2 onward Y always plays d. For each N 2 ≥ k 1 N the event E N,N 2 = E N and Y plays d on every round n with n ≥ N 2 has probability zero because at each such round Y is responding to dd by playing d. These are independent events each with probability at most 1 − . So E N = N 2 ≥k 1 N E N,N 2 has probability zero.
This completes the proof of Claim 1. If Y plays c at any time N when s N ∈ C then the outcome of the N +1 round is cc with payoff (R, R). Since C is convex s N +1 ∈ C and so X next plays C and Y next plays C because q is agreeable. Inductively cc is the outcome and s n ∈ C for every round n with n ≥ N .
From Claim 1, it suffices to show thatẼ has probability zero.
Claim 2: AssumingẼ, for every N there exists n ≥ N such that s n ∈ S \ C.
Proof: If we assume E 0 , then whenever X plays c, Y plays d leading to payoff (S, T ). If for some N , s n ∈ C for all n ≥ N and E 0 is true, then for all n > N we have S n = (S, T ) and the sequence {s n } would converge to (S, T ), but the complement of C is a neighborhood of (S, T ) and so eventually s n ∈ C. The contradiction proves Claim 2.
AssumingẼ, s n ∈ C and s n ∈ S \ C each occur infinitely often by Claim 1 and Claim 2. Now let N k be the k th return time to C from X \ C. This is an infinite sequence of Markov times. Since at time N k − 1 the payoff sequence was in S \ C, X played d and so at time N k Y plays d in response to either a dc or a dd. Playing d in these cases has probability at most 1 − . Furthermore, the Y play at time N k is independent of the previous plays. Thus, againẼ requires an infinite sequence of independent events, each with probability at most 1 − . Hence,Ẽ has probability zero. 2 Remark: Our assumptions on q allow the possibility q 3 = 0. So if (S, T ) were in C • then from a neighborhood of (S, T ) it would be a limit point following an infinite sequence of cd outcomes. Notice that the assumption (S, T ) ∈ C is analogous to the assumption that p 2 < 1 for a generous Markov plan. If p 2 = 1, p 3 = 0 and q satisfies the analogous condition then {cd} and {dc} are terminal sets when X plays p and Y plays q. Definition 3.9. We call a Markov plan good when it satisfies the protection inequalities (and so is agreeable) and is generous.
We call a Smale plan good (or convex-good) when it is weakly agreeable, admits a protection line and is generous (resp. and is convexgenerous).
For example, if π is a simple Smale plan with separation line then π is good if and only if it is weakly agreeable (i.e. π(R, R) = 1) and is a line through (R, R) with slope strictly between 0 and 1, so that is a protection line. It is convex-good if and only if, in addition, π = 1 on ∩ S. Such a good simple Smale plan is the Smale version of what is called in [10] a complier strategy (also called a generous zerodeterminant strategy, as in e.g. [20]). Any limit point s * , other than (R, R) on the separation line has s * Y larger than s * X , albeit less than R.
On the other hand, if is a line through (P, P ) with slope between 0 and 1 then it is a separation line and the associated simple Smale plan is, when P ≤ 1 2 (T + S), the Smale version of what Press and Dyson [15] call an extortionate strategy. Any limit point s * , other than (P, P ) on the separation line satisfies s * Y < s * X . Thus, if Y plays to avoid the (P, P ) payoff she always obtains less than X does from the change in policy. The best reply to such an extortionate strategy is All-C. The payoff point is then the intersection point B = ∩ ((R, R), (T, S)) with P < B Y < R < B X .
In [5] a Markov plan, there called a memory-one plan, is called good when it satisfies the protection inequalities, or, equivalently, it is agreeable and satisfies the Protection Condition. The TFT plan with p = (1, 0, 1, 0) satisfies the protection inequalities but is firm rather than generous. If both X and Y use the TFT plan then from initial outcome cc the sequence of outcomes is fixed at cc, but from an initial dd, the state dd is fixed. Following cd or dc the two states alternate leading to convergence of the payoff sequence to 1 2 (T + S, T + S). The phenomena illustrate the failure of robustness in the absence of generosity.
In [18] a Smale plan π is called good if it is generous, and so satisfies the Robustness Condition, and π(s) = 0 when s Y > R. The line {s Y = R} is an equalizer line and so Smale's conditions allows the possibility of a limit outcome (s X , R) with s X < R.
In addition, Smale imposed the condition that π(s) = 0 when s X < P . If P > 1 2 (T + S), this would contradict the condition that π = 1 when s X > s Y . Smale was only considering the case with P < 1 2 (T +S) and we will examine that situation first.
Assume P < 1 2 (T + S). Choose 1 a line through (R, R) with slope strictly between 0 and 1, so that the weakly agreeable simple Smale plan with separation line 1 is good. Let A be the point of intersection of 1 and the open segment ((P, P ), (S, T )). Choose a point V on the open segment ((R, R), A) with V X ≥ P or larger. Let 2 be the line )(P, P ), V (. Let π be a Smale plan such that π(s) = 0 if s is above 1 or above 2 and π(s) = 1 at (R, R), below the diagonal and on some open set containing [ 1 2 (T + S, T + S), (R, R)). Thus, π is generous and since 1 is a protection line it is a good Smale strategy. It follows from Corollary 2.6(c) that if X eventually plays π against any plan for Y and any initial plays, then the limit point set Ω is contained in the triangle [(P, P ), (R, R), V ].
The advantage of such a plan is that it excludes points above 2 from the limit. In contrast, against the good simple Smale strategy with separation line 1 , any point of [A, (R, R)] can occur as the limit if Y plays a suitable simple Smale strategy. For example, recall that All-D is a simple Smale strategy with separation line = )(P, P ), (T, S)(. Thus, if Y plays All-D then the limit point is the intersection point of This sort of possible cost can always occur with a generous Markov plan p. If Y plays All-D, which is a Markov plan with q = (0, 0, 0, 0), then the unique terminal set is {cd, dd} with stationary distribution v = (0, p 4 , 0, (1 − p 2 ))/[p 4 + (1 − p 2 )]. The limiting average payoff (s * X , s * Y ) given by (2.8) satisfies (s * X , s * Y ) − (P, P ) = (S − P, T − P ) with = p 4 /[p 4 + (1 − p 2 )] > 0. So s * Y > P > s * X . On the other hand, for plans such as π above we have seen that the limit point set Ω against any Y play is a connected set contained in the triangle [(P, P ), V, (R, R)]. However: Example 3.10. The limit set Ω need not be a point or interval and it may contain points in the interior of the triangle.
Assume P < 1 2 (T + S). Choose V ∈ S with R > V Y > V X ≥ P . Let 1 = )(R, R), V ( and 2 = )(P, P ), V (. Let π X be a Smale plan with π X (s) = 0 for above 1 or above 2 and π X (s) = 1 otherwise. In that case, π X is convex-good with C the quadrilateral [(P, P ), V, (R, R), (T, S)]. Let C ⊂ C denote the triangle [(P, P ), V, (R, R)]. As we saw above, when X plays π X against any strategy for Y, the limit point set Ω is contained in C . Furthermore, by Corollary  LetC be the quadrilateral [(P, P ), (Q, Q), W, W ]. Define π Y for Y to be the Smale plan with π Y (s) = 1 if s lies in Switch(C) and = 0 otherwise. Recall that Y responds with π Y • Switch and so Y plays c if s ∈C and plays d otherwise.
We prove that if, eventually, X plays π X and Y plays π Y then regardless of the initial play, the limit point set Ω is the boundary of the quadrilateralL = [V, W , W, V ] (If V = V, Ω is the boundary of the triangle [V, W , W ]).
Proof: Assume that X and Y play π X and π Y , respectively, beyond time N * .
The lines 1 , 2 , , and the diagonal subdivide S into twelve polyhedral regions. For any > 0, let N > N * be an step-time after which every move from s N to s N +1 has distance less than , i.e. let N be greater than max(N * , √ 2(T − S)/ ). Let 0 > 0 be smaller than the distance between any two non-intersecting regions and let N 0 = N 0 . Thus, such a small move cannot jump between non-intersecting regions.
LetC =C ∩ C, which is the quadrilateral [(P, P ), W , W, (Q, Q)]. Claim: For every N ≥ N * there exists n ≥ N such that s n ∈C.
First we show that the sequence of payoffs must enterC. If not, then for every round beyond N , Y plays d. As in the proof of Proposition 2.10 the results after N are the same as though Y plays All-D which is a simple Smale plan with separation line )(P, P ), (T, S)(. Then Ω is contained in the intersection of the triangle C with Switch( )(P, P ), (T, S)( ) = )(P, P ), (S, T )(. This intersection contains only the point (P, P ). If Ω were just (P, P ) then for any small neighborhood U of (P, P ) eventually s n ∈ U . If s n is on or above the diagonal then s n ∈C. If s n is below the diagonal then the sequence of payoffs moves toward (S, T ) and eventually makes a small jump intoC. Either way, this contradicts the assumption that the sequence never entersC. Now for Z on the open segment ((P, P ), W ) let U Z consist of the points of S which are below both of the lines )(S, T ), Z( and )(T, S), Z(. These are convex open neighborhoods of (P, P ) which converge to (P, P ) as Z → (P, P ). Since (P, P ) ∈C, if s n ∈C \C then there exists Z such that s n ∈ U Z , i.e. s n ∈ K 1 =C \ (C ∪ U Z ). Assume n > N 0 . From such a point the sequence moves toward (T, S). During the motion it remains above U Z . If the sequence does not enterC from C \C then it jumps to below the diagonal to land in K 2 the set of points outside U Z which are on or below the diagonal and . From such points the sequence moves back toward (S, T ). If it jumps over C then it re-enters K 1 . This alternation cannot continue indefinitely.
Notice that K 1 and K 2 are a positive distance Z apart. Once N ≥ N Z the move from K 1 or K 2 must land inC.
This completes the proof of the Claim. For any > 0, let s n ∈C with n > N . From this point the sequence moves toward (R, R) exitingC at a point above, and close to, the line . The sequence now moves toward (S, T ), close to and above the line . It exits C at a point above 1 and close to V . Now the sequence moves toward (P, P ) enteringL on or below, and close to 1 and then moving back toward (S, T ). These P P and then ST alternate motions may continue for a long time but it must eventually cease since the sequence must eventually return toC. The exit occurs when the sequence lands above 2 , close to V . The sequence then moves toward (P, P ) above and close to the line 2 until it entersC \C, close to W . The sequence then moves toward (T, S) crossing 2 close to W to re-enterC now below and close to the line .
As N → ∞, → 0 and the motion gets close to motion from W to W , from W to V , from V to V , and then from V back to W . 2 Turning now to the case when P ≥ 1 2 (T + S) we see that the additional condition imposed by Smale now does not work so well.
Suppose you demand that π satisfy π(s) = 0 when s X ≤ Proposition 3.11. Suppose that π(S, T ) = 0, π(T, S) = 1 and π(s) = 1 when s lies on the diagonal. If both X and Y play Smale plans which satisfy these conditions then for any initial plays, lim s N = (R, R).
Proof: If the initial outcome is cc or dd then the initial payoff lies on the diagonal and so every successive outcome is cc. If the initial outcome is cd, with payoff (S, T ) then the next outcome is dc with payoff (T, S) so that s 2 = 1 2 (T + S, T + S) which lies on the diagonal. Hence, in any case, the outcome on the n th round is cc for n ≥ 3 and the limit result follows. 2 This version of "robustness" is very unsatisfying. For real robustness one wants approach to (R, R) even if errors occur in the computations and if the plans are adopted only from some time N * on.
Example 3.12. Returning to the case with P > 1 2 (T + S) we describe what seems to me to be the best version of robustness we can hope for if we demand protection against payoffs below P .
Let be a protection line with slope less than 1 so that (P, P ) lies below . Let V ∈ with P ≤ V X < R, 1 = )(P, P ), V ( and 2 the vertical line {s X = P }. Let W be the point of intersection 2 ∩ )(S, T ), (T, S)(. Assume that π X (s) = 0 whenever s is above or 1 , or if it is on or to the left of 2 . Otherwise, π X (s) = 1.
It follows from Corollary 2.6(a) and (b) that if X eventually plays π X then against any Y play, the limit point set Ω is contained in Now assume that Y also eventually plays such a plan π Y with lines , 1 , 2 = 2 and with points V and W . Let D consist of the set of points of S which are either on or below )(P, P ), (S, T )( or on or below )(P, P ), (T, S)(. Let D 1 = [W, Switch(W ), (P, P )], the set of s ∈ S with s X , s Y ≤ P . It is easy to check that if s N * ∈ D then s n ∈ D for all n ≥ N * and that for some N * * ≥ N * , s N * * ∈ D 1 . All subsequent outcomes are dd and so lim s N = (P, P ). On the other hand, if {s N } does not converge to (P, P ) then it is easy to check that some s N * * lies in [(P, P ), V, (R, R), Switch(V )] \ {(P, P )}. All subsequent outcomes are cc and so lim s N = (R, R). Thus, we always have either convergence to (P, P ) or to (R, R). 2 Return now to consider a simple Smale plan with separation line . For such a plan the choices on the line were, in general, left unspecified. Recall that if π X and π Y are simple Smale plans with separation lines X and Y then, except for the extreme cases when both X and Y are the diagonal or both the co-diagonal (which requires P ≤ 1 2 (T + S)), then if, eventually, X uses π X and Y uses π Y the sequence {s N } converges to the point of intersection X ∩ Switch( Y ) regardless of earlier play and regardless of the choices on the separation lines.
To illustrate where the choices on the lines become important, let us consider the extreme cases.
Suppose P < 1 2 (T + S) and π is a simple Smale plan with separation line the co-diagonal, = )(S, T ), (T, S)(. If both players use π from time N * on then if s N * is above then the sequence remains above converging to (T, S). Similarly, if s N * is below then the sequence remains below and converges to (S, T ). If s N * ∈ then the result depends on the choice of π on . If π(s N * ) = π(Switch(s N * )) = 1 then s N * +1 is above with convergence to (T, S). If π(s N * ) = π(Switch(s N * )) = 0, then s N * +1 is below with convergence to (S, T ). Suppose that π(s) = 0 if s ∈ with s above the diagonal and π(s) = 1 if s ∈ on or below the diagonal, then we again have alternating (S, T ) and (T, S) motion with limit 1 2 (T + S, T + S) unless the sequence lands on the point Tit-for Tat. Suppose both players use π for N ≥ N * and s N * ∈ . We obtain alternating motion towards (T, S) and (S, T ) with limit point 1 2 (T + S, T + S) unless at some time N ≥ N * , s N = (Q, Q) ∈ . If Q ≥ 1 2 (T + S) then we obtain outcomes cc for all rounds after N with convergence to (R, R). If Q < 1 2 (T + S) then we obtain outcomes dd for all rounds after N with convergence to (P, P ).
Thus, in both extreme cases, the limit results depend upon the π choices on the separation line.

Separation Paths and the Folk Theorem
This section is the result of some suggestions and questions raised by Christian Hilbe in response to an earlier version.
Definition 4.1. We call a pair of Smale plans π X , π Y a Nash Equilibrium when the following hold: (a) There is a point s * = (s * X , s * Y ) such that if X eventually plays π X and Y eventually plays π Y then the outcome sequence {s N } converges to s * . We then call s * the payoff to the pair. (b) If Y eventually plays π Y then any limit point V of an outcome sequence satisfies V X ≤ s * X , regardless of the play of X. If X eventually plays π X then any limit point V of an outcome sequence satisfies V Y ≤ s * Y , regardless of the play of Y. That is, neither player can obtain an improved payoff by unilaterally changing plans.
The so-called Folk Theorem of Iterated Play in this context should say that for any s * ∈ S with s * X , s * Y ≥ P , there exists a Nash Equilibrium with payoff s * .
With R ≥ s * X , s * Y ≥ P this is easy to obtain. If π X and π Y are the equalizer simple Smale plans with horizontal separation lines X given by s Y = s * Y and Y given by s Y = s * X then the pair π X , π Y is a Nash equilibrium. In fact, as long as Y plays π Y , the limit set Ω lies in the vertical line Switch( Y ) and X obtains s * X regardless of his play Consequently, X has no incentive to use π X . Similarly, for Y. So we would like to strengthen condition (b) to the analogue of the Protection Condition in this context.

Definition 4.2.
We call a pair of Smale plans π X , π Y a Strong Nash Equilibrium when the following hold: (a) There is a point s * = (s * X , s * Y ) such that if X eventually plays π X and Y eventually plays π Y , then the outcome sequence {s N } converges to s * . (b') If Y eventually plays π Y , then the point V = s * is the only limit point V of an outcome sequence with V X ≥ s * X , regardless of the play of X. If X eventually plays π X , then the point V = s * is the only limit point V of an outcome sequence with V Y ≥ s * Y , regardless of the play of Y.
Also, we would like to obtain Nash equilibrium results when s * X or s * Y is greater than R.
To deal with the cases when P > 1 2 (T + S), we again letP = min(P, 1 2 (T + S)). If C ⊂ S is nonempty, then the distance from s to C is d(s, C) = inf{||s −ŝ|| :ŝ ∈ C}. For ≥ 0, let C = {s ∈ S : d(s, C) ≤ }. Observe that C 0 is the closure of C. (b) If C is closed and is upper (or lower) full then C is upper full (resp. lower full). In particular, the closure of C is upper full (resp. lower full).
(c) The union C + (or C − ) of the upper triangles (resp. the lower quadrangles) with vertices in C is an upper full (resp. lower full) subset which is closed if C is. Because upper and lower fullness are generalizations of convexity, the following is an extension of Corollary 2.10.
Proposition 4.5. Assume that eventually X plays a Smale plan π, Y uses an arbitrary plan and that the initial play is arbitrary. Let Ω be the limit point set of an associated sequence of outcomes. Let C ⊂ S be a nonempty closed set.
(a) If C is lower full and S \ C is contained in the defection zone π −1 (0), then Ω ⊂ C.
(b) If C is upper full and S \ C is contained in the cooperation zone π −1 (1), then Ω ⊂ C.
Proof: (a) Assume X plays π X after N * . For > 0 let N > N * be an step-time. If for all N ≥ N , s N ∈ S \ C, then after N the outcomes are just as though X played All D. Hence, by Proposition 2.9 (b), Ω ⊂ [(T, S), (P, P )] ⊂ C. So we may assume that at some time N 1 > N , s N 1 ∈ C. We show by induction that for all N ≥ N 1 , s N ∈ C .
If s N ∈ C, then since N > N , ||s N +1 − s N || < , and so s N +1 ∈ C . If s N ∈ C \ C ⊂ π −1 (0), then S N +1 ∈ [(T, S), (P, P )] and so s N +1 is in Q(s N ) and so is contained in the lower full set C .
Since, eventually, s N is in the closed set C , it follows that Ω ⊂ C . As was arbitrary, Ω ⊂ >0 C = C.
The proof of (b) is completely analogous. 2 Note that if s ∈ [(S, T ), (R, R)] then T (s) has empty interior and so is disjoint from C.
Theorem 4.7. Assume that C ⊂ S is a separation path with s ∈ C.
Let proj : C → [S, T ] be the restriction to C of the first coordinate projection, i.e. proj(s) = s X .    Let Since s was an arbitrary point of C, including the possibility s = s 0 , it follows that For If is a separation line, then (S, T ) and (R, R) are on or above and so for s ∈ ∩ S, T (s) ⊂ H + and T (s) • ⊂ H + \ . Also, (P, P ) and (T, S) are on or below . Since (R, R) is on or above it follows that (P ,P ) is also on or below . Thus, Q(s) ⊂ H − and Q(s) • ⊂ H − \ . It follows that ∩ S is a separation path. Call a strict separation line when it is a separation line which does not contain (S, T ), (R, R), (T, S) or (P, P ). That is, for a strict separation line the points (S, T ) and (R, R) are strictly above and the points (T, S) and (P, P ) are strictly below . Furthermore, (P ,P ) is strictly below as well, because if (P ,P ) = (P, P ) then P > 1 2 (T + S) and the only separation line with (P ,P ) on or above it is the diagonal which contains (P, P ). Thus, with L an affine map associated with , we have L(S, T ), L(R, R) > 0 and L(T, S), L(P, P ), L(P ,P ) < 0. It follows that: s ∈ C =⇒ (T (s) ∪ Q(s)) ∩ C = {s}, then we will call C a strict separation path.
Recall that a function γ : [a, b] → R is called piecewise C 1 when γ is continuous and there is a finite sequence a = a 0 < a 1 < · · · < a n = b so that γ is continuously differentiable on each subinterval [a i−1 , a i ] for i = 1, . . . , n. Then at each point (t, γ(t)) with t ∈ [a, b) there is a tangent line from the right and at each point with t ∈ (a, b] there is a tangent line from the left. Except at the points with t = a i for i = 0, . . . , n these two agree are both are the true tangent line at the point.  Proof: (a) As it is a continuous bijection on a compact set, proj : C → [a, b] is a homeomorphism. If γ is the composition of proj −1 with the projection to the s Y coordinate then γ is a continuous map and proj −1 is given by t → (t, γ(t)).
In none of these cases can be a strict separation line. This proves (4.6) for any t = t 1 . This in turn implies implies C is a separation path.
For any s ∈ S, (T (s) ∪ Q(s)) ∪ Switch((T (s) • ∪ Q(s) • ) = S. Assume that C 1 is a strict separation path and that s 1 ∈ S \ {s}. If s 1 ∈ C 1 then s 1 ∈ (T (s) ∪ Q(s)). If s 1 ∈ Switch(C 2 ) then s 1 ∈ Switch((T (s) • ∪ Q(s) • ). It follows that s 1 ∈ C 1 ∩ Switch(C 2 ). 2 Remark: It is possible to extend (c) somewhat. It is easy to check that the set of separation paths is itself closed in the space of closed subsets of S equipped with the Hausdorff topology. So if {γ n } is a sequence of piecewise C 1 functions which satisfy the conditions of (c) and the sequence of graphs converges in the appropriate sense to the graph of γ then the graph of γ is a separation path.
Using simple geometric arguments similar to those in (c) above, we can describe what occurs in a non-strict separation path. Since we will not need the results, we leave the proof to the interested reader. 2 Example 4.10. A differential equations construction for strict separation paths.
For the example, we will restrict to the case P < 1 2 (T + S)and we will just sketch the argument, leaving the details to the reader.
Define for s ∈ S Since the S intersection with a separation line is a separation path, the following extends Corollary 2.8.
Corollary 4.11. Assume that eventually X plays a Smale plan π, Y uses an arbitrary plan and that the initial play is arbitrary. Let Ω be the limit point set of an associated sequence of outcomes. If C is a separation path such that C + \ C ⊂ π −1 (0) and C − \ C ⊂ π −1 (1), then Ω ⊂ C.
Proof: C + is upper full with S \ C + = C − \ C contained in the cooperation zone and C − is lower full with S\C − = C + \C contained in the defection zone. It follows from Proposition 4.5 that Ω ⊂ C + ∩C − = C. 2 Proposition 4.12. If s * = (s * X , s * Y ) ∈ S with P < s * Y < R then there is a strict separation path C such that s * is the unique point s ∈ C with s Y ≥ s * Y .
Choose V a point on the open segment ((S, T ), (P ,P )) with P < V Y < s * Y and so that )V, s * ( intersects ((R, R), (T, S)). Choose W a point on the open segment ((R, R), (T, S)) with P < W Y < s * Y and so that )s * , W ( intersects ((S, T ), (P ,P )). The lines )V, s * ( and )s * , W ( are strict separation lines. While it is easy to check directly that C satisfies (4.6), it follows from Theorem 4.8(c) that C is a strict separation path. Clearly, s * is the unique point of C with maximum height.
If s * ∈ [(S, T ), (P ,P )] then choose W as above so that =)s * , W ( is a strict separation line. If s * ∈ [(R, R), (T, S)] then choose V as above so that =)s * , V ( is a strict separation line. In either case, s * is the point of maximum height on the segment ∩ S. 2 Theorem 4.13. If s * = (s * X , s * Y ) ∈ S with P < s * X , s * Y then there is a pair of Smale plans π X , π Y which is a strong Nash equilibrium with s * the payoff to the pair.
Proof: Case 1 (P < s * X , s * Y < R): Apply Proposition 4.12 to choose C X a strict separation path with maximum point s * and C Y be a strict separation path with maximum point Switch(s * ) = (s * Y , s * X ). Let π X and π Y be Smale plans with (4.8) Corollary 4.11 and Theorem 4.8 (d) imply that when π X plays π Y the payoff is s * = C X ∩ Switch(C Y ). If X uses an alternative plan, then the limit set is contained in Switch(C Y ), and similarly if Y varies against π X .
Case 2 (s * X = s * Y = R): This is the good plan case studied in Section 3. Use π X = π Y a simple Smale plan with the separation line through (R, R) with slope m satisfying 0 < m < 1.
: Notice that if s ∈ S with s Y > R, then s X < R. Furthermore, for the case P < s * Y < R ≤ s * X it suffices to apply Switch to the other case. So we will assume that P < s * X < R ≤ s * Y . To begin with we will also assume that s * ∈ [(R, R), (S, T )].
First we use Proposition 4.12, or more precisely its proof, to choose C Y a separation path with maximum point Switch(s * ) = (s * Y , s * X ). That is, C Y = [Switch(V ), Switch(s * )]∪[Switch(s * ), Switch(W )] with V ∈ ((P ,P ), (T, S)), W ∈ ((R, R), (S, T )), with P < V X , W X < s * X and with the other conditions as described above. Let =)V, s * (. Let π Y = 0 on C + Y \C Y and = 1 on C − Y \C Y . So regardless of the X plan, the Switched limit set Switch(Ω) is contained in C Y which has Switch(s * ) as it unique point of maximum height.
Next choose W ∈ ((R, R), (T, S)) and V ∈ [(S, T ), (P ,P )] with . Let π X = 0 above C X and = 1 below C X . Now despite the labeling, C X is not a separation path. The lower full set C − X is the set of points on or below each of the lines )V , s * ( and )s * , W (, but the set of points on or above C X is not upper full. For every point r ∈ [V , s * ), (r, s * ] ⊂ T (r) * . On the other hand, Corollary 2.8 (c) applies to π X with C ⊂ C − X equal to the triangle [V , s * , W ]. So regardless of the Y plan, the limit set Ω is contained in C which has s * as its unique point of maximum height.
To complete the proof we must show that if, after some N * , X uses π X and Y uses π Y then the solution sequence converges to s * , i.e. Ω = {s * }. We know that Ω is contained in C ∩ Switch(C Y ). Ifs = ∩ , then C ∩ Switch(C Y ) is the segment K = [s * ,s]. Thus, given an arbitrary δ > 0 there is an N δ > N * after which that the sequence remains δ close to K.
Let δ > 0 be arbitrary. Within the δ ball V δ (s * ) we will build a box which contains s * in its interior. Choose A ∈ (V , s * ) ∩ V δ (s * ) close enough to s * that the line 1 =)(S, T ), A( crosses within V δ (s * ), i.e.
We may choose A close enough that the horizontal segment from B to (s * , V ) is also in V δ (s * ). Now let 2 =)(P, P ), B( and 3 =)(P, P ), A(. Because (P, P ) lies below )V , s * ( and above =)V, s * (, it follows that s * is below 3 and above 2 . We can choose A close enough to s * that B = 2 ∩ (s * , W ) and A = 1 ∩ (s * , W ) lie in V δ (s * ). This is where we use s * ∈ [(R, R), (S, T )]. Choose B ∈ 2 ∩ V δ (s * ), A ∈ the quadrilateral Box = [A, B, B , A ] lies in V δ (s * ) and contains s * in its interior.
The various lines cut S into a -finite-number of regions and we choose < δ to be smaller than the distances between any two disjoint such regions and small enough that the open neighborhood V (K) is contained in the union of Box and C − , the convex set of points of S in C or below and, finally, small enough that V (Box) is contained in V δ (s * ).
Let N > N * be an step-time so that for every N ≥ N the length ||s N +1 − s N || is less than . Hence, no such small move crosses between disjoint regions. Since Ω ⊂ K we can also choose N so that for every N ≥ N s N ∈ V (K).
Suppose that for some N ≥ N , s N ∈ Box. If s N lies to the left of then S N +1 = (R, R) and if s N lies to the right of then S N +1 = (S, T ). For s N ∈ either S N +1 = (R, R) or S N +1 = (S, T ). For any such s the (negative) slope of )(S, T ), s( is less than the slope of )(R, R), s(. Thus from the left we move toward (R, R) and after crossing we move toward (S, T ). The successive crossings are higher on K and thus our net motion is upward until we enter Box. From Box it is easy to see that exit could only occur from the triangle [s , B, s * ] landing above the horizontal line through B and close to Box and so is still in V δ (s * ). If it lands to the left of then the motion toward (R, R) is closer to Box and so [s N , s N +1 ] ⊂ V δ (s * ). If it lands to the right of then the motion toward (S, T ) is upward, remaining above the B horizontal line and so [s N , s N +1 ] ⊂ V δ (s * ). Subsequent alternating moves are above these initial ones and so remain in V δ (s * ) until the sequence re-enters Box. It follows that eventually the sequence lies in V δ (s * ) and so the limit point set Ω is contained in the closed ball of radius δ about s * . As δ > 0 was arbitrary, it follows that Ω = {s * } as required.
Finally, we adjust the proof to deal with the case s * ∈ [(S, T ), (R, R)]. If s * = (R, R) we are in Case 2. Since s * X > P , we have s * = (S, T ). We use C X as before. Now

Competition Among Simple Smale Plans
In this section we move beyond the classical question which motivated our original interest in good strategies. We consider now the evolutionary dynamics among simple Smale plans. We follow Hofbauer and Sigmund [11] Chapter 9 and Akin [2].
The dynamics that we consider takes place in the context of a symmetric two-person game, but generalizing our initial description, we merely assume that there is a set of strategies indexed by a finite set I. When players X and Y use strategies with index i, j ∈ I, respectively, then the payoff to player X is given by A ij and the payoff to Y is A ji . Thus, the game is described by the payoff matrix {A ij }. We imagine a population of players each using a particular strategy for each encounter and let ξ i denote the ratio of the number of i players to the total population. The frequency vector {ξ i } lives in the unit simplex ∆ ⊂ R I , i.e. the entries are nonnegative and sum to 1. The vertex v(i) associated with i ∈ I corresponds to a population consisting entirely of i players. Thus, ξ = v(i) exactly when ξ i = 1. We assume the population is large so that we can regard ξ as changing continuously in time.
Now we regard the payoff in units of fitness. That is, when an i player meets a j player in an interval of time dt, the payoff A ij is an addition to the background reproductive rate ρ of the members of the population. So the i player is replaced by 1 + (ρ + A ij )dt i players. Averaging over the current population distribution, the expected relative reproductive rate for the subpopulation of i players is ρ + A iξ , where The resulting dynamical system on ∆ is given by the Taylor-Jonker Game Dynamics Equations introduced in Taylor and Jonker [21].
This system is an example of the replicator equation systems studied in great detail in Hofbauer and Sigmund [11].
We will need some general game dynamic results for later application. Fix the game matrix {A ij }.
A subset A of ∆ is called invariant if ξ(0) ∈ A implies that the entire solution path lies in C. That is, ξ(t) ∈ A for all t ∈ R. An invariant point is an equilibrium.
Each nonempty subset J of I determines the face ∆ J of the simplex consisting of those ξ ∈ ∆ such that ξ i = 0 for all i ∈ J. Each face of the simplex is invariant because ξ i = 0 implies that dξ i dt = 0. In particular, for each i ∈ I the vertex v(i), which represents fixation at the i strategy, is an equilibrium.
In general, ξ is an equilibrium when, for all i, j ∈ I, ξ i , ξ j > 0 imply A iξ = A jξ , or, equivalently, A iξ = A ξξ for all i such that ξ i > 0, i.e. for all i in the support of ξ.
An important example of an invariant set is the omega limit point set of an orbit. Given an initial point ξ ∈ ∆ with associated solution path ξ(t), it is defined by intersecting the closures of the tail values.
By compactness this set is nonempty. A point is in ω(ξ) iff it is the limit of some sequence {ξ(t n )} with {t n } tending to infinity. The set ω(ξ) consists of a single point ξ * iff Lim t→∞ ξ(t) = ξ * . In that case, {ξ * } is an invariant point, i.e. an equilibrium. Notice that this is the analogue for the solution path of the limit point set Ω of a payoff sequence, considered in the previous sections.
Definition 5.1. We call a strategy i * a evolutionarily stable strategy (hereafter, an ESS) when Remark: We follow [5] in labeling this condition ESS although it is stronger than the condition originally introduced in [13]. In [11] precisely this condition is called a strict Nash equilibrium and so this language requires a bit of justification.
In the pure game theory context we regard a distribution ξ over I as a mixed-strategy rather than a population distribution of pure strategists. Then a pair ξ 1 , ξ 2 is a Nash equilibrium when each is a best reply against the other. That is, for all distributions η, Notice that from this we see that That is, all of the pure strategies active in ξ 1 are best replies to ξ 2 and vice-versa. In the context of Smale strategies this is the concept used in the previous section.
Following [2] we call the pair a regular Nash equilibrium when, in addition to (5.5), That is, the pure strategies active in ξ 1 are all of the pure strategies which give the best reply to ξ 2 and vice-versa.
Returning to the dynamic context, a distribution ξ is called a (regular) Nash equilibrium when the pair ξ, ξ is a (regular) Nash equilibrium in the above sense. From (5.6) we see that a Nash equilibrium is an equilibrium for the Taylor-Jonker equations as defined above. When ξ is the vertex v(i * ) then it is a regular Nash equilibrium exactly when it is a strict Nash equilibrium as defined on page 62 of [11], or, equivalently, (5.4) holds.
Proposition 5.2. If i * is an ESS then the vertex v(i * ) is an attractor, i.e. a locally stable equilibrium, for the system (5.2). In fact, there exists > 0 such that Thus, near the equilibrium v(i * ), which is characterized by ξ i * = 1, ξ i * (t) increases monotonically, converging to 1 and the alternative strategies are eliminated from the population in the limit.
Proof: When i * is an ESS, A i * i * > A ji * for all j = i * . It then follows for > 0 sufficiently small that ξ i * ≥ 1 − implies A i * ξ > A jξ for all j = i * . If also 1 > ξ i * , then A i * ξ > A ξξ . So (5.2) implies (5.8). 2 For J a nonempty subset of I we say a strategy i weakly dominates a strategy j in J when i, j ∈ J and with strict inequality for k = i or k = j. If the inequalities are strict for all k then we say that i dominates j in J.
We say that i ∈ J weakly dominates a sequence {j 1 , ..., j n } in J when there exists 1 ≤ m ≤ n such that i weakly dominates j p in J for p = 1, . . . , m and for p = m + 1, ..., n, i dominates j p in J \ {j 1 , ..., j p−1 }.
When J equals all of I we will omit the phrase "in J". Proof: (a): The face {ξ : ξ j = 0} is invariant. So if ξ j (0) = 0 then ξ j (t) = 0 for all t and so it is 0 in the limit. Thus, we may assume ξ j (0) > 0.
For i, j ∈ I, define the open set Q ij and on it the real valued function H ij by Hence, H ij (ξ(t)) is a strictly increasing function of t on the open invariant set Q ij . Thus, as a t tends to infinity H ij (ξ(t)) approaches We must prove that ξ j = 0 on the omega limit set. Assume instead that ξ * ∈ ω(ξ(0)) with ξ * j > 0. If ξ * i were 0 then H ij (ξ(t)) would not be bounded below on {ξ(t) : t ≥ 0}. Hence, ξ * lies in Q ij with h ∞ = H ij (ξ * ) < ∞. So on the invariant set ω(ξ(0)) ∩ Q ij , which contains ξ * , and so is nonempty, H ij would be constantly h ∞ < ∞. Since this set is invariant, dH ij /dt would equal zero. This contradicts (5.11) which implies that the derivative is positive on ω(ξ(0)) ∩ Q ij .
The proof of (b) is a variation of the proof of (a). We refer to [5] Proposition 4.6. An obvious adjustment of the initial step in the inductive proof of (b) there yields the proof here. 2 Corollary 5.5. Assume I = {i * , j 1 , . . . , j n } and i * ∈ I weakly dominates the sequence {j 1 , ..., j n }. If ξ i * (0) > 0 then lim t→∞ ξ i * (t) = 1.
Proof: By Proposition 5.4 ξ jp (t) → 0 for all p = 1, . . . , n and so ξ i * (t) = 1 − Σ n p=1 ξ jp (t) → 1. In [5], see also [3], we examined competition among certain special Markov plans called Zero-Determinant Plans. It was proved that good Markov plans among them are attractors when competing against plans which are not agreeable. In addition, global stability was proved when the class of competitors was further restricted. Here we will similarly consider competition among simple Smale plans. Let I = {i * , j 1 , . . . , j n } index a list of simple Smale plans with π i associated with separation line i for i ∈ I. Except for the extreme cases the intersection i ∩ Switch( j ) is a single point. For π the plan with the diagonal we will assume that the plan is weakly agreeable and that the initial play is c. So if X and Y both play π the payoff is (R, R). If P ≤ 1 2 (T + S) then the co-diagonal is a separation line and we will adopt the convention that if both players use co-diagonal plans then the payoff is 1 2 (T + S, T + S). If X plays π i and Y plays π j we will let (A ij , A ji ) be the coordinates of the payoff point with the above conventions in the extreme cases. We will then use (5.2) to represent the dynamics of the competition with ξ i the fraction of the π i players in the population.
Notice that if Y plays an equalizer plan π j with j horizontal then A ij = A i j for any plans π i and π i for X. In particular, if all of the plans are equalizer plans then A iξ = A i ξ for all i, i ∈ I and so A iξ = A ξξ for all i ∈ I and for any population state ξ. Thus, the dynamics is trivial and every state ξ is an equilibrium. Now we consider the case when i * is a protection line. That is, i * is a line through (R, R) with slope m satisfying 0 < m ≤ 1. Notice that m = 1 is the diagonal line case. If m < 1 and π i * (R, R) = 1 then π i * is a good simple Smale plan.
Theorem 5.6. If i * is a protection line and (R, R) ∈ j for any j ∈ I \ {i * } then i * is an ESS and so fixation at i * is an attractor.
Proof: A i * i * = R. If Y plays π i * and X plays π j for j ∈ I \ {i * } then the payoff point is not (R, R) and so A ji * and A i * j are both less than R because i * is a protection line. This implies (5.4) and so the result follows from Proposition 5.2. 2 Since ξ i * (0) = 0 implies ξ i * (t) = 0 for all t the best stability result we can hope for is that every solution with ξ i * (0) > 0 converges to fixation at i * . We will call this global stability.
As an illustration we describe a very special case.
Theorem 5.7. Assume that i * is a protection line. If for every j ∈ I \ {i * }, j is a horizontal line {s Y = C j } with P ≤ C j < R, then for every j ∈ I \ {i * }, i * weakly dominates j and so the system exhibits global stability.
Proof: Because i * is a protection line, we have, as in Theorem 5.6, A ji * < A i * i * = R. For any k ∈ I \ {i * }, C k = A jk = A i * k for all j ∈ I and weak domination, (5.9), follows. A fortiori, i * weakly dominates the sequence {j 1 , . . . , j n } and the result follows from Corollary 5.5. 2 We will show that we achieve global stability if π i * is good, (R, R) ∈ j for any j ∈ I \ {i * } and, in addition, all the lines j have positive slope. This requires a bit of geometry.
Lemma 5.8. Assume that i * is a protection line, (R, R) ∈ k for any k ∈ I \ {i * } and that k has non-negative slope for every k ∈ I. If for someī ∈ I the segment ī ∩ S lies below i * then i * weakly dominatesī.
Proof: As usual Aī i * < A i * i * = R. For any k ∈ I the line Switch( k ) is either vertical or has positive slope. LetV , V * be the intersection points of Switch( k )∩ ī and Switch( k )∩ i * , respectively. If Switch( k ) is vertical then the X coordinates ofV and V * are equal. If Switch( k ) has positive slope then V * is above and to the right ofV and so has a larger X coordinate. Thus, Aī k ≤ A i * k , proving weak domination. 2 Now we assume that the slope of i * < 1, i.e. i * is not the diagonal. Let V be the point of intersection i * ∩ )(S, T ), (P ,P )(, wherē P = min(P, 1 2 (T + S)). Thus, i * ∩ S \ (R, R) = [V, (R, R)) and the entire half-open segment lies above the diagonal. Now let j be a separation line which does not contain (R, R). So it contains a point A ∈ ((R, R), (T, S)]. Let B be the intersection point j ∩ )(S, T ), (P ,P )(. If B lies below i * then the entire segment j ∩S lies below i * . Otherwise, B ∈ [V, (S, T )] and this is the situation we wish to examine.
Since A is below i * and B is on or above i * it follows that j intersects i * at a point V j of S with V j X its X coordinate. Notice that the portion of j to the right of V j lies below i * . In any case, Switch( j ) intersects i * at a point W j with X coordinate W j X .
Lemma 5.9. If j has non-negative slope then V j X < W j X . Proof: The lines j and Switch( j ) meet the diagonal at a common point (Q, Q) = j ∩ Switch( j ). To the right of {s X = Q} the line j lies below the diagonal, because A is below the diagonal. On the other hand, all of i * ∩ S \ (R, R) lies above the diagonal. Hence, V j X < Q. Similarly, Switch( j ) intersects i * above the diagonal. Since Switch( j ) is either vertical or has positive slope, it follows that Q ≤ W j X . 2 From this we obtain the main result of this section.
Theorem 5.10. {π i : i ∈ I} be a finite indexed collection of simple Smale plans with i the separation line for π i . Assume that for some i * ∈ I, i * is a line through (R, R) with slope strictly between 0 and 1 and (R, R) ∈ j for any j ∈ I \ {i * }. If i has non-negative slope for all i ∈ I and, in addition, i ∩ S lies below i * for those i ∈ I with i horizontal, then fixation at i * is a globally stable equilibrium. That is, if ξ i * (0) > 0 then lim t→∞ ξ i * (t) = 1.
Proof: We choose a numbering of the n elements of I \ {i * } by letting j 1 , ..., j m with 0 ≤ m ≤ n so that j ∩ S lies below i * if and only if j = j p for some p ≤ m. If no such exist then m = 0 and the set is empty.
For the remaining j 's the slope is positive and the numbers V j X and W j X are defined as above. Number them so that V jp X ≤ V j p+1 X for m < p < n.
By Corollary 5.5 it suffices to show that i * weakly dominates the sequence {j 1 , . . . , j n }.
To begin with i * weakly dominates each j p for p ≤ m by Lemma 5.8. We must show that if m < p ≤ n then i * dominates j p in {i * , j p , j p+1 , . . . , j n }.
Because of the chosen numbering and Lemma 5.9 we have V jp X ≤ V k X < W k X . That is, intersection point W k of Switch( k ) ∩ i * lies to the right of V jp . The slope of Switch( k ) is greater than 1 and the slope of i * is less than 1. Hence, Switch( k ) is above i * to the right of W k and below i * to the left. It follows that Switch( k ) intersects the vertical line {s X = V jp X } below V jp and so below the line jp , because V jp lies on jp . Again Switch( k ) has slope greater than 1 and jp has slope less than one. So Switch( k ) intersects jp to the right of this vertical line. Right of this V jp vertical line, jp lies below i * . As in Lemma 5.8 the intersection point Switch( k ) ∩ jp lies below and to the right of W k = Switch( k ) ∩ i * That is, A jpk < A i * k . Thus, i * dominates j p in {i * , j p , j p+1 , . . . , j n }, as required.  (1) and (2). If the sequence {w n } is monotonically non-increasing or non-decreasing, then Conditions (1) and (2) imply Condition (3).
Since the averaging procedure uses ratios we may multiply by a positive constant and so assume w 1 = 1 and hence W N ≥ 1 for all N . Now assume that {w n } is a positive sequence with w 1 = 1 and Conditions (1) and (2) hold.
We replace our previous averaging of the payoff sequence in (2.4) to define (6.1) We obtain the analogues of (2.5) and (2.6).
and so By Condition (1) (6.1) implies that ||s N +1 − s N || → 0 and so the limit point set is connected as before. However, the crucial fact is (6.2) which says that s N +1 is on the segment [S N +1 , s N ] with the weight on s N approaching 1 as N → ∞. Consequently, all of the linear estimates for Smale plans go through as before. The only change is that the numerical estimates M N * /N are replaced by M W N * /W N which tends to 0 as N → ∞ by Condition (2). In particular, when two non-extreme, simple Smale plans compete we obtain convergence to the intersection point regardless of the averaging procedure.
It is the similar result for Markov plans that requires Condition (3). Suppose that M is the Markov matrix when X plays p and Y plays q. Let v 1 be the initial distribution and v n+1 = v n M, the distribution after round n + 1. Define (6.4) It follows that (6.5) Since the length of a distribution is at most 1 we have that From Condition (3) it follows that any limit point of the sequence {v N } is a stationary distribution. In particular if there is a unique terminal set and so a unique stationary distribution v then {v N } converges to v. If J is one of several terminal sets then with probability p J , depending only on then initial distribution, v 1 , the sequence of outcomes enters J. The conditional distributions assuming entrance into J then converge to the unique stationary distribution on J.
In contrast with all this, there is another sort of natural averaging which does not work. Suppose we use (6.7) With Condition (3) one can still show that ||s N +1 − s N || → 0, but this time s N +1 is not on the segment [S N +1 , s N ] except when all the w n are equal, in which case the two sorts of averaging agree (This is the original w n = 1 for all n case). So the results from the first section will not carry over.
The other variation to consider is a asymmetric version of the Prisoner's Dilemma with payoffs given by and with inequalities for X and for Y analogous to those of (2.3). This is a real issue because in the classic version of the Prisoner's Dilemma the payoffs are not in units of dollars, time reduced from a prison sentence or population fitness, but in terms of utility and there is no reason that the two players would have the same Von Neumann-Morgenstern utility functions.
At first glance, there is no problem. In [4] the good Markov strategies are characterized for the asymmetric case. In [18] Smale points out that the theory will work the same way for the asymmetric case. Now one must describe separate Smale strategies for Y, rather than using π • Switch, but as he indicates the mathematics is essentially the same.
There is, however, an underlying philosophical problem. In [4] the inequalities for a good plan for X use the payoffs for Y, which, in theory, X does not know. In the Markov case, this is not too bad because only a rough estimate is needed to ensure that the strategy is good.
In the Smale case, the running averages use the payoffs to both players. Perhaps the best way to proceed would be to begin again and operate, not in the two dimensional convex set generated by the payoff pairs but in the three dimensional simplex of outcomes. That is, let e cc = (1, 0, 0, 0), e cd = (0, 1, 0, 0), e dc = (0, 0, 1, 0), e dd = (0, 0, 0, 1). (6.9) The convex hull S with these vertices is the simplex of distributions on the four outcomes. The data we use from the sequence of outcomes {o 1 , . . . , o N } is the frequency of past outcomes: so that, analogous with (2.5) A plan for X is then a map π : S → [0, 1] with π(s) the probability of cooperating in response to position s. So a pure strategy plan, of the sort Smale uses would be a map π : S → {0, 1}.
Linear results analogous to those of Section 2 can then be carried over. Nonetheless, determining what is a good plan would still require some estimate of the opponent's payoffs. This is a task for another day.