ASYMPTOTICALLY OPTIMAL STRATEGIES IN REPEATED GAMES WITH INCOMPLETE INFORMATION AND VANISHING WEIGHTS

. We construct asymptotically optimal strategies in two-player zero-sum repeated games with incomplete information on both sides in which stages have vanishing weights. Our construction, inspired in Heuer (IJGT 1992), proves the convergence of the values for these games, thus extending the re- sults established by Mertens and Zamir (IJGT 1971) for n -stage games and discounted games to the case of arbitrary vanishing weights.

The convergence of the values v θ (p, q), as sup m≥1 θ m tends to 0, was established using viscosity solutions techniques, which are typical of continuous time games, but only for the independent case. The dependent case was settled by Oliu-Barton [15], using the so-called splitting game: a stochastic game in which the players jointly control the (martingale of) posterior beliefs. Such an approach was recently extended by Laraki and Renault [10], to the case of strongly acyclic independent gambling houses (which include the splitting game, in the independent case), where an extension of Mertens and Zamir characterisation was established. The literature mentioned so far was mainly concerned about the values: existence and characterisation. But what about the players' optimal strategies? There are mainly two approaches: optimal strategies for a fixed evaluation (n-stage games, for instance) and asymptotically optimal strategies, that is, a "way of playing" which becomes closer and closer to the optimum as the game is long. The difference between the two approaches is that, where the first provides an exact optimal strategy for each fixed length of the game, the second is more robust for it provides ε-optimal strategies for any game which is long enough.
Concerning the former, Aumann and Maschler [1] provided a recursive formula for the values of n-stage repeated games with incomplete information on one side from which the informed player (and only him) can deduce an optimal strategy recursively. The notion of dual game was introduced by De Meyer in [3,4] in order to provide a similar construction for the uninformed player. These duality techniques were later extended by De Meyer and Marino [5] for repeated games with incomplete information on both sides, in independent case, and extended by Gensbittel and Oliu-Barton [6] to the general case.
Concerning the asymptotic approach, Aumann and Maschler [1] noticed that, in the case of repeated games with incomplete information on one side, the following strategy is asymptotically optimal for the informed player: use his private information at the first stage (in order to reach Cavu(p)) and then play non-revealing for the rest of the game. For the uninformed strategy, an asymptotically optimal strategy is obtained through weak approachability. Heuer [7] provided asymptotically optimal strategies for n-stage repeated games with incomplete information on both sides, combining the geometric properties of the solutions to (MZ) and approachability techniques. Oliu-Barton [15] described uniformly optimal strategies in the splitting game for which, moreover, the payoff and state variable remained constant. A similar approach was extended to acylcic gambling houses by Laraki and Renault [10].
The main contribution of this paper is to extend Heuer's procedure to provide asymptotically optimal strategies in repeated games with incomplete information on both sides, for a vanishing evaluation. Both players having symmetric roles, we will focus on the second player only. The strategy provided combines the construction of a martingale and approachability techniques, two aspects that were already present in games with incomplete information on one side, albeit one at a time. The idea behind the strategy is that, as far as the player can "do well" without using his information, he uses an approachability (non-revealing) strategy. When this is no longer possible, he uses his information optimally, generating an appropriate martingale of posterior beliefs. Our construction provides an alternative, strategybased proof of the convergence of the values of two-player zero-sum repeated games with incomplete information on both sides, as the weights on each stage go to zero, to the unique solution of the Mertens and Zamir system. Organization of the paper. In Section 2 we introduce the model of repeated games with incomplete information, state our main contributions and provide an explanation of the strategy. Section 3 is devoted to the formal construction of an asymptotically optimal strategy for player 2. Both players having symmetric roles, the construction can be applied to of asymptotically optimal strategies for player 1 is analogous.

Model and main result.
2.1. The model. Let K, L, I, J be finite sets. A repeated games with incomplete information (on two sides) is described by a family of I × J-matrix games {G k , (k, ) ∈ K × L}, a probability π ∈ ∆(K × L) and a sequence of weights θ ∈ ∆(N * ). The game is played as follows: • A pair of parameters (or types) (k, ) ∈ K × L is drawn according to the joint probability measure π ∈ ∆(K × L): Player 1 is informed of k, while player 2 is informed of . • Then, the game G k is played repeatedly: at each stage m ≥ 1, knowing the past actions and their private information, the players choose actions (i m , j m ) ∈ I × J. The weight of the stage payoff G k (i m , j m ) in the overall payoff function is θ m . • At the end of the game Player 1 receives m≥1 θ m G k (i m , j m ), while Player 2 receives the opposite amount.
A (behavioral) strategy for Player 1 is a function from the set of his past observations into the set ∆(I). A strategy is defined similarly for Player 2. As already mentioned, we assume that the players observe and remember the history of the play h m = (i 1 , j 1 , . . . , i m−1 , j m−1 ), at every stage m ≥ 1. The set of possible histories at stage m is thus H m := (I × J) m−1 and H := ∪ m≥1 H m is the set of all possible (finite) histories. The information available to player 1 (resp. 2) at stage m is (k, h m ) (resp. ( , h m )). Fromally, strategies are defined as follows.
Definition 1 (Strategies). Let S (resp. T ) be the set of mappings s : H → ∆(I) (resp. t : H → ∆(J)) from the set of finite histories to the set of mixed actions. A strategy for player 1 (resp. 2) is an element in S K (resp. T L ).
Let P π s,t be the unique probability distribution over K × L × (I × J) ∞ induced by π ∈ ∆(K × L) and (ŝ,t) ∈ S K × T L on the σ-algebra generated by the cylinders. The payoff function in G θ (π) is defined as follows: where E π s,t is the expectation with respect to P π s,t . The game G θ (π) has a value, denoted by v θ (π), which satisfies: The value function is denoted by v θ : ∆(K × L) → R.
The following additional notation will be used in the rest of this paper: • v : ∆(K × L) → R is some continuous solution of the (MZ) system. • For any finite or countable set E, we denote by ∆(E) the set of probability distributions over E. That is, ∆(E) = {(β e ) e∈E | β e ≥ 0 for all e ∈ E and e∈E β e = 1}.
• For any π ∈ ∆(K × L): -D(π) is the average I ×J-matrix game (k, )∈K×L π k G k . It corresponds to the (one-shot) game in which the players cannot use their private information. It is sometimes referred as the non-revealing game. u(π) is the value of D(π).
-The marginals of π are denoted by p ∈ ∆(K) and q ∈ ∆(L), the matrix of conditionals by P ∈ ∆(K) L and Q ∈ ∆(L) K respectively so that: That is, π k = p k Q( |k) = q P (k| ) for all (k, ) ∈ K × L. • When dealing with a sequence (π m ) m in ∆(K × L), the marginals and conditions will be denoted by p m , q m , Q m and P m .

Main result.
The main contributions of this paper is the construction of a family of strategies for Player 2 (more precisely, a strategy t θ ∈ T L for each sequence of weights θ ∈ ∆(N * )) satisfying the following robust condition: for any ε > 0 there exists δ > 0 such that, for all θ ∈ ∆(N * ) satisfying θ ≤ δ, As a corollary, we obtain a strategy-based proof for the convergence of the values in zero-sum repeated games with incomplete information on both sided, as the weights on each stage tend to 0.
N.B. Our construction differs from the so-called uniform approach where one looks for a family of strategies {t ε ∈ T L , ε > 0} having a similar property, that is, supŝ ∈S K γ θ (π,ŝ,t ε ) ≤ v θ (π) + ε for all θ satisfying θ ≤ δ. Note, however, a crucial difference: the strategytε is independent from θ. The fact that such strategies fail to exist for repeated games with incomplete information on both sides was proved by Aumann and Maschler in the late sixties (see [1]).

2.3.
Informal description of asymptotically optimal strategies. In this section we provide an asymptotically optimal strategyt * θ ∈ T L for player 2 in the repeated game G θ (π) which guarantees the asymptotic value v(p) up to small error (which vanishes as θ tends to 0). The strategy combines the construction of a martingale with approachability techniques. The idea is that, as far as the player can "do well" without using his information, he uses an approachability strategy by playing non-revealing, i.e. in the average game D(π). When this is no longer possible, he will use his private information optimally in order to generate an appropriate martingale of posterior beliefs. Our construction specifies when to reveal, how much information to reveal and what direction to target when the optimal play in non-revealing play. The geometry of the solution to the (MZ)-system will play an important role in the construction. Idea of the strategy. Let us give here an idea of the strategy, leaving all the technicalities to the next sections. Let θ ∈ ∆(N * ) and π = p ⊗ Q ∈ ∆(K × L) be fixed throughout this section, where p and Q denote, respectively, the marginal of π on K and the conditional probabilities on L given k, i.e. π k = p k Q( |k) for all (k, ) ∈ K × L.
Let B(Q) and B(p, Q) denote, respectively, the set of hyperplanes above and supporting hyperplanes to the function p → v(p ⊗ Q), from ∆(K) to R : The aim of player 2 (the minimizer) is to obtain an overall vector payoff z satisfying z ≤ x 1 meaning that, whatever player 1's type k is, he obtains no more than x k 1 so that his expected payoff is z, p ≤ v(π). • Play some arbitrary action τ ∈ ∆(J) at stage 1, and observe player 1's action i = i 1 . • Compute the vector G Q iτ ∈ R K of expected payoffs of stage 1, defined as follows: where σ ∈ ∆(I) K is the mixed strategy used by player 1 at the first stage. • The remaining stages of the game have a total weight of m≥2 θ m = 1 − θ 1 .
Thus, to reach the target x 1 (or, equivalently, the region x 1 − R K + ) in the remaining stages of the game, player 2 needs to reach some x 2 ∈ R K satisfying Thus, the new target is set as • It may very well be the case that the new target is not achievable any more (i.e. x 2 / ∈ B(p, Q)) meaning that, unless player 2 uses part of his private information, he will not reach the region x 2 − R K + . Thus, player 2 needs to adjust his target by using some of his private information.
• A new target x 2 (j) is defined for each action j = j 1 played by player 2 at stage 1, solving the following equation: Note that Z 1 is a random variable with values in R K , centered in each coordinate. • The choice of an appropriate Z 1 ensuring that the new target is possible for each j 1 and optimal is the key issue here. • The tower property of conditional laws ensures that In other words, the equation (3) still holds, but only in expectation. • The strategy is then constructed inductively, defining targets x 2 , x 3 , . . . and random variables Z 2 , Z 3 , . . . . The formal details are given in Section 3. • This strategy ensures that, in expectation, the overall payoff G T up to stage T , which is a random variable depending on the realizations of Z 1 , . . . , Z T and on the past actions (i 1 , j 1 , . . . , i T , j T ) belongs to x 1 − R K + up to an error of order sup m≥0 θ 1/2 m . It follows that G T , p ≤ v(π) + ε for any θ small enough. Comments. The targets x 1 , . . . , x m play the role of the dual variable in the dual game introduced by De Meyer [3], whereas the random variables Z 1 , . . . , Z m provide the necessary splittings, as in the recursive formula of De Meyer [5] or Gensbittel and Oliu-Barton [6] from which optimal strategies can be constructed for a fixed evaluation.
3. Formal construction of the strategy.
3.1. Preliminary results. Let us start by introducing some notation and giving two important preliminary results.
Linear homogeneous extension. For technical reasons, it will be useful to consider the linear homogeneous extension of u and v to R K×L + by setting v(π) := π 1 v( π π 1 ) and u(π) := π 1 u( π π 1 ) for any π = 0 and and v(0) = u(0) = 0. In these expressions · denotes the L 1 -norm in R K×L . Both v and u are Lipschitz continuous in R K×L + . The system (MZ) can be easily transposed R K×L + , and v remains a continuous solution. In particular, p → v(p ⊗ Q) and q → v(q ⊗ P ) are, respectively, concave and convex. For any (p, Q) ∈ R K + × (R L + ) K , let B(Q) and B(p, Q) be, respectively, the set of vectors above and supporting the function p → v(p ⊗ Q). Note that, for any p ∈ ∆(K), the inclusions B(p, Q) ⊂ B(p, Q) holds.
By a slight abuse in the notation, and to avoid heavy notation, let u, v, B(Q) and B(p, Q) stand for u, v, B(Q) and B(p, Q) respectively.
Functions on a product set. For any function f : R K×L Similarly, we will use the notation f 2 (y, P ) := f (q ⊗ P ) for any y ∈ R L + and P ∈ (R L + ) K .
The next result is obtained similarly.
Then v 2 ( · , P ) is linear in C. 3.3. Description of the strategy. Let θ ∈ ∆(N * ) denote a fixed sequence of weights. Let us define the strategyt :=t θ recursively. At stage 1, let x 1 ∈ B(p, Q) be the initial target, and let Q 1 := Q be the initial matrix of conditional probabilities. Thus x 1 ∈ B(Q 1 ) (see Figure 1) . Suppose that the strategy has been constructed at stages 1, . . . , m. Let x 1 , . . . , x m ∈ R K and Q 1 , . . . , Q m ∈ ∆(L) K denote the targets and conditionals defined inductively. Let h m denote the history, form player 2's point of view, at stage m, i.e. h m contains his type, the past actions, and the realizations of some centered, random variables.
• Let z m ∈ argmin z∈B(Qm) x m − z 2 be the projection of x m on B(Q m ). The set being upper comprehensive, p m := z m − x m ∈ R K + (see Figure 2). Next, define π m ∈ (R + ) K×L by the relation π m := p m ⊗ Q m . Finally, let q m ∈ R L + be the marginal of π m on R L + , and let P m ∈ (R K + ) L be such that π m := p m ⊗ Q m = q m ⊗ P m .
• We distinguish two cases, depending on whether v(π m ) ≥ u(π m ) or not. Case 1. v(π m ) ≥ u(π m ). In this case, playing an optimal strategy in the non-revealing game is seems convenient for player 2. Indeed, he guarantees u(π m ) which is below the limit value. We set where τ m is some optimal strategy in the average game D(π m ). Notice that this case includes the situation where x m ∈ B(Q m ), so that π m = 0. Since any strategy is optimal in D(0), player 2 can play anything whenever x m belongs to B(Q m ) (this is the case at stage 1). Case 2. v(π m ) < u(π m ). Let (R, α m , {q m (r), r ∈ R}) be given by Proposition 1, at (q m , P m ). Use the splitting lemma [18, Proposition 2.3] to generate posteriors q m (r) with probability α r m , for each r ∈ R. Precisely, define a probability µ m ∈ ∆(R) L as follows: The choice of q m (r) ensures that v(q m (r) ⊗ P m ) = u(q m (r) ⊗ P m ), for each r ∈ R. Given r, we set where τ m (r) is an optimal strategy in the non-revealing game D(q m (r) ⊗ P m ). See Figure 2 for an illustration. • Notice that, w.l.o.g., the set of signals R used in the construction of the strategy in Case 2 can be taken to be some fixed set (thanks to Carath dory theorem). Besides, note also that the player can use a splitting also in Case 1 too, by setting q m (r) = q m for all r ∈ R. With this convention, the strategŷ t generates a private (signal) r m ∈ R at every stage.

Remark 1.
If |L| = 1, then v(p) = Cav ∆(K) (p) holds for all p ∈ ∆(K) [1]. In particular, v(p m ) ≥ u(p m ) for all m, so that Case 1 applies at every stage. The strategy reduces then to a weak-approachability strategy [19], or the so-called extremal aiming method in differential games [8,11].
x, · Figure 1. Duality between the hyperplanes above v 1 ( · , Q) (resp. supporting v 1 ( · , Q) at p ∈ ∆(K)) and the set B(Q) (resp. B(p, Q)). Here, z ∈ B(Q) corresponds to the hyperplane p → z, p and x ∈ B(p, Q) corresponds to p → x, p . Figure 2. Illustration of the strategy at stage m, in the case v(π m ) < u(π m ) where player 2 needs to use his private information. In the figure, R = {r, r } and α m = (α, α ) ∈ ∆(R). The vectors z m (r) and z m (r ) belong, respectively, to B(p m (r), Q m (r)) and B(p m (r), Q m (r )). The construction is trivial in the case v(π m ) ≥ u(π m ), for it is enough to take z m (r) = z m (r ) = z m .
As is shown in Lemma 4 below, Q m corresponds to the matrix of conditional probabilities on L, given k ∈ K, generated by players 2, i.e.
For any r ∈ R, define p m (r) ∈ R K + by the following relation: 3.4.2. The target x m . For any m ≥ 1, let t m = m−1 n=1 θ n be the weight of the past stages up to stage m. Letting r m ∈ R, define x m+1 := x m (r m ) as follows: where g m (r) and z m (r) are two vectors in R K that need to be specifies. The former corresponds to the stage payoff (from player 2' perspective) at stage m, after obtaining the outcome r in the private state-dependent lottery µ m at stage m.
That is: The vectors {z m (r), r ∈ R} are chosen so that they satisfy The existence of such vectors is proved in Section 3.5 is a crucial part in the construction. Condition (ii) ensures that the expectation of z m (r) − z m is zero so that: so that, in average, one has an approachability scheme. Condition (i) on the other hand ensures that the distance between x m (r) and B(Q m (r)) is smaller than the distance between x m and B(Q m ). This is crucial because, in order to obtain v(π), player 2 needs the target x N to be close to B(Q N ) for some large N .

Technical comments.
This section provide some technical comments which will be useful in the proofs below. It can be skipped at a first reading. 1. Note that, by definition of the (g m ) m , for any N ∈ N one has: θmE π s,t G k (im, jm) | k, hm, im, rm , θm p, gm .
2. The following explicit definition will be convenient in the sequel: where for all (r, k) ∈ R × K, This vector corresponds to the updating of the marginal on K, since (see Comment 1 below) for any (r, k) ∈ R × K: p k m (r) = p k m Φ m (r, k).

Notice that, by definition,
so that (ii) can be written as 4. When v(π m ) ≥ u(π m ) the splitting is trivial and one can take µ m (r| ) = α r for all (r, ) ∈ R × L. In particular, µ(·| ) is independent of and, as a consequence, Q m (r) = Q m and Φ m (r, k) = 1 for all (r, k) ∈ R × K. Therefore, there is trivial way to make the vectors {z m (r), r ∈ R} satisfy (i) and (ii): setting set z m (r) = z m for all r ∈ R. The update of the target is then The martingale component thus fades away, and the strategy becomes a weak approachability strategy. 5. Note that condition (ii) gives, for each k ∈ K, a splitting of z k m . A straightforward computation gives (see the proof Corollary 1): , ∀(r, k) ∈ R × K, so that the equality in (13) holds. As a consequence: P π s,t (r, k |h m ) = p k m α r m Φ m (r, k), = α r m p k m (r). Thus, in particular, p k m (r) = p k m Φ m (r, k), for all (r, k) ∈ R × K. 6. In the independent case, Φ m (r, k) = 1 for all (r, k) ∈ R × K. Then, the equality (13) reduces to r∈R α r m z m (r m ) = z m . In other words, there is one same splitting for each coordinate k ∈ K.

3.5.
Existence of the strategy. The existence of the strategy defined above relies on the existence of a set of vectors {z m (r), r ∈ R} satisfying (i) and (ii). This section is devoted to this result. Proposition 1. Let π = q ⊗ P = p ⊗ Q such that v(π) < u(π). Let (R, α, (q r ) r ) be given by Lemma 1. For all (r, k) ∈ R × K, let and let Q r ∈ (R L + ) K be defined as follows: Let p r ∈ R K + be such that π r := q r ⊗ P = p r ⊗ Q r . Then, B(p, Q) ⊂ B * := z ∈ R K | ∃zr ∈ B(pr, Qr), r ∈ R, ∀k ∈ K, z k = r∈R α r Φ(r, k)z k r . Remark 2. In the independent case, i.e. π = p ⊗ q, the last statement reduces to B(p, q) = r∈R α r B(p, q r ). The converse inclusion r∈R α r B(p, q r ) ⊂ B(p, q) is easily obtained from the convexity of q → v(p, q).
Proposition 1 is quite technical. For this reason, we have preferred to leave its proof to the Appendix at the end of this chapter. The following result shows that z m ∈ B(p m , Q m ), for any m ≥ 1. Thus, one can apply Proposition 1 to obtain a representation of z m with elements z m (r) ∈ B(p m (r), Q m (r)), r ∈ R.
Lemma 3. Let Q ∈ (R L + ) K and let x / ∈ B(Q). Let z ∈ argmin y∈B(Q) x − y be its projection on B(Q). Then z − x ∈ R K + and z ∈ B(z − x, Q).
Proof. The convexity and upper comprehensiveness of Besides, by construction, the vector z − x is normal to B(Q) at z, so that z − x, y − z ≥ 0 for all y ∈ B(Q). In particular, for any y ∈ B(z − x, Q). But then: so that the inequalities are equalities. Thus, z ∈ B(z − x, Q).
The next result states that Q m corresponds to the true matrix of conditional probabilities, and that it can be monitored by played 2. Besides, the proof shows that the two definitions of Q m (r) given above, i.e. (6) and (11), are equivalent.
The result follows since Q m (r) = Q m+1 .
Proof. By Lemma 3, z m ∈ B(p m , Q m ). Both (9) and (10) are clear in Case 1. Indeed, Q m (r) = Q m for all r ∈ R, and it is enough to take z m (r) = z m , for all r ∈ R. In Case 2, apply Proposition 1 to obtain {z m (r), r ∈ R} such that z m (r) ∈ B(p m , Q m (r)) and r∈R α r m Φ m (r, k)z k m (r) = z k m , which gives the second equality in (10). The first equality comes from straightforward computation. It is enough to check that: 3.6. The error term. The next result gives an estimate for the payoff guaranteed by the split-and-approach strategy constructed above.
Proposition 2. Lett * be the strategy described in Section 3.3. There exists a constant C ≥ 0, independent from θ such that for allŝ ∈ S K : Proof. Fix some m ≥ 1 such that t m < 1. Fix r ∈ R and i m ∈ I. Recall that g m (r) depends on i m . By the choice of τ m (r), i.e. an optimal strategy in D(p m (r)⊗Q m (r)), one has: p m (r), g m (r) ≤ u(p m (r) ⊗ Q m (r)) = u(q m (r) ⊗ P m ).
The choice of q m (r) and z m (r) yields then: Combining these two estimates, one obtains: For any m ≥ 1, define d m := (1 − t m ) x m − z m 2 . Then, for any (h m , i m ) ∈ H m × I: The first two equalities follow from the definition of the conditional expectation, and the definition of x m+1 , i.e. x m+1 = x m (r), with probability α r m . The third line follows from the fact that z m (r) ∈ B(Q m (r)) and z m+1 is the orthogonal projection of x m (r) to this set. The next two lines are obtained using the definition of x m (r) and rearranging the terms. Now, developing the square of the euclidean norm, noting that the scalar product z m − x m , g m (r) − z m (r) is nonpositive by (16) and using the equality d m = (1 − t m )(x m − z m ) and the inequality z m (r) ∞ ≤ G which hold by definition and because z m (r) is a supergradient of a G -Lipschitz function, respectively, one obtains: Taking expectations and putting κ := 2K G 2 , one has: Adding the inequalities (17) from m = 1 to N , for some N such that 1 − t N > 0, and using the fact that d 1 = 0, one obtains On the other hand, by construction, which is equivalent, rearranging the terms, to Take the expectation in (19). Clearly, E π s,t [x k 1 ] = p, x 1 = v(π) (by the choice of x 1 ) and E π s,t * [ m≥N θ m g k m ] ≤ (1 − t N ) G . By Corollary 1, E π s,t * [z k m (r m ) − z k m ] = 0. Finally put x N = x N − z N + z N , and notice that and E π s,t * [ z N ] ≤ √ K G ≤ √ κ. An easy way to obtain x k N , z k N ≤ G , for all k ∈ K is as follows. Replace, if necessary, x N with the vector of coordinates min{x k N , G }, k ∈ K. The equality in (19) becomes then an ≤-inequality and each coordinate of its projection to B(Q N ) is bounded by G . Putting these estimates together, one gets: The last term vanishes if θ is of finite support, by taking N → ∞. For θ of finite support, take N such that t N < t N +1 = 1. Then, 1 − t N = θ N ≤ θ , and the result follows.