\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side

Abstract / Introduction Related Papers Cited by
  • We are concerned with two-person zero-sum Markov games with Borel spaces under a long-run average criterion. The payoff function is possibly unbounded and depends on a parameter which is unknown to one of the players. The parameter and the payoff function can be estimated by implementing statistical methods. Thus, our main objective is to combine such estimation procedure with a variant of the so-called vanishing discount approach to construct an average optimal pair of strategies for the game. Our results are applied to a class of zero-sum semi-Markov games.
    Mathematics Subject Classification: Primary: 91A15; Secondary: 62F10.

    Citation:

    \begin{equation} \\ \end{equation}
  • [1]

    H. S. Chang, Perfect information two-person zero-sum Markov games with imprecise transition probabilities, Math. Meth. Oper. Res., 64 (2006), 335-351.doi: 10.1007/s00186-006-0081-5.

    [2]

    J. I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems, SIAM J. Control Optim., 41 (2003), 1626-1659.doi: 10.1137/S0363012901383837.

    [3]

    E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion, Kybernetika (Prague), 34 (1998), 217-234.

    [4]

    M. K. Ghosh, D. McDonald and S. Sinha, Zero-sum stochastic games with partial information, J. Optimiz. Theory Appl., 121 (2004), 99-118.doi: 10.1023/B:JOTA.0000026133.56615.cf.

    [5]

    O. Hernández-Lerma and J. B. Lasserre, "Discrete-Time Markov Control Processes. Basic Optimality Criteria," Applications of Mathematics (New York), 30, Springer-Verlag, New York, 1996.

    [6]

    O. Hernández-Lerma and J. B. Lasserre, "Further Topics on Discrete-Time Markov Control Processes," Applications of Mathematics (New York), 42, Springer-Verlag, New York, 1999.

    [7]

    O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: Average payoff criteria, SIAM J. Control Optim., 39 (2001), 1520-1539.doi: 10.1137/S0363012999361962.

    [8]

    A. Jaśkiewicz and A. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities, SIAM J. Control Optim., 45 (2006), 773-789.doi: 10.1137/S0363012904443257.

    [9]

    A. Krausz and U. Rieder, Markov games with incomplete information, Math. Meth. Oper. Res., 46 (1997), 263-279.doi: 10.1007/BF01217695.

    [10]

    H.-U. Küenle, On Markov games with average reward criterion and weakly continuous transition probabilities, SIAM J. Control Optim., 45 (2007), 2156-2168.doi: 10.1137/040617303.

    [11]

    E. L. Lehmann and G. Casella, "Theory of Point Estimation," Second edition, Springer-Verlag, New York, 1998.

    [12]

    F. Luque-Vásquez, Zero-sum semi-Markov games in Borel spaces: Discounted and average payoff, Bol. Soc. Mat. Mexicana (3), 8 (2002), 227-241.

    [13]

    J. A. Minjárez-Sosa and F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution in one side: A discounted payoff criterion, Appl. Math. Optim., 57 (2008), 289-305.doi: 10.1007/s00245-007-9016-7.

    [14]

    J. A. Minjárez-Sosa and O. Vega-Amaya, Asymptotically optimal strategies for adaptive zero-sum discounted Markov games, SIAM J. Control Optim., 48 (2009), 1405-1421.doi: 10.1137/060651458.

    [15]

    K. Najim, A. S. Poznyak and E. Gómez, Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs, Automatica J. IFAC, 37 (2001), 1007-1018.doi: 10.1016/S0005-1098(01)00050-4.

    [16]

    N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. I. Certainty equivalence strategies, Math. Oper. Res., 20 (1995), 743-767.doi: 10.1287/moor.20.3.743.

    [17]

    N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. II. Asymptotic optimality, Math. Oper. Res., 21 (1996), 487-512.doi: 10.1287/moor.21.2.487.

    [18]

    J. A. E. E. Van Nunen and J. Wessels, A note on dynamic programming with unbounded rewards, Manag. Sci., 24 (1978), 576-580.

  • 加载中
SHARE

Article Metrics

HTML views() PDF downloads(57) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return