January  2014, 1(1): 105-119. doi: 10.3934/jdg.2014.1.105

Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side

1. 

Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Centro, C.P. 83000, Hermosillo, Sonora, Mexico

2. 

Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Centro, C.P. 83000, Hermosillo, Sonora,, Mexico

Received  January 2012 Revised  June 2012 Published  June 2013

We are concerned with two-person zero-sum Markov games with Borel spaces under a long-run average criterion. The payoff function is possibly unbounded and depends on a parameter which is unknown to one of the players. The parameter and the payoff function can be estimated by implementing statistical methods. Thus, our main objective is to combine such estimation procedure with a variant of the so-called vanishing discount approach to construct an average optimal pair of strategies for the game. Our results are applied to a class of zero-sum semi-Markov games.
Citation: Fernando Luque-Vásquez, J. Adolfo Minjárez-Sosa. Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side. Journal of Dynamics & Games, 2014, 1 (1) : 105-119. doi: 10.3934/jdg.2014.1.105
References:
[1]

H. S. Chang, Perfect information two-person zero-sum Markov games with imprecise transition probabilities,, Math. Meth. Oper. Res., 64 (2006), 335.  doi: 10.1007/s00186-006-0081-5.  Google Scholar

[2]

J. I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems,, SIAM J. Control Optim., 41 (2003), 1626.  doi: 10.1137/S0363012901383837.  Google Scholar

[3]

E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion,, Kybernetika (Prague), 34 (1998), 217.   Google Scholar

[4]

M. K. Ghosh, D. McDonald and S. Sinha, Zero-sum stochastic games with partial information,, J. Optimiz. Theory Appl., 121 (2004), 99.  doi: 10.1023/B:JOTA.0000026133.56615.cf.  Google Scholar

[5]

O. Hernández-Lerma and J. B. Lasserre, "Discrete-Time Markov Control Processes. Basic Optimality Criteria,", Applications of Mathematics (New York), 30 (1996).   Google Scholar

[6]

O. Hernández-Lerma and J. B. Lasserre, "Further Topics on Discrete-Time Markov Control Processes,", Applications of Mathematics (New York), 42 (1999).   Google Scholar

[7]

O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: Average payoff criteria,, SIAM J. Control Optim., 39 (2001), 1520.  doi: 10.1137/S0363012999361962.  Google Scholar

[8]

A. Jaśkiewicz and A. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities,, SIAM J. Control Optim., 45 (2006), 773.  doi: 10.1137/S0363012904443257.  Google Scholar

[9]

A. Krausz and U. Rieder, Markov games with incomplete information,, Math. Meth. Oper. Res., 46 (1997), 263.  doi: 10.1007/BF01217695.  Google Scholar

[10]

H.-U. Küenle, On Markov games with average reward criterion and weakly continuous transition probabilities,, SIAM J. Control Optim., 45 (2007), 2156.  doi: 10.1137/040617303.  Google Scholar

[11]

E. L. Lehmann and G. Casella, "Theory of Point Estimation,", Second edition, (1998).   Google Scholar

[12]

F. Luque-Vásquez, Zero-sum semi-Markov games in Borel spaces: Discounted and average payoff,, Bol. Soc. Mat. Mexicana (3), 8 (2002), 227.   Google Scholar

[13]

J. A. Minjárez-Sosa and F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution in one side: A discounted payoff criterion,, Appl. Math. Optim., 57 (2008), 289.  doi: 10.1007/s00245-007-9016-7.  Google Scholar

[14]

J. A. Minjárez-Sosa and O. Vega-Amaya, Asymptotically optimal strategies for adaptive zero-sum discounted Markov games,, SIAM J. Control Optim., 48 (2009), 1405.  doi: 10.1137/060651458.  Google Scholar

[15]

K. Najim, A. S. Poznyak and E. Gómez, Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs,, Automatica J. IFAC, 37 (2001), 1007.  doi: 10.1016/S0005-1098(01)00050-4.  Google Scholar

[16]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. I. Certainty equivalence strategies,, Math. Oper. Res., 20 (1995), 743.  doi: 10.1287/moor.20.3.743.  Google Scholar

[17]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. II. Asymptotic optimality,, Math. Oper. Res., 21 (1996), 487.  doi: 10.1287/moor.21.2.487.  Google Scholar

[18]

J. A. E. E. Van Nunen and J. Wessels, A note on dynamic programming with unbounded rewards,, Manag. Sci., 24 (1978), 576.   Google Scholar

show all references

References:
[1]

H. S. Chang, Perfect information two-person zero-sum Markov games with imprecise transition probabilities,, Math. Meth. Oper. Res., 64 (2006), 335.  doi: 10.1007/s00186-006-0081-5.  Google Scholar

[2]

J. I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems,, SIAM J. Control Optim., 41 (2003), 1626.  doi: 10.1137/S0363012901383837.  Google Scholar

[3]

E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion,, Kybernetika (Prague), 34 (1998), 217.   Google Scholar

[4]

M. K. Ghosh, D. McDonald and S. Sinha, Zero-sum stochastic games with partial information,, J. Optimiz. Theory Appl., 121 (2004), 99.  doi: 10.1023/B:JOTA.0000026133.56615.cf.  Google Scholar

[5]

O. Hernández-Lerma and J. B. Lasserre, "Discrete-Time Markov Control Processes. Basic Optimality Criteria,", Applications of Mathematics (New York), 30 (1996).   Google Scholar

[6]

O. Hernández-Lerma and J. B. Lasserre, "Further Topics on Discrete-Time Markov Control Processes,", Applications of Mathematics (New York), 42 (1999).   Google Scholar

[7]

O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: Average payoff criteria,, SIAM J. Control Optim., 39 (2001), 1520.  doi: 10.1137/S0363012999361962.  Google Scholar

[8]

A. Jaśkiewicz and A. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities,, SIAM J. Control Optim., 45 (2006), 773.  doi: 10.1137/S0363012904443257.  Google Scholar

[9]

A. Krausz and U. Rieder, Markov games with incomplete information,, Math. Meth. Oper. Res., 46 (1997), 263.  doi: 10.1007/BF01217695.  Google Scholar

[10]

H.-U. Küenle, On Markov games with average reward criterion and weakly continuous transition probabilities,, SIAM J. Control Optim., 45 (2007), 2156.  doi: 10.1137/040617303.  Google Scholar

[11]

E. L. Lehmann and G. Casella, "Theory of Point Estimation,", Second edition, (1998).   Google Scholar

[12]

F. Luque-Vásquez, Zero-sum semi-Markov games in Borel spaces: Discounted and average payoff,, Bol. Soc. Mat. Mexicana (3), 8 (2002), 227.   Google Scholar

[13]

J. A. Minjárez-Sosa and F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution in one side: A discounted payoff criterion,, Appl. Math. Optim., 57 (2008), 289.  doi: 10.1007/s00245-007-9016-7.  Google Scholar

[14]

J. A. Minjárez-Sosa and O. Vega-Amaya, Asymptotically optimal strategies for adaptive zero-sum discounted Markov games,, SIAM J. Control Optim., 48 (2009), 1405.  doi: 10.1137/060651458.  Google Scholar

[15]

K. Najim, A. S. Poznyak and E. Gómez, Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs,, Automatica J. IFAC, 37 (2001), 1007.  doi: 10.1016/S0005-1098(01)00050-4.  Google Scholar

[16]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. I. Certainty equivalence strategies,, Math. Oper. Res., 20 (1995), 743.  doi: 10.1287/moor.20.3.743.  Google Scholar

[17]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. II. Asymptotic optimality,, Math. Oper. Res., 21 (1996), 487.  doi: 10.1287/moor.21.2.487.  Google Scholar

[18]

J. A. E. E. Van Nunen and J. Wessels, A note on dynamic programming with unbounded rewards,, Manag. Sci., 24 (1978), 576.   Google Scholar

[1]

Xiangxiang Huang, Xianping Guo, Jianping Peng. A probability criterion for zero-sum stochastic games. Journal of Dynamics & Games, 2017, 4 (4) : 369-383. doi: 10.3934/jdg.2017020

[2]

Salah Eddine Choutri, Boualem Djehiche, Hamidou Tembine. Optimal control and zero-sum games for Markov chains of mean-field type. Mathematical Control & Related Fields, 2019, 9 (3) : 571-605. doi: 10.3934/mcrf.2019026

[3]

Marianne Akian, Stéphane Gaubert, Antoine Hochart. Ergodicity conditions for zero-sum games. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 3901-3931. doi: 10.3934/dcds.2015.35.3901

[4]

Sylvain Sorin, Guillaume Vigeral. Reversibility and oscillations in zero-sum discounted stochastic games. Journal of Dynamics & Games, 2015, 2 (1) : 103-115. doi: 10.3934/jdg.2015.2.103

[5]

Alexander J. Zaslavski. Structure of approximate solutions of dynamic continuous time zero-sum games. Journal of Dynamics & Games, 2014, 1 (1) : 153-179. doi: 10.3934/jdg.2014.1.153

[6]

Beatris A. Escobedo-Trujillo. Discount-sensitive equilibria in zero-sum stochastic differential games. Journal of Dynamics & Games, 2016, 3 (1) : 25-50. doi: 10.3934/jdg.2016002

[7]

Qingmeng Wei, Zhiyong Yu. Time-inconsistent recursive zero-sum stochastic differential games. Mathematical Control & Related Fields, 2018, 8 (3&4) : 1051-1079. doi: 10.3934/mcrf.2018045

[8]

Antoine Hochart. An accretive operator approach to ergodic zero-sum stochastic games. Journal of Dynamics & Games, 2019, 6 (1) : 27-51. doi: 10.3934/jdg.2019003

[9]

Lasse Kliemann, Elmira Shirazi Sheykhdarabadi, Anand Srivastav. Price of anarchy for graph coloring games with concave payoff. Journal of Dynamics & Games, 2017, 4 (1) : 41-58. doi: 10.3934/jdg.2017003

[10]

Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics & Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009

[11]

Alexander J. Zaslavski. Turnpike properties of approximate solutions of dynamic discrete time zero-sum games. Journal of Dynamics & Games, 2014, 1 (2) : 299-330. doi: 10.3934/jdg.2014.1.299

[12]

Libin Mou, Jiongmin Yong. Two-person zero-sum linear quadratic stochastic differential games by a Hilbert space method. Journal of Industrial & Management Optimization, 2006, 2 (1) : 95-117. doi: 10.3934/jimo.2006.2.95

[13]

Fabien Gensbittel, Miquel Oliu-Barton, Xavier Venel. Existence of the uniform value in zero-sum repeated games with a more informed controller. Journal of Dynamics & Games, 2014, 1 (3) : 411-445. doi: 10.3934/jdg.2014.1.411

[14]

Georg Ostrovski, Sebastian van Strien. Payoff performance of fictitious play. Journal of Dynamics & Games, 2014, 1 (4) : 621-638. doi: 10.3934/jdg.2014.1.621

[15]

Zhi-Wei Sun. Unification of zero-sum problems, subset sums and covers of Z. Electronic Research Announcements, 2003, 9: 51-60.

[16]

Feimin Zhong, Jinxing Xie, Jing Jiao. Solutions for bargaining games with incomplete information: General type space and action space. Journal of Industrial & Management Optimization, 2018, 14 (3) : 953-966. doi: 10.3934/jimo.2017084

[17]

Miquel Oliu-Barton. Asymptotically optimal strategies in repeated games with incomplete information and vanishing weights. Journal of Dynamics & Games, 2019, 6 (4) : 259-275. doi: 10.3934/jdg.2019018

[18]

Beatris Adriana Escobedo-Trujillo, José Daniel López-Barrientos. Nonzero-sum stochastic differential games with additive structure and average payoffs. Journal of Dynamics & Games, 2014, 1 (4) : 555-578. doi: 10.3934/jdg.2014.1.555

[19]

Josef Hofbauer, Sylvain Sorin. Best response dynamics for continuous zero--sum games. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 215-224. doi: 10.3934/dcdsb.2006.6.215

[20]

Valery Y. Glizer, Oleg Kelis. Singular infinite horizon zero-sum linear-quadratic differential game: Saddle-point equilibrium sequence. Numerical Algebra, Control & Optimization, 2017, 7 (1) : 1-20. doi: 10.3934/naco.2017001

 Impact Factor: 

Metrics

  • PDF downloads (8)
  • HTML views (0)
  • Cited by (0)

[Back to Top]