July  2014, 1(3): 347-361. doi: 10.3934/jdg.2014.1.347

Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs

1. 

Department of Mathematics and Statistics, Loyola University Chicago, 1032 W. Sheridan Road, Chicago, IL 60660, United States

2. 

Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 322 Science and Engineering Offices (M/C 249), 851 S. Morgan Street, Chicago, IL 60607-7045, United States

Received  December 2012 Revised  December 2013 Published  July 2014

We give a policy improvement algorithm for additive reward, additive transition (ARAT) zero-sum two-player stochastic games for both discounted and average payoffs. The class of ARAT games includes perfect information games.
Citation: Matthew Bourque, T. E. S. Raghavan. Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs. Journal of Dynamics & Games, 2014, 1 (3) : 347-361. doi: 10.3934/jdg.2014.1.347
References:
[1]

M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , ().   Google Scholar

[2]

K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information,, Operations Research Letters, 40 (2012), 56.  doi: 10.1016/j.orl.2011.10.005.  Google Scholar

[3]

D. Blackwell, Discrete dynamic programming,, The Annals of Mathematical Statistics, 33 (1962), 719.  doi: 10.1214/aoms/1177704593.  Google Scholar

[4]

J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff,, Comptes Rendus Mathematique, 343 (2006), 377.  doi: 10.1016/j.crma.2006.07.011.  Google Scholar

[5]

J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra,, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), (1998), 667.   Google Scholar

[6]

J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak,, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, (1991), 59.  doi: 10.1007/978-94-011-3760-7_6.  Google Scholar

[7]

J, A. Filar and K, Vrieze, Competitive Markov Decision Processes,, Springer, (1997).   Google Scholar

[8]

B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix,, Commun. ACM, 11 (1968), 619.   Google Scholar

[9]

A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems,, OR Spektrum, 7 (1985), 143.  doi: 10.1007/BF01721353.  Google Scholar

[10]

R. A. Howard, Dynamic Programming and Markov Process,, The Technology Press of M.I.T., (1960).   Google Scholar

[11]

R. G. Jeroslow, Asymptotic linear programming,, Operations Research, 21 (1973), 1128.  doi: 10.1287/opre.21.5.1128.  Google Scholar

[12]

J. G. Kemeny and J. L. Snell, Finite Markov Chains,, The University Series in Undergraduate Mathematics, (1960).   Google Scholar

[13]

E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations,, Mathematics of Operations Research, 5 (1980), 366.  doi: 10.1287/moor.5.3.366.  Google Scholar

[14]

J.-F. Mertens and A. Neyman, Stochastic games have a value,, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145.  doi: 10.1073/pnas.79.6.2145.  Google Scholar

[15]

T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities,, Journal of Optimization Theory and Applications, 33 (1981), 375.  doi: 10.1007/BF00935250.  Google Scholar

[16]

T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure,, Journal of Optimization Theory and Applications, 47 (1985), 451.  doi: 10.1007/BF00942191.  Google Scholar

[17]

T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information,, Mathematical Programming, 95 (2003), 513.  doi: 10.1007/s10107-002-0312-3.  Google Scholar

[18]

L. S. Shapley, Stochastic games,, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095.  doi: 10.1073/pnas.39.10.1095.  Google Scholar

[19]

Z. U. Syed, Algorithms for Stochastic Games and Related Topics,, Ph.D thesis, (1999).   Google Scholar

[20]

A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting,, The Annals of Mathematical Statistics, 37 (1966), 1284.  doi: 10.1214/aoms/1177699272.  Google Scholar

show all references

References:
[1]

M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , ().   Google Scholar

[2]

K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information,, Operations Research Letters, 40 (2012), 56.  doi: 10.1016/j.orl.2011.10.005.  Google Scholar

[3]

D. Blackwell, Discrete dynamic programming,, The Annals of Mathematical Statistics, 33 (1962), 719.  doi: 10.1214/aoms/1177704593.  Google Scholar

[4]

J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff,, Comptes Rendus Mathematique, 343 (2006), 377.  doi: 10.1016/j.crma.2006.07.011.  Google Scholar

[5]

J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra,, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), (1998), 667.   Google Scholar

[6]

J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak,, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, (1991), 59.  doi: 10.1007/978-94-011-3760-7_6.  Google Scholar

[7]

J, A. Filar and K, Vrieze, Competitive Markov Decision Processes,, Springer, (1997).   Google Scholar

[8]

B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix,, Commun. ACM, 11 (1968), 619.   Google Scholar

[9]

A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems,, OR Spektrum, 7 (1985), 143.  doi: 10.1007/BF01721353.  Google Scholar

[10]

R. A. Howard, Dynamic Programming and Markov Process,, The Technology Press of M.I.T., (1960).   Google Scholar

[11]

R. G. Jeroslow, Asymptotic linear programming,, Operations Research, 21 (1973), 1128.  doi: 10.1287/opre.21.5.1128.  Google Scholar

[12]

J. G. Kemeny and J. L. Snell, Finite Markov Chains,, The University Series in Undergraduate Mathematics, (1960).   Google Scholar

[13]

E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations,, Mathematics of Operations Research, 5 (1980), 366.  doi: 10.1287/moor.5.3.366.  Google Scholar

[14]

J.-F. Mertens and A. Neyman, Stochastic games have a value,, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145.  doi: 10.1073/pnas.79.6.2145.  Google Scholar

[15]

T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities,, Journal of Optimization Theory and Applications, 33 (1981), 375.  doi: 10.1007/BF00935250.  Google Scholar

[16]

T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure,, Journal of Optimization Theory and Applications, 47 (1985), 451.  doi: 10.1007/BF00942191.  Google Scholar

[17]

T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information,, Mathematical Programming, 95 (2003), 513.  doi: 10.1007/s10107-002-0312-3.  Google Scholar

[18]

L. S. Shapley, Stochastic games,, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095.  doi: 10.1073/pnas.39.10.1095.  Google Scholar

[19]

Z. U. Syed, Algorithms for Stochastic Games and Related Topics,, Ph.D thesis, (1999).   Google Scholar

[20]

A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting,, The Annals of Mathematical Statistics, 37 (1966), 1284.  doi: 10.1214/aoms/1177699272.  Google Scholar

[1]

Beatris Adriana Escobedo-Trujillo, José Daniel López-Barrientos. Nonzero-sum stochastic differential games with additive structure and average payoffs. Journal of Dynamics & Games, 2014, 1 (4) : 555-578. doi: 10.3934/jdg.2014.1.555

[2]

Beatris Adriana Escobedo-Trujillo, Alejandro Alaffita-Hernández, Raquiel López-Martínez. Constrained stochastic differential games with additive structure: Average and discount payoffs. Journal of Dynamics & Games, 2018, 5 (2) : 109-141. doi: 10.3934/jdg.2018008

[3]

Helena Rifà-Pous, Josep Rifà, Lorena Ronquillo. $\mathbb{Z}_2\mathbb{Z}_4$-additive perfect codes in Steganography. Advances in Mathematics of Communications, 2011, 5 (3) : 425-433. doi: 10.3934/amc.2011.5.425

[4]

Boling Guo, Guoli Zhou. On the backward uniqueness of the stochastic primitive equations with additive noise. Discrete & Continuous Dynamical Systems - B, 2019, 24 (7) : 3157-3174. doi: 10.3934/dcdsb.2018305

[5]

Xiaojun Li, Xiliang Li, Kening Lu. Random attractors for stochastic parabolic equations with additive noise in weighted spaces. Communications on Pure & Applied Analysis, 2018, 17 (3) : 729-749. doi: 10.3934/cpaa.2018038

[6]

Isabelle Kuhwald, Ilya Pavlyukevich. Bistable behaviour of a jump-diffusion driven by a periodic stable-like additive process. Discrete & Continuous Dynamical Systems - B, 2016, 21 (9) : 3175-3190. doi: 10.3934/dcdsb.2016092

[7]

Serhii Dyshko. On extendability of additive code isometries. Advances in Mathematics of Communications, 2016, 10 (1) : 45-52. doi: 10.3934/amc.2016.10.45

[8]

Daoyi Xu, Yumei Huang, Zhiguo Yang. Existence theorems for periodic Markov process and stochastic functional differential equations. Discrete & Continuous Dynamical Systems - A, 2009, 24 (3) : 1005-1023. doi: 10.3934/dcds.2009.24.1005

[9]

Lin Xu, Rongming Wang, Dingjun Yao. Optimal stochastic investment games under Markov regime switching market. Journal of Industrial & Management Optimization, 2014, 10 (3) : 795-815. doi: 10.3934/jimo.2014.10.795

[10]

Hussain Alazki, Alexander Poznyak. Robust output stabilization for a class of nonlinear uncertain stochastic systems under multiplicative and additive noises: The attractive ellipsoid method. Journal of Industrial & Management Optimization, 2016, 12 (1) : 169-186. doi: 10.3934/jimo.2016.12.169

[11]

Dingshi Li, Kening Lu, Bixiang Wang, Xiaohu Wang. Limiting behavior of dynamics for stochastic reaction-diffusion equations with additive noise on thin domains. Discrete & Continuous Dynamical Systems - A, 2018, 38 (1) : 187-208. doi: 10.3934/dcds.2018009

[12]

Henri Schurz. Stochastic heat equations with cubic nonlinearity and additive space-time noise in 2D. Conference Publications, 2013, 2013 (special) : 673-684. doi: 10.3934/proc.2013.2013.673

[13]

Henri Schurz. Stochastic wave equations with cubic nonlinearity and Q-regular additive noise in $\mathbb{R}^2$. Conference Publications, 2011, 2011 (Special) : 1299-1308. doi: 10.3934/proc.2011.2011.1299

[14]

Takeshi Taniguchi. The existence and decay estimates of the solutions to $3$D stochastic Navier-Stokes equations with additive noise in an exterior domain. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 4323-4341. doi: 10.3934/dcds.2014.34.4323

[15]

Henri Schurz. Analysis and discretization of semi-linear stochastic wave equations with cubic nonlinearity and additive space-time noise. Discrete & Continuous Dynamical Systems - S, 2008, 1 (2) : 353-363. doi: 10.3934/dcdss.2008.1.353

[16]

Cem Güneri, Ferruh Özbudak, Funda ÖzdemIr. On complementary dual additive cyclic codes. Advances in Mathematics of Communications, 2017, 11 (2) : 353-357. doi: 10.3934/amc.2017028

[17]

Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics & Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639

[18]

Shahad Al-azzawi, Jicheng Liu, Xianming Liu. Convergence rate of synchronization of systems with additive noise. Discrete & Continuous Dynamical Systems - B, 2017, 22 (2) : 227-245. doi: 10.3934/dcdsb.2017012

[19]

Yongluo Cao, De-Jun Feng, Wen Huang. The thermodynamic formalism for sub-additive potentials. Discrete & Continuous Dynamical Systems - A, 2008, 20 (3) : 639-657. doi: 10.3934/dcds.2008.20.639

[20]

Anna Mummert. The thermodynamic formalism for almost-additive sequences. Discrete & Continuous Dynamical Systems - A, 2006, 16 (2) : 435-454. doi: 10.3934/dcds.2006.16.435

 Impact Factor: 

Metrics

  • PDF downloads (11)
  • HTML views (0)
  • Cited by (1)

Other articles
by authors

[Back to Top]