American Institute of Mathematical Sciences

July  2014, 1(3): 347-361. doi: 10.3934/jdg.2014.1.347

Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs

 1 Department of Mathematics and Statistics, Loyola University Chicago, 1032 W. Sheridan Road, Chicago, IL 60660, United States 2 Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 322 Science and Engineering Offices (M/C 249), 851 S. Morgan Street, Chicago, IL 60607-7045, United States

Received  December 2012 Revised  December 2013 Published  July 2014

We give a policy improvement algorithm for additive reward, additive transition (ARAT) zero-sum two-player stochastic games for both discounted and average payoffs. The class of ARAT games includes perfect information games.
Citation: Matthew Bourque, T. E. S. Raghavan. Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs. Journal of Dynamics and Games, 2014, 1 (3) : 347-361. doi: 10.3934/jdg.2014.1.347
References:
 [1] M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , (). [2] K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information, Operations Research Letters, 40 (2012), 56-60. doi: 10.1016/j.orl.2011.10.005. [3] D. Blackwell, Discrete dynamic programming, The Annals of Mathematical Statistics, 33 (1962), 719-726. doi: 10.1214/aoms/1177704593. [4] J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff, Comptes Rendus Mathematique, 343 (2006), 377-382. doi: 10.1016/j.crma.2006.07.011. [5] J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), Pergamon, Nantes, France, (1998), 667-674. [6] J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, T. S. Ferguson, T. Parthasarathy, O. J. Vrieze, W. Leinfellner, G. Eberlein and S. H. Tijs), Theory and Decision Library, 7, Springer, Netherlands, 1991, 59-70. doi: 10.1007/978-94-011-3760-7_6. [7] J, A. Filar and K, Vrieze, Competitive Markov Decision Processes, Springer, 1997. [8] B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix, Commun. ACM, 11 (1968), 619-621. [9] A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems, OR Spektrum, 7 (1985), 143-151. doi: 10.1007/BF01721353. [10] R. A. Howard, Dynamic Programming and Markov Process, The Technology Press of M.I.T., Cambridge, Mass.; John Wiley & Sons, Inc., New York-London, 1960. [11] R. G. Jeroslow, Asymptotic linear programming, Operations Research, 21 (1973), 1128-1141. doi: 10.1287/opre.21.5.1128. [12] J. G. Kemeny and J. L. Snell, Finite Markov Chains, The University Series in Undergraduate Mathematics, D. Van Nostrand Co., Inc., Princeton, N.J.-Toronto-London-New York, 1960. [13] E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations, Mathematics of Operations Research, 5 (1980), 366-372. doi: 10.1287/moor.5.3.366. [14] J.-F. Mertens and A. Neyman, Stochastic games have a value, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145-2146. doi: 10.1073/pnas.79.6.2145. [15] T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities, Journal of Optimization Theory and Applications, 33 (1981), 375-392. doi: 10.1007/BF00935250. [16] T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure, Journal of Optimization Theory and Applications, 47 (1985), 451-464. doi: 10.1007/BF00942191. [17] T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information, Mathematical Programming, 95 (2003), 513-532. doi: 10.1007/s10107-002-0312-3. [18] L. S. Shapley, Stochastic games, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095-1100. doi: 10.1073/pnas.39.10.1095. [19] Z. U. Syed, Algorithms for Stochastic Games and Related Topics, Ph.D thesis, University of Illinois at Chicago, Chicago, IL, USA, 1999. [20] A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting, The Annals of Mathematical Statistics, 37 (1966), 1284-1294. doi: 10.1214/aoms/1177699272.

show all references

References:
 [1] M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , (). [2] K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information, Operations Research Letters, 40 (2012), 56-60. doi: 10.1016/j.orl.2011.10.005. [3] D. Blackwell, Discrete dynamic programming, The Annals of Mathematical Statistics, 33 (1962), 719-726. doi: 10.1214/aoms/1177704593. [4] J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff, Comptes Rendus Mathematique, 343 (2006), 377-382. doi: 10.1016/j.crma.2006.07.011. [5] J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), Pergamon, Nantes, France, (1998), 667-674. [6] J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, T. S. Ferguson, T. Parthasarathy, O. J. Vrieze, W. Leinfellner, G. Eberlein and S. H. Tijs), Theory and Decision Library, 7, Springer, Netherlands, 1991, 59-70. doi: 10.1007/978-94-011-3760-7_6. [7] J, A. Filar and K, Vrieze, Competitive Markov Decision Processes, Springer, 1997. [8] B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix, Commun. ACM, 11 (1968), 619-621. [9] A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems, OR Spektrum, 7 (1985), 143-151. doi: 10.1007/BF01721353. [10] R. A. Howard, Dynamic Programming and Markov Process, The Technology Press of M.I.T., Cambridge, Mass.; John Wiley & Sons, Inc., New York-London, 1960. [11] R. G. Jeroslow, Asymptotic linear programming, Operations Research, 21 (1973), 1128-1141. doi: 10.1287/opre.21.5.1128. [12] J. G. Kemeny and J. L. Snell, Finite Markov Chains, The University Series in Undergraduate Mathematics, D. Van Nostrand Co., Inc., Princeton, N.J.-Toronto-London-New York, 1960. [13] E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations, Mathematics of Operations Research, 5 (1980), 366-372. doi: 10.1287/moor.5.3.366. [14] J.-F. Mertens and A. Neyman, Stochastic games have a value, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145-2146. doi: 10.1073/pnas.79.6.2145. [15] T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities, Journal of Optimization Theory and Applications, 33 (1981), 375-392. doi: 10.1007/BF00935250. [16] T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure, Journal of Optimization Theory and Applications, 47 (1985), 451-464. doi: 10.1007/BF00942191. [17] T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information, Mathematical Programming, 95 (2003), 513-532. doi: 10.1007/s10107-002-0312-3. [18] L. S. Shapley, Stochastic games, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095-1100. doi: 10.1073/pnas.39.10.1095. [19] Z. U. Syed, Algorithms for Stochastic Games and Related Topics, Ph.D thesis, University of Illinois at Chicago, Chicago, IL, USA, 1999. [20] A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting, The Annals of Mathematical Statistics, 37 (1966), 1284-1294. doi: 10.1214/aoms/1177699272.
 [1] Beatris Adriana Escobedo-Trujillo, José Daniel López-Barrientos. Nonzero-sum stochastic differential games with additive structure and average payoffs. Journal of Dynamics and Games, 2014, 1 (4) : 555-578. doi: 10.3934/jdg.2014.1.555 [2] Beatris Adriana Escobedo-Trujillo, Alejandro Alaffita-Hernández, Raquiel López-Martínez. Constrained stochastic differential games with additive structure: Average and discount payoffs. Journal of Dynamics and Games, 2018, 5 (2) : 109-141. doi: 10.3934/jdg.2018008 [3] Gregorio Díaz, Jesús Ildefonso Díaz. Stochastic energy balance climate models with Legendre weighted diffusion and an additive cylindrical Wiener process forcing. Discrete and Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021165 [4] Helena Rifà-Pous, Josep Rifà, Lorena Ronquillo. $\mathbb{Z}_2\mathbb{Z}_4$-additive perfect codes in Steganography. Advances in Mathematics of Communications, 2011, 5 (3) : 425-433. doi: 10.3934/amc.2011.5.425 [5] Boling Guo, Guoli Zhou. On the backward uniqueness of the stochastic primitive equations with additive noise. Discrete and Continuous Dynamical Systems - B, 2019, 24 (7) : 3157-3174. doi: 10.3934/dcdsb.2018305 [6] Xiaojun Li, Xiliang Li, Kening Lu. Random attractors for stochastic parabolic equations with additive noise in weighted spaces. Communications on Pure and Applied Analysis, 2018, 17 (3) : 729-749. doi: 10.3934/cpaa.2018038 [7] Xiaobin Yao. Asymptotic behavior for stochastic plate equations with memory and additive noise on unbounded domains. Discrete and Continuous Dynamical Systems - B, 2022, 27 (1) : 443-468. doi: 10.3934/dcdsb.2021050 [8] Isabelle Kuhwald, Ilya Pavlyukevich. Bistable behaviour of a jump-diffusion driven by a periodic stable-like additive process. Discrete and Continuous Dynamical Systems - B, 2016, 21 (9) : 3175-3190. doi: 10.3934/dcdsb.2016092 [9] Daoyi Xu, Yumei Huang, Zhiguo Yang. Existence theorems for periodic Markov process and stochastic functional differential equations. Discrete and Continuous Dynamical Systems, 2009, 24 (3) : 1005-1023. doi: 10.3934/dcds.2009.24.1005 [10] Liqiang Jin, Yanqing Liu, Yanyan Yin, Kok Lay Teo, Fei Liu. Design of probabilistic $l_2-l_\infty$ filter for uncertain Markov jump systems with partial information of the transition probabilities. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021070 [11] Serhii Dyshko. On extendability of additive code isometries. Advances in Mathematics of Communications, 2016, 10 (1) : 45-52. doi: 10.3934/amc.2016.10.45 [12] Lin Xu, Rongming Wang, Dingjun Yao. Optimal stochastic investment games under Markov regime switching market. Journal of Industrial and Management Optimization, 2014, 10 (3) : 795-815. doi: 10.3934/jimo.2014.10.795 [13] Jinsen Zhuang, Yan Zhou, Yonghui Xia. Synchronization analysis of drive-response multi-layer dynamical networks with additive couplings and stochastic perturbations. Discrete and Continuous Dynamical Systems - S, 2021, 14 (4) : 1607-1629. doi: 10.3934/dcdss.2020279 [14] Hussain Alazki, Alexander Poznyak. Robust output stabilization for a class of nonlinear uncertain stochastic systems under multiplicative and additive noises: The attractive ellipsoid method. Journal of Industrial and Management Optimization, 2016, 12 (1) : 169-186. doi: 10.3934/jimo.2016.12.169 [15] Dingshi Li, Kening Lu, Bixiang Wang, Xiaohu Wang. Limiting behavior of dynamics for stochastic reaction-diffusion equations with additive noise on thin domains. Discrete and Continuous Dynamical Systems, 2018, 38 (1) : 187-208. doi: 10.3934/dcds.2018009 [16] Henri Schurz. Stochastic heat equations with cubic nonlinearity and additive space-time noise in 2D. Conference Publications, 2013, 2013 (special) : 673-684. doi: 10.3934/proc.2013.2013.673 [17] Henri Schurz. Stochastic wave equations with cubic nonlinearity and Q-regular additive noise in $\mathbb{R}^2$. Conference Publications, 2011, 2011 (Special) : 1299-1308. doi: 10.3934/proc.2011.2011.1299 [18] Takeshi Taniguchi. The existence and decay estimates of the solutions to $3$D stochastic Navier-Stokes equations with additive noise in an exterior domain. Discrete and Continuous Dynamical Systems, 2014, 34 (10) : 4323-4341. doi: 10.3934/dcds.2014.34.4323 [19] Xiaohu Wang, Dingshi Li, Jun Shen. Wong-Zakai approximations and attractors for stochastic wave equations driven by additive noise. Discrete and Continuous Dynamical Systems - B, 2021, 26 (5) : 2829-2855. doi: 10.3934/dcdsb.2020207 [20] Fuzhi Li, Dongmei Xu. Regular dynamics for stochastic Fitzhugh-Nagumo systems with additive noise on thin domains. Discrete and Continuous Dynamical Systems - B, 2021, 26 (7) : 3517-3542. doi: 10.3934/dcdsb.2020244

Impact Factor: