July  2014, 1(3): 347-361. doi: 10.3934/jdg.2014.1.347

Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs

1. 

Department of Mathematics and Statistics, Loyola University Chicago, 1032 W. Sheridan Road, Chicago, IL 60660, United States

2. 

Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 322 Science and Engineering Offices (M/C 249), 851 S. Morgan Street, Chicago, IL 60607-7045, United States

Received  December 2012 Revised  December 2013 Published  July 2014

We give a policy improvement algorithm for additive reward, additive transition (ARAT) zero-sum two-player stochastic games for both discounted and average payoffs. The class of ARAT games includes perfect information games.
Citation: Matthew Bourque, T. E. S. Raghavan. Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs. Journal of Dynamics & Games, 2014, 1 (3) : 347-361. doi: 10.3934/jdg.2014.1.347
References:
[1]

M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , ().   Google Scholar

[2]

K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information,, Operations Research Letters, 40 (2012), 56.  doi: 10.1016/j.orl.2011.10.005.  Google Scholar

[3]

D. Blackwell, Discrete dynamic programming,, The Annals of Mathematical Statistics, 33 (1962), 719.  doi: 10.1214/aoms/1177704593.  Google Scholar

[4]

J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff,, Comptes Rendus Mathematique, 343 (2006), 377.  doi: 10.1016/j.crma.2006.07.011.  Google Scholar

[5]

J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra,, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), (1998), 667.   Google Scholar

[6]

J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak,, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, (1991), 59.  doi: 10.1007/978-94-011-3760-7_6.  Google Scholar

[7]

J, A. Filar and K, Vrieze, Competitive Markov Decision Processes,, Springer, (1997).   Google Scholar

[8]

B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix,, Commun. ACM, 11 (1968), 619.   Google Scholar

[9]

A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems,, OR Spektrum, 7 (1985), 143.  doi: 10.1007/BF01721353.  Google Scholar

[10]

R. A. Howard, Dynamic Programming and Markov Process,, The Technology Press of M.I.T., (1960).   Google Scholar

[11]

R. G. Jeroslow, Asymptotic linear programming,, Operations Research, 21 (1973), 1128.  doi: 10.1287/opre.21.5.1128.  Google Scholar

[12]

J. G. Kemeny and J. L. Snell, Finite Markov Chains,, The University Series in Undergraduate Mathematics, (1960).   Google Scholar

[13]

E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations,, Mathematics of Operations Research, 5 (1980), 366.  doi: 10.1287/moor.5.3.366.  Google Scholar

[14]

J.-F. Mertens and A. Neyman, Stochastic games have a value,, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145.  doi: 10.1073/pnas.79.6.2145.  Google Scholar

[15]

T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities,, Journal of Optimization Theory and Applications, 33 (1981), 375.  doi: 10.1007/BF00935250.  Google Scholar

[16]

T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure,, Journal of Optimization Theory and Applications, 47 (1985), 451.  doi: 10.1007/BF00942191.  Google Scholar

[17]

T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information,, Mathematical Programming, 95 (2003), 513.  doi: 10.1007/s10107-002-0312-3.  Google Scholar

[18]

L. S. Shapley, Stochastic games,, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095.  doi: 10.1073/pnas.39.10.1095.  Google Scholar

[19]

Z. U. Syed, Algorithms for Stochastic Games and Related Topics,, Ph.D thesis, (1999).   Google Scholar

[20]

A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting,, The Annals of Mathematical Statistics, 37 (1966), 1284.  doi: 10.1214/aoms/1177699272.  Google Scholar

show all references

References:
[1]

M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , ().   Google Scholar

[2]

K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information,, Operations Research Letters, 40 (2012), 56.  doi: 10.1016/j.orl.2011.10.005.  Google Scholar

[3]

D. Blackwell, Discrete dynamic programming,, The Annals of Mathematical Statistics, 33 (1962), 719.  doi: 10.1214/aoms/1177704593.  Google Scholar

[4]

J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff,, Comptes Rendus Mathematique, 343 (2006), 377.  doi: 10.1016/j.crma.2006.07.011.  Google Scholar

[5]

J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra,, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), (1998), 667.   Google Scholar

[6]

J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak,, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, (1991), 59.  doi: 10.1007/978-94-011-3760-7_6.  Google Scholar

[7]

J, A. Filar and K, Vrieze, Competitive Markov Decision Processes,, Springer, (1997).   Google Scholar

[8]

B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix,, Commun. ACM, 11 (1968), 619.   Google Scholar

[9]

A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems,, OR Spektrum, 7 (1985), 143.  doi: 10.1007/BF01721353.  Google Scholar

[10]

R. A. Howard, Dynamic Programming and Markov Process,, The Technology Press of M.I.T., (1960).   Google Scholar

[11]

R. G. Jeroslow, Asymptotic linear programming,, Operations Research, 21 (1973), 1128.  doi: 10.1287/opre.21.5.1128.  Google Scholar

[12]

J. G. Kemeny and J. L. Snell, Finite Markov Chains,, The University Series in Undergraduate Mathematics, (1960).   Google Scholar

[13]

E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations,, Mathematics of Operations Research, 5 (1980), 366.  doi: 10.1287/moor.5.3.366.  Google Scholar

[14]

J.-F. Mertens and A. Neyman, Stochastic games have a value,, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145.  doi: 10.1073/pnas.79.6.2145.  Google Scholar

[15]

T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities,, Journal of Optimization Theory and Applications, 33 (1981), 375.  doi: 10.1007/BF00935250.  Google Scholar

[16]

T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure,, Journal of Optimization Theory and Applications, 47 (1985), 451.  doi: 10.1007/BF00942191.  Google Scholar

[17]

T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information,, Mathematical Programming, 95 (2003), 513.  doi: 10.1007/s10107-002-0312-3.  Google Scholar

[18]

L. S. Shapley, Stochastic games,, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095.  doi: 10.1073/pnas.39.10.1095.  Google Scholar

[19]

Z. U. Syed, Algorithms for Stochastic Games and Related Topics,, Ph.D thesis, (1999).   Google Scholar

[20]

A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting,, The Annals of Mathematical Statistics, 37 (1966), 1284.  doi: 10.1214/aoms/1177699272.  Google Scholar

[1]

Guangjun Shen, Xueying Wu, Xiuwei Yin. Stabilization of stochastic differential equations driven by G-Lévy process with discrete-time feedback control. Discrete & Continuous Dynamical Systems - B, 2021, 26 (2) : 755-774. doi: 10.3934/dcdsb.2020133

[2]

Xin Zhao, Tao Feng, Liang Wang, Zhipeng Qiu. Threshold dynamics and sensitivity analysis of a stochastic semi-Markov switched SIRS epidemic model with nonlinear incidence and vaccination. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021010

[3]

Yueyang Zheng, Jingtao Shi. A stackelberg game of backward stochastic differential equations with partial information. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020047

[4]

Mehdi Bastani, Davod Khojasteh Salkuyeh. On the GSOR iteration method for image restoration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 27-43. doi: 10.3934/naco.2020013

[5]

Yiling Chen, Baojun Bian. Optimal dividend policy in an insurance company with contagious arrivals of claims. Mathematical Control & Related Fields, 2021, 11 (1) : 1-22. doi: 10.3934/mcrf.2020024

[6]

Laura Aquilanti, Simone Cacace, Fabio Camilli, Raul De Maio. A Mean Field Games model for finite mixtures of Bernoulli and categorical distributions. Journal of Dynamics & Games, 2020  doi: 10.3934/jdg.2020033

[7]

Maicon Sônego. Stable transition layers in an unbalanced bistable equation. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020370

[8]

Angelica Pachon, Federico Polito, Costantino Ricciuti. On discrete-time semi-Markov processes. Discrete & Continuous Dynamical Systems - B, 2021, 26 (3) : 1499-1529. doi: 10.3934/dcdsb.2020170

[9]

Wai-Ki Ching, Jia-Wen Gu, Harry Zheng. On correlated defaults and incomplete information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 889-908. doi: 10.3934/jimo.2020003

[10]

Nicolas Rougerie. On two properties of the Fisher information. Kinetic & Related Models, 2021, 14 (1) : 77-88. doi: 10.3934/krm.2020049

[11]

Parikshit Upadhyaya, Elias Jarlebring, Emanuel H. Rubensson. A density matrix approach to the convergence of the self-consistent field iteration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 99-115. doi: 10.3934/naco.2020018

[12]

Tian Ma, Shouhong Wang. Topological phase transition III: Solar surface eruptions and sunspots. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 501-514. doi: 10.3934/dcdsb.2020350

[13]

Zsolt Saffer, Miklós Telek, Gábor Horváth. Analysis of Markov-modulated fluid polling systems with gated discipline. Journal of Industrial & Management Optimization, 2021, 17 (2) : 575-599. doi: 10.3934/jimo.2019124

[14]

Shengxin Zhu, Tongxiang Gu, Xingping Liu. AIMS: Average information matrix splitting. Mathematical Foundations of Computing, 2020, 3 (4) : 301-308. doi: 10.3934/mfc.2020012

[15]

Zonghong Cao, Jie Min. Selection and impact of decision mode of encroachment and retail service in a dual-channel supply chain. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020167

[16]

Honglin Yang, Jiawu Peng. Coordinating a supply chain with demand information updating. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020181

[17]

Chao Xing, Jiaojiao Pan, Hong Luo. Stability and dynamic transition of a toxin-producing phytoplankton-zooplankton model with additional food. Communications on Pure & Applied Analysis, 2021, 20 (1) : 427-448. doi: 10.3934/cpaa.2020275

[18]

Xiao-Xu Chen, Peng Xu, Jiao-Jiao Li, Thomas Walker, Guo-Qiang Yang. Decision-making in a retailer-led closed-loop supply chain involving a third-party logistics provider. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021014

[19]

Chuan Ding, Da-Hai Li. Angel capitalists exit decisions under information asymmetry: IPO or acquisitions. Journal of Industrial & Management Optimization, 2021, 17 (1) : 369-392. doi: 10.3934/jimo.2019116

[20]

Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135

 Impact Factor: 

Metrics

  • PDF downloads (89)
  • HTML views (0)
  • Cited by (1)

Other articles
by authors

[Back to Top]