Advanced Search
Article Contents
Article Contents

Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs

Abstract Related Papers Cited by
  • We give a policy improvement algorithm for additive reward, additive transition (ARAT) zero-sum two-player stochastic games for both discounted and average payoffs. The class of ARAT games includes perfect information games.
    Mathematics Subject Classification: Primary: 91A15, 91A05; Secondary: 90C40.


    \begin{equation} \\ \end{equation}
  • [1]

    M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint arXiv:1208.0446


    K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information, Operations Research Letters, 40 (2012), 56-60.doi: 10.1016/j.orl.2011.10.005.


    D. Blackwell, Discrete dynamic programming, The Annals of Mathematical Statistics, 33 (1962), 719-726.doi: 10.1214/aoms/1177704593.


    J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff, Comptes Rendus Mathematique, 343 (2006), 377-382.doi: 10.1016/j.crma.2006.07.011.


    J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), Pergamon, Nantes, France, (1998), 667-674.


    J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, T. S. Ferguson, T. Parthasarathy, O. J. Vrieze, W. Leinfellner, G. Eberlein and S. H. Tijs), Theory and Decision Library, 7, Springer, Netherlands, 1991, 59-70.doi: 10.1007/978-94-011-3760-7_6.


    J, A. Filar and K, Vrieze, Competitive Markov Decision Processes, Springer, 1997.


    B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix, Commun. ACM, 11 (1968), 619-621.


    A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems, OR Spektrum, 7 (1985), 143-151.doi: 10.1007/BF01721353.


    R. A. Howard, Dynamic Programming and Markov Process, The Technology Press of M.I.T., Cambridge, Mass.; John Wiley & Sons, Inc., New York-London, 1960.


    R. G. Jeroslow, Asymptotic linear programming, Operations Research, 21 (1973), 1128-1141.doi: 10.1287/opre.21.5.1128.


    J. G. Kemeny and J. L. Snell, Finite Markov Chains, The University Series in Undergraduate Mathematics, D. Van Nostrand Co., Inc., Princeton, N.J.-Toronto-London-New York, 1960.


    E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations, Mathematics of Operations Research, 5 (1980), 366-372.doi: 10.1287/moor.5.3.366.


    J.-F. Mertens and A. Neyman, Stochastic games have a value, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145-2146.doi: 10.1073/pnas.79.6.2145.


    T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities, Journal of Optimization Theory and Applications, 33 (1981), 375-392.doi: 10.1007/BF00935250.


    T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure, Journal of Optimization Theory and Applications, 47 (1985), 451-464.doi: 10.1007/BF00942191.


    T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information, Mathematical Programming, 95 (2003), 513-532.doi: 10.1007/s10107-002-0312-3.


    L. S. Shapley, Stochastic games, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095-1100.doi: 10.1073/pnas.39.10.1095.


    Z. U. Syed, Algorithms for Stochastic Games and Related Topics, Ph.D thesis, University of Illinois at Chicago, Chicago, IL, USA, 1999.


    A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting, The Annals of Mathematical Statistics, 37 (1966), 1284-1294.doi: 10.1214/aoms/1177699272.

  • 加载中

Article Metrics

HTML views() PDF downloads(154) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint