-
Previous Article
Pure and Random strategies in differential game with incomplete informations
- JDG Home
- This Issue
-
Next Article
Asymptotic behavior of compositions of under-relaxed nonexpansive operators
Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs
1. | Department of Mathematics and Statistics, Loyola University Chicago, 1032 W. Sheridan Road, Chicago, IL 60660, United States |
2. | Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 322 Science and Engineering Offices (M/C 249), 851 S. Morgan Street, Chicago, IL 60607-7045, United States |
References:
[1] |
M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , (). Google Scholar |
[2] |
K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information,, Operations Research Letters, 40 (2012), 56.
doi: 10.1016/j.orl.2011.10.005. |
[3] |
D. Blackwell, Discrete dynamic programming,, The Annals of Mathematical Statistics, 33 (1962), 719.
doi: 10.1214/aoms/1177704593. |
[4] |
J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff,, Comptes Rendus Mathematique, 343 (2006), 377.
doi: 10.1016/j.crma.2006.07.011. |
[5] |
J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra,, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), (1998), 667. Google Scholar |
[6] |
J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak,, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, (1991), 59.
doi: 10.1007/978-94-011-3760-7_6. |
[7] |
J, A. Filar and K, Vrieze, Competitive Markov Decision Processes,, Springer, (1997).
|
[8] |
B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix,, Commun. ACM, 11 (1968), 619. Google Scholar |
[9] |
A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems,, OR Spektrum, 7 (1985), 143.
doi: 10.1007/BF01721353. |
[10] |
R. A. Howard, Dynamic Programming and Markov Process,, The Technology Press of M.I.T., (1960).
|
[11] |
R. G. Jeroslow, Asymptotic linear programming,, Operations Research, 21 (1973), 1128.
doi: 10.1287/opre.21.5.1128. |
[12] |
J. G. Kemeny and J. L. Snell, Finite Markov Chains,, The University Series in Undergraduate Mathematics, (1960).
|
[13] |
E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations,, Mathematics of Operations Research, 5 (1980), 366.
doi: 10.1287/moor.5.3.366. |
[14] |
J.-F. Mertens and A. Neyman, Stochastic games have a value,, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145.
doi: 10.1073/pnas.79.6.2145. |
[15] |
T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities,, Journal of Optimization Theory and Applications, 33 (1981), 375.
doi: 10.1007/BF00935250. |
[16] |
T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure,, Journal of Optimization Theory and Applications, 47 (1985), 451.
doi: 10.1007/BF00942191. |
[17] |
T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information,, Mathematical Programming, 95 (2003), 513.
doi: 10.1007/s10107-002-0312-3. |
[18] |
L. S. Shapley, Stochastic games,, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095.
doi: 10.1073/pnas.39.10.1095. |
[19] |
Z. U. Syed, Algorithms for Stochastic Games and Related Topics,, Ph.D thesis, (1999).
|
[20] |
A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting,, The Annals of Mathematical Statistics, 37 (1966), 1284.
doi: 10.1214/aoms/1177699272. |
show all references
References:
[1] |
M. Akian, J. Cochet-Terrasson, S. Detournay and S. Gaubert, Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, preprint, , (). Google Scholar |
[2] |
K. Avrachenkov, L. Cottatellucci and L. Maggi, Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information,, Operations Research Letters, 40 (2012), 56.
doi: 10.1016/j.orl.2011.10.005. |
[3] |
D. Blackwell, Discrete dynamic programming,, The Annals of Mathematical Statistics, 33 (1962), 719.
doi: 10.1214/aoms/1177704593. |
[4] |
J. Cochet-Terrasson and S. Gaubert, A policy iteration algorithm for zero-sum stochastic games with mean payoff,, Comptes Rendus Mathematique, 343 (2006), 377.
doi: 10.1016/j.crma.2006.07.011. |
[5] |
J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. Mc Gettrick and J.-p. Quadrat, Numerical computation of spectral elements in max-plus algebra,, in Proceedings SSC'98 (IFAC Conference on System Structure and Control), (1998), 667. Google Scholar |
[6] |
J. A. Filar and B. Tolwinski, On the algorithm of Pollatschek and Avi-ltzhak,, in Stochastic Games And Related Topics (eds. T. E. S. Raghavan, (1991), 59.
doi: 10.1007/978-94-011-3760-7_6. |
[7] |
J, A. Filar and K, Vrieze, Competitive Markov Decision Processes,, Springer, (1997).
|
[8] |
B. L. Fox and D. M. Landi, An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix,, Commun. ACM, 11 (1968), 619. Google Scholar |
[9] |
A. Hordijk, R. Dekker and L. C. M. Kallenberg, Sensitivity-analysis in discounted Markovian decision problems,, OR Spektrum, 7 (1985), 143.
doi: 10.1007/BF01721353. |
[10] |
R. A. Howard, Dynamic Programming and Markov Process,, The Technology Press of M.I.T., (1960).
|
[11] |
R. G. Jeroslow, Asymptotic linear programming,, Operations Research, 21 (1973), 1128.
doi: 10.1287/opre.21.5.1128. |
[12] |
J. G. Kemeny and J. L. Snell, Finite Markov Chains,, The University Series in Undergraduate Mathematics, (1960).
|
[13] |
E. Kohlberg, Invariant half-lines of nonexpansive piecewise-linear transformations,, Mathematics of Operations Research, 5 (1980), 366.
doi: 10.1287/moor.5.3.366. |
[14] |
J.-F. Mertens and A. Neyman, Stochastic games have a value,, Proceedings of the National Academy of Sciences of the United States of America, 79 (1982), 2145.
doi: 10.1073/pnas.79.6.2145. |
[15] |
T. Parthasarathy and T. E. S. Raghavan, An orderfield property for stochastic games when one player controls transition probabilities,, Journal of Optimization Theory and Applications, 33 (1981), 375.
doi: 10.1007/BF00935250. |
[16] |
T. E. S. Raghavan, S. H. Tijs and O. J. Vrieze, On stochastic games with additive reward and transition structure,, Journal of Optimization Theory and Applications, 47 (1985), 451.
doi: 10.1007/BF00942191. |
[17] |
T. E. S. Raghavan and Z. Syed, A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information,, Mathematical Programming, 95 (2003), 513.
doi: 10.1007/s10107-002-0312-3. |
[18] |
L. S. Shapley, Stochastic games,, Proceedings of the National Academy of Sciences of the United States of America, 39 (1953), 1095.
doi: 10.1073/pnas.39.10.1095. |
[19] |
Z. U. Syed, Algorithms for Stochastic Games and Related Topics,, Ph.D thesis, (1999).
|
[20] |
A. F. Veinott, On finding optimal policies in discrete dynamic programming with no discounting,, The Annals of Mathematical Statistics, 37 (1966), 1284.
doi: 10.1214/aoms/1177699272. |
[1] |
Guangjun Shen, Xueying Wu, Xiuwei Yin. Stabilization of stochastic differential equations driven by G-Lévy process with discrete-time feedback control. Discrete & Continuous Dynamical Systems - B, 2021, 26 (2) : 755-774. doi: 10.3934/dcdsb.2020133 |
[2] |
Xin Zhao, Tao Feng, Liang Wang, Zhipeng Qiu. Threshold dynamics and sensitivity analysis of a stochastic semi-Markov switched SIRS epidemic model with nonlinear incidence and vaccination. Discrete & Continuous Dynamical Systems - B, 2020 doi: 10.3934/dcdsb.2021010 |
[3] |
Yueyang Zheng, Jingtao Shi. A stackelberg game of backward stochastic differential equations with partial information. Mathematical Control & Related Fields, 2020 doi: 10.3934/mcrf.2020047 |
[4] |
Mehdi Bastani, Davod Khojasteh Salkuyeh. On the GSOR iteration method for image restoration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 27-43. doi: 10.3934/naco.2020013 |
[5] |
Yiling Chen, Baojun Bian. Optimal dividend policy in an insurance company with contagious arrivals of claims. Mathematical Control & Related Fields, 2021, 11 (1) : 1-22. doi: 10.3934/mcrf.2020024 |
[6] |
Laura Aquilanti, Simone Cacace, Fabio Camilli, Raul De Maio. A Mean Field Games model for finite mixtures of Bernoulli and categorical distributions. Journal of Dynamics & Games, 2020 doi: 10.3934/jdg.2020033 |
[7] |
Maicon Sônego. Stable transition layers in an unbalanced bistable equation. Discrete & Continuous Dynamical Systems - B, 2020 doi: 10.3934/dcdsb.2020370 |
[8] |
Angelica Pachon, Federico Polito, Costantino Ricciuti. On discrete-time semi-Markov processes. Discrete & Continuous Dynamical Systems - B, 2021, 26 (3) : 1499-1529. doi: 10.3934/dcdsb.2020170 |
[9] |
Wai-Ki Ching, Jia-Wen Gu, Harry Zheng. On correlated defaults and incomplete information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 889-908. doi: 10.3934/jimo.2020003 |
[10] |
Nicolas Rougerie. On two properties of the Fisher information. Kinetic & Related Models, 2021, 14 (1) : 77-88. doi: 10.3934/krm.2020049 |
[11] |
Parikshit Upadhyaya, Elias Jarlebring, Emanuel H. Rubensson. A density matrix approach to the convergence of the self-consistent field iteration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 99-115. doi: 10.3934/naco.2020018 |
[12] |
Tian Ma, Shouhong Wang. Topological phase transition III: Solar surface eruptions and sunspots. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 501-514. doi: 10.3934/dcdsb.2020350 |
[13] |
Zsolt Saffer, Miklós Telek, Gábor Horváth. Analysis of Markov-modulated fluid polling systems with gated discipline. Journal of Industrial & Management Optimization, 2021, 17 (2) : 575-599. doi: 10.3934/jimo.2019124 |
[14] |
Shengxin Zhu, Tongxiang Gu, Xingping Liu. AIMS: Average information matrix splitting. Mathematical Foundations of Computing, 2020, 3 (4) : 301-308. doi: 10.3934/mfc.2020012 |
[15] |
Zonghong Cao, Jie Min. Selection and impact of decision mode of encroachment and retail service in a dual-channel supply chain. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2020167 |
[16] |
Honglin Yang, Jiawu Peng. Coordinating a supply chain with demand information updating. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2020181 |
[17] |
Chao Xing, Jiaojiao Pan, Hong Luo. Stability and dynamic transition of a toxin-producing phytoplankton-zooplankton model with additional food. Communications on Pure & Applied Analysis, 2021, 20 (1) : 427-448. doi: 10.3934/cpaa.2020275 |
[18] |
Xiao-Xu Chen, Peng Xu, Jiao-Jiao Li, Thomas Walker, Guo-Qiang Yang. Decision-making in a retailer-led closed-loop supply chain involving a third-party logistics provider. Journal of Industrial & Management Optimization, 2020 doi: 10.3934/jimo.2021014 |
[19] |
Chuan Ding, Da-Hai Li. Angel capitalists exit decisions under information asymmetry: IPO or acquisitions. Journal of Industrial & Management Optimization, 2021, 17 (1) : 369-392. doi: 10.3934/jimo.2019116 |
[20] |
Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]