This paper introduces a probability criterion for two-person zero-sum stochastic games, and focuses on the probability that the payoff before the first passage time to some target state set exceeds a level formulated by both players, which shows the security for player 1 and the risk for player 2. For the game model based on discrete-time Markov chains, under a suitable condition on the game's primitive data, we establish the Shapley equation, from which the existences of the value of the game and a pair of optimal policies are ensured. We also provide a recursive way of computing (or at least approximating) the value of the game. At last, the application of our main result is exhibited via an inventory system.
Citation: |
[1] |
K. Fan, Minimax theorems, Proc. Nat. Acad. Sci., 39 (1953), 42-47.
doi: 10.1073/pnas.39.1.42.![]() ![]() ![]() |
[2] |
E. A. Feinberg and J. Fei, An inequality for variances of the discounted rewards, J. Appl. Probab., 46 (2009), 1209-1212.
doi: 10.1017/S0021900200006240.![]() ![]() ![]() |
[3] |
X. P. Guo and O. Hernández-Lerma, Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), 327-345.
doi: 10.1017/S0021900200019331.![]() ![]() ![]() |
[4] |
X.P. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303-320.
doi: 10.1017/S002190020000036X.![]() ![]() ![]() |
[5] |
X. P. Guo and O. Hernández-Lerma, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), 645-668.
doi: 10.1017/S0001867800001981.![]() ![]() ![]() |
[6] |
X. P. Guo and O. Hernández-Lerma,
Continuous-Time Markov Decision Processes: Theory and Applications, Springer-Verlag, Berlin, 2009.
doi: 10.1007/978-3-642-02547-1.![]() ![]() ![]() |
[7] |
X. P. Guo, M. Vykertas and Y. Zhang, Absorbing continuous-time Markov decision processes with total cost criteria, Adv. in Appl. Probab., 45 (2013), 490-519.
doi: 10.1017/S0001867800006418.![]() ![]() ![]() |
[8] |
O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optim., 39 (2000), 1520-1539.
doi: 10.1137/S0363012999361962.![]() ![]() ![]() |
[9] |
O. Hernández-Lerma and J. B. Lasserre,
Discrete-time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.
doi: 10.1007/978-1-4612-0729-0.![]() ![]() ![]() |
[10] |
O. Hernández-Lerma and J. B. Lasserre,
Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
doi: 10.1007/978-1-4612-0561-6.![]() ![]() ![]() |
[11] |
Y. H. Huang, X. P. Guo and X. Y. Song, Performance analysis for controlled semi-Markov systems with application to maintenance, J. Optim. Theory Appl., 150 (2011), 395-415.
doi: 10.1007/s10957-011-9813-7.![]() ![]() ![]() |
[12] |
Y. H. Huang, X. P. Guo and Z. F. Li, Minimum risk probability for finite horizon semi-Markov decision processes, J. Math. Anal. Appl., 402 (2013), 378-391.
doi: 10.1016/j.jmaa.2013.01.021.![]() ![]() ![]() |
[13] |
A. Ja![]() ![]() ![]() |
[14] |
A. Ja![]() ![]() ![]() |
[15] |
A. Jaśkiewicz and A. S. Nowak,
Non-Zero-Sum Stochastic Games, In: Basar T, Zaccour G (eds.) Handbook of Dynamic Games, 2016.
![]() |
[16] |
A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res., 50 (1999), 399-419.
doi: 10.1007/s001860050078.![]() ![]() ![]() |
[17] |
A. S. Nowak, Measurable selection theorems for minimax stochastic optimization problems, SIAM J.Control Optim., 23 (1985), 466-476.
doi: 10.1137/0323030.![]() ![]() ![]() |
[18] |
Y. Ohtsubo, Minimizing risk models in stochastic shortest path problems, Math. Methods Oper. Res., 57 (2003), 79-88.
doi: 10.1007/s001860200246.![]() ![]() ![]() |
[19] |
Y. Ohtsubo, Optimal threshold probability in undiscounted Markov decision processes with a target set, Appl. Math. Comput., 149 (2004), 519-532.
doi: 10.1016/S0096-3003(03)00158-9.![]() ![]() ![]() |
[20] |
T. Parthasarathy and S. Sinha, Existence of equilibrium stationary strategies in nonzero-sum discounted stochastic games with uncountable state space and state independent transitions, Internat. J. Game Theory, 18 (1989), 189-194.
doi: 10.1007/BF01268158.![]() ![]() ![]() |
[21] |
M. L. Puterman,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, 1994.
![]() ![]() |
[22] |
S. Saha, Zero-sum stochastic games with partial information and average payoff, J. Optim. Theory Appl., 160 (2014), 344-354.
doi: 10.1007/s10957-013-0359-8.![]() ![]() ![]() |
[23] |
M. Sakaguchi and Y. Ohtsubo, Optimal threshold probability and expectation in semi-Markov decision processes, Appl. Math. Comput., 216 (2010), 2947-2958.
doi: 10.1016/j.amc.2010.04.007.![]() ![]() ![]() |
[24] |
L. I. Sennott, Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Z. Oper. Res., 40 (1994), 145-162.
doi: 10.1007/BF01432807.![]() ![]() ![]() |
[25] |
O. Vega-Amaya, Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation, SIAM J. Control Optim., 42 (2003), 1876-1894.
doi: 10.1137/S0363012902408423.![]() ![]() ![]() |