2017, 4(4): 369-383. doi: 10.3934/jdg.2017020

A probability criterion for zero-sum stochastic games

1. 

School of Computer Science and Network Security, Dongguan University of Technology, Dongguan, 523808, China

a. 

School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China

b. 

Sun Yat-sen Business School, Sun Yat-Sen University, Guangzhou, 510275, China

* Corresponding author: Xianping Guo

Received  April 2017 Revised  July 2017 Published  October 2017

This paper introduces a probability criterion for two-person zero-sum stochastic games, and focuses on the probability that the payoff before the first passage time to some target state set exceeds a level formulated by both players, which shows the security for player 1 and the risk for player 2. For the game model based on discrete-time Markov chains, under a suitable condition on the game's primitive data, we establish the Shapley equation, from which the existences of the value of the game and a pair of optimal policies are ensured. We also provide a recursive way of computing (or at least approximating) the value of the game. At last, the application of our main result is exhibited via an inventory system.

Citation: Xiangxiang Huang, Xianping Guo, Jianping Peng. A probability criterion for zero-sum stochastic games. Journal of Dynamics & Games, 2017, 4 (4) : 369-383. doi: 10.3934/jdg.2017020
References:
[1]

K. Fan, Minimax theorems, Proc. Nat. Acad. Sci., 39 (1953), 42-47. doi: 10.1073/pnas.39.1.42.

[2]

E. A. Feinberg and J. Fei, An inequality for variances of the discounted rewards, J. Appl. Probab., 46 (2009), 1209-1212. doi: 10.1017/S0021900200006240.

[3]

X. P. Guo and O. Hernández-Lerma, Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), 327-345. doi: 10.1017/S0021900200019331.

[4]

X.P. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303-320. doi: 10.1017/S002190020000036X.

[5]

X. P. Guo and O. Hernández-Lerma, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), 645-668. doi: 10.1017/S0001867800001981.

[6]

X. P. Guo and O. Hernández-Lerma, Continuous-Time Markov Decision Processes: Theory and Applications, Springer-Verlag, Berlin, 2009.

[7]

X. P. GuoM. Vykertas and Y. Zhang, Absorbing continuous-time Markov decision processes with total cost criteria, Adv. in Appl. Probab., 45 (2013), 490-519. doi: 10.1017/S0001867800006418.

[8]

O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optim., 39 (2000), 1520-1539. doi: 10.1137/S0363012999361962.

[9]

O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.

[10]

O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.

[11]

Y. H. HuangX. P. Guo and X. Y. Song, Performance analysis for controlled semi-Markov systems with application to maintenance, J. Optim. Theory Appl., 150 (2011), 395-415. doi: 10.1007/s10957-011-9813-7.

[12]

Y. H. HuangX. P. Guo and Z. F. Li, Minimum risk probability for finite horizon semi-Markov decision processes, J. Math. Anal. Appl., 402 (2013), 378-391. doi: 10.1016/j.jmaa.2013.01.021.

[13]

A. Jaśkiewicz and A. S. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities, SIAM J. Control Optim., 45 (2006), 773-789. doi: 10.1137/S0363012904443257.

[14]

A. Jaśkiewicz, Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities, J. Optim. Theory Appl., 141 (2009), 321-347. doi: 10.1007/s10957-008-9491-2.

[15]

A. Jaśkiewicz and A. S. Nowak, Non-Zero-Sum Stochastic Games, In: Basar T, Zaccour G (eds.) Handbook of Dynamic Games, 2016.

[16]

A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res., 50 (1999), 399-419. doi: 10.1007/s001860050078.

[17]

A. S. Nowak, Measurable selection theorems for minimax stochastic optimization problems, SIAM J.Control Optim., 23 (1985), 466-476. doi: 10.1137/0323030.

[18]

Y. Ohtsubo, Minimizing risk models in stochastic shortest path problems, Math. Methods Oper. Res., 57 (2003), 79-88. doi: 10.1007/s001860200246.

[19]

Y. Ohtsubo, Optimal threshold probability in undiscounted Markov decision processes with a target set, Appl. Math. Comput., 149 (2004), 519-532. doi: 10.1016/S0096-3003(03)00158-9.

[20]

T. Parthasarathy and S. Sinha, Existence of equilibrium stationary strategies in nonzero-sum discounted stochastic games with uncountable state space and state independent transitions, Internat. J. Game Theory, 18 (1989), 189-194. doi: 10.1007/BF01268158.

[21]

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, 1994.

[22]

S. Saha, Zero-sum stochastic games with partial information and average payoff, J. Optim. Theory Appl., 160 (2014), 344-354. doi: 10.1007/s10957-013-0359-8.

[23]

M. Sakaguchi and Y. Ohtsubo, Optimal threshold probability and expectation in semi-Markov decision processes, Appl. Math. Comput., 216 (2010), 2947-2958. doi: 10.1016/j.amc.2010.04.007.

[24]

L. I. Sennott, Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Z. Oper. Res., 40 (1994), 145-162. doi: 10.1007/BF01432807.

[25]

O. Vega-Amaya, Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation, SIAM J. Control Optim., 42 (2003), 1876-1894. doi: 10.1137/S0363012902408423.

show all references

References:
[1]

K. Fan, Minimax theorems, Proc. Nat. Acad. Sci., 39 (1953), 42-47. doi: 10.1073/pnas.39.1.42.

[2]

E. A. Feinberg and J. Fei, An inequality for variances of the discounted rewards, J. Appl. Probab., 46 (2009), 1209-1212. doi: 10.1017/S0021900200006240.

[3]

X. P. Guo and O. Hernández-Lerma, Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 (2003), 327-345. doi: 10.1017/S0021900200019331.

[4]

X.P. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab., 42 (2005), 303-320. doi: 10.1017/S002190020000036X.

[5]

X. P. Guo and O. Hernández-Lerma, Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs, Adv. in Appl. Probab., 39 (2007), 645-668. doi: 10.1017/S0001867800001981.

[6]

X. P. Guo and O. Hernández-Lerma, Continuous-Time Markov Decision Processes: Theory and Applications, Springer-Verlag, Berlin, 2009.

[7]

X. P. GuoM. Vykertas and Y. Zhang, Absorbing continuous-time Markov decision processes with total cost criteria, Adv. in Appl. Probab., 45 (2013), 490-519. doi: 10.1017/S0001867800006418.

[8]

O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optim., 39 (2000), 1520-1539. doi: 10.1137/S0363012999361962.

[9]

O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.

[10]

O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.

[11]

Y. H. HuangX. P. Guo and X. Y. Song, Performance analysis for controlled semi-Markov systems with application to maintenance, J. Optim. Theory Appl., 150 (2011), 395-415. doi: 10.1007/s10957-011-9813-7.

[12]

Y. H. HuangX. P. Guo and Z. F. Li, Minimum risk probability for finite horizon semi-Markov decision processes, J. Math. Anal. Appl., 402 (2013), 378-391. doi: 10.1016/j.jmaa.2013.01.021.

[13]

A. Jaśkiewicz and A. S. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities, SIAM J. Control Optim., 45 (2006), 773-789. doi: 10.1137/S0363012904443257.

[14]

A. Jaśkiewicz, Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities, J. Optim. Theory Appl., 141 (2009), 321-347. doi: 10.1007/s10957-008-9491-2.

[15]

A. Jaśkiewicz and A. S. Nowak, Non-Zero-Sum Stochastic Games, In: Basar T, Zaccour G (eds.) Handbook of Dynamic Games, 2016.

[16]

A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res., 50 (1999), 399-419. doi: 10.1007/s001860050078.

[17]

A. S. Nowak, Measurable selection theorems for minimax stochastic optimization problems, SIAM J.Control Optim., 23 (1985), 466-476. doi: 10.1137/0323030.

[18]

Y. Ohtsubo, Minimizing risk models in stochastic shortest path problems, Math. Methods Oper. Res., 57 (2003), 79-88. doi: 10.1007/s001860200246.

[19]

Y. Ohtsubo, Optimal threshold probability in undiscounted Markov decision processes with a target set, Appl. Math. Comput., 149 (2004), 519-532. doi: 10.1016/S0096-3003(03)00158-9.

[20]

T. Parthasarathy and S. Sinha, Existence of equilibrium stationary strategies in nonzero-sum discounted stochastic games with uncountable state space and state independent transitions, Internat. J. Game Theory, 18 (1989), 189-194. doi: 10.1007/BF01268158.

[21]

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, 1994.

[22]

S. Saha, Zero-sum stochastic games with partial information and average payoff, J. Optim. Theory Appl., 160 (2014), 344-354. doi: 10.1007/s10957-013-0359-8.

[23]

M. Sakaguchi and Y. Ohtsubo, Optimal threshold probability and expectation in semi-Markov decision processes, Appl. Math. Comput., 216 (2010), 2947-2958. doi: 10.1016/j.amc.2010.04.007.

[24]

L. I. Sennott, Nonzero-sum stochastic games with unbounded costs: discounted and average cost cases, Z. Oper. Res., 40 (1994), 145-162. doi: 10.1007/BF01432807.

[25]

O. Vega-Amaya, Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation, SIAM J. Control Optim., 42 (2003), 1876-1894. doi: 10.1137/S0363012902408423.

[1]

Alexander J. Zaslavski. Turnpike properties of approximate solutions of dynamic discrete time zero-sum games. Journal of Dynamics & Games, 2014, 1 (2) : 299-330. doi: 10.3934/jdg.2014.1.299

[2]

Alexander J. Zaslavski. Structure of approximate solutions of dynamic continuous time zero-sum games. Journal of Dynamics & Games, 2014, 1 (1) : 153-179. doi: 10.3934/jdg.2014.1.153

[3]

Fernando Luque-Vásquez, J. Adolfo Minjárez-Sosa. Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side. Journal of Dynamics & Games, 2014, 1 (1) : 105-119. doi: 10.3934/jdg.2014.1.105

[4]

Valery Y. Glizer, Oleg Kelis. Singular infinite horizon zero-sum linear-quadratic differential game: Saddle-point equilibrium sequence. Numerical Algebra, Control & Optimization, 2017, 7 (1) : 1-20. doi: 10.3934/naco.2017001

[5]

Marianne Akian, Stéphane Gaubert, Antoine Hochart. Ergodicity conditions for zero-sum games. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 3901-3931. doi: 10.3934/dcds.2015.35.3901

[6]

Sie Long Kek, Kok Lay Teo, Mohd Ismail Abd Aziz. Filtering solution of nonlinear stochastic optimal control problem in discrete-time with model-reality differences. Numerical Algebra, Control & Optimization, 2012, 2 (1) : 207-222. doi: 10.3934/naco.2012.2.207

[7]

Yuefen Chen, Yuanguo Zhu. Indefinite LQ optimal control with process state inequality constraints for discrete-time uncertain systems. Journal of Industrial & Management Optimization, 2017, 13 (4) : 1-18. doi: 10.3934/jimo.2017082

[8]

Sie Long Kek, Mohd Ismail Abd Aziz. Output regulation for discrete-time nonlinear stochastic optimal control problems with model-reality differences. Numerical Algebra, Control & Optimization, 2015, 5 (3) : 275-288. doi: 10.3934/naco.2015.5.275

[9]

Sie Long Kek, Mohd Ismail Abd Aziz, Kok Lay Teo, Rohanin Ahmad. An iterative algorithm based on model-reality differences for discrete-time nonlinear stochastic optimal control problems. Numerical Algebra, Control & Optimization, 2013, 3 (1) : 109-125. doi: 10.3934/naco.2013.3.109

[10]

Ka Chun Cheung, Hailiang Yang. Optimal investment-consumption strategy in a discrete-time model with regime switching. Discrete & Continuous Dynamical Systems - B, 2007, 8 (2) : 315-332. doi: 10.3934/dcdsb.2007.8.315

[11]

Agnieszka B. Malinowska, Tatiana Odzijewicz. Optimal control of the discrete-time fractional-order Cucker-Smale model. Discrete & Continuous Dynamical Systems - B, 2018, 23 (1) : 347-357. doi: 10.3934/dcdsb.2018023

[12]

Zhi-Wei Sun. Unification of zero-sum problems, subset sums and covers of Z. Electronic Research Announcements, 2003, 9: 51-60.

[13]

Beatris A. Escobedo-Trujillo. Discount-sensitive equilibria in zero-sum stochastic differential games. Journal of Dynamics & Games, 2016, 3 (1) : 25-50. doi: 10.3934/jdg.2016002

[14]

Sylvain Sorin, Guillaume Vigeral. Reversibility and oscillations in zero-sum discounted stochastic games. Journal of Dynamics & Games, 2015, 2 (1) : 103-115. doi: 10.3934/jdg.2015.2.103

[15]

Qiuying Li, Lifang Huang, Jianshe Yu. Modulation of first-passage time for bursty gene expression via random signals. Mathematical Biosciences & Engineering, 2017, 14 (5-6) : 1261-1277. doi: 10.3934/mbe.2017065

[16]

Piotr Oprocha. Chain recurrence in multidimensional time discrete dynamical systems. Discrete & Continuous Dynamical Systems - A, 2008, 20 (4) : 1039-1056. doi: 10.3934/dcds.2008.20.1039

[17]

Yayun Zheng, Xu Sun. Governing equations for Probability densities of stochastic differential equations with discrete time delays. Discrete & Continuous Dynamical Systems - B, 2017, 22 (9) : 3615-3628. doi: 10.3934/dcdsb.2017182

[18]

Bara Kim, Jeongsim Kim. Explicit solution for the stationary distribution of a discrete-time finite buffer queue. Journal of Industrial & Management Optimization, 2016, 12 (3) : 1121-1133. doi: 10.3934/jimo.2016.12.1121

[19]

Ciprian Preda. Discrete-time theorems for the dichotomy of one-parameter semigroups. Communications on Pure & Applied Analysis, 2008, 7 (2) : 457-463. doi: 10.3934/cpaa.2008.7.457

[20]

Lih-Ing W. Roeger. Dynamically consistent discrete-time SI and SIS epidemic models. Conference Publications, 2013, 2013 (special) : 653-662. doi: 10.3934/proc.2013.2013.653

 Impact Factor: 

Metrics

  • PDF downloads (14)
  • HTML views (81)
  • Cited by (0)

Other articles
by authors

[Back to Top]