2014, 1(4): 639-659. doi: 10.3934/jdg.2014.1.639

A limit theorem for Markov decision processes

1. 

Center for Mathematical Economics, University Bielefeld, Postfach 100131, 33501 Bielefeld, Germany

Received  January 2014 Revised  June 2014 Published  November 2014

In this paper I prove a deterministic approximation theorem for a sequence of Markov decision processes with finitely many actions and general state spaces as they appear frequently in economics, game theory and operations research. Using viscosity solution methods no a-priori differentiabililty assumptions are imposed on the value function.
Citation: Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics & Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639
References:
[1]

R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information,, MIT Press, (1995).

[2]

M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,, Birkhäuser - Systems & Control: Foundations & Applications, (1997). doi: 10.1007/978-0-8176-4755-1.

[3]

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case,, Academic Press, (1978).

[4]

P. Billingsley, Convergence of Probability Measures,, Wiley Series in Probability and Statistics, (1999). doi: 10.1002/9780470316962.

[5]

I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory,, Applied Mathematics & Optimization, 11 (1984), 161. doi: 10.1007/BF01442176.

[6]

P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().

[7]

N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control,, Stochastics, 20 (1987), 169. doi: 10.1080/17442508708833443.

[8]

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence,, Wiley, (1986). doi: 10.1002/9780470316658.

[9]

M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory,, Applied Mathematics & Optimization, 15 (1987), 1. doi: 10.1007/BF01442644.

[10]

W. H. Fleming, The convergence problem for differential games,, Journal of Mathematical Analysis and Applications, 3 (1961), 102. doi: 10.1016/0022-247X(61)90009-9.

[11]

N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization,, IEEE Transactions on Automatic Control, 57 (2012), 2266. doi: 10.1109/TAC.2012.2186176.

[12]

F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player,, hal-00910970, (2013).

[13]

R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem,, SIAM Journal on Control and Optimization, 23 (1985), 242. doi: 10.1137/0323018.

[14]

X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs,, Journal of Applied Probability, 42 (2005), 303. doi: 10.1239/jap/1118777172.

[15]

O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria,, Springer-Verlag, (1996). doi: 10.1007/978-1-4612-0729-0.

[16]

J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem,, Econometrica, 79 (2011), 1277. doi: 10.3982/ECTA9004.

[17]

O. Kallenberg, Foundations of Modern Probability,, Springer, (2002). doi: 10.1007/978-1-4757-4015-8.

[18]

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,, Springer-Verlag, (2000).

[19]

H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,, Birkhäuser - Systems & Control: Foundations & Applications, (1990). doi: 10.1007/978-1-4612-4482-0.

[20]

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,, Springer, (2001). doi: 10.1007/978-1-4613-0007-6.

[21]

A. Neyman, Stochastic games with short-stage duration,, Dynamic Games and Applications, 3 (2013), 236. doi: 10.1007/s13235-013-0083-x.

[22]

A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games,, Mathematics of Operations Research, 17 (1992), 519. doi: 10.1287/moor.17.3.519.

[23]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory,, Unpublished manuscript, (2014).

[24]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model,, Unpublished manuscript, (2014).

[25]

Y. Sannikov, Games with imperfectly observable actions in continuous time,, Econometrica, 75 (2007), 1285. doi: 10.1111/j.1468-0262.2007.00795.x.

[26]

S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies,, SIAM Journal on Control and Optimization, 48 (2009), 2461. doi: 10.1137/090749098.

[27]

M. Staudigl, Stochastic stability in asymmetric binary choice coordination games,, Games and Economic Behavior, 75 (2012), 372. doi: 10.1016/j.geb.2011.11.003.

[28]

M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time,, Bielefeld University, (2014).

[29]

J. Warga, Optimal Control of Differential and Functional Equations,, Academic Press, (1972).

show all references

References:
[1]

R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information,, MIT Press, (1995).

[2]

M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,, Birkhäuser - Systems & Control: Foundations & Applications, (1997). doi: 10.1007/978-0-8176-4755-1.

[3]

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case,, Academic Press, (1978).

[4]

P. Billingsley, Convergence of Probability Measures,, Wiley Series in Probability and Statistics, (1999). doi: 10.1002/9780470316962.

[5]

I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory,, Applied Mathematics & Optimization, 11 (1984), 161. doi: 10.1007/BF01442176.

[6]

P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().

[7]

N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control,, Stochastics, 20 (1987), 169. doi: 10.1080/17442508708833443.

[8]

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence,, Wiley, (1986). doi: 10.1002/9780470316658.

[9]

M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory,, Applied Mathematics & Optimization, 15 (1987), 1. doi: 10.1007/BF01442644.

[10]

W. H. Fleming, The convergence problem for differential games,, Journal of Mathematical Analysis and Applications, 3 (1961), 102. doi: 10.1016/0022-247X(61)90009-9.

[11]

N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization,, IEEE Transactions on Automatic Control, 57 (2012), 2266. doi: 10.1109/TAC.2012.2186176.

[12]

F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player,, hal-00910970, (2013).

[13]

R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem,, SIAM Journal on Control and Optimization, 23 (1985), 242. doi: 10.1137/0323018.

[14]

X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs,, Journal of Applied Probability, 42 (2005), 303. doi: 10.1239/jap/1118777172.

[15]

O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria,, Springer-Verlag, (1996). doi: 10.1007/978-1-4612-0729-0.

[16]

J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem,, Econometrica, 79 (2011), 1277. doi: 10.3982/ECTA9004.

[17]

O. Kallenberg, Foundations of Modern Probability,, Springer, (2002). doi: 10.1007/978-1-4757-4015-8.

[18]

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,, Springer-Verlag, (2000).

[19]

H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,, Birkhäuser - Systems & Control: Foundations & Applications, (1990). doi: 10.1007/978-1-4612-4482-0.

[20]

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,, Springer, (2001). doi: 10.1007/978-1-4613-0007-6.

[21]

A. Neyman, Stochastic games with short-stage duration,, Dynamic Games and Applications, 3 (2013), 236. doi: 10.1007/s13235-013-0083-x.

[22]

A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games,, Mathematics of Operations Research, 17 (1992), 519. doi: 10.1287/moor.17.3.519.

[23]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory,, Unpublished manuscript, (2014).

[24]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model,, Unpublished manuscript, (2014).

[25]

Y. Sannikov, Games with imperfectly observable actions in continuous time,, Econometrica, 75 (2007), 1285. doi: 10.1111/j.1468-0262.2007.00795.x.

[26]

S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies,, SIAM Journal on Control and Optimization, 48 (2009), 2461. doi: 10.1137/090749098.

[27]

M. Staudigl, Stochastic stability in asymmetric binary choice coordination games,, Games and Economic Behavior, 75 (2012), 372. doi: 10.1016/j.geb.2011.11.003.

[28]

M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time,, Bielefeld University, (2014).

[29]

J. Warga, Optimal Control of Differential and Functional Equations,, Academic Press, (1972).

[1]

A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial & Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429

[2]

Vincent Renault, Michèle Thieullen, Emmanuel Trélat. Optimal control of infinite-dimensional piecewise deterministic Markov processes and application to the control of neuronal dynamics via Optogenetics. Networks & Heterogeneous Media, 2017, 12 (3) : 417-459. doi: 10.3934/nhm.2017019

[3]

Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics & Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009

[4]

Alexander Mielke. Weak-convergence methods for Hamiltonian multiscale problems. Discrete & Continuous Dynamical Systems - A, 2008, 20 (1) : 53-79. doi: 10.3934/dcds.2008.20.53

[5]

N. U. Ahmed. Weak solutions of stochastic reaction diffusion equations and their optimal control. Discrete & Continuous Dynamical Systems - S, 2018, 11 (6) : 1011-1029. doi: 10.3934/dcdss.2018059

[6]

Roberto C. Cabrales, Gema Camacho, Enrique Fernández-Cara. Analysis and optimal control of some solidification processes. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 3985-4017. doi: 10.3934/dcds.2014.34.3985

[7]

Zhen Lei, Yi Zhou. BKM's criterion and global weak solutions for magnetohydrodynamics with zero viscosity. Discrete & Continuous Dynamical Systems - A, 2009, 25 (2) : 575-583. doi: 10.3934/dcds.2009.25.575

[8]

Karl Kunisch, Markus Müller. Uniform convergence of the POD method and applications to optimal control. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 4477-4501. doi: 10.3934/dcds.2015.35.4477

[9]

Naïla Hayek. Infinite-horizon multiobjective optimal control problems for bounded processes. Discrete & Continuous Dynamical Systems - S, 2018, 11 (6) : 1121-1141. doi: 10.3934/dcdss.2018064

[10]

Wael Bahsoun, Paweł Góra. SRB measures for certain Markov processes. Discrete & Continuous Dynamical Systems - A, 2011, 30 (1) : 17-37. doi: 10.3934/dcds.2011.30.17

[11]

Chunjuan Hou, Yanping Chen, Zuliang Lu. Superconvergence property of finite element methods for parabolic optimal control problems. Journal of Industrial & Management Optimization, 2011, 7 (4) : 927-945. doi: 10.3934/jimo.2011.7.927

[12]

Christian Heinemann, Christiane Kraus. Existence of weak solutions for a PDE system describing phase separation and damage processes including inertial effects. Discrete & Continuous Dynamical Systems - A, 2015, 35 (6) : 2565-2590. doi: 10.3934/dcds.2015.35.2565

[13]

Xian Chen, Zhi-Ming Ma. A transformation of Markov jump processes and applications in genetic study. Discrete & Continuous Dynamical Systems - A, 2014, 34 (12) : 5061-5084. doi: 10.3934/dcds.2014.34.5061

[14]

A. M. Vershik. Polymorphisms, Markov processes, quasi-similarity. Discrete & Continuous Dynamical Systems - A, 2005, 13 (5) : 1305-1324. doi: 10.3934/dcds.2005.13.1305

[15]

H.Thomas Banks, Shuhua Hu. Nonlinear stochastic Markov processes and modeling uncertainty in populations. Mathematical Biosciences & Engineering, 2012, 9 (1) : 1-25. doi: 10.3934/mbe.2012.9.1

[16]

Steven Richardson, Song Wang. The viscosity approximation to the Hamilton-Jacobi-Bellman equation in optimal feedback control: Upper bounds for extended domains. Journal of Industrial & Management Optimization, 2010, 6 (1) : 161-175. doi: 10.3934/jimo.2010.6.161

[17]

Shanjian Tang, Fu Zhang. Path-dependent optimal stochastic control and viscosity solution of associated Bellman equations. Discrete & Continuous Dynamical Systems - A, 2015, 35 (11) : 5521-5553. doi: 10.3934/dcds.2015.35.5521

[18]

Ryan Loxton, Qun Lin, Volker Rehbock, Kok Lay Teo. Control parameterization for optimal control problems with continuous inequality constraints: New convergence results. Numerical Algebra, Control & Optimization, 2012, 2 (3) : 571-599. doi: 10.3934/naco.2012.2.571

[19]

Matthias Gerdts, Martin Kunkel. Convergence analysis of Euler discretization of control-state constrained optimal control problems with controls of bounded variation. Journal of Industrial & Management Optimization, 2014, 10 (1) : 311-336. doi: 10.3934/jimo.2014.10.311

[20]

Filippo Gazzola, Mirko Sardella. Attractors for families of processes in weak topologies of Banach spaces. Discrete & Continuous Dynamical Systems - A, 1998, 4 (3) : 455-466. doi: 10.3934/dcds.1998.4.455

 Impact Factor: 

Metrics

  • PDF downloads (2)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]