October  2014, 1(4): 639-659. doi: 10.3934/jdg.2014.1.639

A limit theorem for Markov decision processes

1. 

Center for Mathematical Economics, University Bielefeld, Postfach 100131, 33501 Bielefeld, Germany

Received  January 2014 Revised  June 2014 Published  November 2014

In this paper I prove a deterministic approximation theorem for a sequence of Markov decision processes with finitely many actions and general state spaces as they appear frequently in economics, game theory and operations research. Using viscosity solution methods no a-priori differentiabililty assumptions are imposed on the value function.
Citation: Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics & Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639
References:
[1]

R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information, MIT Press, Cambridge, MA, 1995.  Google Scholar

[2]

M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhäuser - Systems & Control: Foundations & Applications, 1997. doi: 10.1007/978-0-8176-4755-1.  Google Scholar

[3]

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, 1978.  Google Scholar

[4]

P. Billingsley, Convergence of Probability Measures, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., New York, 1999. doi: 10.1002/9780470316962.  Google Scholar

[5]

I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory, Applied Mathematics & Optimization, 11 (1984), 161-181. doi: 10.1007/BF01442176.  Google Scholar

[6]

P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().   Google Scholar

[7]

N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control, Stochastics, 20 (1987), 169-219. doi: 10.1080/17442508708833443.  Google Scholar

[8]

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence, Wiley, New York, 1986. doi: 10.1002/9780470316658.  Google Scholar

[9]

M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory, Applied Mathematics & Optimization, 15 (1987), 1-13. doi: 10.1007/BF01442644.  Google Scholar

[10]

W. H. Fleming, The convergence problem for differential games, Journal of Mathematical Analysis and Applications, 3 (1961), 102-116. doi: 10.1016/0022-247X(61)90009-9.  Google Scholar

[11]

N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization, IEEE Transactions on Automatic Control, 57 (2012), 2266-2280. doi: 10.1109/TAC.2012.2186176.  Google Scholar

[12]

F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player, hal-00910970, version 1, 2013. Google Scholar

[13]

R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem, SIAM Journal on Control and Optimization, 23 (1985), 242-266. doi: 10.1137/0323018.  Google Scholar

[14]

X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs, Journal of Applied Probability, 42 (2005), 303-320. doi: 10.1239/jap/1118777172.  Google Scholar

[15]

O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria, Springer-Verlag, 1996. doi: 10.1007/978-1-4612-0729-0.  Google Scholar

[16]

J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem, Econometrica, 79 (2011), 1277-1318. doi: 10.3982/ECTA9004.  Google Scholar

[17]

O. Kallenberg, Foundations of Modern Probability, Springer, New York [u.a.], 2nd edition, 2002. doi: 10.1007/978-1-4757-4015-8.  Google Scholar

[18]

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, Springer-Verlag, 2nd edition, 2000. Google Scholar

[19]

H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems, Birkhäuser - Systems & Control: Foundations & Applications, Boston, MA, 1990. doi: 10.1007/978-1-4612-4482-0.  Google Scholar

[20]

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York, 2nd edition, 2001. doi: 10.1007/978-1-4613-0007-6.  Google Scholar

[21]

A. Neyman, Stochastic games with short-stage duration, Dynamic Games and Applications, 3 (2013), 236-278. doi: 10.1007/s13235-013-0083-x.  Google Scholar

[22]

A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games, Mathematics of Operations Research, 17 (1992), 519-526. doi: 10.1287/moor.17.3.519.  Google Scholar

[23]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory, Unpublished manuscript, University of Wisconsin and Bielefeld University, 2014. Google Scholar

[24]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model, Unpublished manuscript, University of Wisconsin and Bielefeld University, 2014. Google Scholar

[25]

Y. Sannikov, Games with imperfectly observable actions in continuous time, Econometrica, 75 (2007), 1285-1329. doi: 10.1111/j.1468-0262.2007.00795.x.  Google Scholar

[26]

S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies, SIAM Journal on Control and Optimization, 48 (2009), 2461-2479. doi: 10.1137/090749098.  Google Scholar

[27]

M. Staudigl, Stochastic stability in asymmetric binary choice coordination games, Games and Economic Behavior, 75 (2012), 372-401. doi: 10.1016/j.geb.2011.11.003.  Google Scholar

[28]

M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time, Bielefeld University, unpublished manuscript, 2014. Google Scholar

[29]

J. Warga, Optimal Control of Differential and Functional Equations, Academic Press, New York-London, 1972.  Google Scholar

show all references

References:
[1]

R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information, MIT Press, Cambridge, MA, 1995.  Google Scholar

[2]

M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhäuser - Systems & Control: Foundations & Applications, 1997. doi: 10.1007/978-0-8176-4755-1.  Google Scholar

[3]

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, 1978.  Google Scholar

[4]

P. Billingsley, Convergence of Probability Measures, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., New York, 1999. doi: 10.1002/9780470316962.  Google Scholar

[5]

I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory, Applied Mathematics & Optimization, 11 (1984), 161-181. doi: 10.1007/BF01442176.  Google Scholar

[6]

P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().   Google Scholar

[7]

N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control, Stochastics, 20 (1987), 169-219. doi: 10.1080/17442508708833443.  Google Scholar

[8]

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence, Wiley, New York, 1986. doi: 10.1002/9780470316658.  Google Scholar

[9]

M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory, Applied Mathematics & Optimization, 15 (1987), 1-13. doi: 10.1007/BF01442644.  Google Scholar

[10]

W. H. Fleming, The convergence problem for differential games, Journal of Mathematical Analysis and Applications, 3 (1961), 102-116. doi: 10.1016/0022-247X(61)90009-9.  Google Scholar

[11]

N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization, IEEE Transactions on Automatic Control, 57 (2012), 2266-2280. doi: 10.1109/TAC.2012.2186176.  Google Scholar

[12]

F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player, hal-00910970, version 1, 2013. Google Scholar

[13]

R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem, SIAM Journal on Control and Optimization, 23 (1985), 242-266. doi: 10.1137/0323018.  Google Scholar

[14]

X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs, Journal of Applied Probability, 42 (2005), 303-320. doi: 10.1239/jap/1118777172.  Google Scholar

[15]

O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria, Springer-Verlag, 1996. doi: 10.1007/978-1-4612-0729-0.  Google Scholar

[16]

J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem, Econometrica, 79 (2011), 1277-1318. doi: 10.3982/ECTA9004.  Google Scholar

[17]

O. Kallenberg, Foundations of Modern Probability, Springer, New York [u.a.], 2nd edition, 2002. doi: 10.1007/978-1-4757-4015-8.  Google Scholar

[18]

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, Springer-Verlag, 2nd edition, 2000. Google Scholar

[19]

H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems, Birkhäuser - Systems & Control: Foundations & Applications, Boston, MA, 1990. doi: 10.1007/978-1-4612-4482-0.  Google Scholar

[20]

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York, 2nd edition, 2001. doi: 10.1007/978-1-4613-0007-6.  Google Scholar

[21]

A. Neyman, Stochastic games with short-stage duration, Dynamic Games and Applications, 3 (2013), 236-278. doi: 10.1007/s13235-013-0083-x.  Google Scholar

[22]

A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games, Mathematics of Operations Research, 17 (1992), 519-526. doi: 10.1287/moor.17.3.519.  Google Scholar

[23]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory, Unpublished manuscript, University of Wisconsin and Bielefeld University, 2014. Google Scholar

[24]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model, Unpublished manuscript, University of Wisconsin and Bielefeld University, 2014. Google Scholar

[25]

Y. Sannikov, Games with imperfectly observable actions in continuous time, Econometrica, 75 (2007), 1285-1329. doi: 10.1111/j.1468-0262.2007.00795.x.  Google Scholar

[26]

S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies, SIAM Journal on Control and Optimization, 48 (2009), 2461-2479. doi: 10.1137/090749098.  Google Scholar

[27]

M. Staudigl, Stochastic stability in asymmetric binary choice coordination games, Games and Economic Behavior, 75 (2012), 372-401. doi: 10.1016/j.geb.2011.11.003.  Google Scholar

[28]

M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time, Bielefeld University, unpublished manuscript, 2014. Google Scholar

[29]

J. Warga, Optimal Control of Differential and Functional Equations, Academic Press, New York-London, 1972.  Google Scholar

[1]

A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial & Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429

[2]

Vincent Renault, Michèle Thieullen, Emmanuel Trélat. Optimal control of infinite-dimensional piecewise deterministic Markov processes and application to the control of neuronal dynamics via Optogenetics. Networks & Heterogeneous Media, 2017, 12 (3) : 417-459. doi: 10.3934/nhm.2017019

[3]

Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics & Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009

[4]

Vladimir Turetsky, Valery Y. Glizer. Optimal decision in a Statistical Process Control with cubic loss. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021096

[5]

Alexander Mielke. Weak-convergence methods for Hamiltonian multiscale problems. Discrete & Continuous Dynamical Systems, 2008, 20 (1) : 53-79. doi: 10.3934/dcds.2008.20.53

[6]

N. U. Ahmed. Weak solutions of stochastic reaction diffusion equations and their optimal control. Discrete & Continuous Dynamical Systems - S, 2018, 11 (6) : 1011-1029. doi: 10.3934/dcdss.2018059

[7]

Bin Wang, Lin Mu. Viscosity robust weak Galerkin finite element methods for Stokes problems. Electronic Research Archive, 2021, 29 (1) : 1881-1895. doi: 10.3934/era.2020096

[8]

Marzia Bisi, Maria Groppi, Giorgio Martalò, Romina Travaglini. Optimal control of leachate recirculation for anaerobic processes in landfills. Discrete & Continuous Dynamical Systems - B, 2021, 26 (6) : 2957-2976. doi: 10.3934/dcdsb.2020215

[9]

Roberto C. Cabrales, Gema Camacho, Enrique Fernández-Cara. Analysis and optimal control of some solidification processes. Discrete & Continuous Dynamical Systems, 2014, 34 (10) : 3985-4017. doi: 10.3934/dcds.2014.34.3985

[10]

Zhen Lei, Yi Zhou. BKM's criterion and global weak solutions for magnetohydrodynamics with zero viscosity. Discrete & Continuous Dynamical Systems, 2009, 25 (2) : 575-583. doi: 10.3934/dcds.2009.25.575

[11]

Karl Kunisch, Markus Müller. Uniform convergence of the POD method and applications to optimal control. Discrete & Continuous Dynamical Systems, 2015, 35 (9) : 4477-4501. doi: 10.3934/dcds.2015.35.4477

[12]

Naïla Hayek. Infinite-horizon multiobjective optimal control problems for bounded processes. Discrete & Continuous Dynamical Systems - S, 2018, 11 (6) : 1121-1141. doi: 10.3934/dcdss.2018064

[13]

Wael Bahsoun, Paweł Góra. SRB measures for certain Markov processes. Discrete & Continuous Dynamical Systems, 2011, 30 (1) : 17-37. doi: 10.3934/dcds.2011.30.17

[14]

Artur Stephan, Holger Stephan. Memory equations as reduced Markov processes. Discrete & Continuous Dynamical Systems, 2019, 39 (4) : 2133-2155. doi: 10.3934/dcds.2019089

[15]

James Broda, Alexander Grigo, Nikola P. Petrov. Convergence rates for semistochastic processes. Discrete & Continuous Dynamical Systems - B, 2019, 24 (1) : 109-125. doi: 10.3934/dcdsb.2019001

[16]

Z. Foroozandeh, Maria do rosário de Pinho, M. Shamsi. On numerical methods for singular optimal control problems: An application to an AUV problem. Discrete & Continuous Dynamical Systems - B, 2019, 24 (5) : 2219-2235. doi: 10.3934/dcdsb.2019092

[17]

Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009

[18]

Xiaowei Pang, Haiming Song, Xiaoshen Wang, Jiachuan Zhang. Efficient numerical methods for elliptic optimal control problems with random coefficient. Electronic Research Archive, 2020, 28 (2) : 1001-1022. doi: 10.3934/era.2020053

[19]

Chunjuan Hou, Yanping Chen, Zuliang Lu. Superconvergence property of finite element methods for parabolic optimal control problems. Journal of Industrial & Management Optimization, 2011, 7 (4) : 927-945. doi: 10.3934/jimo.2011.7.927

[20]

H.Thomas Banks, Shuhua Hu. Nonlinear stochastic Markov processes and modeling uncertainty in populations. Mathematical Biosciences & Engineering, 2012, 9 (1) : 1-25. doi: 10.3934/mbe.2012.9.1

 Impact Factor: 

Metrics

  • PDF downloads (84)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]