American Institute of Mathematical Sciences

October  2014, 1(4): 639-659. doi: 10.3934/jdg.2014.1.639

A limit theorem for Markov decision processes

 1 Center for Mathematical Economics, University Bielefeld, Postfach 100131, 33501 Bielefeld, Germany

Received  January 2014 Revised  June 2014 Published  November 2014

In this paper I prove a deterministic approximation theorem for a sequence of Markov decision processes with finitely many actions and general state spaces as they appear frequently in economics, game theory and operations research. Using viscosity solution methods no a-priori differentiabililty assumptions are imposed on the value function.
Citation: Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics & Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639
References:
 [1] R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information,, MIT Press, (1995).   Google Scholar [2] M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,, Birkhäuser - Systems & Control: Foundations & Applications, (1997).  doi: 10.1007/978-0-8176-4755-1.  Google Scholar [3] D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case,, Academic Press, (1978).   Google Scholar [4] P. Billingsley, Convergence of Probability Measures,, Wiley Series in Probability and Statistics, (1999).  doi: 10.1002/9780470316962.  Google Scholar [5] I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory,, Applied Mathematics & Optimization, 11 (1984), 161.  doi: 10.1007/BF01442176.  Google Scholar [6] P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().   Google Scholar [7] N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control,, Stochastics, 20 (1987), 169.  doi: 10.1080/17442508708833443.  Google Scholar [8] S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence,, Wiley, (1986).  doi: 10.1002/9780470316658.  Google Scholar [9] M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory,, Applied Mathematics & Optimization, 15 (1987), 1.  doi: 10.1007/BF01442644.  Google Scholar [10] W. H. Fleming, The convergence problem for differential games,, Journal of Mathematical Analysis and Applications, 3 (1961), 102.  doi: 10.1016/0022-247X(61)90009-9.  Google Scholar [11] N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization,, IEEE Transactions on Automatic Control, 57 (2012), 2266.  doi: 10.1109/TAC.2012.2186176.  Google Scholar [12] F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player,, hal-00910970, (2013).   Google Scholar [13] R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem,, SIAM Journal on Control and Optimization, 23 (1985), 242.  doi: 10.1137/0323018.  Google Scholar [14] X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs,, Journal of Applied Probability, 42 (2005), 303.  doi: 10.1239/jap/1118777172.  Google Scholar [15] O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria,, Springer-Verlag, (1996).  doi: 10.1007/978-1-4612-0729-0.  Google Scholar [16] J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem,, Econometrica, 79 (2011), 1277.  doi: 10.3982/ECTA9004.  Google Scholar [17] O. Kallenberg, Foundations of Modern Probability,, Springer, (2002).  doi: 10.1007/978-1-4757-4015-8.  Google Scholar [18] I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,, Springer-Verlag, (2000).   Google Scholar [19] H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,, Birkhäuser - Systems & Control: Foundations & Applications, (1990).  doi: 10.1007/978-1-4612-4482-0.  Google Scholar [20] H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,, Springer, (2001).  doi: 10.1007/978-1-4613-0007-6.  Google Scholar [21] A. Neyman, Stochastic games with short-stage duration,, Dynamic Games and Applications, 3 (2013), 236.  doi: 10.1007/s13235-013-0083-x.  Google Scholar [22] A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games,, Mathematics of Operations Research, 17 (1992), 519.  doi: 10.1287/moor.17.3.519.  Google Scholar [23] W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory,, Unpublished manuscript, (2014).   Google Scholar [24] W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model,, Unpublished manuscript, (2014).   Google Scholar [25] Y. Sannikov, Games with imperfectly observable actions in continuous time,, Econometrica, 75 (2007), 1285.  doi: 10.1111/j.1468-0262.2007.00795.x.  Google Scholar [26] S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies,, SIAM Journal on Control and Optimization, 48 (2009), 2461.  doi: 10.1137/090749098.  Google Scholar [27] M. Staudigl, Stochastic stability in asymmetric binary choice coordination games,, Games and Economic Behavior, 75 (2012), 372.  doi: 10.1016/j.geb.2011.11.003.  Google Scholar [28] M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time,, Bielefeld University, (2014).   Google Scholar [29] J. Warga, Optimal Control of Differential and Functional Equations,, Academic Press, (1972).   Google Scholar

show all references

References:
 [1] R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information,, MIT Press, (1995).   Google Scholar [2] M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,, Birkhäuser - Systems & Control: Foundations & Applications, (1997).  doi: 10.1007/978-0-8176-4755-1.  Google Scholar [3] D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case,, Academic Press, (1978).   Google Scholar [4] P. Billingsley, Convergence of Probability Measures,, Wiley Series in Probability and Statistics, (1999).  doi: 10.1002/9780470316962.  Google Scholar [5] I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory,, Applied Mathematics & Optimization, 11 (1984), 161.  doi: 10.1007/BF01442176.  Google Scholar [6] P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().   Google Scholar [7] N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control,, Stochastics, 20 (1987), 169.  doi: 10.1080/17442508708833443.  Google Scholar [8] S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence,, Wiley, (1986).  doi: 10.1002/9780470316658.  Google Scholar [9] M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory,, Applied Mathematics & Optimization, 15 (1987), 1.  doi: 10.1007/BF01442644.  Google Scholar [10] W. H. Fleming, The convergence problem for differential games,, Journal of Mathematical Analysis and Applications, 3 (1961), 102.  doi: 10.1016/0022-247X(61)90009-9.  Google Scholar [11] N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization,, IEEE Transactions on Automatic Control, 57 (2012), 2266.  doi: 10.1109/TAC.2012.2186176.  Google Scholar [12] F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player,, hal-00910970, (2013).   Google Scholar [13] R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem,, SIAM Journal on Control and Optimization, 23 (1985), 242.  doi: 10.1137/0323018.  Google Scholar [14] X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs,, Journal of Applied Probability, 42 (2005), 303.  doi: 10.1239/jap/1118777172.  Google Scholar [15] O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria,, Springer-Verlag, (1996).  doi: 10.1007/978-1-4612-0729-0.  Google Scholar [16] J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem,, Econometrica, 79 (2011), 1277.  doi: 10.3982/ECTA9004.  Google Scholar [17] O. Kallenberg, Foundations of Modern Probability,, Springer, (2002).  doi: 10.1007/978-1-4757-4015-8.  Google Scholar [18] I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,, Springer-Verlag, (2000).   Google Scholar [19] H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,, Birkhäuser - Systems & Control: Foundations & Applications, (1990).  doi: 10.1007/978-1-4612-4482-0.  Google Scholar [20] H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,, Springer, (2001).  doi: 10.1007/978-1-4613-0007-6.  Google Scholar [21] A. Neyman, Stochastic games with short-stage duration,, Dynamic Games and Applications, 3 (2013), 236.  doi: 10.1007/s13235-013-0083-x.  Google Scholar [22] A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games,, Mathematics of Operations Research, 17 (1992), 519.  doi: 10.1287/moor.17.3.519.  Google Scholar [23] W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory,, Unpublished manuscript, (2014).   Google Scholar [24] W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model,, Unpublished manuscript, (2014).   Google Scholar [25] Y. Sannikov, Games with imperfectly observable actions in continuous time,, Econometrica, 75 (2007), 1285.  doi: 10.1111/j.1468-0262.2007.00795.x.  Google Scholar [26] S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies,, SIAM Journal on Control and Optimization, 48 (2009), 2461.  doi: 10.1137/090749098.  Google Scholar [27] M. Staudigl, Stochastic stability in asymmetric binary choice coordination games,, Games and Economic Behavior, 75 (2012), 372.  doi: 10.1016/j.geb.2011.11.003.  Google Scholar [28] M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time,, Bielefeld University, (2014).   Google Scholar [29] J. Warga, Optimal Control of Differential and Functional Equations,, Academic Press, (1972).   Google Scholar
 [1] A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial & Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429 [2] Vincent Renault, Michèle Thieullen, Emmanuel Trélat. Optimal control of infinite-dimensional piecewise deterministic Markov processes and application to the control of neuronal dynamics via Optogenetics. Networks & Heterogeneous Media, 2017, 12 (3) : 417-459. doi: 10.3934/nhm.2017019 [3] Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics & Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009 [4] Alexander Mielke. Weak-convergence methods for Hamiltonian multiscale problems. Discrete & Continuous Dynamical Systems - A, 2008, 20 (1) : 53-79. doi: 10.3934/dcds.2008.20.53 [5] N. U. Ahmed. Weak solutions of stochastic reaction diffusion equations and their optimal control. Discrete & Continuous Dynamical Systems - S, 2018, 11 (6) : 1011-1029. doi: 10.3934/dcdss.2018059 [6] Roberto C. Cabrales, Gema Camacho, Enrique Fernández-Cara. Analysis and optimal control of some solidification processes. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 3985-4017. doi: 10.3934/dcds.2014.34.3985 [7] Zhen Lei, Yi Zhou. BKM's criterion and global weak solutions for magnetohydrodynamics with zero viscosity. Discrete & Continuous Dynamical Systems - A, 2009, 25 (2) : 575-583. doi: 10.3934/dcds.2009.25.575 [8] Karl Kunisch, Markus Müller. Uniform convergence of the POD method and applications to optimal control. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 4477-4501. doi: 10.3934/dcds.2015.35.4477 [9] Naïla Hayek. Infinite-horizon multiobjective optimal control problems for bounded processes. Discrete & Continuous Dynamical Systems - S, 2018, 11 (6) : 1121-1141. doi: 10.3934/dcdss.2018064 [10] Wael Bahsoun, Paweł Góra. SRB measures for certain Markov processes. Discrete & Continuous Dynamical Systems - A, 2011, 30 (1) : 17-37. doi: 10.3934/dcds.2011.30.17 [11] Artur Stephan, Holger Stephan. Memory equations as reduced Markov processes. Discrete & Continuous Dynamical Systems - A, 2019, 39 (4) : 2133-2155. doi: 10.3934/dcds.2019089 [12] James Broda, Alexander Grigo, Nikola P. Petrov. Convergence rates for semistochastic processes. Discrete & Continuous Dynamical Systems - B, 2019, 24 (1) : 109-125. doi: 10.3934/dcdsb.2019001 [13] Chunjuan Hou, Yanping Chen, Zuliang Lu. Superconvergence property of finite element methods for parabolic optimal control problems. Journal of Industrial & Management Optimization, 2011, 7 (4) : 927-945. doi: 10.3934/jimo.2011.7.927 [14] Z. Foroozandeh, Maria do rosário de Pinho, M. Shamsi. On numerical methods for singular optimal control problems: An application to an AUV problem. Discrete & Continuous Dynamical Systems - B, 2019, 24 (5) : 2219-2235. doi: 10.3934/dcdsb.2019092 [15] Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009 [16] H.Thomas Banks, Shuhua Hu. Nonlinear stochastic Markov processes and modeling uncertainty in populations. Mathematical Biosciences & Engineering, 2012, 9 (1) : 1-25. doi: 10.3934/mbe.2012.9.1 [17] Xian Chen, Zhi-Ming Ma. A transformation of Markov jump processes and applications in genetic study. Discrete & Continuous Dynamical Systems - A, 2014, 34 (12) : 5061-5084. doi: 10.3934/dcds.2014.34.5061 [18] A. M. Vershik. Polymorphisms, Markov processes, quasi-similarity. Discrete & Continuous Dynamical Systems - A, 2005, 13 (5) : 1305-1324. doi: 10.3934/dcds.2005.13.1305 [19] Pablo Ochoa, Julio Alejo Ruiz. A study of comparison, existence and regularity of viscosity and weak solutions for quasilinear equations in the Heisenberg group. Communications on Pure & Applied Analysis, 2019, 18 (3) : 1091-1115. doi: 10.3934/cpaa.2019053 [20] Christian Heinemann, Christiane Kraus. Existence of weak solutions for a PDE system describing phase separation and damage processes including inertial effects. Discrete & Continuous Dynamical Systems - A, 2015, 35 (6) : 2565-2590. doi: 10.3934/dcds.2015.35.2565

Impact Factor: