October  2014, 1(4): 639-659. doi: 10.3934/jdg.2014.1.639

A limit theorem for Markov decision processes

1. 

Center for Mathematical Economics, University Bielefeld, Postfach 100131, 33501 Bielefeld, Germany

Received  January 2014 Revised  June 2014 Published  November 2014

In this paper I prove a deterministic approximation theorem for a sequence of Markov decision processes with finitely many actions and general state spaces as they appear frequently in economics, game theory and operations research. Using viscosity solution methods no a-priori differentiabililty assumptions are imposed on the value function.
Citation: Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics & Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639
References:
[1]

R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information,, MIT Press, (1995).   Google Scholar

[2]

M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,, Birkhäuser - Systems & Control: Foundations & Applications, (1997).  doi: 10.1007/978-0-8176-4755-1.  Google Scholar

[3]

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case,, Academic Press, (1978).   Google Scholar

[4]

P. Billingsley, Convergence of Probability Measures,, Wiley Series in Probability and Statistics, (1999).  doi: 10.1002/9780470316962.  Google Scholar

[5]

I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory,, Applied Mathematics & Optimization, 11 (1984), 161.  doi: 10.1007/BF01442176.  Google Scholar

[6]

P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().   Google Scholar

[7]

N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control,, Stochastics, 20 (1987), 169.  doi: 10.1080/17442508708833443.  Google Scholar

[8]

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence,, Wiley, (1986).  doi: 10.1002/9780470316658.  Google Scholar

[9]

M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory,, Applied Mathematics & Optimization, 15 (1987), 1.  doi: 10.1007/BF01442644.  Google Scholar

[10]

W. H. Fleming, The convergence problem for differential games,, Journal of Mathematical Analysis and Applications, 3 (1961), 102.  doi: 10.1016/0022-247X(61)90009-9.  Google Scholar

[11]

N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization,, IEEE Transactions on Automatic Control, 57 (2012), 2266.  doi: 10.1109/TAC.2012.2186176.  Google Scholar

[12]

F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player,, hal-00910970, (2013).   Google Scholar

[13]

R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem,, SIAM Journal on Control and Optimization, 23 (1985), 242.  doi: 10.1137/0323018.  Google Scholar

[14]

X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs,, Journal of Applied Probability, 42 (2005), 303.  doi: 10.1239/jap/1118777172.  Google Scholar

[15]

O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria,, Springer-Verlag, (1996).  doi: 10.1007/978-1-4612-0729-0.  Google Scholar

[16]

J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem,, Econometrica, 79 (2011), 1277.  doi: 10.3982/ECTA9004.  Google Scholar

[17]

O. Kallenberg, Foundations of Modern Probability,, Springer, (2002).  doi: 10.1007/978-1-4757-4015-8.  Google Scholar

[18]

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,, Springer-Verlag, (2000).   Google Scholar

[19]

H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,, Birkhäuser - Systems & Control: Foundations & Applications, (1990).  doi: 10.1007/978-1-4612-4482-0.  Google Scholar

[20]

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,, Springer, (2001).  doi: 10.1007/978-1-4613-0007-6.  Google Scholar

[21]

A. Neyman, Stochastic games with short-stage duration,, Dynamic Games and Applications, 3 (2013), 236.  doi: 10.1007/s13235-013-0083-x.  Google Scholar

[22]

A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games,, Mathematics of Operations Research, 17 (1992), 519.  doi: 10.1287/moor.17.3.519.  Google Scholar

[23]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory,, Unpublished manuscript, (2014).   Google Scholar

[24]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model,, Unpublished manuscript, (2014).   Google Scholar

[25]

Y. Sannikov, Games with imperfectly observable actions in continuous time,, Econometrica, 75 (2007), 1285.  doi: 10.1111/j.1468-0262.2007.00795.x.  Google Scholar

[26]

S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies,, SIAM Journal on Control and Optimization, 48 (2009), 2461.  doi: 10.1137/090749098.  Google Scholar

[27]

M. Staudigl, Stochastic stability in asymmetric binary choice coordination games,, Games and Economic Behavior, 75 (2012), 372.  doi: 10.1016/j.geb.2011.11.003.  Google Scholar

[28]

M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time,, Bielefeld University, (2014).   Google Scholar

[29]

J. Warga, Optimal Control of Differential and Functional Equations,, Academic Press, (1972).   Google Scholar

show all references

References:
[1]

R. J. Aumann and M. Maschler, Repeated Games with Incomplete Information,, MIT Press, (1995).   Google Scholar

[2]

M. Bardi and I. Capuzzo-Dolcetta, Optimal control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,, Birkhäuser - Systems & Control: Foundations & Applications, (1997).  doi: 10.1007/978-0-8176-4755-1.  Google Scholar

[3]

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case,, Academic Press, (1978).   Google Scholar

[4]

P. Billingsley, Convergence of Probability Measures,, Wiley Series in Probability and Statistics, (1999).  doi: 10.1002/9780470316962.  Google Scholar

[5]

I. Capuzzo-Dolcetta and H. Ishii, Approximate solutions of the Bellman equation of deterministic control theory,, Applied Mathematics & Optimization, 11 (1984), 161.  doi: 10.1007/BF01442176.  Google Scholar

[6]

P. Cardaliaguet, C. Rainer, D. Rosenberg and N. Vieille, Markov games with frequent actions and incomplete information,, 2013. arXiv:1307.3365v1 [math.OC]., ().   Google Scholar

[7]

N. El Karoui, D. Nguyen and M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: Existence of an optimal control,, Stochastics, 20 (1987), 169.  doi: 10.1080/17442508708833443.  Google Scholar

[8]

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence,, Wiley, (1986).  doi: 10.1002/9780470316658.  Google Scholar

[9]

M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory,, Applied Mathematics & Optimization, 15 (1987), 1.  doi: 10.1007/BF01442644.  Google Scholar

[10]

W. H. Fleming, The convergence problem for differential games,, Journal of Mathematical Analysis and Applications, 3 (1961), 102.  doi: 10.1016/0022-247X(61)90009-9.  Google Scholar

[11]

N. Gast, B. Gaujal and J.-Y. Le Boudec, Mean field for markov decision processes: From discrete to continuous optimization,, IEEE Transactions on Automatic Control, 57 (2012), 2266.  doi: 10.1109/TAC.2012.2186176.  Google Scholar

[12]

F. Gensbittel, Continuous-time Limit of Dynamic Games with Incomplete Information and a More Informed Player,, hal-00910970, (2013).   Google Scholar

[13]

R. Gonzalez and E. Rofman, On deterministic control problems: An approximation procedure for the optimal cost I. the stationary problem,, SIAM Journal on Control and Optimization, 23 (1985), 242.  doi: 10.1137/0323018.  Google Scholar

[14]

X. Guo and O. Hernández-Lerma, Nonzero-sum games for continuous-time markov chains with unbounded discounted payoffs,, Journal of Applied Probability, 42 (2005), 303.  doi: 10.1239/jap/1118777172.  Google Scholar

[15]

O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality criteria,, Springer-Verlag, (1996).  doi: 10.1007/978-1-4612-0729-0.  Google Scholar

[16]

J. Hörner, T. Sugaya, S. Takahashi and N. Vieille, Recursive methods in discounted stochastic games: An algorithm for $\delta\rightarrow 1$ and a folk theorem,, Econometrica, 79 (2011), 1277.  doi: 10.3982/ECTA9004.  Google Scholar

[17]

O. Kallenberg, Foundations of Modern Probability,, Springer, (2002).  doi: 10.1007/978-1-4757-4015-8.  Google Scholar

[18]

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,, Springer-Verlag, (2000).   Google Scholar

[19]

H. J. Kushner, Weak convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,, Birkhäuser - Systems & Control: Foundations & Applications, (1990).  doi: 10.1007/978-1-4612-4482-0.  Google Scholar

[20]

H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time,, Springer, (2001).  doi: 10.1007/978-1-4613-0007-6.  Google Scholar

[21]

A. Neyman, Stochastic games with short-stage duration,, Dynamic Games and Applications, 3 (2013), 236.  doi: 10.1007/s13235-013-0083-x.  Google Scholar

[22]

A. S. Nowak and T. E. S. Raghavan, Existence of stationary correlated equilibria with symmetric information for discounted stochastic games,, Mathematics of Operations Research, 17 (1992), 519.  doi: 10.1287/moor.17.3.519.  Google Scholar

[23]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, I: Theory,, Unpublished manuscript, (2014).   Google Scholar

[24]

W. H. Sandholm and M. Staudigl, Stochastic stability in the small noise double limit, II: The logit model,, Unpublished manuscript, (2014).   Google Scholar

[25]

Y. Sannikov, Games with imperfectly observable actions in continuous time,, Econometrica, 75 (2007), 1285.  doi: 10.1111/j.1468-0262.2007.00795.x.  Google Scholar

[26]

S. Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies,, SIAM Journal on Control and Optimization, 48 (2009), 2461.  doi: 10.1137/090749098.  Google Scholar

[27]

M. Staudigl, Stochastic stability in asymmetric binary choice coordination games,, Games and Economic Behavior, 75 (2012), 372.  doi: 10.1016/j.geb.2011.11.003.  Google Scholar

[28]

M. Staudigl and J.-H. Steg, On repeated games with imperfect public monitoring: From discrete to continuous time,, Bielefeld University, (2014).   Google Scholar

[29]

J. Warga, Optimal Control of Differential and Functional Equations,, Academic Press, (1972).   Google Scholar

[1]

Zuliang Lu, Fei Huang, Xiankui Wu, Lin Li, Shang Liu. Convergence and quasi-optimality of $ L^2- $norms based an adaptive finite element method for nonlinear optimal control problems. Electronic Research Archive, 2020, 28 (4) : 1459-1486. doi: 10.3934/era.2020077

[2]

Hong Niu, Zhijiang Feng, Qijin Xiao, Yajun Zhang. A PID control method based on optimal control strategy. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 117-126. doi: 10.3934/naco.2020019

[3]

Lars Grüne, Matthias A. Müller, Christopher M. Kellett, Steven R. Weller. Strict dissipativity for discrete time discounted optimal control problems. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020046

[4]

Cheng He, Changzheng Qu. Global weak solutions for the two-component Novikov equation. Electronic Research Archive, 2020, 28 (4) : 1545-1562. doi: 10.3934/era.2020081

[5]

Youming Guo, Tingting Li. Optimal control strategies for an online game addiction model with low and high risk exposure. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020347

[6]

Bo Chen, Youde Wang. Global weak solutions for Landau-Lifshitz flows and heat flows associated to micromagnetic energy functional. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020268

[7]

Bernard Bonnard, Jérémy Rouot. Geometric optimal techniques to control the muscular force response to functional electrical stimulation using a non-isometric force-fatigue model. Journal of Geometric Mechanics, 2020  doi: 10.3934/jgm.2020032

[8]

Thierry Horsin, Mohamed Ali Jendoubi. On the convergence to equilibria of a sequence defined by an implicit scheme. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020465

[9]

Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115

[10]

Zonghong Cao, Jie Min. Selection and impact of decision mode of encroachment and retail service in a dual-channel supply chain. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020167

[11]

Parikshit Upadhyaya, Elias Jarlebring, Emanuel H. Rubensson. A density matrix approach to the convergence of the self-consistent field iteration. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 99-115. doi: 10.3934/naco.2020018

[12]

José Madrid, João P. G. Ramos. On optimal autocorrelation inequalities on the real line. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020271

[13]

Xuefei He, Kun Wang, Liwei Xu. Efficient finite difference methods for the nonlinear Helmholtz equation in Kerr medium. Electronic Research Archive, 2020, 28 (4) : 1503-1528. doi: 10.3934/era.2020079

[14]

Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026

[15]

Hui Lv, Xing'an Wang. Dissipative control for uncertain singular markovian jump systems via hybrid impulsive control. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 127-142. doi: 10.3934/naco.2020020

[16]

Antoine Benoit. Weak well-posedness of hyperbolic boundary value problems in a strip: when instabilities do not reflect the geometry. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5475-5486. doi: 10.3934/cpaa.2020248

[17]

Shenglan Xie, Maoan Han, Peng Zhu. A posteriori error estimate of weak Galerkin fem for second order elliptic problem with mixed boundary condition. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020340

[18]

Wenbin Li, Jianliang Qian. Simultaneously recovering both domain and varying density in inverse gravimetry by efficient level-set methods. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020073

[19]

Tommi Brander, Joonas Ilmavirta, Petteri Piiroinen, Teemu Tyni. Optimal recovery of a radiating source with multiple frequencies along one line. Inverse Problems & Imaging, 2020, 14 (6) : 967-983. doi: 10.3934/ipi.2020044

[20]

Awais Younus, Zoubia Dastgeer, Nudrat Ishaq, Abdul Ghaffar, Kottakkaran Sooppy Nisar, Devendra Kumar. On the observability of conformable linear time-invariant control systems. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020444

 Impact Factor: 

Metrics

  • PDF downloads (43)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]