Citation: |
[1] |
A. Almudevar, Approximate fixed point iteration with an application to infinite horizon Markov decision processes, SIAM Journal on Control and Optimization, 47 (2008), 2303-2347.doi: 10.1137/S0363012904441520. |
[2] |
E. F. Arruda, M. D. Fragoso and J. B. R. do Val, Approximate dynamic programming via direct search in the space of value function approximations, European Journal of Operational Research, 211 (2011), 343-351.doi: 10.1016/j.ejor.2010.11.019. |
[3] |
D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987. |
[4] |
D. P. Bertsekas, Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, 9 (2011), 310-335.doi: 10.1007/s11768-011-1005-3. |
[5] |
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996. |
[6] |
L. Beutel, H. Gonska and D. Kacsó, On variation-diminishing Shoenberg operators: new quantitative statements, in Multivariate approximation and interpolations with applications (ed. M. Gasca), Monografías de la Academia de Ciencias de Zaragoza, 20 (2002), 9-58. |
[7] |
R. A. DeVore, The Approximation of Continuous Functions by Positive Linear Operators, Lectures Notes in Mathematics 293, Springer-Verlag, Berlin Heidelberg, 1972. |
[8] |
F. Dufour and T. Prieto-Rumeau, Approximation of infinite horizon discounted cost Markov decision processes, in Optimization, Control, and Applications of Stochastic Systems, In Honor of Onésimo Hernández-Lerma (eds. D. Hernández-Hernández and J. A. Minjárez-Sosa), Birkhäuser, (2012), 59-76.doi: 10.1007/978-0-8176-8337-5_4. |
[9] |
D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal difference learning, Journal of Optimization Theory and Applications, 105 (2000), 589-608.doi: 10.1023/A:1004641123405. |
[10] |
G. J. Gordon, Stable function approximation in dynamic programming, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, 1995, 261-268.doi: 10.1016/B978-1-55860-377-6.50040-2. |
[11] |
O. Hernández-Lerma, Adaptive Markov Control Processes, Springer-Verlag, NY, 1989.doi: 10.1007/978-1-4419-8714-3. |
[12] |
O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria, Springer-Verlag, NY, 1996.doi: 10.1007/978-1-4612-0729-0. |
[13] |
O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, NY, 1999.doi: 10.1007/978-1-4612-0561-6. |
[14] |
D. R. Jiang and W. B. Powel, An approximate dynamic programming algorithm for monotone value functions, Operations Research, 63 (2015), 1489-1511.doi: 10.1287/opre.2015.1425. |
[15] |
R. Munos, Performance bounds in $L_p$ -norm for approximate value iteration, SIAM Journal on Control and Optimization, 46 (2007), 541-561.doi: 10.1137/040614384. |
[16] |
W. P. Powell, Approximate Dynamic Programming. Solving the Curse of Dimensionality, John Wiley & Sons Inc., 2007.doi: 10.1002/9780470182963. |
[17] |
W. P. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration alogrithms for multidimensional continuous applications, Journal of Control Theory and Applications, 9 (2011), 336-352.doi: 10.1007/s11768-011-0313-y. |
[18] |
W. P. Powell, Perspectives of approximate dynamic programming, Annals of Operations Research, 241 (2016), 319-356.doi: 10.1007/s10479-012-1077-6. |
[19] |
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, NY, 1994. |
[20] |
J. Rust, Numerical dynamic programming in economics, in Handbook of Computational Economics, (eds. H. Amman, D. Kendrick and J. Rust), Elsevier, 1 (1996), 619-729. |
[21] |
J. Stachurski, Continuous state dynamic programming via nonexpansive approximation, Computational Economics, 31 (2008), 141-160.doi: 10.1007/s10614-007-9111-5. |
[22] |
J. Stachurski, Dynamic Economic: Theory and Computation, MIT Press, Cambridge, MA, 2009. |
[23] |
S. Stidham Jr. and R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems, 13 (1993), 291-314.doi: 10.1007/BF01158935. |
[24] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. |
[25] |
B. Van Roy, Performance loss bounds for approximate value iteration with state space aggregation, Mathematics of Operations Research, 31 (2006), 234-244.doi: 10.1287/moor.1060.0188. |
[26] |
O. Vega-Amaya and R. Montes-de-Oca, Application of average dynamic programming to inventory systems, Mathematical Methods of Operations Research, 47 (1998), 451-471.doi: 10.1007/BF01198405. |
[27] |
D. J. White, Real applications of Markov decision processes, Interfaces, 15 (1985), 73-83.doi: 10.1287/inte.15.6.73. |
[28] |
D. J. White, Further real applications of Markov decision processes, Interfaces, 18 (1988), 55-61.doi: 10.1287/inte.18.5.55. |
[29] |
D. J. White, A survey of applications of Markov decision processes, The Journal of the Operational Research Society, 44 (1993), 1073-1096. |
[30] |
S. Yakowitz, Dynamic programming applications in water resources, Water Resource Research, 18 (1982), 673-696.doi: 10.1029/WR018i004p00673. |
[31] |
W. W.-G. Yeh, Reservoir management and operation models: A state-of-the-art review, Water Resource Research, 21 (1985), 1797-1818.doi: 10.1029/WR021i012p01797. |