# American Institute of Mathematical Sciences

July  2016, 3(3): 261-278. doi: 10.3934/jdg.2016014

## A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces

 1 Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Col. Centro, 83000, Hermosillo, Sonora, Mexico, Mexico

Received  December 2015 Revised  July 2016 Published  August 2016

The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.
Citation: Óscar Vega-Amaya, Joaquín López-Borbón. A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces. Journal of Dynamics & Games, 2016, 3 (3) : 261-278. doi: 10.3934/jdg.2016014
##### References:
 [1] A. Almudevar, Approximate fixed point iteration with an application to infinite horizon Markov decision processes,, SIAM Journal on Control and Optimization, 47 (2008), 2303.  doi: 10.1137/S0363012904441520.  Google Scholar [2] E. F. Arruda, M. D. Fragoso and J. B. R. do Val, Approximate dynamic programming via direct search in the space of value function approximations,, European Journal of Operational Research, 211 (2011), 343.  doi: 10.1016/j.ejor.2010.11.019.  Google Scholar [3] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models,, Prentice-Hall, (1987).   Google Scholar [4] D. P. Bertsekas, Approximate policy iteration: A survey and some new methods,, Journal of Control Theory and Applications, 9 (2011), 310.  doi: 10.1007/s11768-011-1005-3.  Google Scholar [5] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming,, Athena Scientific, (1996).   Google Scholar [6] L. Beutel, H. Gonska and D. Kacsó, On variation-diminishing Shoenberg operators: new quantitative statements,, in Multivariate approximation and interpolations with applications (ed. M. Gasca), 20 (2002), 9.   Google Scholar [7] R. A. DeVore, The Approximation of Continuous Functions by Positive Linear Operators, Lectures Notes in Mathematics 293,, Springer-Verlag, (1972).   Google Scholar [8] F. Dufour and T. Prieto-Rumeau, Approximation of infinite horizon discounted cost Markov decision processes,, in Optimization, (2012), 59.  doi: 10.1007/978-0-8176-8337-5_4.  Google Scholar [9] D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal difference learning,, Journal of Optimization Theory and Applications, 105 (2000), 589.  doi: 10.1023/A:1004641123405.  Google Scholar [10] G. J. Gordon, Stable function approximation in dynamic programming,, Proceedings of the Twelfth International Conference on Machine Learning, (1995), 261.  doi: 10.1016/B978-1-55860-377-6.50040-2.  Google Scholar [11] O. Hernández-Lerma, Adaptive Markov Control Processes,, Springer-Verlag, (1989).  doi: 10.1007/978-1-4419-8714-3.  Google Scholar [12] O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria,, Springer-Verlag, (1996).  doi: 10.1007/978-1-4612-0729-0.  Google Scholar [13] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes,, Springer-Verlag, (1999).  doi: 10.1007/978-1-4612-0561-6.  Google Scholar [14] D. R. Jiang and W. B. Powel, An approximate dynamic programming algorithm for monotone value functions,, Operations Research, 63 (2015), 1489.  doi: 10.1287/opre.2015.1425.  Google Scholar [15] R. Munos, Performance bounds in $L_p$ -norm for approximate value iteration,, SIAM Journal on Control and Optimization, 46 (2007), 541.  doi: 10.1137/040614384.  Google Scholar [16] W. P. Powell, Approximate Dynamic Programming. Solving the Curse of Dimensionality,, John Wiley & Sons Inc., (2007).  doi: 10.1002/9780470182963.  Google Scholar [17] W. P. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration alogrithms for multidimensional continuous applications,, Journal of Control Theory and Applications, 9 (2011), 336.  doi: 10.1007/s11768-011-0313-y.  Google Scholar [18] W. P. Powell, Perspectives of approximate dynamic programming,, Annals of Operations Research, 241 (2016), 319.  doi: 10.1007/s10479-012-1077-6.  Google Scholar [19] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming,, Wiley, (1994).   Google Scholar [20] J. Rust, Numerical dynamic programming in economics,, in Handbook of Computational Economics, 1 (1996), 619.   Google Scholar [21] J. Stachurski, Continuous state dynamic programming via nonexpansive approximation,, Computational Economics, 31 (2008), 141.  doi: 10.1007/s10614-007-9111-5.  Google Scholar [22] J. Stachurski, Dynamic Economic: Theory and Computation,, MIT Press, (2009).   Google Scholar [23] S. Stidham Jr. and R. Weber, A survey of Markov decision models for control of networks of queues,, Queueing Systems, 13 (1993), 291.  doi: 10.1007/BF01158935.  Google Scholar [24] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,, MIT Press, (1998).   Google Scholar [25] B. Van Roy, Performance loss bounds for approximate value iteration with state space aggregation,, Mathematics of Operations Research, 31 (2006), 234.  doi: 10.1287/moor.1060.0188.  Google Scholar [26] O. Vega-Amaya and R. Montes-de-Oca, Application of average dynamic programming to inventory systems,, Mathematical Methods of Operations Research, 47 (1998), 451.  doi: 10.1007/BF01198405.  Google Scholar [27] D. J. White, Real applications of Markov decision processes,, Interfaces, 15 (1985), 73.  doi: 10.1287/inte.15.6.73.  Google Scholar [28] D. J. White, Further real applications of Markov decision processes,, Interfaces, 18 (1988), 55.  doi: 10.1287/inte.18.5.55.  Google Scholar [29] D. J. White, A survey of applications of Markov decision processes,, The Journal of the Operational Research Society, 44 (1993), 1073.   Google Scholar [30] S. Yakowitz, Dynamic programming applications in water resources,, Water Resource Research, 18 (1982), 673.  doi: 10.1029/WR018i004p00673.  Google Scholar [31] W. W.-G. Yeh, Reservoir management and operation models: A state-of-the-art review,, Water Resource Research, 21 (1985), 1797.  doi: 10.1029/WR021i012p01797.  Google Scholar

show all references

##### References:
 [1] A. Almudevar, Approximate fixed point iteration with an application to infinite horizon Markov decision processes,, SIAM Journal on Control and Optimization, 47 (2008), 2303.  doi: 10.1137/S0363012904441520.  Google Scholar [2] E. F. Arruda, M. D. Fragoso and J. B. R. do Val, Approximate dynamic programming via direct search in the space of value function approximations,, European Journal of Operational Research, 211 (2011), 343.  doi: 10.1016/j.ejor.2010.11.019.  Google Scholar [3] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models,, Prentice-Hall, (1987).   Google Scholar [4] D. P. Bertsekas, Approximate policy iteration: A survey and some new methods,, Journal of Control Theory and Applications, 9 (2011), 310.  doi: 10.1007/s11768-011-1005-3.  Google Scholar [5] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming,, Athena Scientific, (1996).   Google Scholar [6] L. Beutel, H. Gonska and D. Kacsó, On variation-diminishing Shoenberg operators: new quantitative statements,, in Multivariate approximation and interpolations with applications (ed. M. Gasca), 20 (2002), 9.   Google Scholar [7] R. A. DeVore, The Approximation of Continuous Functions by Positive Linear Operators, Lectures Notes in Mathematics 293,, Springer-Verlag, (1972).   Google Scholar [8] F. Dufour and T. Prieto-Rumeau, Approximation of infinite horizon discounted cost Markov decision processes,, in Optimization, (2012), 59.  doi: 10.1007/978-0-8176-8337-5_4.  Google Scholar [9] D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal difference learning,, Journal of Optimization Theory and Applications, 105 (2000), 589.  doi: 10.1023/A:1004641123405.  Google Scholar [10] G. J. Gordon, Stable function approximation in dynamic programming,, Proceedings of the Twelfth International Conference on Machine Learning, (1995), 261.  doi: 10.1016/B978-1-55860-377-6.50040-2.  Google Scholar [11] O. Hernández-Lerma, Adaptive Markov Control Processes,, Springer-Verlag, (1989).  doi: 10.1007/978-1-4419-8714-3.  Google Scholar [12] O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria,, Springer-Verlag, (1996).  doi: 10.1007/978-1-4612-0729-0.  Google Scholar [13] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes,, Springer-Verlag, (1999).  doi: 10.1007/978-1-4612-0561-6.  Google Scholar [14] D. R. Jiang and W. B. Powel, An approximate dynamic programming algorithm for monotone value functions,, Operations Research, 63 (2015), 1489.  doi: 10.1287/opre.2015.1425.  Google Scholar [15] R. Munos, Performance bounds in $L_p$ -norm for approximate value iteration,, SIAM Journal on Control and Optimization, 46 (2007), 541.  doi: 10.1137/040614384.  Google Scholar [16] W. P. Powell, Approximate Dynamic Programming. Solving the Curse of Dimensionality,, John Wiley & Sons Inc., (2007).  doi: 10.1002/9780470182963.  Google Scholar [17] W. P. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration alogrithms for multidimensional continuous applications,, Journal of Control Theory and Applications, 9 (2011), 336.  doi: 10.1007/s11768-011-0313-y.  Google Scholar [18] W. P. Powell, Perspectives of approximate dynamic programming,, Annals of Operations Research, 241 (2016), 319.  doi: 10.1007/s10479-012-1077-6.  Google Scholar [19] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming,, Wiley, (1994).   Google Scholar [20] J. Rust, Numerical dynamic programming in economics,, in Handbook of Computational Economics, 1 (1996), 619.   Google Scholar [21] J. Stachurski, Continuous state dynamic programming via nonexpansive approximation,, Computational Economics, 31 (2008), 141.  doi: 10.1007/s10614-007-9111-5.  Google Scholar [22] J. Stachurski, Dynamic Economic: Theory and Computation,, MIT Press, (2009).   Google Scholar [23] S. Stidham Jr. and R. Weber, A survey of Markov decision models for control of networks of queues,, Queueing Systems, 13 (1993), 291.  doi: 10.1007/BF01158935.  Google Scholar [24] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,, MIT Press, (1998).   Google Scholar [25] B. Van Roy, Performance loss bounds for approximate value iteration with state space aggregation,, Mathematics of Operations Research, 31 (2006), 234.  doi: 10.1287/moor.1060.0188.  Google Scholar [26] O. Vega-Amaya and R. Montes-de-Oca, Application of average dynamic programming to inventory systems,, Mathematical Methods of Operations Research, 47 (1998), 451.  doi: 10.1007/BF01198405.  Google Scholar [27] D. J. White, Real applications of Markov decision processes,, Interfaces, 15 (1985), 73.  doi: 10.1287/inte.15.6.73.  Google Scholar [28] D. J. White, Further real applications of Markov decision processes,, Interfaces, 18 (1988), 55.  doi: 10.1287/inte.18.5.55.  Google Scholar [29] D. J. White, A survey of applications of Markov decision processes,, The Journal of the Operational Research Society, 44 (1993), 1073.   Google Scholar [30] S. Yakowitz, Dynamic programming applications in water resources,, Water Resource Research, 18 (1982), 673.  doi: 10.1029/WR018i004p00673.  Google Scholar [31] W. W.-G. Yeh, Reservoir management and operation models: A state-of-the-art review,, Water Resource Research, 21 (1985), 1797.  doi: 10.1029/WR021i012p01797.  Google Scholar
 [1] Mathias Staudigl. A limit theorem for Markov decision processes. Journal of Dynamics & Games, 2014, 1 (4) : 639-659. doi: 10.3934/jdg.2014.1.639 [2] A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial & Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429 [3] Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics & Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009 [4] Heikki Haario, Leonid Kalachev, Marko Laine. Reduction and identification of dynamic models. Simple example: Generic receptor model. Discrete & Continuous Dynamical Systems - B, 2013, 18 (2) : 417-435. doi: 10.3934/dcdsb.2013.18.417 [5] Wael Bahsoun, Paweł Góra. SRB measures for certain Markov processes. Discrete & Continuous Dynamical Systems - A, 2011, 30 (1) : 17-37. doi: 10.3934/dcds.2011.30.17 [6] Artur Stephan, Holger Stephan. Memory equations as reduced Markov processes. Discrete & Continuous Dynamical Systems - A, 2019, 39 (4) : 2133-2155. doi: 10.3934/dcds.2019089 [7] H.Thomas Banks, Shuhua Hu. Nonlinear stochastic Markov processes and modeling uncertainty in populations. Mathematical Biosciences & Engineering, 2012, 9 (1) : 1-25. doi: 10.3934/mbe.2012.9.1 [8] Angelica Pachon, Federico Polito, Costantino Ricciuti. On discrete-time semi-Markov processes. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020170 [9] Thomas Kruse, Mikhail Urusov. Approximating exit times of continuous Markov processes. Discrete & Continuous Dynamical Systems - B, 2020, 25 (9) : 3631-3650. doi: 10.3934/dcdsb.2020076 [10] Xian Chen, Zhi-Ming Ma. A transformation of Markov jump processes and applications in genetic study. Discrete & Continuous Dynamical Systems - A, 2014, 34 (12) : 5061-5084. doi: 10.3934/dcds.2014.34.5061 [11] A. M. Vershik. Polymorphisms, Markov processes, quasi-similarity. Discrete & Continuous Dynamical Systems - A, 2005, 13 (5) : 1305-1324. doi: 10.3934/dcds.2005.13.1305 [12] Vladimir Gaitsgory, Alex Parkinson, Ilya Shvartsman. Linear programming based optimality conditions and approximate solution of a deterministic infinite horizon discounted optimal control problem in discrete time. Discrete & Continuous Dynamical Systems - B, 2019, 24 (4) : 1743-1767. doi: 10.3934/dcdsb.2018235 [13] Rich Stankewitz, Hiroki Sumi. Random backward iteration algorithm for Julia sets of rational semigroups. Discrete & Continuous Dynamical Systems - A, 2015, 35 (5) : 2165-2175. doi: 10.3934/dcds.2015.35.2165 [14] Scott Crass. New light on solving the sextic by iteration: An algorithm using reliable dynamics. Journal of Modern Dynamics, 2011, 5 (2) : 397-408. doi: 10.3934/jmd.2011.5.397 [15] Necdet Bildik, Sinan Deniz. New approximate solutions to the nonlinear Klein-Gordon equations using perturbation iteration techniques. Discrete & Continuous Dynamical Systems - S, 2020, 13 (3) : 503-518. doi: 10.3934/dcdss.2020028 [16] Gholam Hassan Shirdel, Somayeh Ramezani-Tarkhorani. A new method for ranking decision making units using common set of weights: A developed criterion. Journal of Industrial & Management Optimization, 2020, 16 (2) : 633-651. doi: 10.3934/jimo.2018171 [17] Evangelos Evangelou. Approximate Bayesian inference for geostatistical generalised linear models. Foundations of Data Science, 2019, 1 (1) : 39-60. doi: 10.3934/fods.2019002 [18] K. F. C. Yiu, L. L. Xie, K. L. Mak. Analysis of bullwhip effect in supply chains with heterogeneous decision models. Journal of Industrial & Management Optimization, 2009, 5 (1) : 81-94. doi: 10.3934/jimo.2009.5.81 [19] Samuel N. Cohen. Uncertainty and filtering of hidden Markov models in discrete time. Probability, Uncertainty and Quantitative Risk, 2020, 5 (0) : 4-. doi: 10.1186/s41546-020-00046-x [20] Olivier Bokanowski, Maurizio Falcone, Roberto Ferretti, Lars Grüne, Dante Kalise, Hasnaa Zidani. Value iteration convergence of $\epsilon$-monotone schemes for stationary Hamilton-Jacobi equations. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 4041-4070. doi: 10.3934/dcds.2015.35.4041

Impact Factor: