\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Neural network approaches for parameterized optimal control

  • *Corresponding author: Deepanshu Verma

    *Corresponding author: Deepanshu Verma 
Abstract / Introduction Full Text(HTML) Figure(5) Related Papers Cited by
  • We consider numerical approaches for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters. We seek to amortize the solution over a set of relevant parameters in an offline stage to enable rapid decision-making and be able to react to changes in the parameter in the online stage. To tackle the curse of dimensionality arising when the state and/or parameter are high-dimensional, we represent the policy using neural networks. We compare two training paradigms: First, our model-based approach leverages the dynamics and definition of the objective function to learn the value function of the parameterized optimal control problem and obtain the policy using a feedback form. Second, we use actor-critic reinforcement learning to approximate the policy in a data-driven way. Using an example involving a two-dimensional convection-diffusion equation, which features high-dimensional state and parameter spaces, we investigate the accuracy and efficiency of both training paradigms. While both paradigms lead to a reasonable approximation of the policy, the model-based approach is more accurate and considerably reduces the number of PDE solves.

    Mathematics Subject Classification: Primary: 35F21, 49M41, 68T07.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Network architectures for the actor (left) and critic (right) components of the RL models. Both networks receive input arrays with input containing the values of $ \boldsymbol{z}(s) $, $ \boldsymbol{y} $, and $ s $. Convolutional and max-pooling layers (blue) process the data received from the PDE environment to extract features. These features are then flattened and passed to dense layers (grey) to form the position, variance, and value predictions

    Figure 2.  Example evolution of advection-diffusion system

    Figure 3.  Horizontal problem setup: (left) Validation loss during training and (right) number of PDE solves required for different target accuracies of control objective

    Figure 4.  Sinusoidal problem setup: (left) Validation loss during training and (right) number of PDE solves required for different target accuracies of control objective

    Figure 5.  Suboptimality, relative to the baseline, on validation problems for the horizontal (left column) and sinusoidal (right column) problem setups

  • [1] G. AlbiS. Bicego and D. Kalise, Gradient-augmented supervised learning of optimal feedback laws using state-dependent Riccati equations, IEEE Control Systems Letters, 6 (2022), 836-841.  doi: 10.1109/LCSYS.2021.3086697.
    [2] S. Amin, M. Gomrokchi, H. Satija, H. van Hoof and D. Precup, A survey of exploration methods in reinforcement learning, (2021), arXiv preprint, arXiv: 2109.00157.
    [3] H. Antil, Mathematical opportunities in digital twins (math-dt), (2024), arXiv preprint, arXiv: 2402.10326.
    [4] O. Aydogmus and M. Yilmaz, Comparative analysis of reinforcement learning algorithms for bipedal robot locomotion, IEEE Access, (2023). doi: 10.1109/ACCESS.2023.3344393.
    [5] F. BallarinE. FaggianoA. ManzoniA. M. QuarteroniG. RozzaS. IppolitoC. Antona and R. Scrofani, Numerical modeling of hemodynamics scenarios of patient-specific coronary artery bypass grafts, Biomechanics and Modeling in Mechanobiology, 16 (2017), 1373-1399.  doi: 10.1007/s10237-017-0893-7.
    [6] S. Bansal and C. J. Tomlin, DeepReach: A deep learning approach to high-dimensional reachability, IEEE International Conference on Robotics and Automation (ICRA), (2021), 1817-1824. doi: 10.1109/ICRA48506.2021.9561949.
    [7] C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. W. Pachocki, M. Petrov, H. P. de Oliveira Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski and S. Zhang, Dota 2 with large scale deep reinforcement learning, (2019), arXiv: abs/1912.06680.
    [8] D. P. Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific Optimization and Computation Series. Athena Scientific, Belmont, MA, 2019.
    [9] R. C. CarlsonI. PapamichailM. Papageorgiou and A. Messmer, Optimal motorway traffic flow control involving variable speed limits and ramp metering, Transportation Science, 44 (2010), 238-253.  doi: 10.1287/trsc.1090.0314.
    [10] S. ChakrabortyS. Adhikari and R. Ganguli, The role of surrogate models in the development of digital twins of dynamic systems, Applied Mathematical Modelling, 90 (2021), 662-681.  doi: 10.1016/j.apm.2020.09.037.
    [11] M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti and Ed H. Chi, Top-k off-policy correction for a reinforce recommender system, Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, (2018). doi: 10.1145/3289600.3290999.
    [12] N. Demo, M. Strazzullo and G. Rozza, An extended physics-informed neural network for preliminary analysis of parametric optimal control problems, Comput. Math. Appl., 143 (2023), 383-396, arXiv: 2110.13530. doi: 10.1016/j.camwa.2023.05.004.
    [13] S. DolgovD. Kalise and L. Saluzzi, Data-driven tensor train gradient cross approximation for Hamilton–Jacobi–Bellman equations, SIAM Journal on Scientific Computing, 45 (2023), A2153-A2184.  doi: 10.1137/22M1498401.
    [14] L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph and A. Madry, Implementation matters in deep rl: A case study on ppo and trpo, International conference on learning representations, (2019).
    [15] W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions, Second edition. Stochastic Modelling and Applied Probability, 25. Springer, New York, 2006.
    [16] S. FujimotoH. van Hoof and D. Meger, Addressing function approximation error in actor-critic methods, Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, 80 (2018), 1587-1596. 
    [17] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016). doi: 10.1109/CVPR.2016.90.
    [18] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup and D. Meger, Deep reinforcement learning that matters, Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018). doi: 10.1609/aaai.v32i1.11694.
    [19] S. Huang, R. F. J. Dossa, A. Raffin, A. Kanervisto and W. Wang, The 37 implementation details of proximal policy optimization, The ICLR Blog Track 2023, (2022).
    [20] R. HwangJ. Y. LeeJ. Y. Shin and H. J. Hwang, Solving PDE-constrained control problems using operator learning, Proceedings of the AAAI Conference on Artificial Intelligence, 36 (2022), 4504-4512.  doi: 10.1609/aaai.v36i4.20373.
    [21] J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun and M. Hutter, Learning agile and dynamic motor skills for legged robots, Science Robotics, 4 (2019). doi: 10.1126/scirobotics.aau5872.
    [22] A. Irpan, Deep reinforcement learning doesn't work yet, (2018).
    [23] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, (2014), arXiv: 1412.6980.
    [24] K. Kunisch and D. Walter, Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation, ESAIM: Control, Optimisation and Calculus of Variations, 27 (2021), Paper No. 16, 59 pp. doi: 10.1051/cocv/2021009.
    [25] K. Kunisch and D. Walter, Optimal feedback control of dynamical systems via value-function approximation, Comptes Rendus. Mécanique, 351 (2023), 535-571.  doi: 10.5802/crmeca.199.
    [26] K. Kunisch and D. Walter., Optimal feedback control of dynamical systems via value-function approximation, arXiv: 2302.13122, (2023). doi: 10.5802/crmeca.199.
    [27] T. N. LarsenH. Ø. TeigenT. LaacheD. Varagnolo and A. Rasheed, Comparing deep reinforcement learning algorithms' ability to safely navigate challenging waters, Frontiers in Robotics and AI, 8 (2021), 738113.  doi: 10.3389/frobt.2021.738113.
    [28] X. Li, D. Verma and L. Ruthotto, A neural network approach for stochastic optimal control, arXiv: 2209.13104, (2022).
    [29] V. G. LopezF. L. LewisY. WanE. N. Sanchez and L. Fan, Solutions for multiagent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors, IEEE Transactions on Automatic Control (TAC), 65 (2019), 1911-1923.  doi: 10.1109/TAC.2019.2926554.
    [30] L. LuR. PestourieW. YaoZ. WangF. Verdugo and S. G. Johnson, Physics-informed neural networks with hard constraints for inverse design, SIAM Journal on Scientific Computing, 43 (2021), B1105-B1132.  doi: 10.1137/21M1397908.
    [31] D. Luo, T. O'Leary-Roseberry, P. Chen and O. Ghattas, Efficient pde-constrained optimization under high-dimensional uncertainty using derivative-informed neural operators, (2023). doi: 10.21105/joss.06101.
    [32] K. O. LyeS. MishraD. Ray and P. Chandrashekar, Iterative surrogate model optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks, Computer Methods in Applied Mechanics and Engineering, 374 (2021), 113575.  doi: 10.1016/j.cma.2020.113575.
    [33] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. A. Riedmiller, Playing atari with deep reinforcement learning, arXiv: 1312.5602, (2013).
    [34] S. Mowlavi and S. Nabi, Optimal control of PDEs using physics-informed neural networks, Journal of Computational Physics, (2022), 111731, 22 pp. doi: 10.1016/j.jcp.2022.111731.
    [35] T. Nakamura-ZimmererQ. Gong and W. Kang, Adaptive deep learning for high dimensional Hamilton-Jacobi-Bellman equations, SIAM Journal on Scientific Computing, 43 (2021), A1221-A1247.  doi: 10.1137/19M1288802.
    [36] E. Nikishin, P. Izmailov, B. Athiwaratkun, D. Podoprikhin, T. Garipov, P. Shvechikov, D. Vetrov and A. G. Wilson, Improving stability in deep reinforcement learning with weight averaging, Uncertainty in Artificial Intelligence Workshop on Uncertainty in Deep Learning, (2018).
    [37] D. Onken, L. Nurbekyan, X. Li, S. W. Fung, S. Osher and L. Ruthotto, A neural network approach for high-dimensional optimal control, arXiv: 2104.03270, (2021). doi: 10.23919/ECC54610.2021.9655103.
    [38] G. Rozza, A. Manzoni and F. Negri, Reduction strategies for PDE-constrained optimization problems in haemodynamics, Proceedings of the 6th European Congress on Computational Methods in Applied Sciences and Engineering, (2012), 1748-1769.
    [39] J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, arXiv: 1707.06347, (2017).
    [40] H. Shengren, E. M. Salazar, P. P. Vergara and P. Palensky, Performance comparison of deep RL algorithms for energy systems optimal scheduling, IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), (2022), 1-6. doi: 10.1109/ISGT-Europe54678.2022.9960642.
    [41] D. SilverA. HuangC. J. MaddisonA. GuezL. SifreG. van den DriesscheJ. SchrittwieserI. AntonoglouV. PanneershelvamM. LanctotS. DielemanD. GreweJ. NhamN. KalchbrennerI. SutskeverT. P. LillicrapM. LeachK. KavukcuogluT. Graepel and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489.  doi: 10.1038/nature16961.
    [42] D. SilverT. HubertJ. SchrittwieserI. AntonoglouM. LaiA. GuezM. LanctotL. SifreD. KumaranT. GraepelT. P. LillicrapK. Simonyan and D. Hassabis, A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science, 362 (2018), 1140-1144.  doi: 10.1126/science.aar6404.
    [43] M. StrazzulloF. BallarinR. Mosetti and G. Rozza, Model reduction for parametrized optimal control problems in environmental marine sciences and engineering, SIAM Journal on Scientific Computing, 40 (2018), B1055-B1079.  doi: 10.1137/17M1150591.
    [44] R. S. Sutton and  A. G. BartoReinforcement Learning: An Introduction, MIT press, 2018. 
    [45] F. Tröltzsch, Optimal Control of Partial Differential Equations: Theory, Methods and Applications, Volume 112 of Graduate Studies in Mathematics, 2010. doi: 10.1090/gsm/112/07.
    [46] S. Wang, M. A. Bhouri and P. Perdikaris, Fast PDE-constrained optimization via self-supervised operator learning, arXiv: 2110.13297, (2021).
    [47] J. Wu, E. Lu, P. Kohli, B. Freeman and J. B. Tenenbaum, Learning to see physics via visual de-animation, NIPS, (2017).
    [48] M. Xu, S. Song, X. Sun, W. Chen and W. Zhang, Machine learning for adjoint vector in aerodynamic shape optimization, Acta Mechanica Sinica, (2021), 1-17. doi: 10.1007/s10409-021-01119-6.
    [49] P. Yin, G. Xiao, K. Tang and C. Yang, AONN: An adjoint-oriented neural network method for all-at-once solutions of parametric optimal control problems, SIAM J. Sci. Comput., 46 (2024), C127-C153, arXiv: 2302.02076. doi: 10.1137/22M154209X.
    [50] J. Yong and X. Y. Zhou, Stochastic Controls, Volume 43 of Applications of Mathematics (New York), Springer-Verlag, New York, 1999.
    [51] M. ZhouJ. Han and J. Lu, Actor-critic method for high dimensional static Hamilton–Jacobi–Bellman partial differential equations based on neural networks, SIAM Journal on Scientific Computing, 43 (2021), A4043-A4066.  doi: 10.1137/21M1402303.
    [52] M. Zhou and J. Lu, A policy gradient framework for stochastic optimal control problems with global convergence guarantee, arXiv: 2302.05816, (2023).
    [53] H. Dong and Z. Ding, Challenges of reinforcement learning, Deep Reinforcement Learning: Fundamentals, Research, and Applications, 7 (2020), 249-272, http://www.deepreinforcementlearningbook.org. doi: 10.1007/978-981-15-4095-0_7.
  • 加载中

Figures(5)

SHARE

Article Metrics

HTML views(698) PDF downloads(188) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return