doi: 10.3934/era.2021075
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA

* Corresponding author: Guannan Zhang (zhangg@ornl.gov)

Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Received  November 2020 Revised  August 2021 Early access September 2021

The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.

Citation: Jiaxin Zhang, Hoang Tran, Guannan Zhang. Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy. Electronic Research Archive, doi: 10.3934/era.2021075
References:
[1]

M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Functions, Dover, New York, 1972. Google Scholar

[2]

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel and W. Zaremba, Hindsight experience replay, in Advances in Neural Information Processing Systems, (2017), 5048–5058. Google Scholar

[3]

A. S. Berahas, L. Cao, K. Choromanskiv and K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv: 1905.01332. Google Scholar

[4]

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540. Google Scholar

[5]

K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization, NeurIPS. Google Scholar

[6]

K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, Provably robust blackbox optimization for reinforcement learning, arXiv: 1903.02993. Google Scholar

[7]

E. Conti, V. Madhavan, F. P. Such, J. Lehman, K. O. Stanley and J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, NIPS. Google Scholar

[8]

E. Coumans and Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, GitHub Repository. Google Scholar

[9]

P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu and P. Zhokhov, Openai Baselines, https://github.com/openai/baselines, 2017. Google Scholar

[10]

A. D. Flaxman, A. T. Kalai and H. B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, Proceedings of the 16th Annual ACM-SIAM symposium on Discrete Algorithms, 385–394, ACM, New York, (2005).  Google Scholar

[11]

S. Fujimoto, H. Van Hoof and D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint, arXiv: 1802.09477. Google Scholar

[12]

N. Hansen, The CMA evolution strategy: A comparing review, in Towards a new Evolutionary Computation, Springer, 192 (2006), 75–102. doi: 10.1007/3-540-32494-1_4.  Google Scholar

[13]

N. Hansen and A. Ostermeier, Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, 9 (2001), 159-195.  doi: 10.1162/106365601750190398.  Google Scholar

[14]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, ICLR. Google Scholar

[15]

N. Maheswaranathan, L. Metz, G. Tucker, D. Choi and J. Sohl-Dickstein, Guided evolutionary strategies: Augmenting random search with surrogate gradients, Proceedings of the 36th International Conference on Machine Learning. Google Scholar

[16]

F. Meier, A. Mujika, M. M. Gauy and A. Steger, Improving gradient estimation in evolutionary strategies with past descent directions, Optimization Foundations for Reinforcement Learning Workshop at NeurIPS. Google Scholar

[17]

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, ICML, 1928–1937. Google Scholar

[18]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. doi: 10.1038/nature14236.  Google Scholar

[19]

P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., Ray: A distributed framework for emerging $\{$AI$\}$ applications, in 13th $\{USENIX\}$ Symposium on Operating Systems Design and Implementation ($\{OSDI\}$ 18), (2018), 561–577. Google Scholar

[20]

Y. Nesterov and V. Spokoiny, Random gradient-free minimization of convex functions, Found. Comput. Math., 17 (2017), 527-566.  doi: 10.1007/s10208-015-9296-2.  Google Scholar

[21]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, Automatic differentiation in pytorch., Google Scholar

[22]

A. Quarteroni, R. Sacco and F. Saleri, Numerical Mathematics, Texts in Applied Mathematics, 37. Springer-Verlag, Berlin, 2007. doi: 10.1007/b98885.  Google Scholar

[23]

T. Salimans, J. Ho, X. Chen, S. Sidor and I. Sutskever, From complexity to simplicity as a scalable alternative to reinforcement learning, arXiv preprint, arXiv: 1703.03864. Google Scholar

[24]

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan and P. Moritz, Trust region policy optimization, ICML, 1889–1897. Google Scholar

[25]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, arXiv preprint, arXiv: 1707.06347. Google Scholar

[26]

F. SehnkeC. OsendorferT. RuckstiebA. GravesJ. Peters and J. Schmidhuber, Parameter-exploring policy gradients, Neural Networks, 23 (2010), 551-559.  doi: 10.1016/j.neunet.2009.12.004.  Google Scholar

[27]

O. Sigaud and F. Stulp, Robot skill learning: From reinforcement learning to evolution strategies, Paladyn Journal of Behavioral Robotics, 4 (2013), 49-61.   Google Scholar

[28]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, et al., Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961.  Google Scholar

[29]

F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint, arXiv: 1712.06567. Google Scholar

[30]

R. S. Sutton and A. G. Barto (eds.), Reinforcement Learning: An introduction, Second edition. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018.  Google Scholar

[31]

K. TianL. HuangY. SunK. DuP. Hao and B. Wang, Fast buckling load numerical prediction method for imperfect shells under axial compression based on pod and vibration correlation technique, Composite Structures, 252 (2020), 112721.  doi: 10.1016/j.compstruct.2020.112721.  Google Scholar

[32]

K. Tian, Z. Li, L. Huang, K. Du, L. Jiang and B. Wang, Enhanced variable-fidelity surrogate-based optimization framework by gaussian process regression and fuzzy clustering, Comput. Methods Appl. Mech. Engrg., 366 (2020), 113045, 19 pp. doi: 10.1016/j.cma.2020.113045.  Google Scholar

[33]

J. Zhang, H. Tran, D. Lu and G. Zhang, Enabling long-range exploration in minimization of multimodal functions, Proceedings of 37th on Uncertainty in Artificial Intelligence (UAI). Google Scholar

show all references

References:
[1]

M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Functions, Dover, New York, 1972. Google Scholar

[2]

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel and W. Zaremba, Hindsight experience replay, in Advances in Neural Information Processing Systems, (2017), 5048–5058. Google Scholar

[3]

A. S. Berahas, L. Cao, K. Choromanskiv and K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv: 1905.01332. Google Scholar

[4]

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540. Google Scholar

[5]

K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization, NeurIPS. Google Scholar

[6]

K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, Provably robust blackbox optimization for reinforcement learning, arXiv: 1903.02993. Google Scholar

[7]

E. Conti, V. Madhavan, F. P. Such, J. Lehman, K. O. Stanley and J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, NIPS. Google Scholar

[8]

E. Coumans and Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, GitHub Repository. Google Scholar

[9]

P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu and P. Zhokhov, Openai Baselines, https://github.com/openai/baselines, 2017. Google Scholar

[10]

A. D. Flaxman, A. T. Kalai and H. B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, Proceedings of the 16th Annual ACM-SIAM symposium on Discrete Algorithms, 385–394, ACM, New York, (2005).  Google Scholar

[11]

S. Fujimoto, H. Van Hoof and D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint, arXiv: 1802.09477. Google Scholar

[12]

N. Hansen, The CMA evolution strategy: A comparing review, in Towards a new Evolutionary Computation, Springer, 192 (2006), 75–102. doi: 10.1007/3-540-32494-1_4.  Google Scholar

[13]

N. Hansen and A. Ostermeier, Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, 9 (2001), 159-195.  doi: 10.1162/106365601750190398.  Google Scholar

[14]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, ICLR. Google Scholar

[15]

N. Maheswaranathan, L. Metz, G. Tucker, D. Choi and J. Sohl-Dickstein, Guided evolutionary strategies: Augmenting random search with surrogate gradients, Proceedings of the 36th International Conference on Machine Learning. Google Scholar

[16]

F. Meier, A. Mujika, M. M. Gauy and A. Steger, Improving gradient estimation in evolutionary strategies with past descent directions, Optimization Foundations for Reinforcement Learning Workshop at NeurIPS. Google Scholar

[17]

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, ICML, 1928–1937. Google Scholar

[18]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. doi: 10.1038/nature14236.  Google Scholar

[19]

P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., Ray: A distributed framework for emerging $\{$AI$\}$ applications, in 13th $\{USENIX\}$ Symposium on Operating Systems Design and Implementation ($\{OSDI\}$ 18), (2018), 561–577. Google Scholar

[20]

Y. Nesterov and V. Spokoiny, Random gradient-free minimization of convex functions, Found. Comput. Math., 17 (2017), 527-566.  doi: 10.1007/s10208-015-9296-2.  Google Scholar

[21]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, Automatic differentiation in pytorch., Google Scholar

[22]

A. Quarteroni, R. Sacco and F. Saleri, Numerical Mathematics, Texts in Applied Mathematics, 37. Springer-Verlag, Berlin, 2007. doi: 10.1007/b98885.  Google Scholar

[23]

T. Salimans, J. Ho, X. Chen, S. Sidor and I. Sutskever, From complexity to simplicity as a scalable alternative to reinforcement learning, arXiv preprint, arXiv: 1703.03864. Google Scholar

[24]

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan and P. Moritz, Trust region policy optimization, ICML, 1889–1897. Google Scholar

[25]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, arXiv preprint, arXiv: 1707.06347. Google Scholar

[26]

F. SehnkeC. OsendorferT. RuckstiebA. GravesJ. Peters and J. Schmidhuber, Parameter-exploring policy gradients, Neural Networks, 23 (2010), 551-559.  doi: 10.1016/j.neunet.2009.12.004.  Google Scholar

[27]

O. Sigaud and F. Stulp, Robot skill learning: From reinforcement learning to evolution strategies, Paladyn Journal of Behavioral Robotics, 4 (2013), 49-61.   Google Scholar

[28]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, et al., Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961.  Google Scholar

[29]

F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint, arXiv: 1712.06567. Google Scholar

[30]

R. S. Sutton and A. G. Barto (eds.), Reinforcement Learning: An introduction, Second edition. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018.  Google Scholar

[31]

K. TianL. HuangY. SunK. DuP. Hao and B. Wang, Fast buckling load numerical prediction method for imperfect shells under axial compression based on pod and vibration correlation technique, Composite Structures, 252 (2020), 112721.  doi: 10.1016/j.compstruct.2020.112721.  Google Scholar

[32]

K. Tian, Z. Li, L. Huang, K. Du, L. Jiang and B. Wang, Enhanced variable-fidelity surrogate-based optimization framework by gaussian process regression and fuzzy clustering, Comput. Methods Appl. Mech. Engrg., 366 (2020), 113045, 19 pp. doi: 10.1016/j.cma.2020.113045.  Google Scholar

[33]

J. Zhang, H. Tran, D. Lu and G. Zhang, Enabling long-range exploration in minimization of multimodal functions, Proceedings of 37th on Uncertainty in Artificial Intelligence (UAI). Google Scholar

Figure 1.  Illustration of DGS gradient direction and local gradient direction at 50 random locations on the surface of the Ackley in 2D. (Left) The standard deviation $ \boldsymbol \sigma $ is set to 0.01, such that the DGS gradient align with the local gradient at most locations. (Right) The standard deviation $ \boldsymbol \sigma $ is set to 1.0, such that most DGS gradient points to the global minimum at $ (0,0) $
Figure 2.  Illustration of the dimension dependence of the convergence rate of the DGS-ES method. The convergence rate, i.e., the number of iterations to converge, is independent of the dimension for convex functions, e.g., Sphere function, and such property empirically carries over to the non-convex Lévy and Rastrigin functions
Figure 3.  Comparison of different blackbox optimization methods on four 2000-dimensional benchmark functions
Figure 4.  Comparison between the DGS-ES and two baselines, i.e., ES and ASEBO, for solving the three problems from OpenAI Gym. The colored curves are the average return over 5 repeated runs with different random seeds, and the corresponding shade represents the interval between the maximum and minimum return
Figure 5.  Comparison between the DGS-ES method and the baselines, i.e., ES, ASEBO, PPO, TRPO, DDPG and TD3, for solving the three problems from the PyBullet library. The colored curves are the average return over 20 runs with different random seeds, and the corresponding shade represents the interval between the maximal and minimal rewards
Figure 6.  Illustration of the effect of the radius $ \boldsymbol \sigma $ of $ \widetilde{\nabla}^M_{\boldsymbol \sigma, \boldsymbol \Xi}[J](\boldsymbol \theta) $ on the performance of the DGS-ES method in solving the Reacher-v0 problem
Figure 7.  Illustration and design optimization of hierarchical stiffened shell structures in aerospace engineering, e.g. rocket
[1]

Nurullah Yilmaz, Ahmet Sahiner. On a new smoothing technique for non-smooth, non-convex optimization. Numerical Algebra, Control & Optimization, 2020, 10 (3) : 317-330. doi: 10.3934/naco.2020004

[2]

Yong Wang, Wanquan Liu, Guanglu Zhou. An efficient algorithm for non-convex sparse optimization. Journal of Industrial & Management Optimization, 2019, 15 (4) : 2009-2021. doi: 10.3934/jimo.2018134

[3]

Qilin Wang, Liu He, Shengjie Li. Higher-order weak radial epiderivatives and non-convex set-valued optimization problems. Journal of Industrial & Management Optimization, 2019, 15 (2) : 465-480. doi: 10.3934/jimo.2018051

[4]

Meixia Li, Changyu Wang, Biao Qu. Non-convex semi-infinite min-max optimization with noncompact sets. Journal of Industrial & Management Optimization, 2017, 13 (4) : 1859-1881. doi: 10.3934/jimo.2017022

[5]

C. M. Elliott, B. Gawron, S. Maier-Paape, E. S. Van Vleck. Discrete dynamics for convex and non-convex smoothing functionals in PDE based image restoration. Communications on Pure & Applied Analysis, 2006, 5 (1) : 181-200. doi: 10.3934/cpaa.2006.5.181

[6]

Yaxian Xu, Ajay Jasra. Particle filters for inference of high-dimensional multivariate stochastic volatility models with cross-leverage effects. Foundations of Data Science, 2019, 1 (1) : 61-85. doi: 10.3934/fods.2019003

[7]

Takashi Hara and Gordon Slade. The incipient infinite cluster in high-dimensional percolation. Electronic Research Announcements, 1998, 4: 48-55.

[8]

Rentsen Enkhbat, M. V. Barkova, A. S. Strekalovsky. Solving Malfatti's high dimensional problem by global optimization. Numerical Algebra, Control & Optimization, 2016, 6 (2) : 153-160. doi: 10.3934/naco.2016005

[9]

Wayne B. Hayes, Kenneth R. Jackson, Carmen Young. Rigorous high-dimensional shadowing using containment: The general case. Discrete & Continuous Dynamical Systems, 2006, 14 (2) : 329-342. doi: 10.3934/dcds.2006.14.329

[10]

Johannes Lankeit, Yulan Wang. Global existence, boundedness and stabilization in a high-dimensional chemotaxis system with consumption. Discrete & Continuous Dynamical Systems, 2017, 37 (12) : 6099-6121. doi: 10.3934/dcds.2017262

[11]

Hansol Park. Generalization of the Winfree model to the high-dimensional sphere and its emergent dynamics. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021134

[12]

Haodong Yu, Jie Sun. Robust stochastic optimization with convex risk measures: A discretized subgradient scheme. Journal of Industrial & Management Optimization, 2021, 17 (1) : 81-99. doi: 10.3934/jimo.2019100

[13]

Murat Adivar, Shu-Cherng Fang. Convex optimization on mixed domains. Journal of Industrial & Management Optimization, 2012, 8 (1) : 189-227. doi: 10.3934/jimo.2012.8.189

[14]

Sanming Liu, Zhijie Wang, Chongyang Liu. Proximal iterative Gaussian smoothing algorithm for a class of nonsmooth convex minimization problems. Numerical Algebra, Control & Optimization, 2015, 5 (1) : 79-89. doi: 10.3934/naco.2015.5.79

[15]

Kamran Jalilian, Kameleh Nasiri Pirbazari. Convex optimization without convexity of constraints on non-necessarily convex sets and its applications in customer satisfaction in automotive industry. Numerical Algebra, Control & Optimization, 2021  doi: 10.3934/naco.2021020

[16]

Yoon Mo Jung, Taeuk Jeong, Sangwoon Yun. Non-convex TV denoising corrupted by impulse noise. Inverse Problems & Imaging, 2017, 11 (4) : 689-702. doi: 10.3934/ipi.2017032

[17]

Tong Li, Jeungeun Park. Stability of traveling waves of models for image processing with non-convex nonlinearity. Communications on Pure & Applied Analysis, 2018, 17 (3) : 959-985. doi: 10.3934/cpaa.2018047

[18]

Ahmet Sahiner, Nurullah Yilmaz, Gulden Kapusuz. A novel modeling and smoothing technique in global optimization. Journal of Industrial & Management Optimization, 2019, 15 (1) : 113-130. doi: 10.3934/jimo.2018035

[19]

Chao Wang, Zhien Li, Ravi P. Agarwal. Hyers-Ulam-Rassias stability of high-dimensional quaternion impulsive fuzzy dynamic equations on time scales. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021041

[20]

Yuan Xu, Xin Jin, Saiwei Wang, Yang Tang. Optimal synchronization control of multiple euler-lagrange systems via event-triggered reinforcement learning. Discrete & Continuous Dynamical Systems - S, 2021, 14 (4) : 1495-1518. doi: 10.3934/dcdss.2020377

2020 Impact Factor: 1.833

Metrics

  • PDF downloads (43)
  • HTML views (51)
  • Cited by (0)

Other articles
by authors

[Back to Top]