# American Institute of Mathematical Sciences

November  2019, 2(4): 315-331. doi: 10.3934/mfc.2019020

## A Sim2real method based on DDQN for training a self-driving scale car

 1 School of Information Science and Technology, North China University of Technology, Beijing 100144, China 2 State Key Laboratory of Turbulence and Complex Systems, College of Engineering, Peking University, Beijing 100871, China

* Corresponding author: Tao Du

Published  December 2019

The self-driving based on deep reinforcement learning, as the most important application of artificial intelligence, has become a popular topic. Most of the current self-driving methods focus on how to directly learn end-to-end self-driving control strategy from the raw sensory data. Essentially, this control strategy can be considered as a mapping between images and driving behavior, which usually faces a problem of low generalization ability. To improve the generalization ability for the driving behavior, the reinforcement learning method requires extrinsic reward from the real environment, which may damage the car. In order to obtain a good generalization ability in safety, a virtual simulation environment that can be constructed different driving scene is designed by Unity. A theoretical model is established and analyzed in the virtual simulation environment, and it is trained by double Deep Q-network. Then, the trained model is migrated to a scale car in real world. This process is also called a sim2real method. The sim2real training method efficiently handles these two problems. The simulations and experiments are carried out to evaluate the performance and effectiveness of the proposed algorithm. Finally, it is demonstrated that the scale car in real world obtains the capability for autonomous driving.

Citation: Qi Zhang, Tao Du, Changzheng Tian. A Sim2real method based on DDQN for training a self-driving scale car. Mathematical Foundations of Computing, 2019, 2 (4) : 315-331. doi: 10.3934/mfc.2019020
##### References:
 [1] H. Abraham, C. Lee, S. Brady, C. Fitzgerald, B. Mehler, B. Reimer and J. F. Coughlin, Autonomous vehicles, trust, and driving alternatives: a survey of consumer preferences, Massachusetts Inst. Technol, AgeLab, Cambridge, (2016), 1–16. Google Scholar [2] K. J. Aditya, Working model of self-driving car using Convolutional Neural Network, Raspberry Pi and Arduino, in 2018 Second International Conference on Electronics, Communication and Aerospace Technology, IEEE, 2018, 1630–1635. Google Scholar [3] P. Andhare and S. Rawat, Pick and place industrial robot controller with computer vision, in 2016 International Conference on Computing Communication Control and automation, 2016, 1–4. doi: 10.1109/ICCUBEA.2016.7860048.  Google Scholar [4] C. Chen, A. Seff, A. Kornhauser and J. Xiao, Deepdriving: Learning affordance for direct perception in autonomous driving, in IEEE International Conference on Computer Vision, 2015, 2722–2730. doi: 10.1109/ICCV.2015.312.  Google Scholar [5] Z. Chen and X. Huang, End-to-end learning for lane keeping of self-driving cars, in IEEE Intelligent Vehicles Symposium, IEEE, 2018, 1856–1860. doi: 10.1109/IVS.2017.7995975.  Google Scholar [6] F. Codevilla, M. Miiller, A. Lopez, V. Koltun and A. Dosovitskiy, End-to-end driving via conditional imitation learning, in IEEE International Conference on Robotics and Automation, IEEE, 2018, 4693–4700. doi: 10.1109/ICRA.2018.8460487.  Google Scholar [7] D. Dorr, D. Grabengiesser and F. Gauterin, Online driving style recognition using fuzzy logic, in 17th International IEEE Conference on Intelligent Transportation Systems, IEEE, 2014, 1021–1026. doi: 10.1109/ITSC.2014.6957822.  Google Scholar [8] X. Liang, T. Wang, L. Yang and E. Xing, CIRL: controllable imitative reinforcement learning for vision-based self-driving, in Proceedings of the European Conference on Computer Vision, 2018, 604–620. doi: 10.1007/978-3-030-01234-2_36.  Google Scholar [9] L. J. Lin, Reinforcement Learning for Robots Using Neural Networks, Ph.D thesis, Carnegie Mellon University in Pittsburgh, 1993. Google Scholar [10] R. R. Meganathan, A. A. Kasi and S. Jagannath, Computer vision based novel steering angle calculation for autonomous vehicles, in 2018 Second IEEE International Conference on Robotic Computing, 2018, 143–146. Google Scholar [11] [12] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, Playing atari with deep reinforcement learning, preprint, arXiv: 1312.5602. Google Scholar [13] V. Mnih and et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533.  doi: 10.1038/nature14236.  Google Scholar [14] C. J. Pretorius, M. C. du Plessis and J. W. Gonsalves, The transferability of evolved hexapod locomotion controllers from simulation to real hardware, in 2017 IEEE International Conference on Real-time Computing and Robotics, 2017, 567–574. doi: 10.1109/RCAR.2017.8311923.  Google Scholar [15] Understanding the Fatal Tesla Accident on Autopilot and the NHTSA Probe, Electrek, 2016. Available from: https://electrek.co/2016/07/01/understanding-fatal-tesla-accident-autopilot-nhtsa-probe/. Google Scholar [16] M. Sadeghzadeh, D. Calvert and H. A. Abdullah, Self-learning visual servoing of robot manipulator using explanation-based fuzzy neural networks and Q-learning, Journal of Intelligent and Robotic Systems, 78 (2015), 83-104.  doi: 10.1007/s10846-014-0151-5.  Google Scholar [17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, $2^nd$ edition, Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018.   Google Scholar [18] [19] [20] H. Van, A Guez and D Silver, Deep reinforcement learning with double Q-Learning, in National Conference on Artificial Intelligence, 2016, 2094–2100. Google Scholar [21] D. Wang, J. Wen, Y. Wang, X. Huang and F. Pei, End-to-end self-driving using deep neural networks with multi-auxiliary tasks, Automotive Innovation, 2 (2019), 127-136.  doi: 10.1007/s42154-019-00057-1.  Google Scholar [22] C. J. Watkins and P. Dayan, Q-learning, Machine Learning, 8 (1992), 279-292.  doi: 10.1007/BF00992698.  Google Scholar [23] T. Yamawaki and M. Yashima, Application of adam to iterative learning for an in-hand manipulation task, ROMANSY 22 Robot Design, Dynamics and Control, 584 (2019), 272-279.  doi: 10.1007/978-3-319-78963-7_35.  Google Scholar

show all references

##### References:
 [1] H. Abraham, C. Lee, S. Brady, C. Fitzgerald, B. Mehler, B. Reimer and J. F. Coughlin, Autonomous vehicles, trust, and driving alternatives: a survey of consumer preferences, Massachusetts Inst. Technol, AgeLab, Cambridge, (2016), 1–16. Google Scholar [2] K. J. Aditya, Working model of self-driving car using Convolutional Neural Network, Raspberry Pi and Arduino, in 2018 Second International Conference on Electronics, Communication and Aerospace Technology, IEEE, 2018, 1630–1635. Google Scholar [3] P. Andhare and S. Rawat, Pick and place industrial robot controller with computer vision, in 2016 International Conference on Computing Communication Control and automation, 2016, 1–4. doi: 10.1109/ICCUBEA.2016.7860048.  Google Scholar [4] C. Chen, A. Seff, A. Kornhauser and J. Xiao, Deepdriving: Learning affordance for direct perception in autonomous driving, in IEEE International Conference on Computer Vision, 2015, 2722–2730. doi: 10.1109/ICCV.2015.312.  Google Scholar [5] Z. Chen and X. Huang, End-to-end learning for lane keeping of self-driving cars, in IEEE Intelligent Vehicles Symposium, IEEE, 2018, 1856–1860. doi: 10.1109/IVS.2017.7995975.  Google Scholar [6] F. Codevilla, M. Miiller, A. Lopez, V. Koltun and A. Dosovitskiy, End-to-end driving via conditional imitation learning, in IEEE International Conference on Robotics and Automation, IEEE, 2018, 4693–4700. doi: 10.1109/ICRA.2018.8460487.  Google Scholar [7] D. Dorr, D. Grabengiesser and F. Gauterin, Online driving style recognition using fuzzy logic, in 17th International IEEE Conference on Intelligent Transportation Systems, IEEE, 2014, 1021–1026. doi: 10.1109/ITSC.2014.6957822.  Google Scholar [8] X. Liang, T. Wang, L. Yang and E. Xing, CIRL: controllable imitative reinforcement learning for vision-based self-driving, in Proceedings of the European Conference on Computer Vision, 2018, 604–620. doi: 10.1007/978-3-030-01234-2_36.  Google Scholar [9] L. J. Lin, Reinforcement Learning for Robots Using Neural Networks, Ph.D thesis, Carnegie Mellon University in Pittsburgh, 1993. Google Scholar [10] R. R. Meganathan, A. A. Kasi and S. Jagannath, Computer vision based novel steering angle calculation for autonomous vehicles, in 2018 Second IEEE International Conference on Robotic Computing, 2018, 143–146. Google Scholar [11] [12] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, Playing atari with deep reinforcement learning, preprint, arXiv: 1312.5602. Google Scholar [13] V. Mnih and et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533.  doi: 10.1038/nature14236.  Google Scholar [14] C. J. Pretorius, M. C. du Plessis and J. W. Gonsalves, The transferability of evolved hexapod locomotion controllers from simulation to real hardware, in 2017 IEEE International Conference on Real-time Computing and Robotics, 2017, 567–574. doi: 10.1109/RCAR.2017.8311923.  Google Scholar [15] Understanding the Fatal Tesla Accident on Autopilot and the NHTSA Probe, Electrek, 2016. Available from: https://electrek.co/2016/07/01/understanding-fatal-tesla-accident-autopilot-nhtsa-probe/. Google Scholar [16] M. Sadeghzadeh, D. Calvert and H. A. Abdullah, Self-learning visual servoing of robot manipulator using explanation-based fuzzy neural networks and Q-learning, Journal of Intelligent and Robotic Systems, 78 (2015), 83-104.  doi: 10.1007/s10846-014-0151-5.  Google Scholar [17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, $2^nd$ edition, Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018.   Google Scholar [18] [19] [20] H. Van, A Guez and D Silver, Deep reinforcement learning with double Q-Learning, in National Conference on Artificial Intelligence, 2016, 2094–2100. Google Scholar [21] D. Wang, J. Wen, Y. Wang, X. Huang and F. Pei, End-to-end self-driving using deep neural networks with multi-auxiliary tasks, Automotive Innovation, 2 (2019), 127-136.  doi: 10.1007/s42154-019-00057-1.  Google Scholar [22] C. J. Watkins and P. Dayan, Q-learning, Machine Learning, 8 (1992), 279-292.  doi: 10.1007/BF00992698.  Google Scholar [23] T. Yamawaki and M. Yashima, Application of adam to iterative learning for an in-hand manipulation task, ROMANSY 22 Robot Design, Dynamics and Control, 584 (2019), 272-279.  doi: 10.1007/978-3-319-78963-7_35.  Google Scholar
The reinforcement learning scale car based on DDQN
]">Figure 2.  One 1:16 scale car. There is an opensource DIY self-driving platform for small scale cars called donkeycar [18]
The process of reinforcement learning
The architecture of the network
The examples of raw images transfer to the segmented images
The learning curve of "average rewards - train episodes"
The scale vehicle car in the Unity Simulation
The road for self-driving scale vehicle car, which contains two fast curves and two gentle curves
The trained self-driving scale vehicle car
An obstacle is added on the road. The angle of view of the car in the lower left of the figure
There are five obstacles on the left figure. And there are three obstacles on the right figure
Performance of CNN, DDQN in the same road. The number means the times of the car outside
 5(night) 3 0 10(daylight) 4 1 10(night) 8 2 15(daylight) 9 1 15(night) 12 3
 5(night) 3 0 10(daylight) 4 1 10(night) 8 2 15(daylight) 9 1 15(night) 12 3
Performance of CNN, DDQN in the same road with obstacle(s) and in five laps. The number means the times of the car hitting the obstacle(s)
 3 3 0 5 4 0
 3 3 0 5 4 0
 [1] Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics & Games, 2021  doi: 10.3934/jdg.2021008 [2] Shan-Shan Lin. Due-window assignment scheduling with learning and deterioration effects. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021081 [3] Mehmet Duran Toksari, Emel Kizilkaya Aydogan, Berrin Atalay, Saziye Sari. Some scheduling problems with sum of logarithm processing times based learning effect and exponential past sequence dependent delivery times. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021044 [4] Hao Li, Honglin Chen, Matt Haberland, Andrea L. Bertozzi, P. Jeffrey Brantingham. PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021039 [5] W. Cary Huffman. On the theory of $\mathbb{F}_q$-linear $\mathbb{F}_{q^t}$-codes. Advances in Mathematics of Communications, 2013, 7 (3) : 349-378. doi: 10.3934/amc.2013.7.349 [6] Mingxin Wang, Qianying Zhang. Dynamics for the diffusive Leslie-Gower model with double free boundaries. Discrete & Continuous Dynamical Systems, 2018, 38 (5) : 2591-2607. doi: 10.3934/dcds.2018109 [7] Tuvi Etzion, Alexander Vardy. On $q$-analogs of Steiner systems and covering designs. Advances in Mathematics of Communications, 2011, 5 (2) : 161-176. doi: 10.3934/amc.2011.5.161 [8] János Kollár. Relative mmp without $\mathbb{Q}$-factoriality. Electronic Research Archive, , () : -. doi: 10.3934/era.2021033 [9] Yun Gao, Shilin Yang, Fang-Wei Fu. Some optimal cyclic $\mathbb{F}_q$-linear $\mathbb{F}_{q^t}$-codes. Advances in Mathematics of Communications, 2021, 15 (3) : 387-396. doi: 10.3934/amc.2020072 [10] Dingheng Pi. Periodic orbits for double regularization of piecewise smooth systems with a switching manifold of codimension two. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021080 [11] Meng Ding, Ting-Zhu Huang, Xi-Le Zhao, Michael K. Ng, Tian-Hui Ma. Tensor train rank minimization with nonlocal self-similarity for tensor completion. Inverse Problems & Imaging, 2021, 15 (3) : 475-498. doi: 10.3934/ipi.2021001 [12] Thomas Y. Hou, Ruo Li. Nonexistence of locally self-similar blow-up for the 3D incompressible Navier-Stokes equations. Discrete & Continuous Dynamical Systems, 2007, 18 (4) : 637-642. doi: 10.3934/dcds.2007.18.637 [13] Masashi Wakaiki, Hideki Sano. Stability analysis of infinite-dimensional event-triggered and self-triggered control systems with Lipschitz perturbations. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021021 [14] Francis Hounkpe, Gregory Seregin. An approximation of forward self-similar solutions to the 3D Navier-Stokes system. Discrete & Continuous Dynamical Systems, 2021  doi: 10.3934/dcds.2021059 [15] Joe Gildea, Adrian Korban, Abidin Kaya, Bahattin Yildiz. Constructing self-dual codes from group rings and reverse circulant matrices. Advances in Mathematics of Communications, 2021, 15 (3) : 471-485. doi: 10.3934/amc.2020077 [16] Wei Wang, Yang Shen, Linyi Qian, Zhixin Yang. Hedging strategy for unit-linked life insurance contracts with self-exciting jump clustering. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021072 [17] Jennifer D. Key, Bernardo G. Rodrigues. Binary codes from $m$-ary $n$-cubes $Q^m_n$. Advances in Mathematics of Communications, 2021, 15 (3) : 507-524. doi: 10.3934/amc.2020079

Impact Factor: