Advanced Search
Article Contents
Article Contents

Branching improved Deep Q Networks for solving pursuit-evasion strategy solution of spacecraft

  • * Corresponding author: Bingyan Liu, Academy of Military Science of the People's Liberation Army, Beijing 100091, PR China

    * Corresponding author: Bingyan Liu, Academy of Military Science of the People's Liberation Army, Beijing 100091, PR China 
Abstract Full Text(HTML) Figure(12) / Table(2) Related Papers Cited by
  • With the continuous development of space rendezvous technology, more and more attention has been paid to the study of spacecraft orbital pursuit-evasion differential game. Therefore, we propose a pursuit-evasion game algorithm based on branching improved Deep Q Networks to obtain a space rendezvous strategy with non-cooperative target. Firstly, we transform the optimal control of space rendezvous between spacecraft and non-cooperative target into a survivable differential game problem. Next, in order to solve this game problem, we construct Nash equilibrium strategy and test its existence and uniqueness. Then, in order to avoid the dimensional disaster of Deep Q Networks in the continuous behavior space, we construct a TSK fuzzy inference model to represent the continuous space. Finally, in order to solve the complex and timeconsuming self-learning problem of discrete action sets, we improve Deep Q Networks algorithm, and propose a branching architecture with multiple groups of parallel neural Networks and shared decision modules. The simulation results show that the algorithm achieves the combination of optimal control and game theory, and further improves the learning ability of discrete behaviors. The algorithm has the comparative advantage of continuous space behavior decision, can effectively deal with the continuous space chase game problem, and provides a new idea for the solution of spacecraft orbit pursuit-evasion strategy.

    Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Coordinate diagram of spacecraft and non-cooperative target

    Figure 2.  Schematic diagram of direction angle of behavior control quantity

    Figure 3.  TSK fuzzy inference model of pursuit-evasion behavior

    Figure 4.  Branching Deep Q Networks architecture

    Figure 5.  Sharing behavior decision diagram based on improved Deep Q Networks

    Figure 6.  The interactive flow of pursuit-evasion game

    Figure 7.  The error function value comparison of the two algorithms

    Figure 8.  The reward value comparison of the two algorithms

    Figure 9.  The pursuit-evasion trajectory after learning 0 times

    Figure 10.  The pursuit-evasion trajectory after learning 400 times

    Figure 11.  Probability distribution of pursuit-evasion behavior

    Figure 12.  The pursuit-evasion trajectory after learning 800 times

    Table 1.  The initial state of the spacecraft and the non-cooperative target

    x y z $ \dot{x} $ $ \dot{y} $ $ \dot{z} $
    (km) (km) (km) (km/s) (km/s) (km/s)
    P 0 0 0 -0.0563 0.0418 0
    E 70 70 0 -0.0425 0.0314 0
     | Show Table
    DownLoad: CSV

    Table 2.  Experimental environment parameters

    computing platform environment configuration
    CPU Intel Core i5-7300H QCPU @2.50GHz
    RAM 8 GB
    System Windows 10
    Programming language Python 3.6
    Compiling environment Pycharm 2018
    Deep Learning framework TensorFlow 0.12.0
     | Show Table
    DownLoad: CSV
  • [1] G. M. Anderson and V. W. Grazier, Barrier in pursuit-evasion problems between two low-thrust orbital spacecraft, AIAA J., 14 (1976), 158-163.  doi: 10.2514/3.61350.
    [2] J. Ba, V. Mnih and K. Kavukcuoglu, Multiple Object Recognition with Visual Attention, In ICLR, 2015, arXiv: 1412.7755.
    [3] E. N. BarronL. C. Evans and R. Jensen, Viscosity solutions of Isaacs' formulas and differen-tial games with Lipschitz controls, J. Differential Equations, 53 (1984), 213-233.  doi: 10.1016/0022-0396(84)90040-8.
    [4] Y. L. Chen, Research on Differential Games-Based Finite-Time Adaptive Dynamic Programming Guidance Law, Nanjing University of Aeronautics and Astronautics, 2019.
    [5] Y. ChengZ. SunY. Huang and W. Zhang, Fuzzy categorical deep reinforcement learning of a defensive game for an unmanned surface vessel, International Journal of Fuzzy Systemse, 21 (2019), 592-606.  doi: 10.1007/s40815-018-0586-0.
    [6] M. G. CrandallL. C. Evans and P.-L. Lions, Some properties of viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc., 282 (1984), 487-502.  doi: 10.1090/S0002-9947-1984-0732102-X.
    [7] X. DaiC. K. Li and A. B. Rad, An approach to tune fuzzy contorllers based on rein-forcement learning for autonomous vehicle control, IEEE Transactions on Intelligent Transportation Sys-tems, 6 (2005), 285-293. 
    [8] S. F. Desouky and H. M. Schwartz, Q($\lambda$)-learning fuzzy logic controller for a multi-robot system, in IEEE International Conference on Systems, Man and Cybernetics, 10 (2010), 4075–4080. doi: 10.1109/ICSMC.2010.5641791.
    [9] J. Engwerda, Algorithms for computing Nash equilibria indeterministic LQ games, Comput. Manag. Sci., 4 (2007), 113-140.  doi: 10.1007/s10287-006-0030-z.
    [10] A. Friedman, Differential Games, Rhode Island: American Mathematical Society, 1974.
    [11] W. T. HaferH. L. ReedJ. D. Turner and K. Pham, Sensitivity methods applied to orbital pursuit evasion, J. Guid Control Dyn., 38 (2015), 1118-1126.  doi: 10.2514/1.G000832.
    [12] Z. W. HaoS. T. SunQ. H. Zhang and Y. Chen, Application of Semi-Direct Collocation method for solving pur-suit-evasion problems of spacecraft, Journal of As-Tronautics, 40 (2019), 628-635. 
    [13] H. V. Hasselt, A. Guez and D. Silver, Deep reinforcement learning with double q-learning, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 02 (2016), 2094–2100, arXiv: 1509.06461.
    [14] M. Hessel, J. Modayil, H. H. Van et al, Rainbow: Combining improvements in deep rein-forcement learning, Association for the Advancement of Artificial Intelli-gence, 10 (2017), 3215–3222, arXiv: 1710.02298v1.
    [15] R. Isaacs, Differential Games, New York: Wiley, 1965.
    [16] J. S. R. JangC. T. Sun and E. Mizutani, Neuro-fuzzy and soft computing: A computational approach to learning and machine intelligence, IEEE Transactions on Automatic Control, 42 (1997), 1482-1484.  doi: 10.1109/TAC.1997.633847.
    [17] F. Jürgen, W. K. Härdle and C. M. Hafner, Neural Networks and Deep Learning, Statistics of Financial Markets, 2019.
    [18] C. Y. Li, Study on Guidance and Control Problems for Tactical Ballistic Missile Interceptor, Dissertation for Doctoral Degree Harbin: Harbin Institute of Technology, 2008.
    [19] L. LiF. LiuX. Shi and J. Wang, Differential game model and solving method for mis-sile pursuit-evasion, Systems Engineering-Theory & Practice, 36 (2016), 2161-2168. 
    [20] Z.-Y. LiH. ZhuZ. Yang and Y.-Z. Luo, A dimension-reduction solution of free-time differen-tial games for spacecraft pursuit-evasion, Acta Astronautica, 163 (2019), 201-210.  doi: 10.1016/j.actaastro.2019.01.011.
    [21] T. P. Lillicrap, J. J. Hunt, A. Pritzel et al, Continuous Control with Deep Reinforcement Learn-Ing, In International Conference on Learning Representa-tions, 2016. doi: 10.1016/S1098-3015(10)67722-4.
    [22] B. LiuX. YeY. GaoX. DongX. Wang and B. Liu, Forward-looking imaginative planning framework combined with prioritized replay double DQN, International Conferenceon Control, Automation and Robotics, 4 (2019), 336-341.  doi: 10.1109/ICCAR.2019.8813352.
    [23] B. LiuX. YeC. Zhou and B. Liu, Composite mode on-orbit service resource allocation based on improved DQN, Acta Aeronautica et Astronautica Sinica, 41 (2020), 323630-323630.  doi: 10.7527/S1000-6893.2019.23630.
    [24] R. C. LoxtonK. L. TeoV. Rehbock and K. F. C. Yiu, Optimal control problems with a continuous Inequal-ity constraint on the state and the control, Automatica J. IFAC, 45 (2009), 2250-2257.  doi: 10.1016/j.automatica.2009.05.029.
    [25] Y. Z. LuoZ. Y. Li and H. Zhu, Survey on spacecraft orbital pursuit-evasion differential games, Scientia Sinica Technologica, 50 (2020), 1533-1545.  doi: 10.1360/SST-2019-0174.
    [26] L. MatignonG. J. Laurent and N. Le Fort-Piat, Independent reinforcement learners in cooperative markov games: A survey regarding coordination prob-lems, The Knowledge Engineering Review, 27 (2012), 1-31.  doi: 10.1017/S0269888912000057.
    [27] V. MnihK. Kavukcuoglu and D. Silver et al, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533.  doi: 10.1038/nature14236.
    [28] J. F. NashClassic in Game Theory, China Renmin University Press, Beijing, 2013. 
    [29] M. Pontani and B. A. Conway, Numerical solution of the three-dimensional orbital pursuit-evasion game, J. Guid Control Dyn., 32 (2009), 474-487.  doi: 10.2514/1.37962.
    [30] X. F. QianR. X. Lin and  Y. N. ZhaoFlight Mechanics of Guided Missile, Beijing Institute of Technology Press, Beijing, 2006. 
    [31] S. S. Richard and  G. B. AndrewReinforcement Learning: An Introduction (second edition), The MIT Press, London, England, 2018. 
    [32] T. J. Ross, Fuzzy Logic with Engineering Applications, the United States of America: John Wiley & Sons, Ltd, 2010. doi: 10.1002/9781119994374.
    [33] W. C. RyanG. C. RichardP. Meir and P. Scott, Solution of a pursuit-evasion game using a near-optimal strategy, J. Guid Control Dyn, 41 (2018), 841-850.  doi: 10.2514/1.G002911.
    [34] H. M. Schwartz, Multi-Agent Machine Learning: A Reinforcement Ap-Proach, Canada: John Wiley & Sons, Inc, 2014.
    [35] F. SuJ. Liu and Y. Zhang et al, Analysis of optimal impulse for in-plane collision avoidance maneuver, Systems Engineering and Electronics, 40 (2018), 2782-2789.  doi: 10.3969/j.issn.1001-506X.2018.12.23.
    [36] S. T. Sun, Two Spacecraft Pursuit-Evasion Strategies on Low Earth Orbit and Numerical Solution, Harbin Institute of Technology, 2015.
    [37] S. SunQ. ZhangR. Loxton and B. Li, Numerical solution of a pursuit-evasion differential game involving two spacecraft in low earth orbit, J. Ind. Manag. Optim., 11 (2015), 1127-1147.  doi: 10.3934/jimo.2015.11.1127.
    [38] S. SunQ. ZhangR. Loxton and B. Li, Numerical solution of a pursuit-evasion differential game involving two spacecraft in low earth orbit, J. Ind. Manag. Optim., 11 (2015), 1127-1147.  doi: 10.3934/jimo.2015.11.1127.
    [39] T. Takagi and M. Sugeno, Fuzzy identifcation of systems and its applications to modelling ad control, IEEE Transactions on Systems. Man and Cyberetics, 15 (1985), 116-132.  doi: 10.1109/TSMC.1985.6313399.
    [40] L. X. Wang, A Course in Fuzzy Systems and Control, New Jersey: Prentice-Hall, Inc., 1997.
    [41] Z. Wang, T. Schaul, M. Hessel et al, Dueling network architectures for deep reinforcement learning, preprint, arXiv: 1511.06581, 2015, 10.
    [42] X. Wu, S. Liu, L. Yang and Z. Jia, A gait control method for biped robot on slope based on deep reinforcement learning, ACTA Automatica Sinica, 1-13[2020-02-28]. doi: 10.16383/j.aas.c190547.
    [43] Q. Wu and H. Zhang, Spacecraft pursuit strategy and numerical solution based on survival differential strategy, Control and Information Technology, 04 (2019), 39-43.  doi: 10.13889/j.issn.2096-5427.2019.04.007.
    [44] D. T. YuH. Wang and W. M. Zhou, Anti-rendezvous evasive maneuver method consider-ing space geometrical relationship, J. Natl. Univ. Def. Technol., 38 (2016), 89-94.  doi: 10.11887/j.cn.201606015.
    [45] Q. H. ZhangY. Sun and M. M. Huang et al, Pursuit-evasion barrier of two spacecrafts under mi-nute continuous radial thrust in coplanar orbit, Control Decision, 22 (2007), 530-534.  doi: 10.13195/j.cd.2007.05.52.zhangqh.010.
  • 加载中




Article Metrics

HTML views(2006) PDF downloads(668) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint