Continuous Space Pursuit-evasion Game Algorithm Based on Multi-group Deep Q Network

doi:10.3969/j.issn.1000-1093.2021.03.024

Abstract

Abstract: A continuous space pursuit-evasion game algorithm based on multi-group deep reinforcement learning is proposed to solve the problems in continuous space pursuit-evasion game(PEG). In order to avoid the insufficient curse of dimensionality of continuous space in traditional reinforcement learning，a TSK fuzzy inference model is established to represent the continuous space.And a pursuit-evasion game algorithm based on multi-group deep reinforcement learning is designed for the complex and time-consuming problems of discrete action self-learning.The simulation environment and motion model were designed by taking the PEG problem of a four-wheel vehicle as an example, and the simulation experiments were carried out with Q-learning algorithm, reinforcement learning algorithm based on qualification trace and genetic algorithm based on reward, respectively. The simulated results show that the continuous space PEG algorithm can be used to solve the problem of continuous space pursuit-evasion game well，and continuously improve the ability to address problems with the increase in learning times，and has the comparative advantages of less time consuming for independent learning and short application time.

Key words: pursuit-evasiongame, continuousspace, deepQnetwork, neuralnetwork, differentialgame, intelligentvehicle

CLC Number:

TP181

LIU Bingyan， YE Xiongbing， YUE Zhihong， DONG Xianzhou， ZHANG Qiyang. Continuous Space Pursuit-evasion Game Algorithm Based on Multi-group Deep Q Network[J]. Acta Armamentarii, 2021, 42(3): 663-672.

References

［1］ ISAACS R. Differential games［M］.New York，NY，US: John Wiley & Sons，1965.
［2］ MAMATOV M S，SOBIROV K K.On the theory of position pursuit differential games［J］.Journal of Mathematical Sciences，2020，245(2)： 332-340.
［3］李世豪.复杂空战环境下基于博弈模型的无人机机动决策方法研究［D］.南京：南京航空航天大学，2019.
LI S H.Research on UAV maneuvering decision-making method based on game theory in complex air combat［D］.Nanjing：Nanjing University of Aeronautics and Astronautics，2019.(in Chinese)
［4］朱强，邵之江.基于神经网络的实时滚动追逃博弈导弹制导律［J］.系统工程与电子技术，2019，41(7)：1597-1605.
ZHU Q，SHAO Z J.Real-time receding horizon pursuit and evasion games of missile guidance based on neural network［J］.Systems Engineering and Electronics，2019，41(7):1597-1605.(in Chinese)
［5］柴源，罗建军，王明明，等.基于追逃博弈的非合作目标接近控制［J］.宇航总体技术，2020，4(1)：30-38.
CHAI Y，LUO J J，WANG M M，et al.Pursuit-evasion game control for approaching space non-cooperative target［J］.Aerospace Systems Engineering Technology，2020，4(1):30-38.(in Chinese)
［6］ FRIEDMAN A.Differential games［M］.Providence，RI，US:American Mathematical Society，1974．
［7］ DICKMANNS E，WELL K.Approximate solution of optimal control problems using third order hermite polynomial functions［C］∥Proceedings of Optimization Techniques IFIP Technical Conference.Novosibirsk，Russia:IFIP，1974．
［8］ MEHDI S，MASSIMILIANO F.Differential game of optimal pursuit of one evader by many pursuers［J］.International Journal of Game Theory，2019，48 (2):481-490.
［9］ VON MOLL A，CASBEER D，GARCIA E，et al.The multi-pursuer single-evader game: a geometric approach［J］.Journal of Intelligent & Robotic Systems，2019，96(2):193-207.
［10］常晓飞，孙博，闫杰，等.针对高速机动目标的三维非线性微分对策制导律［J］.弹道学报，2018，30(3)：1-6.
CHANG X F，SUN B，YAN J，et al.3-dimensional nonlinear differential game-based guidance law against high-speed maneuvering target［J］.Journal of Ballistics，2018，30(3):1-6.(in Chinese)
［11］孙启龙，齐乃明，赵钧，等.攻击主动防御飞行器的微分对策制导律［J］.国防科技大学学报，2018，40(3)：7-14.
SUN Q L QI N M，ZHAO J，et al.Differential game guidance laws against active defense aircraft［J］.Journak of National University of Defense Technology，2018，40(3):7-14.(in Chinese)
［12］郭志强，孙启龙，周绍磊，等.主动防御飞行器的范数型微分对策制导律［J］.北京航空航天大学学报，2019，45(9)：1787-1796.
GUO Z Q，SUN Q L，ZHOU S L，et al.Norm differential game guidance law for active defense aircraft［J］.Journal of Beijing University of Aeronautics and Astronautics，2019，45(9):1787-1796.(in Chinese)
［13］赵琳，周俊峰，刘源，等.三维空间“追-逃-防”三方微分对策方法［J］.系统工程与电子技术，2019，41(2)：322-335.
ZHAO L，ZHOU J F，LIU Y，et al.Three-body differential game approach of pursuit-evasion-defense in three dimensional space［J］.Systems Engineering and Electronics，2019，41(2):322-335.(in Chinese)
［14］郝志伟，孙松涛，张秋华，等.半直接配点法在航天器追逃问题求解中的应用［J］.宇航学报，2019，40(6)：628-635.
HAO Z W，SUN S T，ZHANG Q H，et al.Application of semi-direct collocation method for solving pursuit-evasion problems of space-craft［J］.Journal of Astronautics，2019，40(6):628-635. (in Chinese)
［15］陈燕妮.基于微分对策的有限时间自适应动态规划制导研究［D］.南京：南京航空航天大学，2019.
CHEN Y N.Research on differential games-based finite-time adaptive dynamic programming guidance law［D］.Nanjing：Nanjing University of Aeronautics and Astronautics，2019.(in Chinese)
［16］ MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning［J］.Nature，2015，518(7540):529-533.
［17］余跃，王宏伦.基于深度学习的高超声速飞行器再入预测校正容错制导［J］.兵工学报，2020，41(4):656-669.
YU Y，WANG H L.Deep learning-based reentry predictor-corrector fault-tolerant guidance for hypersonic vehicles［J］.Acta Armamentarii，2020，41(4):656-669.(in Chinese)
［18］冷鹏飞，徐朝阳.一种深度强化学习的雷达辐射源个体识别方法［J］.兵工学报，2018，39(12): 2420-2426.
LENG P F，XU C Y.Specific emitter identification based on deep einforcement learning［J］.Acta Armamentarii，2018，39(12):2420-2426.(in Chinese)
［19］刘冰雁，叶雄兵，周赤非，等.基于改进DQN的复合模式在轨服务资源分配［J］.航空学报，2020，41(5):256-264.
LIU B Y，YE X B，ZHOU C F，et al.Composite mode on-orbit service resource allocation based on improved DQN［J］.Acta Aeronautica et Astronautica Sinica，2020，41(5):256-264.(in Chinese)
［20］张宏达，李德才，何玉庆.人工智能与“星际争霸”:多智能体博弈研究新进展［J］.无人系统技术，2019，2(1):5-16.
ZHANG H D，LI D C，HE Y Q.Artificial intelligence and StarCraft: new progress in multiagent game research［J］.Unmanned Systems Technology，2019，2(1):5-16.(in Chinese)
［21］曹雷.基于深度强化学习的智能博弈对抗关键技术［J］.指挥信息系统与技术，2019，10(5):1-7.
CAO L.Key technologies of intelligent game confrontation based on deep reinforcement learning［J］.Command Information System and Technology，2019，10(5):1-7. (in Chinese)
［22］曲昭伟，潘昭天，陈永恒，等.考虑博弈的多智能体强化学习分布式信号控制［J］.交通运输系统工程与信息，2020，20(2): 76-82，100.
QU Z W，PAN Z T，CHEN Y H，et al.Distributed signal control of multi-agent reinforcement learning based on game［J］.Journal of Transportation Systems Engineering and Information Technology，2020，20(2):76-82，100.(in Chinese)
［23］范超琼，赵成林，李斌.无人机网络中基于分层博弈的干扰对抗频谱接入优化［J］.通信学报，2020，41(6):21-33.
FAN C Q，ZHAO C L，LI B.Hierarchical game based spectrum access optimization for anti-jamming in UAV communication network［J］.Journal on Communications，2020，41(6): 21-33. (in Chinese)
［24］曹雷.基于深度强化学习的智能博弈对抗关键技术［J］.指挥信息系统与技术，2019，10(5)：1-7.
CAO L.Key technologies of intelligent game confrontation based on deep reinforcement learning［J］.Command Information System and Technology，2019，10(5):1-7.(in Chinese)
［25］ YIN C，SUN Z J，HUANG Y X，et al.Fuzzy categorical deep reinforcement learning of a defensive game for an unmanned surface vessel［J］.International Journal of Fuzzy Systems，2019，21(2):592-606.
［26］ LIU B Y，YE X B，GAO Y，et al.Forward-looking imaginative planning framework combined with prioritized replay double DQN［C］∥Proceedingas of the 5th International Conferenceon Control，Automation and Robotics.Beijing，China:IEEE，2019:336-341.
［27］吴晓光，刘绍维，杨磊，等.基于深度强化学习的双足机器人斜坡步态控制方法［J/OL］.自动化学报，2020［2020-02-28］.https:∥doi.org/10.16383/j.aas.c190547.
WU X G，LIU S W，YANG L，et al.A gait control method for biped robot on slope based on deep reinforcement learning［J］.Acta Automatica Sinica，2020［2020-02-28］.https:∥doi.org/10.16383/j.aas.c190547. (in Chinese)
［28］ SCHWARTZ H M.Multi-agent machine learning: a reinforcement approach［M］.Hoboken，NJ，US:John Wiley & Sons，2014.
［29］ WANG L X.A course in fuzzy systems and control［M］.Upper Saddle River，NJ，US:Prentice-Hall，1997.
［30］ TAKAGI T，SUGENO M.Fuzzy identification of systems and its applications to modelling and control［J］.IEEE Transactions on Systems，Man and Cyberetics，1985，15(1):116-132.
［31］ JANG J S R，SUN C T.Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence［M］.Upper Saddle River，NJ，US:Prentice-Hall，1997.
［32］ DAI X，LI C，RAD A.An approach to tune fuzzy contorllers based on reinforcement learning for autonomous vehicle control［J］.IEEE Transactions on Intelligent Transportation Systems，2005，6(3): 285-293.
［33］ DESOUKY S，SCHWARTZ H.Q( )-learning fuzzy logic controller for a multirobot system［C］∥Proceedings of IEEE International Conference on Systems，Man and Cybernetics.Istanbul，Turkey:IEEE，2010:4075-4080.
［34］ ROSS T J.Fuzzy logic with engineering applications［M］.New York，NY，US:John Wiley & Sons，2010.
［35］ JANG J S R，SUN C T.Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence［M］.Upper Saddle River，NJ，US:Prentice-Hall，1997.
［36］刘冰雁，叶雄兵，高勇，等.基于分支深度强化学习的非合作目标追逃博弈策略求解［J］.航空学报，2020，41(10): 324040.
LIU B Y，YE X B，GAO Y，et al.Strategy solution of non-cooperative target pursuit-evasion game based on branching deep rein-forcement learning［J］.Acta Aeronautica et Astronautica Sinica，2020，41(10):324040.(in Chinese)
［37］ WANG Z Y，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning［C］∥Proceedings of the 33rd International Conference on Machine Learning.New York，NY，US:ACM，2016:1995-2003.
［38］ HESSEL M，MODAYIL J，VAN HASSELT H，et al.Rainbow: combining improvements in deep rein-forcement learning［J］.Association for the Advancement of Artificial Intelligence，2017，10(6): 3215-3222.
［39］ FAIYA B A.Learning in pursuit-evasion differential games using reinforcement fuzzy learning［D］.Ottawa，Ontario，Canada:Carleton University，2012.
［40］ LIM S H，FURUKAWA T，DURRANT W H，et al.A time-optimal control strategy for pursuit-evasion games problems［C］∥Proceedings of IEEE International Conference on Robotics and Automation.San Diego，CA，US：IEEE，2004：2003-2012.
［41］ DESOUKY S F，SCHWARTZ H M.Genetic based fuzzy logic controller for a wall-following mobile robot［C］∥Proceedings of Conference on American Control Conference.St.Louis，MO，US:IEEE，2009:3555-3560.
［42］ DAI X H，LI C K，RAD A B.An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control［J］.IEEE Transactions on Intelligent Transportation Systems，2005，6(3):285-293.
［43］ DESOUKY S F，SCHWARTZ H M.Self-learning fuzzy logic controllers for pursuit-evasion differential games［J］.Robotics and Autonomous Systems，2011，59(1):22-33.



下4篇留版

[1]	WANG Erlie, WANG Shuai, PI Dawei, WANG Hongliang, WANG Xianhui, XIE Boyuan. Energy Consumption Modeling for a Heavy-duty Purely Electric-powered Vehicle [J]. Acta Armamentarii, 2024, 45(4): 1229-1236.
[2]	LI Ye, ZHENG Chun, TENG Zhe， GU Xiaodong, LUO Rong, XIN Zeyu. Cooperative Combat Effectiveness Assessment of High-Power Microwave Weapons and Medium- and Short-Range Air DefenseWeapons Based on Fuzzy Wavelet Neural Networks [J]. Acta Armamentarii, 2022, 43(S2): 87-96.
[3]	WANG Xiaoqi, ZHAO Yang, ZHANG Jian, WANG Shuo. 3D Ship Model Generation Algorithm Based on Deep Learning [J]. Acta Armamentarii, 2022, 43(S2): 115-119.
[4]	CUI Lingfei， GUO Yonghong， XIU Quanfa， SHI Chao， ZHANG Shuoyang. UAV Detection Method Based on Domestic Embedded Intelligent Computing Platform [J]. Acta Armamentarii, 2022, 43(S1): 146-154.
[5]	TANG Shangqin, WEI Zhenglei, XIE Lei, ZHOU Huan, ZHANG Zhuoran. TSO-GRU-Ada Maneuver Trajectory Prediction Based on Maneuver Unit Library [J]. Acta Armamentarii, 2022, 43(8): 1913-1925.
[6]	LI Bin， XIE Xin， TANG Wenyong， TAO Jiangping， SUN Yiqiang， ZHAHG Hui. Analysis of Structural Performance of Composite Duct Supporting Structure Based on Approximate Model [J]. Acta Armamentarii, 2022, 43(6): 1435-1446.
[7]	WU Zeliang， YE Jianchuan， WANG Jiang， JIN Ren. Parameter Dimensionality Reduction and Optimal Design of Aircraft Airfoil Based on Deep Autoencoder Neural Network [J]. Acta Armamentarii, 2022, 43(6): 1326-1336.
[8]	YUAN Hang， LUO Ying， LI Kaiming， CHEN Yijun， ZHANG Qun. Fine Recognition of Human Gait with Vortex Electromagnetic Wave Radar [J]. Acta Armamentarii, 2022, 43(5): 1167-1174.
[9]	XIN Dajun， XUE Kun. Artificial Neural Network-based Prediction Model for the Air Drag Coefficient of Non-spherical Fragments [J]. Acta Armamentarii, 2022, 43(5): 1083-1092.
[10]	LIU Yue, LIU Tielin, JIANG Xiangzheng, HAN Yueming. Health Evaluation of Missile Control System based on RBF and SOM [J]. Acta Armamentarii, 2022, 43(4): 931-939.
[11]	ZHANG Tongtong, JIANG Huhai, YUE Wei, SI Chen, YUAN Man. Adaptive Control Based on RBF Neural Network for Electro-optical System [J]. Acta Armamentarii, 2022, 43(3): 556-564.
[12]	LI Zedong, LI Zhinong, TAO Junyong, MAO Qinghua, ZHANG Xuhui. Fault Diagnosis for Aero-engine Rolling Bearings Based on an Attention Augmented Convolutional Neural Network with FeatureFusion [J]. Acta Armamentarii, 2022, 43(12): 3228-3239.
[13]	SUN Jianming, HAN Shengquan, SHEN Zicheng, WU Jinpeng. Binocular Human Pose and Distance Identification Based on Double Convolutional Chain [J]. Acta Armamentarii, 2022, 43(11): 2846-2854.
[14]	LIU Chang, WANG Jiang, FAN Shipeng, LI Ling, LIN Defu. BP Neural Network-Based Adaptive Biased Proportional Navigation Guidance Law [J]. Acta Armamentarii, 2022, 43(11): 2798-2809.
[15]	LI Xiaoxiong, ZHANG Shuning, ZHAO Huichang, CHEN Si. Identification of Fuzzy Small-sample Terrain Targets Based on 1DC-CGAN and Wavelet Energy Features [J]. Acta Armamentarii, 2022, 43(10): 2545-2553.