基于多组并行深度Q网络的连续空间追逃博弈算法

doi:10.3969/j.issn.1000-1093.2021.03.024

摘要/Abstract

摘要： 为解决连续空间追逃博弈（PEG）问题，提出一种基于多组并行深度Q网络(DQN)的连续空间PEG算法。应对连续行为空间中为避免传统强化学习存在的维数灾难不足，通过构建Takagi-Sugeno-Kang模糊推理模型来表征连续空间；为应对离散动作集自学习复杂且耗时不足，设计基于多组并行DQN的PEG算法。以4轮战车PEG问题为例设计仿真环境与运动模型，进行了运动计算，并与Q-learning算法、基于资格迹的强化学习算法、基于奖励的遗传算法结果相比对。仿真实验结果表明，连续空间PEG算法能够较好地解决连续空间PEG问题，且随着学习次数的增加不断提升问题处理能力，具备自主学习耗时少、追捕应用时间短的比较优势。

关键词: 追逃博弈, 连续空间, 深度Q网络, 神经网络, 微分对策, 智能战车

Abstract: A continuous space pursuit-evasion game algorithm based on multi-group deep reinforcement learning is proposed to solve the problems in continuous space pursuit-evasion game(PEG). In order to avoid the insufficient curse of dimensionality of continuous space in traditional reinforcement learning，a TSK fuzzy inference model is established to represent the continuous space.And a pursuit-evasion game algorithm based on multi-group deep reinforcement learning is designed for the complex and time-consuming problems of discrete action self-learning.The simulation environment and motion model were designed by taking the PEG problem of a four-wheel vehicle as an example, and the simulation experiments were carried out with Q-learning algorithm, reinforcement learning algorithm based on qualification trace and genetic algorithm based on reward, respectively. The simulated results show that the continuous space PEG algorithm can be used to solve the problem of continuous space pursuit-evasion game well，and continuously improve the ability to address problems with the increase in learning times，and has the comparative advantages of less time consuming for independent learning and short application time.

Key words: pursuit-evasiongame, continuousspace, deepQnetwork, neuralnetwork, differentialgame, intelligentvehicle

中图分类号:

TP181

刘冰雁，叶雄兵，岳智宏，董献洲，张其扬. 基于多组并行深度Q网络的连续空间追逃博弈算法[J]. 兵工学报, 2021, 42(3): 663-672.

LIU Bingyan， YE Xiongbing， YUE Zhihong， DONG Xianzhou， ZHANG Qiyang. Continuous Space Pursuit-evasion Game Algorithm Based on Multi-group Deep Q Network[J]. Acta Armamentarii, 2021, 42(3): 663-672.

参考文献

［1］ ISAACS R. Differential games［M］.New York，NY，US: John Wiley & Sons，1965.
［2］ MAMATOV M S，SOBIROV K K.On the theory of position pursuit differential games［J］.Journal of Mathematical Sciences，2020，245(2)： 332-340.
［3］李世豪.复杂空战环境下基于博弈模型的无人机机动决策方法研究［D］.南京：南京航空航天大学，2019.
LI S H.Research on UAV maneuvering decision-making method based on game theory in complex air combat［D］.Nanjing：Nanjing University of Aeronautics and Astronautics，2019.(in Chinese)
［4］朱强，邵之江.基于神经网络的实时滚动追逃博弈导弹制导律［J］.系统工程与电子技术，2019，41(7)：1597-1605.
ZHU Q，SHAO Z J.Real-time receding horizon pursuit and evasion games of missile guidance based on neural network［J］.Systems Engineering and Electronics，2019，41(7):1597-1605.(in Chinese)
［5］柴源，罗建军，王明明，等.基于追逃博弈的非合作目标接近控制［J］.宇航总体技术，2020，4(1)：30-38.
CHAI Y，LUO J J，WANG M M，et al.Pursuit-evasion game control for approaching space non-cooperative target［J］.Aerospace Systems Engineering Technology，2020，4(1):30-38.(in Chinese)
［6］ FRIEDMAN A.Differential games［M］.Providence，RI，US:American Mathematical Society，1974．
［7］ DICKMANNS E，WELL K.Approximate solution of optimal control problems using third order hermite polynomial functions［C］∥Proceedings of Optimization Techniques IFIP Technical Conference.Novosibirsk，Russia:IFIP，1974．
［8］ MEHDI S，MASSIMILIANO F.Differential game of optimal pursuit of one evader by many pursuers［J］.International Journal of Game Theory，2019，48 (2):481-490.
［9］ VON MOLL A，CASBEER D，GARCIA E，et al.The multi-pursuer single-evader game: a geometric approach［J］.Journal of Intelligent & Robotic Systems，2019，96(2):193-207.
［10］常晓飞，孙博，闫杰，等.针对高速机动目标的三维非线性微分对策制导律［J］.弹道学报，2018，30(3)：1-6.
CHANG X F，SUN B，YAN J，et al.3-dimensional nonlinear differential game-based guidance law against high-speed maneuvering target［J］.Journal of Ballistics，2018，30(3):1-6.(in Chinese)
［11］孙启龙，齐乃明，赵钧，等.攻击主动防御飞行器的微分对策制导律［J］.国防科技大学学报，2018，40(3)：7-14.
SUN Q L QI N M，ZHAO J，et al.Differential game guidance laws against active defense aircraft［J］.Journak of National University of Defense Technology，2018，40(3):7-14.(in Chinese)
［12］郭志强，孙启龙，周绍磊，等.主动防御飞行器的范数型微分对策制导律［J］.北京航空航天大学学报，2019，45(9)：1787-1796.
GUO Z Q，SUN Q L，ZHOU S L，et al.Norm differential game guidance law for active defense aircraft［J］.Journal of Beijing University of Aeronautics and Astronautics，2019，45(9):1787-1796.(in Chinese)
［13］赵琳，周俊峰，刘源，等.三维空间“追-逃-防”三方微分对策方法［J］.系统工程与电子技术，2019，41(2)：322-335.
ZHAO L，ZHOU J F，LIU Y，et al.Three-body differential game approach of pursuit-evasion-defense in three dimensional space［J］.Systems Engineering and Electronics，2019，41(2):322-335.(in Chinese)
［14］郝志伟，孙松涛，张秋华，等.半直接配点法在航天器追逃问题求解中的应用［J］.宇航学报，2019，40(6)：628-635.
HAO Z W，SUN S T，ZHANG Q H，et al.Application of semi-direct collocation method for solving pursuit-evasion problems of space-craft［J］.Journal of Astronautics，2019，40(6):628-635. (in Chinese)
［15］陈燕妮.基于微分对策的有限时间自适应动态规划制导研究［D］.南京：南京航空航天大学，2019.
CHEN Y N.Research on differential games-based finite-time adaptive dynamic programming guidance law［D］.Nanjing：Nanjing University of Aeronautics and Astronautics，2019.(in Chinese)
［16］ MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning［J］.Nature，2015，518(7540):529-533.
［17］余跃，王宏伦.基于深度学习的高超声速飞行器再入预测校正容错制导［J］.兵工学报，2020，41(4):656-669.
YU Y，WANG H L.Deep learning-based reentry predictor-corrector fault-tolerant guidance for hypersonic vehicles［J］.Acta Armamentarii，2020，41(4):656-669.(in Chinese)
［18］冷鹏飞，徐朝阳.一种深度强化学习的雷达辐射源个体识别方法［J］.兵工学报，2018，39(12): 2420-2426.
LENG P F，XU C Y.Specific emitter identification based on deep einforcement learning［J］.Acta Armamentarii，2018，39(12):2420-2426.(in Chinese)
［19］刘冰雁，叶雄兵，周赤非，等.基于改进DQN的复合模式在轨服务资源分配［J］.航空学报，2020，41(5):256-264.
LIU B Y，YE X B，ZHOU C F，et al.Composite mode on-orbit service resource allocation based on improved DQN［J］.Acta Aeronautica et Astronautica Sinica，2020，41(5):256-264.(in Chinese)
［20］张宏达，李德才，何玉庆.人工智能与“星际争霸”:多智能体博弈研究新进展［J］.无人系统技术，2019，2(1):5-16.
ZHANG H D，LI D C，HE Y Q.Artificial intelligence and StarCraft: new progress in multiagent game research［J］.Unmanned Systems Technology，2019，2(1):5-16.(in Chinese)
［21］曹雷.基于深度强化学习的智能博弈对抗关键技术［J］.指挥信息系统与技术，2019，10(5):1-7.
CAO L.Key technologies of intelligent game confrontation based on deep reinforcement learning［J］.Command Information System and Technology，2019，10(5):1-7. (in Chinese)
［22］曲昭伟，潘昭天，陈永恒，等.考虑博弈的多智能体强化学习分布式信号控制［J］.交通运输系统工程与信息，2020，20(2): 76-82，100.
QU Z W，PAN Z T，CHEN Y H，et al.Distributed signal control of multi-agent reinforcement learning based on game［J］.Journal of Transportation Systems Engineering and Information Technology，2020，20(2):76-82，100.(in Chinese)
［23］范超琼，赵成林，李斌.无人机网络中基于分层博弈的干扰对抗频谱接入优化［J］.通信学报，2020，41(6):21-33.
FAN C Q，ZHAO C L，LI B.Hierarchical game based spectrum access optimization for anti-jamming in UAV communication network［J］.Journal on Communications，2020，41(6): 21-33. (in Chinese)
［24］曹雷.基于深度强化学习的智能博弈对抗关键技术［J］.指挥信息系统与技术，2019，10(5)：1-7.
CAO L.Key technologies of intelligent game confrontation based on deep reinforcement learning［J］.Command Information System and Technology，2019，10(5):1-7.(in Chinese)
［25］ YIN C，SUN Z J，HUANG Y X，et al.Fuzzy categorical deep reinforcement learning of a defensive game for an unmanned surface vessel［J］.International Journal of Fuzzy Systems，2019，21(2):592-606.
［26］ LIU B Y，YE X B，GAO Y，et al.Forward-looking imaginative planning framework combined with prioritized replay double DQN［C］∥Proceedingas of the 5th International Conferenceon Control，Automation and Robotics.Beijing，China:IEEE，2019:336-341.
［27］吴晓光，刘绍维，杨磊，等.基于深度强化学习的双足机器人斜坡步态控制方法［J/OL］.自动化学报，2020［2020-02-28］.https:∥doi.org/10.16383/j.aas.c190547.
WU X G，LIU S W，YANG L，et al.A gait control method for biped robot on slope based on deep reinforcement learning［J］.Acta Automatica Sinica，2020［2020-02-28］.https:∥doi.org/10.16383/j.aas.c190547. (in Chinese)
［28］ SCHWARTZ H M.Multi-agent machine learning: a reinforcement approach［M］.Hoboken，NJ，US:John Wiley & Sons，2014.
［29］ WANG L X.A course in fuzzy systems and control［M］.Upper Saddle River，NJ，US:Prentice-Hall，1997.
［30］ TAKAGI T，SUGENO M.Fuzzy identification of systems and its applications to modelling and control［J］.IEEE Transactions on Systems，Man and Cyberetics，1985，15(1):116-132.
［31］ JANG J S R，SUN C T.Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence［M］.Upper Saddle River，NJ，US:Prentice-Hall，1997.
［32］ DAI X，LI C，RAD A.An approach to tune fuzzy contorllers based on reinforcement learning for autonomous vehicle control［J］.IEEE Transactions on Intelligent Transportation Systems，2005，6(3): 285-293.
［33］ DESOUKY S，SCHWARTZ H.Q( )-learning fuzzy logic controller for a multirobot system［C］∥Proceedings of IEEE International Conference on Systems，Man and Cybernetics.Istanbul，Turkey:IEEE，2010:4075-4080.
［34］ ROSS T J.Fuzzy logic with engineering applications［M］.New York，NY，US:John Wiley & Sons，2010.
［35］ JANG J S R，SUN C T.Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence［M］.Upper Saddle River，NJ，US:Prentice-Hall，1997.
［36］刘冰雁，叶雄兵，高勇，等.基于分支深度强化学习的非合作目标追逃博弈策略求解［J］.航空学报，2020，41(10): 324040.
LIU B Y，YE X B，GAO Y，et al.Strategy solution of non-cooperative target pursuit-evasion game based on branching deep rein-forcement learning［J］.Acta Aeronautica et Astronautica Sinica，2020，41(10):324040.(in Chinese)
［37］ WANG Z Y，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning［C］∥Proceedings of the 33rd International Conference on Machine Learning.New York，NY，US:ACM，2016:1995-2003.
［38］ HESSEL M，MODAYIL J，VAN HASSELT H，et al.Rainbow: combining improvements in deep rein-forcement learning［J］.Association for the Advancement of Artificial Intelligence，2017，10(6): 3215-3222.
［39］ FAIYA B A.Learning in pursuit-evasion differential games using reinforcement fuzzy learning［D］.Ottawa，Ontario，Canada:Carleton University，2012.
［40］ LIM S H，FURUKAWA T，DURRANT W H，et al.A time-optimal control strategy for pursuit-evasion games problems［C］∥Proceedings of IEEE International Conference on Robotics and Automation.San Diego，CA，US：IEEE，2004：2003-2012.
［41］ DESOUKY S F，SCHWARTZ H M.Genetic based fuzzy logic controller for a wall-following mobile robot［C］∥Proceedings of Conference on American Control Conference.St.Louis，MO，US:IEEE，2009:3555-3560.
［42］ DAI X H，LI C K，RAD A B.An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control［J］.IEEE Transactions on Intelligent Transportation Systems，2005，6(3):285-293.
［43］ DESOUKY S F，SCHWARTZ H M.Self-learning fuzzy logic controllers for pursuit-evasion differential games［J］.Robotics and Autonomous Systems，2011，59(1):22-33.



下4篇留版

[1]	王尔烈，王帅，皮大伟，王洪亮，王显会，谢伯元. 某纯电驱动重载车辆能耗预测模型[J]. 兵工学报, 2024, 45(4): 1229-1236.
[2]	刘梦真, 黄广炎, 张宏, 周宏元, 刘思宇. 小样本驱动特征分段网络的防护材料折痕检测[J]. 兵工学报, 2024, 45(3): 963-974.
[3]	秦国华, 娄维达, 林锋, 徐勇. 基于Cotes求积法和神经网络的稳定域判断及铣削参数优化新方法[J]. 兵工学报, 2024, 45(2): 516-526.
[4]	吕卫民, 孙晨峰, 任立坤, 赵杰, 李永强. 一种基于TCN-LGBM的航空发动机气路故障诊断方法[J]. 兵工学报, 2024, 45(1): 253-263.
[5]	刘畅, 雷红波, 林时尧, 范世鹏, 王江. 基于多模型网络的激光末制导炮弹诸元解算方法[J]. 兵工学报, 2023, 44(9): 2745-2755.
[6]	张凯歌, 卢志刚, 聂天常, 李志伟, 郭宇强. 面向无人装备的智能边缘计算软技术分析[J]. 兵工学报, 2023, 44(9): 2611-2621.
[7]	刘冰, 郝新红, 周文, 杨瑾. 基于BAS-BPNN的调频无线电引信目标与扫频干扰识别方法[J]. 兵工学报, 2023, 44(8): 2391-2403.
[8]	周宇, 曹荣刚, 栗苹, 马啸. 一种用于外场试验图像的引信炸点检测方法[J]. 兵工学报, 2023, 44(8): 2453-2464.
[9]	吴礼洋, 呙鹏程, 刘超, 李文强. 基于注意力机制增强残差网络的雷达信号调制类型识别[J]. 兵工学报, 2023, 44(8): 2310-2318.
[10]	解宝琦, 李英顺, 王德彪, 隋欢欢. 一种坦克炮长瞄准镜系统状态评估的方法[J]. 兵工学报, 2023, 44(8): 2414-2423.
[11]	刘懿, 任济寰, 吴祥, 薄煜明. 基于集成迁移学习的新装备装甲车辆分类[J]. 兵工学报, 2023, 44(8): 2319-2328.
[12]	宋秋雨, 胡健, 姚建勇, 白艳春, 杨正银. 面向输出约束基于神经网络观测器的发射平台输出反馈控制[J]. 兵工学报, 2023, 44(7): 2184-2196.
[13]	华英杰, 刘晶, 邵玉斌, 朵琳. 面向战场环境下的语种识别[J]. 兵工学报, 2023, 44(7): 2197-2206.
[14]	何锦成, 韩永成, 张闻文, 何伟基, 陈钱. 基于通道校正卷积的真彩色微光图像增强[J]. 兵工学报, 2023, 44(6): 1643-1654.
[15]	刘佳, 刘海鸥, 陈慧岩, 毛飞鸿. 基于融合特征的无人履带车辆道路类型识别方法[J]. 兵工学报, 2023, 44(5): 1267-1276.