1. 西北工业大学 电子信息学院, 陕西 西安 710072
2. 沈阳飞机设计研究所, 辽宁 沈阳 110035
收稿:2022-08-13,
网络出版:2023-07-19,
纸质出版:2023-06-30
移动端阅览
张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报, 2023,44(6):1547-1563.
Jiandong ZHANG, Dinghan WANG, Qiming YANG, et al. Multi-Dimensional Decision-Making for UAV Air Combat Based on Hierarchical Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(6): 1547-1563.
张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报, 2023,44(6):1547-1563. DOI: 10.12382/bgxb.2022.0711.
Jiandong ZHANG, Dinghan WANG, Qiming YANG, et al. Multi-Dimensional Decision-Making for UAV Air Combat Based on Hierarchical Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(6): 1547-1563. DOI: 10.12382/bgxb.2022.0711.
针对无人机空战过程中面临的智能决策问题
基于分层强化学习架构建立无人机智能空战的多维决策模型。将空战自主决策由单一维度的机动决策扩展到雷达开关、主动干扰、队形转换、目标探测、目标追踪、干扰规避、武器选择等多个维度
实现空战主要环节的自主决策;为解决维度扩展后决策模型状态空间复杂度、学习效率低的问题
结合Soft Actor-Critic算法和专家经验训练和建立元策略组
并改进传统的Option-Critic算法
设计优化策略终止函数
提高策略的切换的灵活性
实现空战中多个维度决策的无缝切换。实验结果表明
该模型在无人机空战全流程的多维度决策问题中具有较好的对抗效果
能够控制智能体根据不同的战场态势灵活切换干扰、搜索、打击、规避等策略
达到提升传统算法性能和提高解决复杂决策效率的目的。
To solve the intelligent decision-making problem in the process of UAV air combat
a multi-dimensional decision-making model for UAV intelligent air combat based on the hierarchical reinforcement learning architecture is established
allowing the autonomous decision-making of air combat to be extended from a single-dimensional maneuver decision to a multi-dimensional one including radar switch
active jamming
formation conversion
target detection
target tracking
interference avoidance
weapon selection
etc.
so that autonomous decision-making in the main steps of air combat is realized. In order to solve the problems of state-space complexity and low learning efficiency of the decision-making model after the dimension expansion
a meta-strategy group is trained and established with the Soft Actor-Critic algorithm and expert experience
and the traditional Option-Critic algorithm is improved. The strategy termination function is designed and optimized to improve the flexibility of strategy switching and realize seamless multi-dimensional decision-making switching in air combat.. The experimental results show that the proposed method has good countermeasure effectiveness for the multi-dimensional decision-making during the whole process of UAV air combat
which can control the agent to flexibly switch among interference
search
strike
and avoidance strategies according to different battlefield situations with the purpose of improving the performance of traditional algorithms and the efficiency of solving complex decision-making processes.
杨伟 . 关于未来战斗机发展的若干讨论 [J ] . 航空学报 , 2020 , 41 ( 6 ): 524337 .
YANG W . Development of future fighters [J ] . Acta Aeronautica et Astronautica Sinica , 2020 , 41 ( 6 ): 524377 . (in Chinese) DOI: 10.7527/S1000-6893.2020.24377 http://doi.org/10.7527/S1000-6893.2020.24377 Recent years have witnessed extensive discussions on the change of warfare forms and the development of post-4th generation fighters against the background of great power competition and batches of 4th generation fighters entering service. This paper reviews the origin of fighter generation classification and the driving elements behind each generational leap, outlining the evolution of Observe, Orient, Decision, Act (OODA) loops for air combat and proposing the essence of OODA 3.0. After a summary of the supportive and progressive relations among mechanization, informatization and intelligentization, it explores the dialectical relationship among autonomy and manned/unmanned, as well as that among platform, system of systems, and distributed operation, followed finally by a discussion of an agile and efficient development approach of future fighters.
刘冰雁 , 叶雄兵 , 周赤非 , 等 . 基于改进DQN的复合模式在轨服务资源分配 [J ] . 航空学报 , 2020 , 41 ( 5 ): 323630 . DOI: 10.7527/S1000-6893.2019.23630 http://doi.org/10.7527/S1000-6893.2019.23630 针对开展在轨服务前的资源分配非线性多目标优化问题,构建复合服务模式下的在轨资源分配模型,基于对DQN (Deep Q-Network)方法的收敛性和稳定性改进,提出了一种在轨服务资源分配方法。该方法能够应对同时包含"一对多""多对一"的复合服务模式,并在满足预期成功率的前提下优先分配重要服务对象,兼顾资源分配综合效益和总体能耗效率,达到了以期望成功率、较少资源投入尽快完成任务的综合目标。仿真实验表明,改进DQN方法能够在任务执行前依据服务对象重要程度自主分配航天器资源,收敛速度快、训练误差低,在分配效益和总体能耗的优化方面具有明显的比较优势。
LIU B Y , YE X B , ZHOU C F , et al . Allocation of composite mode on-orbit service resource based on improved DQN [J ] . Acta Aeronautica et Astronautica Sinica , 2020 , 41 ( 5 ): 323630 . (in Chinese) DOI: 10.7527/S1000-6893.2019.23630 http://doi.org/10.7527/S1000-6893.2019.23630 In order to solve the nonlinear multi-objective optimization before on-orbit service, an on-orbit service resource allocation model under the composite service mode is constructed, and an on-orbit service resource allocation method based on Deep Q Network (DQN) convergence and stability improvement was proposed. This approach can cope with a composite service pattern which includes "one to many" and "many to one". This method can prioritize the allocation of important service objects on the premise of satisfying the expected success rate, and at the same time, take into account the comprehensive benefit of resource allocation and the overall energy consumption efficiency, achieving the comprehensive goal of completing the task efficiently and with the expected success rate and less resource input. Simulation results show that improved DQN method can independently allocate spacecraft resources based on the importance of service objects. This method has the advantages of fast convergence, low training error, and obvious comparative advantages in the optimization of distribution benefits and overall energy consumption.
DAVID S , GUY L , NICOLAS H et al . Deterministic policy gradient algorithms [C ] //Proceedings of the 31st International Conference on Machine Learning. Beijing, China : IEEE , 2014 , 32 ( 1 ): 387 - 395 .
张耀中 , 徐佳林 , 姚康佳 , 等 . 基于DDPG算法的无人机集群追击任务 [J ] . 航空学报 , 2020 , 41 ( 10 ): 324000 . DOI: 10.7527/S1000-6893.2020.24000 http://doi.org/10.7527/S1000-6893.2020.24000 无人机的集群化应用技术是近年来的研究热点,随着无人机自主智能的不断提高,无人机集群技术必将成为未来无人机发展的主要趋势之一。针对无人机集群协同执行对敌方来袭目标的追击任务,构建了典型的任务场景,基于深度确定性策略梯度网络(DDPG)算法,设计了一种引导型回报函数有效解决了深度强化学习在长周期任务下的稀疏回报问题,通过引入基于滑动平均值的软更新策略减少了DDPG算法中Eval网络和Target网络在训练过程中的参数震荡,提高了算法的训练效率。仿真结果表明,训练完成后的无人机集群能够较好地执行对敌方来袭目标的追击任务,任务成功率达到95%。可以说无人机集群技术作为一种全新概念的作战模式在军事领域具有潜在的应用价值,人工智能算法在无人机集群的自主决策智能化发展方向上具有一定的应用前景。
ZHANG Y Z , XU J L , YAO K J , et al . Pursuit missions for UAV swarms based on DDPG algorithm [J ] . Acta Aeronautica et Astronautica Sinica , 2020 , 41 ( 10 ): 324000 . (in Chinese) DOI: 10.7527/S1000-6893.2020.24000 http://doi.org/10.7527/S1000-6893.2020.24000 The Unmanned Aerial Vehicle (UAV) swarm technology is one of the research hotspots in recent years. With continuous advancement in autonomous intelligence of UAVs, the UAV swarm technology is bound to become one of the main trends of UAV development in the future. In view of the collaborative pursuit missions of UAV swarms against the enemy, we establish a typical task scenario, and, based on the Deep Deterministic Policy Gradient (DDPG) algorithm, design a guided reward function which effectively solves the sparse rewards problem of deep intensive learning during long-period missions. We introduce a sliding average based soft updating strategy to reduce parameter oscillations in the <i>Eval</i> network and the <i>target</i> network during the training process, thereby improving the training efficiency. The simulation results show that after training, the UAV swarm can successfully carry out the pursuit missions with a success rate of 95%. The UAV swarm technology as a brand new combat mode has a potential application value for application in the military field, and this artificial intelligence algorithm has a certain application prospect in the development of autonomous decision-making by UAV swarms.
SHI H B , SUN Y R , LI G Y . Model-based DDPG for motor control [C ] //Proceedings of 2017 International Conference on Progress in Informatics and Computing. Nanjing , China:IEEE , 2017 : 284 - 288 .
KULKARNI T D , NARASIMHAN K R , SAEEDI A , et al . Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation [C ] //Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain : Neural Information Processing Systems , 2016 : 1826 .
王俊敏 , 姜青山 , 罗泽明 . 预警机指挥编队协同空战分层决策模型 [J ] . 海军航空工程学院学报 , 2014 , 29 ( 5 ): 491 - 496 .
WANG J M , JIANG Q S , LUO Z M . A hierarchical decision-making model for cooperative air combat of early warning aircraft command formations [J ] . Journal of Naval Aeronautical and Astronautical University , 2014 , 29 ( 5 ): 491 - 496 . (in Chinese)
付跃文 , 王元诚 , 陈珍 , 等 . 基于多智能体粒子群的协同空战目标决策研究 [J ] . 系统仿真学报 , 2018 , 30 ( 11 ): 4151 - 4157 . DOI: 10.16182/j.issn1004731x.joss.201811013 http://doi.org/10.16182/j.issn1004731x.joss.201811013 以多机多目标协同空战为研究背景,针对复杂多变的战场态势,结合敌我双方的实战约束条件和威胁评估函数,建立了一种能够体现火力攻击优先级的协同空战目标决策仿真模型。为了能够快速准确的求解决策方案,通过在基本粒子群算法中引入多智能体理论中的交互机制,并分别设计智能体邻域合作算子、变异算子和自学习算子,提出了一种改进的多智能体粒子群算法。仿真结果表明,该算法能够计算得到合理有效的决策方案,并且具有良好的仿真实时性。
FU Y W , WANG Y C , CHEN Z , et al . Research on target decision-making of cooperative air combat based on multi-agent particle swarm [J ] . Journal of System Simulation , 2018 , 30 ( 11 ): 4151 - 4157 . (in Chinese)
文永明 , 石晓荣 , 黄雪梅 , 等 . 一种无人机集群对抗多耦合任务智能决策方法 [J ] . 宇航学报 , 2021 , 42 ( 4 ): 504 - 512 .
WEN Y M , SHI X R , HUANG X M , et al . An intelligent decision-making method for UAV swarms against multi-coupling tasks [J ] . Journal of Astronautics , 2021 , 42 ( 4 ): 504 - 512 . (in Chinese)
程先峰 , 严勇杰 . 基于MAXQ分层强化学习的有人机/无人机协同路径规划研究 [J ] . 信息化研究 , 2020 , 46 ( 1 ): 13 - 19 .
CHENG X F , YAN Y J . Research on collaborative path planning of manned and unmanned aerial vehicles based on MAXQ hierarchical reinforcement learning [J ] . Informatization Research , 2020 , 46 ( 1 ): 13 - 19 . (in Chinese)
吴宜珈 , 赖俊 , 陈希亮 , 等 . 强化学习算法在超视距空战辅助决策上的应用研究 [J ] . 航空兵器 , 2021 , 28 ( 2 ): 55 - 61 .
WU Y J , LAI J , CHEN X L , et al . Research on the application of reinforcement learning algorithm in decision-making assistance in over-the-horizon air combat [J ] . Aero Weaponry , 2021 , 28 ( 2 ): 55 - 61 . (in Chinese)
POPE A P , IDE J S , MICOVIC D , et al . Hierarchical reinforcement learning for air-to-air combat [C ] //Proceedings of 2021 International Conference on Unmanned Aircraft Systems. Athens,Greece:IEEE , 2021 : 275 - 284 .
冷鹏飞 , 徐朝阳 . 一种深度强化学习的雷达辐射源个体识别方法 [J ] . 兵工学报 , 2018 , 39 ( 12 ): 2420 - 2426 . DOI: 10.3969/j.issn.1000-1093.2018.12.016 http://doi.org/10.3969/j.issn.1000-1093.2018.12.016 针对传统依赖于人工经验提取辐射源个体特征的不足,提出一种基于深度强化学习的雷达辐射源个体识别方法。利用发射机非理想信道造成的辐射源信号包络在信号变化时呈现的不同瞬态信息,以信号包络前沿作为深度神经网络的输入状态,以辐射源类别作为当前输入状态的可选动作,通过卷积神经网络自动提取辐射源包络个体特征,并拟合当前状态动作对的Q值,进而以强化学习模型完成雷达辐射源个体识别任务。讨论了深度Q网络模型、深度双Q网络模型以及Dueling Network模型3种深度强化学习模型在辐射源识别任务中的应用。实测数据仿真实验表明:传统机器学习算法的识别率不足80%,而深度强化学习网络的识别率高达98.42%.
LENG P F , XU Z Y . A deep reinforcement learning method for individual identification of radar radiation sources [J ] . Acta Armamentarii , 2018 , 39 ( 12 ): 2420 - 2426 . (in Chinese)
朱建文 , 赵长见 , 李小平 , 等 . 基于强化学习的集群多目标分配与智能决策方法 [J ] . 兵工学报 , 2021 , 42 ( 9 ): 2040 - 2048 .
ZHU J W , ZHAO C J , LI X P , et al . Cluster multi-objective assignment and intelligent decision-making method based on reinforcement learning [J ] . Acta Armamentarii , 2021 , 42 ( 9 ): 2040 - 2048 . (in Chinese)
陈中原 , 韦文书 , 陈万春 . 基于强化学习的多发导弹协同攻击智能制导律 [J ] . 兵工学报 , 2021 , 42 ( 8 ): 1638 - 1647 .
CHEN Z Y , WEI W S , CHEN W C . Intelligent guidance law for cooperative attack of multiple missiles based on reinforcement learning [J ] . Acta Armamentarii , 2021 , 42 ( 8 ): 1638 - 1647 . (in Chinese)
高昂 , 董志明 , 叶红兵 , 等 . 基于深度强化学习的巡飞弹突防控制决策 [J ] . 兵工学报 , 2021 , 42 ( 5 ): 1101 - 1110 . DOI: 10.3969/j.issn.1000-1093.2021.05.023 http://doi.org/10.3969/j.issn.1000-1093.2021.05.023 巡飞弹突防控制决策(LMPCD)问题是“多域战”作战概念背景下的重要研究方向。针对该问题,建立基于马尔可夫决策过程的LMPCD模型。拟合LMPCD函数与飞行状态-动作值函数,构建基于演员-评论家方法的LMPCD框架,给出基于深度确定性策略梯度算法的深度强化学习模型求解方法,生成巡飞弹突防控制最优决策网络。通过1 000次巡飞弹突防仿真测试,结果表明,巡飞弹执行任务成功率为82.1%,平均决策时间为1.48 ms,验证了LMPCD模型及其求解过程的有效性。
GAO A , DONG Z M , YE H B , et al . Penetration control decision of cruise missile based on deep reinforcement learning [J ] . Acta Armamentarii , 2021 , 42 ( 5 ): 1101 - 1110 . (in Chinese)
刘冰雁 , 叶雄兵 , 岳智宏 , 等 . 基于多组并行深度Q网络的连续空间追逃博弈算法 [J ] . 兵工学报 , 2021 , 42 ( 3 ): 663 - 672 . DOI: 10.3969/j.issn.1000-1093.2021.03.024 http://doi.org/10.3969/j.issn.1000-1093.2021.03.024 为解决连续空间追逃博弈(PEG)问题,提出一种基于多组并行深度Q网络(DQN)的连续空间PEG算法。应对连续行为空间中为避免传统强化学习存在的维数灾难不足,通过构建Takagi-Sugeno-Kang模糊推理模型来表征连续空间;为应对离散动作集自学习复杂且耗时不足,设计基于多组并行DQN的PEG算法。以4轮战车PEG问题为例设计仿真环境与运动模型,进行了运动计算,并与Q-learning算法、基于资格迹的强化学习算法、基于奖励的遗传算法结果相比对。仿真实验结果表明,连续空间PEG算法能够较好地解决连续空间PEG问题,且随着学习次数的增加不断提升问题处理能力,具备自主学习耗时少、追捕应用时间短的比较优势。
LIU B Y , YE X B , YUE Z H , et al . A continuous space chase-escape game algorithm based on multiple parallel deep Q-networks [J ] . Acta Armamentarii , 2021 , 42 ( 3 ): 663 - 672 . (in Chinese)
CHAKROVORTY J , WARD P N , ROY J , et al . Option-critic in cooperative multi-agent systems [C ] //Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Virtual. Auckland, New Zealand : IEEE , 2020 : 1792 - 1794 .
惠俊鹏 , 汪韧 , 俞启东 . 基于强化学习的再入飞行器“新质”走廊在线生成技术研究 [J ] . 航空学报 , 2022 , 43 ( 9 ): 623 - 635 .
HUI J P , WANG R , YU Q D . Research of generating new quality flight corridor for reentry a ircraft based on reinforcement learning [J ] . Acta Aeronautica et Astronautica Sinica , 2022 , 43 ( 9 ): 623 - 635 . (in Chinese)
罗杰 , 董志岩 , 翟鹏 , 等 . 基于强化学习算法的智能飞控开发系统 [J ] . 计算机系统应用 , 2022 , 31 ( 7 ): 93 - 98 .
LUO J , DONG Z Y , ZHAI P , et al . Intelligent flight control development system based on reinforcement learning algorithm [J ] . Computer Systems & Applications , 2022 , 31 ( 7 ): 93 - 98 . (in Chinese)
魏航 . 基于强化学习的无人机空中格斗算法研究 [D ] . 哈尔滨 : 哈尔滨工业大学 , 2015 .
WEI H . Research of UCAV air combat based on reinforcement learning [D ] . Harbin : Harbin Institute of Technology , 2015 . (in Chinese)
中国电子科技集团公司认知与智能技术重点实验室 . MaCA环境说明 [R ] . 北京 : 中国电子科技集团公司第五十一研究所 , 2019 : 1 - 20 .
China Electronics Technology Group Corporation Key Laboratory of Cognitive and Intelligent Technology . MaCA environment description [R ] . Beijing : The 51st Research Institute of China Electronics Technology Group Corporation , 2019 : 1 - 20 . (in Chinese)
0
浏览量
1872
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号