1. 沈阳航空航天大学 自动化学院, 辽宁 沈阳 110136
2. 西安科为实业发展有限责任公司, 陕西 西安 710000
*wangyu@sau.edu.cn
收稿:2024-10-21,
网络出版:2025-08-28,
纸质出版:2025-08-31
移动端阅览
王昱, 李远鹏, 郭中宇, 等. 基于DDQN-D3PG的无人机空战分层决策[J]. 兵工学报, 2025,46(8):240978.
Yu WANG, Yuanpeng LI, Zhongyu GUO, et al. Hierarchical Decision-making for UAV Air Combat Based on DDQN-D3PG[J]. Acta Armamentarii, 2025, 46(8): 240978.
王昱, 李远鹏, 郭中宇, 等. 基于DDQN-D3PG的无人机空战分层决策[J]. 兵工学报, 2025,46(8):240978. DOI: 10.12382/bgxb.2024.0978.
Yu WANG, Yuanpeng LI, Zhongyu GUO, et al. Hierarchical Decision-making for UAV Air Combat Based on DDQN-D3PG[J]. Acta Armamentarii, 2025, 46(8): 240978. DOI: 10.12382/bgxb.2024.0978.
强化学习在无人机空战应用中面临僵化的奖励函数与单一模型难以处理高维连续状态空间中复杂任务的挑战
严重限制了算法在动态多变态势下的决策泛化能力。针对上述问题
融合分层式与分布式架构的精髓
提出一种集成深度双Q网络(Double Deep Q-Network
DDQN)与深度确定性策略梯度(Deep Deterministic Policy Gradient
DDPG)算法的自主决策框架。根据敌我双方在不同态势下的优势差异
设计一系列基于不同奖励函数权重组合的DDPG算法模型
并以此构建底层分布式深度确定性策略梯度(Distributed DDPG
D
3
PG)决策网络。引入擅长处理离散动作空间的DDQN算法构建上层决策网络
根据实时态势变化自主地选择并切换至最合适的底层策略模型
实现决策的即时调整与优化。为进一步提升红蓝双方无人机近距离空战环境的真实性与挑战性
在DDPG算法的训练中引入自我博弈机制
构建具备高度智能化的敌方决策模型。实验结果表明
新算法在无人机与智能化对手的博弈对抗中胜率最高达96%
相较D
3
PG等算法提升20%以上
且在多种初始态势下均能稳定战胜对手
充分验证了该方法的有效性和先进性。
Application of reinforcement learning in unmanned aerial vehicle (UAV) air combat faces the challenges of which the rigid reward functions and single models are used to handle the complex tasks difficultly in high-dimensional continuous state spaces.This severely limits the decision-making generalization capability in dynamic and of algorithm varied situations.Addressing the aforementioned issues
an autonomous decision-making framework with the deep double Q-network (DDQN) and deep deterministic policy gradient (DDPG) algorithms is proposed
which integrates the essence of hierarchical and distributed architectures.Based on the advantage differences between the opposing forces in various situations
a series of DDPG algorithm models with different reward function weight combinations are designed to construct a bottom-level distributed deep deterministic policy gradient (D
3
PG) decision-making network.The DDQN algorithm which excels in handling discrete action spaces is introduced to construct a top-level decision-making network.It allows for autonomous selection and switching to the most suitable bottom-level policy model based on real-time situation changes
thereby achieving the instant adjustment and optimization of decisions.To further enhance the realism and challenge of combat environment
a self-play mechanism is introduced i
nto the DDPG algorithm training to construct an enemy decision-making model with high intelligence.The experimental results demonstrate that UAVs equipped with the proposed algorithm achieve a maximum win rate of 96% in adversarial engagements against intelligent opponents
which is increased by more than 20% compared to those of baseline algorithms such as D
3
PG.Moreover
it consistently defeats the opponents under various initial conditions
confirming the effectiveness and advancement of the proposed algorithm.
於志文 , 孙卓 , 程岳 , 等 . 智能无人机集群协同感知计算研究综述 [J ] . 航空学报 , 2024 , 45 ( 20 ): 630912 .
YU Z W , SUN Z , CHENG Y , et al . A review of intelligent UAV swarm collaborative perception and computation [J ] . Acta Aeronautica et Astronautica Sinica , 2024 , 45 ( 20 ): 630912 . (in Chinese)
周新民 , 吴佳晖 , 贾圣德 , 等 . 无人机空战决策技术研究进展 [J ] . 国防科技 , 2021 , 42 ( 3 ): 33 - 41 .
ZHOU X M , WU J H , JIA S D , et al . Progress in research on combat decision-making technology in UAVs [J ] . National Defense Technology , 2021 , 42 ( 3 ): 33 - 41 . (in Chinese)
董一群 , 艾剑良 . 自主空战技术中的机动决策:进展与展望 [J ] . 航空学报 , 2020 , 41 ( 增刊2 ): 724264 .
DONG Y Q , AI J L . Decision making in autonomous air combat:review and prospects [J ] . Acta Aeronautica et Astronautica Sinica , 2020 , 41 ( S2 ): 724264 . (in Chinese)
GARCIA E , CASBEER D W , PACHTER M . Active target defense differential game with a fast defender [C ] // Proceedings of 2015 American Control Conference . Chicago,IL,US : IEEE , 2015 : 3752 - 3757 .
ALKAHER D , MOSHAIOV A . Game-based safe aircraft navigation in the presence of energy-bleeding coasting missile [J ] . Journal of Guidance,Control,and Dynamics , 2016 , 39 ( 7 ): 1539 - 1550 .
车竞 , 钱炜祺 , 和争春 . 基于矩阵博弈的两机攻防对抗空战仿真 [J ] . 飞行力学 , 2015 , 33 ( 2 ): 173 - 177 .
CHE J , QIAN W Q , HE Z C . Attack-defence confrontation simulation of air combat based on game-matrix approach [J ] . Flight Dynamics , 2015 , 33 ( 2 ): 173 - 177 . (in Chinese)
ZHANG T , LI C C , MA D Y , et al . An optimal task management and control scheme for military operations with dynamic game strategy [J ] . Aerospace Science and Technology , 2021 , 115 : 106815 .
BOTVINICK M , WANG J X , DABNEY W , et al . Deep reinforcement learning and its neuroscientific impli-cations [J ] . Neuron , 2020 , 107 ( 4 ): 603 - 616 .
TENG T H , TAN A H , TAN Y S , et al . Self-organizing neural networks for learning air combat maneuvers [C ] // Proceedings of the 2012 International Joint Conference on Neural Networks . Brisbane,QLD,Australia : IEEE , 2012 : 1 - 8 .
吴傲 , 杨任农 , 梁晓龙 , 等 . 基于模糊推理的无人战斗机视距空战机动决策 [J ] . 南京航空航天大学学报 , 2021 , 53 ( 6 ): 898 - 908 .
WU A , YANG R N , LIANG X L , et al . Maneuver decision on visual range air combats of unmanned combat aerial vehicles based on fuzzy inference [J ] . Journal of Nanjing University of Aeronautics & Astronautics , 2021 , 53 ( 6 ): 898 - 908 . (in Chinese).
YANG Q M , ZHU Y , ZHANG J D , et al . UAV air combat autonomous maneuver decision based on DDPG algorithm [C ] // Proceedings of the 2019 IEEE 15th International Conference on Control and Automation . Edinburgh,UK : IEEE , 2019 : 37 - 42 .
何子琦 , 李博宸 , 王成罡 , 等 . 针对区域防御的多无人机序列捕捉策略 [J ] . 兵工学报 , 2025 , 46 ( 4 ): 240343 . DOI: 10.12382/bgxb.2024.0343 http://doi.org/10.12382/bgxb.2024.0343 针对区域防御任务中多个入侵者的拦截问题,考虑追捕任务间时序关系与总体拦截效能,提出一种多无人机序列捕捉算法。基于任务的长期规划收益与短期执行效果构建任务的时序收益与空间收益,分别作为任务分配和任务执行的优化目标,实现复杂博弈问题的动态实时求解。基于可达集方法描述攻防双方优势程度并构建任务时序收益,引入深度Q网络对其进行估计进而引导任务分配;基于任务空间收益求解单攻击者追逃博弈问题,给出连续动作空间任务执行的最优控制策略。仿真结果表明,所提算法通过优化任务时空收益能够实现多无人机间的有效合作,提升防御方的捕获成功率,并具有较强的可扩展性。
HE Z Q , LI B C , WANG C G , Multi UAV sequential capture strategy for area defense [J ] . Acta Armamentarii , 2025 , 46 ( 4 ): 240343 . (in Chinese)
张耀中 , 吴卓然 , 张建东 , 等 . 基于ME-DDPG算法的无人机多对一追逃博弈 [J/OL ] . 系统工程与电子技术 , 2024 ( 2024-10-10 )[ 2024-12-24 ] . http://kns.cnki.net/kcms/detail/11.2422.tn.20241009.1739.012.html. http://kns.cnki.net/kcms/detail/11.2422.tn.20241009.1739.012.html http://kns.cnki.net/kcms/detail/11.2422.tn.20241009.1739.012.html
ZHANG Y Z , WU Z R , ZHANG J D , et al . UAV many-to-one pursuit-evasion game based on ME-DDPG algorithm [J/OL ] . Systems Engineering and Electronics , 2024 ( 2024-10-10 ) [ 2024-12-24 ] . http://kns.cnki.net/kcms/detail/11.2422.tn.20241009.1739.012.html. (in Chinese) http://kns.cnki.net/kcms/detail/11.2422.tn.20241009.1739.012.html http://kns.cnki.net/kcms/detail/11.2422.tn.20241009.1739.012.html
ZHANG L J , PENG J B , YI W G , et al . A state-decomposition DDPG algorithm for UAV autonomous navigation in 3-D complex environments [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 6 ): 10778 - 10790 .
LI Y F , LÜ Y X , SHI J P , et al . Autonomous maneuver decision of air combat based on simulated operation command and FRV-DDPG algorithm [J ] . Aerospace , 2022 , 9 : 658 - 676 .
BAI S X , SONG S M , LIANG S Y , et al . UAV maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm [J ] . Journal of Artificial Intelligence and Technology , 2022 , 2 : 16 - 22 .
钟皓俊 , 王振雷 . 基于双经验回放池TD3算法的PID参数优化 [J/OL ] . 控制理论与应用 , 2024 ( 2024-10-25 )[ 2024-12-24 ] .
ZHONG H J , WANG Z L . PID parameter optimization based on TD3 algorithm of double replay buffer [J/OL ] . Control Theory & Applications , 2024 ( 2024-10-25 ) [ 2024-12-24 ] . (in Chinese)
周攀 , 黄江涛 , 章胜 , 等 . 基于深度强化学习的智能空战决策与仿真 [J ] . 航空学报 , 2023 , 44 ( 4 ): 126731 . DOI: 10.7527/S1000-6893.2022.26731 http://doi.org/10.7527/S1000-6893.2022.26731 飞行器空战智能决策是当今世界各军事强国的研究热点。为解决近距空战博弈中无人机的机动决策问题,提出一种基于深度强化学习方法的无人机近距空战格斗自主决策模型。决策模型中,采取并改进了一种综合考虑攻击角度优势、速度优势、高度优势和距离优势的奖励函数,改进后的奖励函数避免了智能体被敌机诱导坠地的问题,同时可以有效引导智能体向最优解收敛。针对强化学习中随机采样带来的收敛速度慢的问题,设计了基于价值的经验池样本优先度排序方法,在保证算法收敛的前提下,显著加快了算法收敛速度。基于人机对抗仿真平台对决策模型进行验证,结果表明智能决策模型能够在近距空战过程中压制专家系统和驾驶员。
ZHOU P , HUANG J T , ZHANG S , et al .、 Intelligent air combat decision making and simulation based on deep reinforcement learning [J ] . Acta Aeronautica et Astronautica Sinica , 2023 , 44 ( 4 ): 126731 . (in Chinese) DOI: 10.7527/S1000-6893.2022.26731 http://doi.org/10.7527/S1000-6893.2022.26731 Intelligent decision-making for aircraft air combat is a research hotspot of military powers in the world today. To solve the problem of Unmanned Aerial Vehicle (UAV) maneuvering decision-making in the close-range air combat game, an autonomous decision-making model based on deep reinforcement learning is proposed, where a reward function comprehensively considering the attack angle advantage, speed advantage, altitude advantage and distance advantage is adopted and improved. The improved reward function avoids the problem that the agent is induced to fall to the ground by the enemy aircraft, and can effectively guide the agent to converge to the optimal solution. Aiming at the problem of slow convergence caused by random sampling in reinforcement learning, we design a value-based prioritization method for experience pool samples. Under the premise of ensuring the algorithm convergence, the convergence speed of the algorithm is significantly accelerated. The decision-making model is verified based on the human-machine confrontation simulation platform, and the results show that the model can suppress the expert system and the driver in the process of close air combat.
李永丰 , 吕永玺 , 史静平 , 等 . 深度确定性策略梯度和预测相结合的无人机空战决策研究 [J ] . 西北工业大学学报 , 2023 , 41 ( 1 ): 56 - 64 .
LI Y F , LÜ Y X , SHI J P , et al . UAV's air combat decision-making based on deep deterministic policy gradient and prediction [J ] . Journal of Northwestern Polytechnical University , 2023 , 41 ( 1 ): 56 - 64 . (in Chinese)
李曾琳 , 李波 , 白双霞 , 等 . 基于AM-SAC的无人机自主-空战决策 [J ] . 兵工学报 , 2023 , 44 ( 9 ): 2849 - 2858 . DOI: 10.12382/bgxb.2022.0669 http://doi.org/10.12382/bgxb.2022.0669 针对现代空战中的无人机自主决策问题,将注意力机制(AM)与深度强化学习中的非确定性策略算法Soft Actor Critic(SAC)相结合,提出一种基于AM-SAC算法的机动决策算法。在1V1的作战背景下建立无人机3自由度运动模型和无人机近距空战模型,并利用敌我之间相对距离和相对方位角构建导弹攻击区模型。将AM引入SAC算法,构造权重网络,从而实现训练过程中奖励权重的动态调整并设计仿真实验。通过与SAC算法的对比以及在多个不同初始态势环境下的测试,验证了基于AM-SAC算法的机动决策算法具有更高的收敛速度和机动稳定性,在空战中有更好的表现,且适用于多种不同的作战场景。
LI Z L , LI B , BAI S X , et al . UAV autonomous air combat decision-making based on AM-SAC [J ] . Acta Armamentarii , 2023 , 44 ( 9 ): 2849 - 2858 . (in Chinese) DOI: 10.12382/bgxb.2022.0669 http://doi.org/10.12382/bgxb.2022.0669 To address the autonomous decision-making problem of unmanned aerial vehicles (UAV) in modern air combats, a maneuvering decision algorithm based on AM-SAC algorithm is proposed by combining the Attention Mechanism (AM) with Soft Actor Critic (SAC) in deep reinforcement learning. Focusing on 1V1 combat scenarios, the UAV three degree of freedom maneuvering model and the UAV close-range air combat model are established, and the missile attack zone model is built based on the relative distance and relative azimuth angle between both sides in a combat. The attention mechanism is introduced into SAC algorithm to construct the weight network, so as to realize the dynamic adjustment of the weight distribution of reward function during the training process. The simulation experiments are also designed. By comparing with SAC algorithm and testing in multiple environments with different initial situations, it is verified that the UAV air combat decision algorithm based on the AM-SAC algorithm has higher convergence speed and maneuvering stability, as well as better performance in air combat across various initial environments.
王昱 , 任田君 , 范子琳 , 等 . 基于角度特征的分布式DDPG无人机追击决策 [J ] . 控制理论与应用 , 2025 , 42 ( 7 ): 1356 - 1366 .
WANG Y , REN T J , FAN Z L , et al . Distributed DDPG UAV pursuit decision based on angle features [J ] . Control Theory & Applications , 2025 , 42 ( 7 ): 1356 - 1366 . (in Chinese)
王昱 , 任田君 , 范子琳 . 基于引导Minimax-DDQN的无人机空战机动决策 [J ] . 计算机应用 , 2023 , 43 ( 8 ): 2636 - 2643 . DOI: 10.11772/j.issn.1001-9081.2022071069 http://doi.org/10.11772/j.issn.1001-9081.2022071069 针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。
WANG Y , REN T J , FAN Z L . Air combat maneuver decision of unmanned aerial vehicle based on guided minimax-DDQN [J ] . Journal of Computer Applications , 2023 , 43 ( 8 ): 2636 - 2643 . (in Chinese) DOI: 10.11772/j.issn.1001-9081.2022071069 http://doi.org/10.11772/j.issn.1001-9081.2022071069 A guided Minimax-DDQN (Minimax-Double Deep Q-Network) algorithm was designed to solve the problems of unpredictable enemy aircraft maneuver strategy and low winning rate, which are caused by the complex environment information and strong confrontation of Unmanned Aerial Vehicle (UAV) in air combat. Firstly, on the basis of Minimax decision-making method, a guided strategy exploration mechanism was proposed. Then, combined with the guided Minimax strategy, a type of DDQN (Double Deep Q-Network) algorithm was designed to improve the update efficiency of Q-network. Finally, an advanced three-stage network training method was proposed. And through confrontation training between different decision models, better optimized decision model was obtained. Experimental results show that compared with Minimax-DQN (Minimax-DQN), Minimax-DDQN and other algorithms, the proposed algorithm has the success rate of chasing straight target improved by 14% to 60% and the winning rate against DDQN algorithm over 60%. It can be seen that compared with algorithms such as DDQN and Minimax-DDQN, the proposed algorithm has stronger decision-making capability and better adaptability in high confrontation combat environment.
0
浏览量
286
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号