1. 西北工业大学 电子信息学院, 陕西 西安 710129
2. 西安现代控制技术研究所, 陕西 西安 710065
*邮箱: libo803@nwpu.edu.cn
收稿:2022-07-25,
网络出版:2023-09-25,
纸质出版:2023-09-20
移动端阅览
李曾琳, 李波, 白双霞, 等. 基于AM-SAC的无人机自主空战决策[J]. 兵工学报, 2023,44(9):2849-2858.
Zenglin LI, Bo LI, Shuangxia BAI, et al. UAV Autonomous Air Combat Decision-making Based on AM-SAC[J]. Acta Armamentarii, 2023, 44(9): 2849-2858.
李曾琳, 李波, 白双霞, 等. 基于AM-SAC的无人机自主空战决策[J]. 兵工学报, 2023,44(9):2849-2858. DOI: 10.12382/bgxb.2022.0669.
Zenglin LI, Bo LI, Shuangxia BAI, et al. UAV Autonomous Air Combat Decision-making Based on AM-SAC[J]. Acta Armamentarii, 2023, 44(9): 2849-2858. DOI: 10.12382/bgxb.2022.0669.
针对现代空战中的无人机自主决策问题
将注意力机制(AM)与深度强化学习中的非确定性策略算法Soft Actor Critic(SAC)相结合
提出一种基于AM-SAC算法的机动决策算法。在1V1的作战背景下建立无人机3自由度运动模型和无人机近距空战模型
并利用敌我之间相对距离和相对方位角构建导弹攻击区模型。将AM引入SAC算法
构造权重网络
从而实现训练过程中奖励权重的动态调整并设计仿真实验。通过与SAC算法的对比以及在多个不同初始态势环境下的测试
验证了基于AM-SAC算法的机动决策算法具有更高的收敛速度和机动稳定性
在空战中有更好的表现
且适用于多种不同的作战场景。
To address the autonomous decision-making problem of unmanned aerial vehicles (UAV) in modern air combats
a maneuvering decision algorithm based on AM-SAC algorithm is proposed by combining the Attention Mechanism (AM) with Soft Actor Critic (SAC) in deep reinforcement learning. Focusing on 1V1 combat scenarios
the UAV three degree of freedom maneuvering model and the UAV close-range air combat model are established
and the missile attack zone model is built based on the relative distance and relative azimuth angle between both sides in a combat. The attention mechanism is introduced into SAC algorithm to construct the weight network
so as to realize the dynamic adjustment of the weight distribution of reward function during the training process. The simulation experiments are also designed. By comparing with SAC algorithm and testing in multiple environments with different initial situations
it is verified that the UAV air combat decision algorithm based on the AM-SAC algorithm has higher convergence speed and maneuvering stability
as well as better performance in air combat across various initial environments.
韩润海 , 陈浩 , 刘权 , 等 . 基于奖励塑造和D3QN的自主空战机动决策研究 [C ] //2021中国自动化大会论文集 . 北京 : 中国自动化学会 , 2021 : 687 - 693 .
HAN R H , CHEN H , LIU Q , et al . Research on maneuvering decision of near autonomous air combat based on sparse reward and D3QN algorithm [C ] //Proceedings of the 2021 China Automation Congress . Beijing : Chinese Association of Automation , 2021 : 687 - 693 . (in Chinese)
傅莉 , 王晓光 . 无人战机近距空战微分对策建模研究 [J ] . 兵工学报 , 2012 , 33 ( 10 ): 1210 - 1216 . 针对无人战机(UCAV)在空战格斗中的自主决策问题,在UCAV的六自由度模型的基础上,将微分对策与机器博弈相结合;通过引入机器博弈中的“变值”思想,改进了传统的以“角度优势”作为支付函数的微分对策模型。当UCAV处于不同的空战态势时,该模型体现了同样的角度变化所获得的收益不同的真实状况,使UCAV的空战策略能够更趋智能与合理,仿真结果验证了算法的合理性与有效性。
FU L , WANG X G . Research on close air combat modeling of differential games for unmanned combat air vehicles [J ] . Acta Armamentarii , 2012 , 33 ( 10 ): 1210 - 1216 . (in Chinese) Based on a six-degree-of-freedom model of unmanned combat air vehicles(UCAVs), a differential countermeasure was combined with computer game for the problem of autonomous decision of UCAVs in air combat. The traditional differential game model which uses “angle advantage” as payoff function was improved by introducing “changing value” in the computer game for reference. When UCAV is in different air combat situation, the model embodies the real air combat situation of that the different incomes are obtained with the change of the same angle, so it makes the air combat strategies more intelligent and more reasonable. The simulation results show that the algorithm is rational and effective.
谢剑 . 基于微分博弈论的多无人机追逃协同机动技术研究 [D ] . 哈尔滨 : 哈尔滨工业大学 , 2015 .
XIE J . Differential game theory for multi uav pursuit maneuver technology based on collaborative research [D ] . Harbin : Harbin Institute of Technology , 2015 . (in Chinese)
钱炜祺 , 车竞 , 何开锋 . 基于矩阵博弈的空战决策方法 [C ] //2014第二届中国指挥控制大会论文集(上). 北京 :中国指挥与控制学会, 2014: 408 - 412 .
QIAN W Q , CHE J , HE K F . Air combat decision method based on game-matrix approach [C ] //Proceedings of the 2nd China Conference on Command and Control 2014 (I). Beijing:Chinese Institute of Command and Control, 2014: 408 - 412 . (in Chinese)
徐光达 , 吕超 , 王光辉 , 等 . 基于双矩阵对策的UCAV空战自主机动决策研究 [J ] . 舰船电子工程 , 2017 , 37 ( 11 ): 24 - 28 ,39.
XU G D , LÜ C , WANG G H , et al . Research on UCAV autonomous air combat maneuvering decision-making based on bi-matrix game [J ] . Ship Electronic Engineering , 2017 , 37 ( 11 ): 24 - 28 , 39. (in Chinese)
BULLOCK H E . ACE: the airborne combat expert systems: an exposition in two parts:ADA170461 [R ] . Fort Belvoir, VA , US : Defense Technical Information Center , 1986 .
CHIN H H . Knowledge-based system of supermaneuver selection for pilot aiding [J ] . Journal of Aircraft , 1989 , 26 ( 12 ): 1111 - 1117 . DOI: 10.2514/3.45888 http://doi.org/10.2514/3.45888 https://arc.aiaa.org/doi/10.2514/3.45888 https://arc.aiaa.org/doi/10.2514/3.45888
魏强 , 周德云 . 基于专家系统的无人战斗机智能决策系统 [J ] . 火力与指挥控制 , 2007 ( 2 ): 5 - 7 , 12.
WEI Q , ZHOU D Y . Research on UCAV's intelligent decision-making system based on expert system [J ] . Fire Control & Command Control , 2007 ( 2 ): 5 - 7 , 12. (in Chinese)
王锐平 , 高正红 . 无人机空战仿真中基于机动动作库的决策模型 [J ] . 飞行力学 , 2009 , 27 ( 6 ): 72 - 75 , 79.
WANG R P , GAO Z H . Research on decision system in air combat simulation using maneuver library [J ] . Flight Dynamics , 2009 , 27 ( 6 ): 72 - 75 , 79. (in Chinese)
VIRTANEN K , EHTAMO H , RAIVIO T , et al. VIATO-visual interactive aircraft trajectory optimization [J ] . IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews , 1999 , 29 ( 3 ): 409 - 421 .
DIKE B A , SMITH R E . Application of genetic algorithms to air combat manuevering [J ] . Neural Networks: Academic/Industrial/NASA/Defense , 1993 , 2204 : 84 .
周德云 , 李锋 , 蒲小勃 , 等 . 基于遗传算法的飞机战术飞行动作决策 [J ] . 西北工业大学学报 , 2002 , 20 ( 1 ) : 109 - 112 .
ZHOU D Y , LI F , PU X B , et al . On improve tactical planning in air combat in P.R.China with genetic algorithm [J ] . Journal of Northwestern Polytechnical University , 2002 , 20 ( 1 ): 109 - 112 . (in Chinese)
张涛 , 于雷 , 周中良 , 等 . 基于变权重伪并行遗传算法的空战机动决策 [J ] . 飞行力学 , 2012 , 30 ( 5 ): 470 - 474 .
ZHANG T , YU L , ZHOU Z L , et al . Decision-making for air combat maneuvering based on variable weight pseudo-parallel genetic algorithm [J ] . Flight Dynamics , 2012 , 30 ( 5 ): 470 - 474 . (in Chinese)
韩统 , 崔明朗 , 张伟 , 等 . 多无人机协同空战机动决策 [J ] . 兵器装备工程学报 , 2020 , 41 ( 4 ): 117 - 123 .
HAN T , CUI M L , ZHANG W , et al . Multi-UCAV cooperative air combat maneuvering decision [J ] . Journal of Ordnance Equipment Engineering , 2020 , 41 ( 4 ): 117 - 123 . (in Chinese)
孙楚 , 赵辉 , 王渊 , 等 . 基于强化学习的无人机自主机动决策方法 [J ] . 火力与指挥控制 , 2019 , 44 ( 4 ): 142 - 149 .
SUN C , ZHAO H , WANG Y , et al. UCAV Autonomic maneuver decision-making method based on reinforcement learning [J ] . Fire Control & Command Control , 2019 , 44 ( 4 ): 142 - 149 . (in Chinese)
HE L , AOUF N , WHIDBORNE J F , et al. Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data:arXiv : 2008 .02521[R/OL ] . Ithaca, NY , US : Cornell University , 2020 :2008.02521.
马文 . 基于深度强化学习的空战博弈决策研究 [D ] . 成都 : 四川大学 , 2021 .
MA W . Research on air combat game decision based on deep reinforcement learning [D ] . Chengdu : Sichuan University , 2021 . (in Chinese)
周攀 , 黄江涛 , 章胜 , 等 . 基于深度强化学习的智能空战决策与仿真研究 [J/OL ] . 航空学报 : 1 - 16 .( 2022-01-26 )[ 2022-05-18 ] . http://kns.cnki.net/kcms/detail/11.1929.v.20220126.1120.014.html http://kns.cnki.net/kcms/detail/11.1929.v.20220126.1120.014.html .
ZHOU P , HUANG J T , ZHANG S , et al . Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning [J/OL ] . Acta Aeronautica et Astronautica Sinica : 1 - 16 .( 2022-01-26 )[ 2022-05-18 ] . http://kns.cnki.net/kcms/detail/11.1929.v.20220126.1120.014.html http://kns.cnki.net/kcms/detail/11.1929.v.20220126.1120.014.html . (in Chinese)
张宏鹏 , 黄长强 , 轩永波 , 等 . 基于深度神经网络的无人作战飞机自主空战机动决策 [J ] . 兵工学报 , 2020 , 41 ( 8 ): 1613 - 1622 . DOI: 10.3969/j.issn.1000-1093.2020.08.016 http://doi.org/10.3969/j.issn.1000-1093.2020.08.016 机动决策是决定无人作战飞机空战成败的关键因素。为提高空战获胜概率,提出用深度神经网络进行决策。构建了36种机动动作,通过飞行仿真,得到由当前态势、控制量和未来态势构成的样本;用样本训练深度神经网络,使其能够根据当前信息快速准确预测未来态势,设计了决策目标函数和态势评估函数,空战过程中,利用训练好的网络预测所有动作对应的未来态势,根据决策目标函数从中选出最优动作;在不同初始条件下,分别与采用简单机动和自主机动的敌机进行空战仿真,并对空战态势进行评估。结果表明,所提方法在均势时能通过较少的动作获得空战胜利,在劣势时能通过一系列机动获得优势,且决策用时缩短了9 ms.
ZHANG H P , HUANG C Q , XUAN Y B , et al . Maneuver decision of autonomous air combat of unmanned combat aerial vehicle based on deep neural network [J ] . Acta Armamentarii , 2020 , 41 ( 8 ): 1613 - 1622 . (in Chinese) DOI: 10.3969/j.issn.1000-1093.2020.08.016 http://doi.org/10.3969/j.issn.1000-1093.2020.08.016 Maneuver decision is a critical factor which determines the success and failure of air combat for unmanned combat aerial vehicle. In order to increase the probability of wining air combat, a deep neural network (DNN) is proposed for maneuver decision. 36 kinds of maneuvers were constructed, and the samples of current situation, control quantity and future situation were acquired through flight simulations. The DNN is trained with the samples, making it capable of predicting future situation according to current information. Decision target function and situation assessment function were designed. In the process of air combat, the trained DNN is used to predict the future situations corresponding to all maneuvers, and the best maneuver is selected from all the maneuvers according to decision target function. The enemy planes which simply and autonomously maneuver were simulated, respectively, under different initial conditions, and the air combat situations were also assessed. The results show that the proposed decision method can be used to win air combat with less actions at balanced situation, and gain an edge through a series of actions at adverse situation, and the decision-making time is reduced by 9 ms.
王兴众 , 王敏 , 罗威 . 基于SAC算法的作战仿真推演智能决策技术 [J ] . 中国舰船研究 , 2021 , 16 ( 6 ): 99 - 108 .
WANG X Z , WANG M , LUO W . Intelligent decision technology in combat deduction based on soft actor-critic algorithm [J ] . Chinese Journal of Ship Research , 2021 , 16 ( 6 ): 99 - 108 . (in Chinese)
许如晨 . 基于深度强化学习的自动驾驶策略研究 [D ] . 杭州 : 浙江大学 , 2021 .
XU R C . Research on autonomous driving strategy based on deep reinforcement learning [D ] . Hangzhou : Zhejiang University , 2021 . (in Chinese)
李波 , 白双霞 , 孟波波 , 等 . 基于SAC算法的无人机自主空战决策算法 [J/OL ] . 指挥控制与仿真 : 1 - 6 .( 2022-09-16 )[ 2022-10-17 ] . http://kns.cnki.net/kcms/detail/32.1759.tj.20220915.1619.020.html http://kns.cnki.net/kcms/detail/32.1759.tj.20220915.1619.020.html .
LI B , BAI S X , MENG B B , et al. UAV autonomous air combat decision-making algorithm based on SAC algorithm [J/OL ] . Command Control & Simulation : 1 - 6 .( 2022-09-16 )[ 2022-10-17 ] . http://kns.cnki.net/kcms/detail/32.1759.tj.20220915.1619.020.html http://kns.cnki.net/kcms/detail/32.1759.tj.20220915.1619.020.html . (in Chinese)
HAARNOJA T , ZHOU A , ABBEEL P , et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor:arXiv : 1801 .01290[R ] . Ithaca, NY , US : Cornell University , 2018 :1801.01290.
0
浏览量
408
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号