基于AM-SAC的无人机自主空战决策

doi:10.12382/bgxb.2022.0669

摘要/Abstract

摘要：

针对现代空战中的无人机自主决策问题,将注意力机制(AM)与深度强化学习中的非确定性策略算法Soft Actor Critic(SAC)相结合,提出一种基于AM-SAC算法的机动决策算法。在1V1的作战背景下建立无人机3自由度运动模型和无人机近距空战模型,并利用敌我之间相对距离和相对方位角构建导弹攻击区模型。将AM引入SAC算法,构造权重网络,从而实现训练过程中奖励权重的动态调整并设计仿真实验。通过与SAC算法的对比以及在多个不同初始态势环境下的测试,验证了基于AM-SAC算法的机动决策算法具有更高的收敛速度和机动稳定性,在空战中有更好的表现,且适用于多种不同的作战场景。

关键词: 无人机, 空战决策算法, Soft Actor Critic, 注意力机制

Abstract:

To address the autonomous decision-making problem of unmanned aerial vehicles (UAV) in modern air combats, a maneuvering decision algorithm based on AM-SAC algorithm is proposed by combining the Attention Mechanism (AM) with Soft Actor Critic (SAC) in deep reinforcement learning. Focusing on 1V1 combat scenarios, the UAV three degree of freedom maneuvering model and the UAV close-range air combat model are established, and the missile attack zone model is built based on the relative distance and relative azimuth angle between both sides in a combat. The attention mechanism is introduced into SAC algorithm to construct the weight network, so as to realize the dynamic adjustment of the weight distribution of reward function during the training process. The simulation experiments are also designed. By comparing with SAC algorithm and testing in multiple environments with different initial situations, it is verified that the UAV air combat decision algorithm based on the AM-SAC algorithm has higher convergence speed and maneuvering stability, as well as better performance in air combat across various initial environments.

Key words: unmanned aerial vehicles, air combat decision-making algorithm, soft actor critic, attention mechanism

中图分类号:

E911
TJ85

李曾琳, 李波, 白双霞, 孟波波. 基于AM-SAC的无人机自主空战决策[J]. 兵工学报, 2023, 44(9): 2849-2858.

LI Zenglin, LI Bo, BAI Shuangxia, MENG Bobo. UAV Autonomous Air Combat Decision-making Based on AM-SAC[J]. Acta Armamentarii, 2023, 44(9): 2849-2858.

图/表 14

图1 空中对抗态势图

Fig.1 Air combat situation map

图2 导弹攻击区示意图

Fig.2 Diagram of missile attack zone

图3 Q、K、V模型示意图

Fig.3 Schematic diagram of QKV model

图4 AM-SAC算法结构示意图

Fig.4 Structure diagram of AM-SAC algorithm

图5 AM-SAC算法流程

Fig.5 Flow of AM-SAC algorithm

表1 测试环境初始态势

Table 1 Initial state of test environment

敌方对我方相对方位角/(°)	我方对敌方相对方位角/(°)	相对初始距离/km
96.64	171.04	7.46

表2 敌我双方初始化位置信息

Table 2 Initial position information of both sides

作战方	X/ km	Y/ km	Z/ km	俯仰角/(°)	航向角/(°)	速度/ (m·s^-1)
红方	2	3.5	-3	2	50	70
蓝方	-3.5	3	2	1	-40	70

图6 奖励曲线对比图

Fig.6 Comparison of reward curves

图7 作战轨迹对比图

Fig.7 Comparison of battle trajectories

图8 状态变化曲线

Fig.8 Curves of state changes

图9 权重分布变化图

Fig.9 Diagram of weight distribution changes

表3 多环境初始状态

Table 3 Initial state of multiple environments

环境	敌方对我方相对方位角/(°)	我方对敌方相对方位角/(°)	相对初始距离/ km	我方俯仰角/ (°)	速度/ (m·s^-1)	敌方俯仰角/(°)
1	96.64	171.04	7.46	2	70	1
2	105.94	8.33	6.17	2	70	1
3	13.20	174.31	7.52	2	70	1
4	75.27	26.24	7.23	2	70	1

表4 AM-SAC训练结果

Table 4 Training results of AM-SAC algorithm

环境	是否作战成功	作战成功步长	最大奖励	奖励收敛回合
1	是	236	495.06	400
2	是	137	641.37	600
3	是	214	539.01	300
4	是	151	627.08	500

图10 作战轨迹示意图

Fig.10 Combat trajectory diagram

参考文献 23

[1]	韩润海, 陈浩, 刘权, 等. 基于奖励塑造和D3QN的自主空战机动决策研究[C]//2021中国自动化大会论文集. 北京: 中国自动化学会, 2021:687-693.
	HAN R H, CHEN H, LIU Q, et al. Research on maneuvering decision of near autonomous air combat based on sparse reward and D3QN algorithm[C]//Proceedings of the 2021 China Automation Congress. Beijing: Chinese Association of Automation, 2021:687-693. (in Chinese)
[2]	傅莉, 王晓光. 无人战机近距空战微分对策建模研究[J]. 兵工学报, 2012, 33(10):1210-1216.
	FU L, WANG X G. Research on close air combat modeling of differential games for unmanned combat air vehicles[J]. Acta Armamentarii, 2012, 33(10):1210-1216. (in Chinese)
[3]	谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2015.
	XIE J. Differential game theory for multi uav pursuit maneuver technology based on collaborative research[D]. Harbin: Harbin Institute of Technology, 2015. (in Chinese)
[4]	钱炜祺, 车竞, 何开锋. 基于矩阵博弈的空战决策方法[C]//2014第二届中国指挥控制大会论文集(上). 北京:中国指挥与控制学会, 2014:408-412.
	QIAN W Q, CHE J, HE K F. Air combat decision method based on game-matrix approach[C]//Proceedings of the 2nd China Conference on Command and Control 2014 (I). Beijing:Chinese Institute of Command and Control, 2014:408-412. (in Chinese)
[5]	徐光达, 吕超, 王光辉, 等. 基于双矩阵对策的UCAV空战自主机动决策研究[J]. 舰船电子工程, 2017, 37(11):24-28,39.
	XU G D, LÜ C, WANG G H, et al. Research on UCAV autonomous air combat maneuvering decision-making based on bi-matrix game[J]. Ship Electronic Engineering, 2017, 37(11):24-28, 39. (in Chinese)
[6]	BULLOCK H E. ACE: the airborne combat expert systems: an exposition in two parts:ADA170461[R]. Fort Belvoir, VA, US: Defense Technical Information Center, 1986.
[7]	CHIN H H. Knowledge-based system of supermaneuver selection for pilot aiding[J]. Journal of Aircraft, 1989, 26(12):1111-1117. doi: 10.2514/3.45888 URL
[8]	魏强, 周德云. 基于专家系统的无人战斗机智能决策系统[J]. 火力与指挥控制, 2007(2):5-7, 12.
	WEI Q, ZHOU D Y. Research on UCAV's intelligent decision-making system based on expert system[J]. Fire Control & Command Control, 2007(2):5-7, 12. (in Chinese)
[9]	王锐平, 高正红. 无人机空战仿真中基于机动动作库的决策模型[J]. 飞行力学, 2009, 27(6):72-75, 79.
	WANG R P, GAO Z H. Research on decision system in air combat simulation using maneuver library[J]. Flight Dynamics, 2009, 27(6):72-75, 79. (in Chinese)
[10]	VIRTANEN K, EHTAMO H, RAIVIO T, et al. VIATO-visual interactive aircraft trajectory optimization[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 1999, 29(3): 409-421.
[11]	DIKE B A, SMITH R E. Application of genetic algorithms to air combat manuevering[J]. Neural Networks: Academic/Industrial/NASA/Defense, 1993, 2204: 84.
[12]	周德云, 李锋, 蒲小勃, 等. 基于遗传算法的飞机战术飞行动作决策[J]. 西北工业大学学报, 2002, 20(1) :109-112.
	ZHOU D Y, LI F, PU X B, et al. On improve tactical planning in air combat in P.R.China with genetic algorithm[J]. Journal of Northwestern Polytechnical University, 2002, 20(1):109-112. (in Chinese)
[13]	张涛, 于雷, 周中良, 等. 基于变权重伪并行遗传算法的空战机动决策[J]. 飞行力学, 2012, 30(5):470-474.
	ZHANG T, YU L, ZHOU Z L, et al. Decision-making for air combat maneuvering based on variable weight pseudo-parallel genetic algorithm[J]. Flight Dynamics, 2012, 30(5):470-474. (in Chinese)
[14]	韩统, 崔明朗, 张伟, 等. 多无人机协同空战机动决策[J]. 兵器装备工程学报, 2020, 41(4):117-123.
	HAN T, CUI M L, ZHANG W, et al. Multi-UCAV cooperative air combat maneuvering decision[J]. Journal of Ordnance Equipment Engineering, 2020, 41(4):117-123. (in Chinese)
[15]	孙楚, 赵辉, 王渊, 等. 基于强化学习的无人机自主机动决策方法[J]. 火力与指挥控制, 2019, 44(4):142-149.
	SUN C, ZHAO H, WANG Y, et al. UCAV Autonomic maneuver decision-making method based on reinforcement learning[J]. Fire Control & Command Control, 2019, 44(4): 142-149. (in Chinese)
[16]	HE L, AOUF N, WHIDBORNE J F, et al. Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data:arXiv: 2008.02521[R/OL]. Ithaca, NY, US: Cornell University, 2020:2008.02521.
[17]	马文. 基于深度强化学习的空战博弈决策研究[D]. 成都: 四川大学, 2021.
	MA W. Research on air combat game decision based on deep reinforcement learning[D]. Chengdu: Sichuan University, 2021. (in Chinese)
[18]	周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真研究[J/OL]. 航空学报:1-16.(2022-01-26)[2022-05-18].
	ZHOU P, HUANG J T, ZHANG S, et al. Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning[J/OL]. Acta Aeronautica et Astronautica Sinica:1-16.(2022-01-26)[2022-05-18]. in Chinese)
[19]	张宏鹏, 黄长强, 轩永波, 等. 基于深度神经网络的无人作战飞机自主空战机动决策[J]. 兵工学报, 2020, 41(8):1613-1622. doi: 10.3969/j.issn.1000-1093.2020.08.016
	ZHANG H P, HUANG C Q, XUAN Y B, et al. Maneuver decision of autonomous air combat of unmanned combat aerial vehicle based on deep neural network[J]. Acta Armamentarii, 2020, 41(8):1613-1622. (in Chinese) doi: 10.3969/j.issn.1000-1093.2020.08.016
[20]	王兴众, 王敏, 罗威. 基于SAC算法的作战仿真推演智能决策技术[J]. 中国舰船研究, 2021, 16(6):99-108.
	WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6):99-108. (in Chinese)
[21]	许如晨. 基于深度强化学习的自动驾驶策略研究[D]. 杭州: 浙江大学, 2021.
	XU R C. Research on autonomous driving strategy based on deep reinforcement learning[D]. Hangzhou: Zhejiang University, 2021. (in Chinese)
[22]	李波, 白双霞, 孟波波, 等. 基于SAC算法的无人机自主空战决策算法[J/OL]. 指挥控制与仿真:1-6.(2022-09-16)[2022-10-17].
	LI B, BAI S X, MENG B B, et al. UAV autonomous air combat decision-making algorithm based on SAC algorithm[J/OL]. Command Control & Simulation:1-6.(2022-09-16)[2022-10-17]. in Chinese)
[23]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor:arXiv: 1801.01290[R]. Ithaca, NY, US: Cornell University, 2018:1801.01290.