A Cooperative Guidance Law Based on Meta-learning and Reinforcement Learning for Multiple Aerial Vehicles

doi:10.12382/bgxb.2024.0568

Abstract

Abstract:

For the cooperative guidance issue of high-hypersonic re-entry gliding vehicles to simultaneously hit a target at a specified angle in a complex environment,a cooperative guidance law based on meta-learning and reinforcement learning algorithms is proposed.Considering the interference caused by complex combat environment,a Markov decision model for the cooperative guidance issue is established,taking the gliding vehicles’ motion status and proportional guidance factor as the state space and action space.A reward function is designed by comprehensively considering the vehicle-target distance,remaining flight time difference,and overload situation for multiple gliding vehicles attacking a target.Based on meta-learning theory and reinforcement learning algorithm,the proximal policy optimization algorithms are combined with the gated recurrent units to learn the common features of similar cooperative guidance tasks.This approach enhances the accuracy of cooperative guidance strategies in complex interference environments to achieve the constraints on angle of attack and attack time,while also improving the adaptability of cooperative guidance strategy to different combat scenarios.Simulated results indicate that the proposed cooperative guidance law enables multiple aerial vehicles to simultaneously attack a target at a specified attack angle in complex battlefield environment and quickly adapt to new cooperative guidance tasks.The cooperative guidance law maintains good performance even when the cooperative combat scenario changes.

Key words: hypersonic re-entry gliding vehicle, cooperative guidance, meta-learning, reinforcement learning, proximal policy optimization

CLC Number:

V249.31

WANG Cuncan, WANG Xiaofang, LIN Hai. A Cooperative Guidance Law Based on Meta-learning and Reinforcement Learning for Multiple Aerial Vehicles[J]. Acta Armamentarii, 2025, 46(7): 240568-.

Figures/Tables 39

Fig.1 Missile-target relative motion diagram

Fig.2 Cooperative guidance interaction process

Fig.3 Structural diagram of PPO algorithm network

Fig.4 Basic structure of GRU network

Fig.5 Structural diagram of GPPO algorithm

Fig.6 Basic structure of Actor and Critic networks

Table 1 Initial parameters of 3 missiles and the target

参数	初始值
目标初始位置/m	(45000,0,0)
M₁初始位置/m	(0,30000,0)
M₁速度/(m·s^-1)	1200
M₁初始弹道倾角/(°)	0
M₁初始弹道偏角/(°)	-5
M₂初始位置/m	(0,30000,1000)
M₂速度/(m·s^-1)	1200
M₂初始弹道倾角/(°)	0
M₂初始弹道偏角/(°)	-20
M₃初始位置/m	(0,30000,-1000)
M₃速度/(m·s^-1)	1200
M₃初始弹道倾角/(°)	0
M₃初始弹道偏角/(°)	15

Table 2 Reward function parameters

参数	数值	参数	数值
a	100	b₄	-10
b₁	-0.5	b₅	-50
b₂	30	c₁	0.1
b₃	10	c₂	-0.1

Table 3 Network structure parameters

参数	Actor网络	Critic网络
输入层	10	10
隐藏层	64	64
激活函数	Tanh	Tanh
GRU层	64	64
激活函数	Tanh	Tanh
输出层	3	1
激活函数	Tanh	-

Table 4 Algorithm training hyperparameters

参数	数值
学习率	0.0003
奖励折扣系数γ	0.99
GAE系数λ	0.95
裁剪因子ε	0.2

Table 5 Interference experienced by the missile

干扰	大小
$d 1 i$ (i=1,2,3)/((°)·s^-1)	0.1N(0,1)
$d 2 i$ (i=1,2,3)/((°)·s^-1)	0.1N(0,1)

Table 5 Interference experienced by the missile

干扰	大小
$d 1 i$ (i=1,2,3)/((°)·s^-1)	0.1N(0,1)
$d 2 i$ (i=1,2,3)/((°)·s^-1)	0.1N(0,1)

Fig.7 Reward function curve of PPO guidance network

Fig.8 Missile trajectory

Fig.9 Missile-target relative distance over time

Fig.10 Remaining flight time tgo over time

Fig.11 Missile velocity V over time

Fig.12 Missile trajectory inclination angle θ over time

Fig.13 Missile trajectory deviation angle ψV over time

Fig.14 The curve of guidance coefficient variation under PPO

Fig.15 Missile longitudinal acceleration curve

Fig.16 Missile lateral acceleration curve

Table 6 Missile miss distance,attack time,and angle of attack

制导律	导弹	脱靶量/m	攻击时间/s	攻击角度/(°)
PPO策略	M₁	0.74	63.45	-65.00
	M₂	3.42	63.46	-64.98
	M₃	2.69	63.45	-65.00
PITCG	M₁	20.85	63.47	-64.98
	M₂	30.21	65.74	-65.05
	M₃	23.57	64.17	-65.04

Fig.17 Schematic diagram of target position

Fig.18 Reward function curve of GPPO guidance network

Table 7 Online performance comparison of guidance laws under case 1

制导律	脱靶量/m		攻击时间误差/s		攻击角度误差/(°)
制导律	平均值	标准差	平均值	标准差	平均值	标准差
PPO	6.79	1.88	0.42	0.76	0.07	0.17
GPPO	5.46	0.49	0.17	0.18	0.03	0.08

Table 8 Online performance comparison of guidance laws under case 2

制导律	脱靶量/m		攻击时间误差/s		攻击角度误差/(°)		平均训练回合数
制导律	平均值	标准差	平均值	标准差	平均值	标准差	平均训练回合数
PPO	4.89	0.89	0.07	0.04	0.03	0.06	85
GPPO	2.57	0.61	0.02	0.02	0.01	0.02	40

Fig.19 Miss distance in Case 1

Fig.20 Attack time error in Case 1

Fig.21 Miss distance in Case 2

Fig.22 Attack time error in Case 2

Table 9 The parameters table of Scenario 1

参数	数值
目标位置/m	(42064,0,3535)
弹1弹道偏角/(°)	2
弹2弹道偏角/(°)	-17
弹3弹道偏角/(°)	5
期望攻击角度/(°)	-65

Fig.23 Cooperative guidance network reward in Scenario 1

Fig.24 Missile trajectory in Scenario 1

Fig.25 Variation curves of missile guidance coefficients in Scenario 1

Table 10 Performance parameters of PPO2 guidance law

参数	平均值	标准差
脱靶量/m	4.28	0.78
攻击时间误差/s	0.05	0.03
攻击角度/(°)	-65.02	0.07
训练回合数	59	-

Table 11 The parameters table of Scenario 2

参数	数值
目标位置/m	(42000,0,3500)
弹1弹道偏角/(°)	2
弹2弹道偏角/(°)	-15
弹3弹道偏角/(°)	10
期望攻击角度/(°)	-70

Fig.26 Missile trajectory in Scenario 2

Fig.27 Missile trajectory inclination angle θ over time in Scenario 2

Fig.28 Missile trajectory deviation angle ψV over time in Scenario 2

References 24

[1]	SZIROCZAK D, SMITH H. A review of design issues specific to hypersonic flight vehicles[J]. Progress in Aerospace Sciences, 2016,84:1-28.
[2]	LEE C H, KIM T H, TANK M J. Interception angle control guidance using proportional navigation with error feedback[J]. Journal of Guidance Control and Dynamics, 2013, 36(5):1556-1561.
[3]	黎克波, 廖选平, 梁彦刚, 等. 基于纯比例导引的拦截碰撞角约束制导策略[J]. 航空学报, 2020, 41(增刊2):724277.
	LI K B, LIAO X Q, LIANG Y G, et al. Guidance strategy with pure proportional guidance and intercept collision angle constraint[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S2):724277. (in Chinese)
[4]	WANG Y N, WANG H, LIN D F, et al. Nonlinear modified bias proportional navigation guidance law against maneuvering targets[J]. Journal of the Franklin Institute, 2022, 359(7):2949-2975.
[5]	KIM T H, PARK B G, TAHK M J. Bias-shaping method for biased proportional navigation with terminal-angle constraint[J]. Journal of Guidance,Control,and Dynamics, 2013, 36(6):1810-1816.
[6]	CHEN X T, WANG J Z. Optimal control based guidance law to control both impact time and impact angle[J]. Aerospace Science and Technology, 2019,84:454-463.
[7]	JEON I S, LEE J I, TAHK M J. Impact-time-control guidance law for anti-ship missiles[J]. IEEE Transactions on Control Systems Technology, 2006, 14(2):260-266.
[8]	SALEEM A, RATNOO A. Lyapunov-based guidance law for impact time control and simultaneous arrival[J]. Journal of Guidance,Control,and Dynamics, 2016, 39(1):164-172.
[9]	CHO D, KIM H J, TANK M J. Nonsingular sliding mode guidance for impact time control[J]. Journal of Guidance Control and Dynamics, 2016, 39(1):1-8.
[10]	LI B F, LIN D, WANG H. Finite time convergence cooperative guidance law based on graph theory[J]. Optik-International Journal for Light and Electron Optics, 2016, 127(21):10180-10188.
[11]	李国飞, 汤清璞, 吴云洁. 从飞行器无导引头的主-从式多飞行器协同制导方法[J]. 兵工学报, 2023, 44(11):3436-3446. doi: 10.12382/bgxb.2023.0678
	LI G F, TANG Q P, WU Y J. Cooperative guidance method of leader and seeker-less follower flight vehicles[J]. Acta Armamentarii, 2023, 44(11):3436-3446. (in Chinese) doi: 10.12382/bgxb.2023.0678
[12]	CHEN Y D, WANG J N, WANG C Y, et al. Three-dimensional cooperative homing guidance law with field-of-view constraint[J]. Journal of Guidance,Control,and Dynamics, 2019, 43(5):1-9.
[13]	ZHANG Y A, WANG X L, MA G X. Impact time control guidance law with large impact angle constraint[J]. Proceedings of the Institution of Mechanical Engineers, Part G:Journal of Aerospace Engineering, 2015, 229(11):2119-2131.
[14]	LI W, WEN Q Q, HE L, et al. Three-dimensional impact angle constrained distributed cooperative guidance law for anti-ship missiles[J]. Journal of Systems Engineering and Electronics, 2021, 32(2):447-459. doi: 10.23919/JSEE.2021.000038
[15]	GRANDO R B, DE J J C, KICH V A, et al. Double critic deep reinforcement learning for mapless 3D navigation of unmanned aerial vehicles[J]. Journal of Intelligent & Robotic Systems, 2022, 104(2):29.
[16]	SUN B, KAMPEN V E J. Reinforcement-learning-based adaptive optimal flight control with output feedback and input constraints[J]. Journal of Guidance,Control,and Dynamics, 2021, 44(9):1685-1691.
[17]	ZHANG J R, ZHANG K P, ZHANG Y, et al. Near-optimal interception strategy for orbital pursuit-evasion using deep reinforcement learning[J]. Acta Astronautica, 2022,198:9-25.
[18]	HE X J, CHEN Z H, JIA F, et al. Guidance law based on zero effort miss and Q-learning algorithm[C]//Proceeding of the 17th Symposium on Novel Photoelectronic Detection Technology and Applications.Kunming, China:SPIE, 2021,11763:708-716.
[19]	陈中原, 韦文书, 陈万春. 基于强化学习的多发导弹协同攻击智能制导律[J]. 兵工学报, 2021, 42(8):1638-1647.
	CHEN Z Y, WEI W S, CHEN W C. Intelligent guidance law for multi-missile coordinated attack based on reinforcement learning[J]. Acta Armamentarii, 2021, 42(8):1638-1647. (in Chinese)
[20]	李博皓, 安旭曼, 杨晓飞, 等. 攻击角度约束下的分布式强化学习制导方法[J]. 宇航学报, 2022, 43(8):1061-1069.
	LI B H, AN X M, YANG X F, et al. Distributed reinforcement learning guidance method under attack angle constraint[J]. Journal of Astronautics, 2022, 43(8):1061-1069 (in Chinese).
[21]	WANG N, WANG X, CUI N, et al. Deep reinforcement learning-based impact time control guidance law with constraints on the field-of-view[J]. Aerospace Science and Technology, 2022,128:107765.
[22]	刘旭, 李响, 王晓鹏. 高超声速滑翔飞行器解析协同再入制导[J]. 宇航学报, 2023, 44(5):731-742.
	LIU X, LI X, WANG X P. Analytical cooperative re-entry guidance for hypersonic glide vehicles[J]. Journal of Astronautics, 2023, 44(5):731-742. (in Chinese)
[23]	高峰, 唐胜景, 师娇, 等. 一种基于落角约束的偏置比例导引律[J]. 北京理工大学学报, 2014, 34(3):277-282.
	GAO F, TANG S J, SHI J, et al. A bias proportional navigation guidance law based on terminal impact angle constrain[J]. Transactions of Beijing Institute of Technology, 2014, 34(3):277-282. (in Chinese)
[24]	李东旭, 王晓芳, 林海. 多高超声速导弹协同末制导律及可行初始位置域研究[J]. 弹道学报, 2019, 31(4):1-7. doi: 10.12115/j.issn.1004-499X(2019)04-001
	LI D X, WANG X F, LIN H. Research on cooperative terminal guidance law and feasibleinitial position domain for multi-hypersonic missiles[J]. Journal of Ballistics, 2019, 31(4):1-7 (in Chinese). doi: 10.12115/j.issn.1004-499X(2019)04-001