Dynamic Penetration Decision of Loitering Munition Group Based on Knowledge-assisted Reinforcement Learning

doi:10.12382/bgxb.2023.0827

Abstract

Abstract:

The loitering munition group penetration control decision (LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic (SAC) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm, which verifies the effectiveness of the proposed algorithm.

Key words: loitering munition group, knowledge-assisted deep reinforcement learning, soft actor-critic algorithm, dynamic environment penetration, control decision

CLC Number:

V279

SUN Hao, LI Haiqing, LIANG Yan, MA Chaoxiong, WU Han. Dynamic Penetration Decision of Loitering Munition Group Based on Knowledge-assisted Reinforcement Learning[J]. Acta Armamentarii, 2024, 45(9): 3161-3176.

Figures/Tables 23

Fig.1 Schematic diagram of LMGPCD in dynamic environments

Fig.2 Relative motion between loitering munition and target

Table 1 Angle definition for loitering munition model

符号	含义
$ψ i M$	巡飞弹速度方向矢量与X轴夹角
$γ i M$	巡飞弹速度方向矢量与OXY平面夹角
$β i M, T$	巡飞弹与目标的视线连线在OXY平面上的投影与X轴之间的夹角
$ε i M, T$	巡飞弹与目标的视线连线与OXY平面的夹角

Table 1 Angle definition for loitering munition model

符号	含义
$ψ i M$	巡飞弹速度方向矢量与X轴夹角
$γ i M$	巡飞弹速度方向矢量与OXY平面夹角
$β i M, T$	巡飞弹与目标的视线连线在OXY平面上的投影与X轴之间的夹角
$ε i M, T$	巡飞弹与目标的视线连线与OXY平面的夹角

Fig.3 Schematic diagram of air defense zone

Table 2 Comparison of knowledge and rules

对应知识	规则名称
突防任务领域知识	突防机动规则
	任务边界型成败规则
	燃料限制型成败规则
打击任务成败规则知识	区域拒止型成败规则
	动态拦截型成败规则
	有效毁伤型成败规则
	目标指向引导偏好规则
	目标距离引导偏好规则
作战操纵领域知识	有限机动约束规则
	规避拦截约束规则
	协同飞行安全约束规则

Fig.4 Algorithm framework of KADRL-based LMGPCD

Fig.5 Structure diagram of strategy network

Fig.6 Structure diagram of critic networks and target networks

Fig.7 Pseudocode of LMG penetration decision algorithm

Table 3 Parameters of LMG penetration decision algorithm

名称	取值
优化器	Adam
策略网络学习率	0.001
评价网络学习率	0.001
经验池大小	100000
采样数据规模	128
奖励折扣因子	0.99
温度系数	0.2
滑动平均更新系数	0.995
动作探索方差	0.5
随机数	0
动作约束	[-1,1]
策略更新开始时刻	30000

Table 4 Parameters of LMG penetration scenario

名称	取值
场景边界l_x×l_y×l_z/km	10×10×2
巡飞弹速度 $v i M$ /(m·s^-1)	100
巡飞弹可用控制过载 $n y_m a x M$ /g	5
拦截器速度 $v i I$ /(m·s^-1)	200
拦截器可用控制过载 $n y_m a x I$ /g	3
巡飞弹巡航高度H/m	1000
巡飞弹有效杀伤范围R^MT/m	20
拦截器有效杀伤范围R^IM/m	20
拦截器最大工作时间 $t m a x I$ /s	50
拦截器比例制导律系数ξ	4
巡飞弹最大工作时间 $t m a x M$ /s	200
巡飞弹最小安全距离R^MM/m	20
防空火力区危险边界厚度L^D-R^D/m	200

Table 4 Parameters of LMG penetration scenario

名称	取值
场景边界l_x×l_y×l_z/km	10×10×2
巡飞弹速度 $v i M$ /(m·s^-1)	100
巡飞弹可用控制过载 $n y_m a x M$ /g	5
拦截器速度 $v i I$ /(m·s^-1)	200
拦截器可用控制过载 $n y_m a x I$ /g	3
巡飞弹巡航高度H/m	1000
巡飞弹有效杀伤范围R^MT/m	20
拦截器有效杀伤范围R^IM/m	20
拦截器最大工作时间 $t m a x I$ /s	50
拦截器比例制导律系数ξ	4
巡飞弹最大工作时间 $t m a x M$ /s	200
巡飞弹最小安全距离R^MM/m	20
防空火力区危险边界厚度L^D-R^D/m	200

Table 5 Initial status of typical penetration scenario 1

名称	x₀/m	y₀/m	z₀/m	v₀/ (m·s^-1)	φ₀/ (°)	R^D/m
巡飞弹1	-3000	0	1000	100	90
巡飞弹2	0	0	1000	100	90
巡飞弹3	2000	0	1000	100	90
目标	-1000	9000	1000	0
防空区1	-4000	5500	0			1000
防空区2	-3500	4000	0			1500
防空区3	2000	5000	0			2000
拦截器1	-2000	8000	1000	200	-90
拦截器2	-1200	7500	1000	200	-90
拦截器3	3000	8200	1000	200	-90

Fig.8 Reward curve of Scenario 1

Fig.9 Success rate curve of LMG mission in Scenario 1

Table 6 Monte Carlo simulation results of Scenario 1

算法	任务成功	被拦截器击中	撞击障碍区	超出边界约束	超出时间约束	相互碰撞坠毁
KASAC	299	1	0	0	0	0
SAC	197	3	99	1	0	0
VAAPF	133	55	88	16	6	2

Fig.10 One-step calculation time for penetration strategy generation of Scenario 1

Fig.11 Typical penetration trajectory in Scenario 1

Fig.12 Analysis of LMG penetration situation in Scenario 1

Fig.13 LMG overload variation curve of Scenario 1

Table 7 Initial status of typical penetration scenario 2

名称	x₀/m	y₀/m	z₀/m	v₀/ (m·s^-1)	φ₀/ (°)	R^D/m
巡飞弹1	-3000	0	1000	100	90
巡飞弹2	-500	-500	1000	100	90
巡飞弹3	1500	0	1000	100	90
目标	-1000	9000	1000	0
防空区1	3000	40000	0			1800
防空区2	-3500	4000	0			1600
防空区3	-1000	6000	0			1200
拦截器1	-4000	8000	1000	200	-90
拦截器2	-1800	7800	1000	200	-90
拦截器3	2000	8200	1000	200	-90

Table 8 Monte Carlo simulation results of Scenario 2

算法	任务成功	被拦截器击中	撞击障碍区	超出边界约束	超出时间约束	相互碰撞坠毁
KASAC	295	0	0	3	2	0
SAC	200	0	100	0	0	0
VAAPF	90	84	103	22	0	1

Fig.14 Typical penetration trajectory of Scenario 2

Fig.15 LMG overload variation curve of Scenario 2

References 24

[1]	孙亚楠, 钟选明, 王俐云, 等. 天基信息支持远程精确打击作战及其体系建设的需求[J]. 战术导弹技术, 2018(5):13-18.
	SUN Y N, ZHONG X M, WANG L Y, et al. Space-based information supports long-range precision strike operations and its system construction requirements[J]. Tactical Missile Technology, 2018(5):13-18. (in Chinese)
[2]	张堃, 刘泽坤, 华帅, 等. 基于T/S-SAS的多无人机四维协同攻击航线生成[J]. 兵工学报, 2023, 44(6):1576-1587. doi: 10.12382/bgxb.2022.0211
	ZHANG K, LIU Z K, HUA S, et al. Influence of different bore structures on engraving process on projectile[J]. Acta Armamentarii, 2023, 44(6):1576-1587. (in Chinese) doi: 10.12382/bgxb.2022.0211
[3]	YANG L, ZHANG X J, ZHANG Y, et al. Collision free 4D path planning for multiple UAVs based on spatial refined voting mechanism and PSO approach[J]. Chinese Journal of Aeronautics, 2019, 32(6):1504-1519.
[4]	王宁宇, 白瑜亮, 魏金鹏, 等. 多弹最优协同诱导突防制导律[J]. 宇航学报, 2022, 43(4):434-444.
	WANG N Y, BAI Y L, WEI J P, et al. Guidance law for multi-missile optimal cooperative lured penetration[J]. Journal of Astronautics, 2022, 43(4):434-444. (in Chinese)
[5]	赵军民, 何浩哲, 王少奇, 等. 复杂环境下多无人机目标跟踪与避障联合航迹规划研究[J]. 兵工学报, 2023, 44(9):2685-2696. doi: 10.12382/bgxb.2022.0525
	ZHAO J M, HE H Z, WANG S Q, et al. Research on joint path planning for multiple UAVs target tracking and obstacle avoidance in complicated environment[J]. Acta Armamentarii, 2023, 44(9):2685-2696. (in Chinese)
[6]	郭华, 郭小和. 改进速度障碍法的无人机局部路径规划算法[J]. 航空学报, 2023, 44(11):271-281.
	GUO H, GUO X H. Local path planning algorithm for UAV based on improved velocity obstacle method[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(11):271-281. (in Chinese)
[7]	SU W S, YAO D N, LI K B, et al. A novel biased proportional navigation guidance law for close approach phase[J]. Chinese Journal of Aeronautics, 2016, 19(1):228-237.
[8]	ZHANG N, GAI W D, ZHONG M Y, et al. A fast finite-time convergent guidance law with nonlinear disturbance observer for unmanned aerial vehicles collision avoidance[J]. Aerospace Science & Technology, 2019, 86(Mar.): 204-214.
[9]	QIAN M S, WU Z, JIANG B. Cerebellar model articulation neural network-based distributed fault tolerant tracking control with obstacle avoidance for fixed-wing UAVs[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(5): 6841-6852.
[10]	王永雄, 田永永, 李璇, 等. 穿越稠密障碍物的自适应动态窗口法[J]. 控制与决策, 2019, 34(5):927-936.
	WANG Y X, TIAN Y Y, LI X, et al. Self-adaptive dynamic window approach in dense obstacles[J]. Control and Decision, 2019, 34(5):927-936. (in Chinese)
[11]	KONDO K, TSUCHIYA T. Predictive a.pngicial potential field for UAV obstacle avoidance[C]//Proceedings of the 2021 Asia-Pacific International Symposium on Aerospace Technology. Singapore: Springer, 2022: 493-506.
[12]	BAI C C, YAN P, PIAO H Y, et al. Learning-based multi-UAV flocking control with limited visual field and instinctive repulsion[J]. IEEE Transactions on Cybernetics, DOI: 10.1109/TCYB.2023.3246985.
[13]	JIN Y, WEI S Q, YUAN J, et al. Hierarchical and stable multiagent reinforcement learning for cooperative navigation control[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(1):90-103.
[14]	YAN C, WANG C, XIANG X J, et al. Collision-avoiding flocking with multiple fixed-wing UAVs in obstacle-cluttered environments:a task-specific curriculum-based MADRL approach[J]. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2023.3245124.
[15]	LIANG C Q, LIU L, LIU C. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN-LSTM fusion network[J]. Neural Networks, 2023, 162:21-33. doi: 10.1016/j.neunet.2023.02.027 pmid: 36878168
[16]	蒲志强, 易建强, 刘振, 等. 知识和数据协同驱动的群体智能决策方法研究综述[J]. 自动化学报, 2022, 48(3):627-643.
	PU Z Q, YI J Q, LIU Z, et al. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: a survey[J]. Acta Automatica Sinica, 2022, 48(3):627-643. (in Chinese)
[17]	WU C B, YU W N, LI G, et al. Deep reinforcement learning with dynamic window approach based collision avoidance path planning for maritime autonomous surface ships[J]. Ocean Engineering, 2023, 284:115208.
[18]	王珂, 穆朝絮, 蔡光斌, 等. 基于安全自适应强化学习的自主避障控制方法[J]. 中国科学:信息科学, 2022, 52(9):1672-1686.
	WANG K, MU C X, CAI G B, et al. Autonomous obstacle avoidance control method based on safe adaptive reinforcement learning[J]. Scientia Sinica Informationis, 2022, 52(9):1672-1686. (in Chinese)
[19]	SUI Z Z, PU Z G, YI J Q, et al. Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(6):2358-2372.
[20]	吴玲, 卢俊霖, 许俊飞. 激光武器反无人机集群建模与效能评估[J]. 激光与红外, 2022, 52(6):887-892.
	WU L, LU J L, XU J F. Modeling and effectiveness evaluation on UAV cluster interception using laser weapon systems[J]. Laser and Infrared, 2022, 52(6):887-892. (in Chinese)
[21]	高昂, 董志明, 叶红兵, 等. 基于深度强化学习的巡飞弹突防控制决策[J]. 兵工学报, 2021, 42(5):1101-1110 doi: 10.3969/j.issn.1000-1093.2021.05.023
	GAO A, DONG Z M, YE H B, et al. Loitering munition penetration control decision based on deep reinforcement learning[J]. Acta Armamentarii, 2021, 42(5):1101-1110. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.05.023
[22]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor: arXiv:1801.01290[R]. Ithaca,NY, US: Cornell University,2018:1801.01290.
[23]	BELLEMARE M G, DABNEY W, et al. A distributional perspective on reinforcement learning:arXiv:1707.06887[R]. Ithaca,NY, US: Cornell University, 2017:1707.06887.
[24]	张立华, 刘全, 黄志刚, 等. 逆向强化学习研究综述[J]. 软件学报, 2023, 34(10):4772-4803.
	ZHANG L H, LIU Q, HUANG Z G, et al. Survey on inverse reinforcement learning[J]. Journal of Software, 2023, 34(10): 4772-4803. (in Chinese)