Intelligent Penetration Policy for Hypersonic Cruise Missiles Based on Virtual Targets

doi:10.12382/bgxb.2023.1048

Abstract

Abstract:

An intelligent penetration policy using virtual targets and contextual Markov decision process (CMDP) for hypersonic cruise missiles is proposed to constrain the trajectory deviation and improve the generalization performance in different combat scenarios. The stationary virtual targets are chosen within a tubular envelope with the planned trajectory as axis, and the deep reinforcement learning algorithm is applied to decide their position relative to the axis. Then the proportional guidance law is used to guide the cruise missile to attack these virtual targets one by one with proportional guidance law, thus shaping a maneuvering trajectory meeting the requirements of penetration within the given envelope. The optimal penetration policy for a combat scenario is extended to the probability distribution of combat scenarios using CMDP to improve the generalization performance. The results demonstrate that the penetration policy constrains the trajectory deviation during penetraton and exhibits adaptability to variations of interceptor’s launch position and maneuvering capability.

Key words: hypersonic cruise missile, maneuvering penetration, virtual target, contextual Markov decision, reinforcement learning

CLC Number:

V249.31

LI Jiashen, WANG Xiaofang, LIN Hai. Intelligent Penetration Policy for Hypersonic Cruise Missiles Based on Virtual Targets[J]. Acta Armamentarii, 2024, 45(11): 3856-3867.

Figures/Tables 18

Fig.1 Interceptor-cruise missile-virtual target relative motion diagram

Fig.2 Virtual target location diagram

Fig.3 Architecture for Actor network

Fig.4 Architecture for Critic network

Fig.5 Training methods for Actor and Critic networks

Table 1 Hyperparameters of network architecture

参数	Actor	Critic
输入层	7、10	7、10
激活函数1	ReLU	ReLU
隐藏层1	512	512
激活函数2	ReLU	ReLU
隐藏层2	256	256
激活函数3	ReLU	ReLU
α、β优化层	256
激活函数4和5	Softplus
输出层	2	1

Table 2 Hyperparameters for network training

参数	数值
BatchSize	140
MiniBatchSize	70
训练回合数	2000
奖励折扣因子γ	0.9
GAE平滑因子λ	0.9
Epoch	4
策略熵系数k_e	0.005
重要性采样权重裁剪因子ε	0.2
初始学习率	0.015
学习率衰减节点	20、40、60、80
学习率衰减因子	0.6

Fig.6 Average cumulative reward of the policies

Fig.7 hm of the policies in each context of Ctrain

Fig.8 τm of the policies in each context of Ctrain

Fig.9 Performance comparison of four policies

Fig.10 Radial distance of virtual target

Fig.11 Rotation angle of virtual target

Fig.12 Lead angle of velocity vector for interceptor

Fig.13 Change in acceleration of cruise missile

Fig.14 Change in acceleration of interceptor

Fig.15 Maneuvering trajectories of both vehicles

Fig.16 Maneuvering trajectories of the cruise missile and the virtual targets within the envelope

References 21

[1]	雷虎民, 骆长鑫, 周池军, 等. 临近空间防御作战拦截弹制导与控制关键技术综述[J]. 航空兵器, 2021, 28(2):1-10.
	LEI H M, LUO C X, ZHOU C J, et al. Summary of key technologies of interceptor guidance and control in near space defense operations[J]. Aero Weaponry, 2021, 28(2):1-10. (in Chinese)
[2]	汪丰麟, 李沁远, 范博, 等. 高超声速武器防御体系的发展现状与演进趋势[J]. 指挥与控制学报, 2022, 8(4):378-388.
	WANG F L, LI Q Y, FAN B, et al. Development status and trends of hypersonic weapon defense system[J]. Journal of Command and Control, 2022, 8(4):378-388. (in Chinese)
[3]	张荣升, 陈万春. THAAD增程型拦截弹预测制导方法[J]. 北京航空航天大学学报, 2021, 47(4):863-874.
	ZHANG R S, CHEN W C. Predictive guidance method of THAAD-ER interceptor[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(4):863-874. (in Chinese)
[4]	石安华, 李海燕, 石卫波, 等. 临近空间高超声速巡航飞行器红外特征[J]. 兵工学报, 2022, 43(4):796-803.
	SHI A H, LI H Y, SHI W B, et al. Infrared radiation feature of near space hypersonic cruise vehicle[J]. Acta Armamentarii, 2022, 43(4):796-803. (in Chinese) doi: 10.12382/bgxb.2021.0105
[5]	AN H, WU Q Q. Adaptive control of variable geometry inlet-configured air-breathing hypersonic vehicles[J]. Journal of Spacecraft and Rockets, 2019, 56(5):1520-1530.
[6]	DALLE D, TORREZ S, DRISCOLL J. Turn performance of an air-breathing hypersonic vehicle[C]// Proceedings of AIAA Atmospheric Flight Mechanics Conference. Oregon, Portland: AIAA, 2011.
[7]	郭行, 符文星, 付斌, 等. 吸气式高超声速飞行器巡航段突防弹道规划[J]. 宇航学报, 2017, 38(3):287-295.
	GUO H, FU W X, FU B, et al. Penetration trajectory programming for air breathing hypersonic vehicles during the cruise phase[J]. Journal of Astronautics, 2017, 38(3):287-295. (in Chinese)
[8]	王雨琪, 宁国栋, 王晓峰, 等. 基于微分对策的临近空间飞行器机动突防策略[J]. 航空学报, 2020, 41(增刊2):724276.
	WANG Y Q, NING G D, WANG X F, et al. Maneuver penetration strategy of near space vehicle based on differential game[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S2): 724276. (in Chinese)
[9]	ENGLISH J T, WILHELM J P. Defender-Aware attacking guidance policy for the target-attacker-defender differential game[J]. Journal of Aerospace Information Systems, 2021, 18(6):366-376.
[10]	HAREL M, MOSHAIOV A, ALKAHER D. Rationalizable strategies for the navigator-target-missile game[J]. Journal of Guidance, Control, and Dynamics, 2020, 43(6): 1129-1142.
[11]	王芳, 林涛, 张克. 基于控制变量参数化的主动反拦截突防最优控制计算方法[J]. 航空学报, 2015, 36(6):2037-2046. doi: 10.7527/S1000-6893.2014.0359
	WANG F, LIN T, ZHANG K. Control variable parameterization-based computational method for optimal control of initiative anti-interception penetration[J]. Acta Aeronautica et Astronautica Sinica, 2015, 36(6): 2037-2046. (in Chinese) doi: 10.7527/S1000-6893.2014.0359
[12]	樊博璇, 陈桂明, 林洪涛. 弹道导弹中段反应式机动突防规避策略[J]. 兵工学报, 2022, 43(1):69-78.
	FAN B X, CHEN G M, LIN H T. Mid-course reactive maneuver penetration and evading strategy of ballistic missile[J]. Acta Armamentarii, 2022, 43(1): 69-78. (in Chinese) doi: 10.3969/j.issn.1000-1093.2022.01.008
[13]	QIU X Q, GAO C S, JING W X. Maneuvering penetration strategies of ballistic missiles based on deep reinforcement learning[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2022, 236(16): 3494-3504.
[14]	张晚晴, 余文斌, 李静琳, 等. 基于纵程解析解的飞行器智能横程机动再入协同制导[J]. 兵工学报, 2021, 42(7): 1400-1411.
	ZHANG W Q, YU W B, LI J L, et al. Cooperative reentry guidance for intelligent lateral maneuver of hypersonic vehicle based on downrange analytical solution[J]. Acta Armamentarii, 2021, 42(7):1400-1411. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.07.007
[15]	WANG Y K, ZHAO K, GUIRAO J L G, et al. Online intelligent maneuvering penetration methods of missile with respect to unknown intercepting strategies based on reinforcement learning[J]. Electronic Research Archive, 2022, 30(12): 4366-4381.
[16]	吴杰, 张成, 李淼, 等. 基于凸优化和LQR的火箭返回轨迹跟踪制导[J]. 北京航空航天大学学报, 2022, 48(11):2270-2280.
	WU J, ZHANG C, LI M, et al. Rocket return trajectory tracking guidance based on convex optimization and LQR[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(11):2270-2280. (in Chinese)
[17]	KIRK R, ZHANG A, GREFENSTETTE E, et al. A survey of zero-shot generalization in deep reinforcement learning[J]. Journal of Artificial Intelligence Research, 2023, 76:201-264.
[18]	王琦, 杨毅远, 江季. Easy RL:强化学习教程[M]. 北京: 人民邮电出版社, 2022:37-98.
	WANG Q, YANG Y Y, JIANG J. Easy RL: reinforcement learning tutorial[M]. Beijing: Posts & Telecom Press, 2022:37-98. (in Chinese)
[19]	CHOU P W, MATURANA D, SCHERER S. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution[C]// Proceedings of the 34th International Conference on Machine Learning.Sydney, Australia: JMLR.org, 2017.
[20]	HAARNOJA T, TANG H, ABBEEL P, et al. Reinforcement Learning with Deep Energy-Based Policies[C]// Proceedings of the 34th International Conference on Machine Learning.Sydney, Australia:JMLR.org, 2017.
[21]	HUANG W, DU W, XU R Y D. On the neural tangent kernel of deep networks with orthogonal initialization[C]// Proceedings of the 30th International Joint Conference on Artificial Intelligence.Montreal,Canada:IJCAI, 2021.