引入虚拟目标的高超声速巡航导弹智能机动突防策略

doi:10.12382/bgxb.2023.1048

摘要/Abstract

摘要：

针对高超声速巡航导弹机动突防时弹道偏离难以约束、突防策略对不同作战场景的泛化性能较差等问题,提出一种基于虚拟目标和上下文马尔可夫决策过程的智能机动突防决策算法。在以预定弹道为轴线的管状弹道包络面内选定多个静止的虚拟目标,采用深度强化学习算法对其相对预定弹道的位置参数进行决策;用比例导引律引导巡航弹依次攻击这些虚拟目标,在包络面内塑造出能满足突防要求的机动弹道。基于上下文马尔可夫决策过程,将针对单个作战场景的最优突防策略拓展到作战场景的概率分布上,提升突防策略对不同作战场景的适应性。仿真结果表明:该智能机动突防策略能在突防的同时约束弹道偏离,在拦截弹发射位置和机动能力发生变化时仍能保持良好性能。

关键词: 高超声速巡航导弹, 机动突防, 虚拟目标, 上下文马尔可夫决策, 强化学习

Abstract:

An intelligent penetration policy using virtual targets and contextual Markov decision process (CMDP) for hypersonic cruise missiles is proposed to constrain the trajectory deviation and improve the generalization performance in different combat scenarios. The stationary virtual targets are chosen within a tubular envelope with the planned trajectory as axis, and the deep reinforcement learning algorithm is applied to decide their position relative to the axis. Then the proportional guidance law is used to guide the cruise missile to attack these virtual targets one by one with proportional guidance law, thus shaping a maneuvering trajectory meeting the requirements of penetration within the given envelope. The optimal penetration policy for a combat scenario is extended to the probability distribution of combat scenarios using CMDP to improve the generalization performance. The results demonstrate that the penetration policy constrains the trajectory deviation during penetraton and exhibits adaptability to variations of interceptor’s launch position and maneuvering capability.

Key words: hypersonic cruise missile, maneuvering penetration, virtual target, contextual Markov decision, reinforcement learning

中图分类号:

V249.31

李加申, 王晓芳, 林海. 引入虚拟目标的高超声速巡航导弹智能机动突防策略[J]. 兵工学报, 2024, 45(11): 3856-3867.

LI Jiashen, WANG Xiaofang, LIN Hai. Intelligent Penetration Policy for Hypersonic Cruise Missiles Based on Virtual Targets[J]. Acta Armamentarii, 2024, 45(11): 3856-3867.

图/表 18

图1 拦截弹-巡航弹-虚拟目标相对运动关系图

Fig.1 Interceptor-cruise missile-virtual target relative motion diagram

图2 虚拟目标位置示意图

Fig.2 Virtual target location diagram

图3 Actor网络结构

Fig.3 Architecture for Actor network

图4 Critic网络结构

Fig.4 Architecture for Critic network

图5 Actor和Critic网络训练方法

Fig.5 Training methods for Actor and Critic networks

表1 网络结构超参数设置

Table 1 Hyperparameters of network architecture

参数	Actor	Critic
输入层	7、10	7、10
激活函数1	ReLU	ReLU
隐藏层1	512	512
激活函数2	ReLU	ReLU
隐藏层2	256	256
激活函数3	ReLU	ReLU
α、β优化层	256
激活函数4和5	Softplus
输出层	2	1

表2 网络训练超参数设置

Table 2 Hyperparameters for network training

参数	数值
BatchSize	140
MiniBatchSize	70
训练回合数	2000
奖励折扣因子γ	0.9
GAE平滑因子λ	0.9
Epoch	4
策略熵系数k_e	0.005
重要性采样权重裁剪因子ε	0.2
初始学习率	0.015
学习率衰减节点	20、40、60、80
学习率衰减因子	0.6

图6 策略的平均累计奖励

Fig.6 Average cumulative reward of the policies

图7 策略在Ctrain各个上下文中的hm

Fig.7 hm of the policies in each context of Ctrain

图8 策略在Ctrain各个上下文中的

Fig.8 τm of the policies in each context of Ctrain

图9 4种突防策略的性能对比

Fig.9 Performance comparison of four policies

图10 虚拟目标的径向距离

Fig.10 Radial distance of virtual target

图11 虚拟目标的周向扭转角

Fig.11 Rotation angle of virtual target

图12 拦截弹的速度矢量前置角

Fig.12 Lead angle of velocity vector for interceptor

图13 巡航弹加速度变化图

Fig.13 Change in acceleration of cruise missile

图14 拦截弹加速度变化图

Fig.14 Change in acceleration of interceptor

图15 攻防双方飞行器机动弹道

Fig.15 Maneuvering trajectories of both vehicles

图16 包络面内的巡航弹机动弹道和虚拟目标

Fig.16 Maneuvering trajectories of the cruise missile and the virtual targets within the envelope

参考文献 21

[1]	雷虎民, 骆长鑫, 周池军, 等. 临近空间防御作战拦截弹制导与控制关键技术综述[J]. 航空兵器, 2021, 28(2):1-10.
	LEI H M, LUO C X, ZHOU C J, et al. Summary of key technologies of interceptor guidance and control in near space defense operations[J]. Aero Weaponry, 2021, 28(2):1-10. (in Chinese)
[2]	汪丰麟, 李沁远, 范博, 等. 高超声速武器防御体系的发展现状与演进趋势[J]. 指挥与控制学报, 2022, 8(4):378-388.
	WANG F L, LI Q Y, FAN B, et al. Development status and trends of hypersonic weapon defense system[J]. Journal of Command and Control, 2022, 8(4):378-388. (in Chinese)
[3]	张荣升, 陈万春. THAAD增程型拦截弹预测制导方法[J]. 北京航空航天大学学报, 2021, 47(4):863-874.
	ZHANG R S, CHEN W C. Predictive guidance method of THAAD-ER interceptor[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(4):863-874. (in Chinese)
[4]	石安华, 李海燕, 石卫波, 等. 临近空间高超声速巡航飞行器红外特征[J]. 兵工学报, 2022, 43(4):796-803.
	SHI A H, LI H Y, SHI W B, et al. Infrared radiation feature of near space hypersonic cruise vehicle[J]. Acta Armamentarii, 2022, 43(4):796-803. (in Chinese) doi: 10.12382/bgxb.2021.0105
[5]	AN H, WU Q Q. Adaptive control of variable geometry inlet-configured air-breathing hypersonic vehicles[J]. Journal of Spacecraft and Rockets, 2019, 56(5):1520-1530.
[6]	DALLE D, TORREZ S, DRISCOLL J. Turn performance of an air-breathing hypersonic vehicle[C]// Proceedings of AIAA Atmospheric Flight Mechanics Conference. Oregon, Portland: AIAA, 2011.
[7]	郭行, 符文星, 付斌, 等. 吸气式高超声速飞行器巡航段突防弹道规划[J]. 宇航学报, 2017, 38(3):287-295.
	GUO H, FU W X, FU B, et al. Penetration trajectory programming for air breathing hypersonic vehicles during the cruise phase[J]. Journal of Astronautics, 2017, 38(3):287-295. (in Chinese)
[8]	王雨琪, 宁国栋, 王晓峰, 等. 基于微分对策的临近空间飞行器机动突防策略[J]. 航空学报, 2020, 41(增刊2):724276.
	WANG Y Q, NING G D, WANG X F, et al. Maneuver penetration strategy of near space vehicle based on differential game[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S2): 724276. (in Chinese)
[9]	ENGLISH J T, WILHELM J P. Defender-Aware attacking guidance policy for the target-attacker-defender differential game[J]. Journal of Aerospace Information Systems, 2021, 18(6):366-376.
[10]	HAREL M, MOSHAIOV A, ALKAHER D. Rationalizable strategies for the navigator-target-missile game[J]. Journal of Guidance, Control, and Dynamics, 2020, 43(6): 1129-1142.
[11]	王芳, 林涛, 张克. 基于控制变量参数化的主动反拦截突防最优控制计算方法[J]. 航空学报, 2015, 36(6):2037-2046. doi: 10.7527/S1000-6893.2014.0359
	WANG F, LIN T, ZHANG K. Control variable parameterization-based computational method for optimal control of initiative anti-interception penetration[J]. Acta Aeronautica et Astronautica Sinica, 2015, 36(6): 2037-2046. (in Chinese) doi: 10.7527/S1000-6893.2014.0359
[12]	樊博璇, 陈桂明, 林洪涛. 弹道导弹中段反应式机动突防规避策略[J]. 兵工学报, 2022, 43(1):69-78.
	FAN B X, CHEN G M, LIN H T. Mid-course reactive maneuver penetration and evading strategy of ballistic missile[J]. Acta Armamentarii, 2022, 43(1): 69-78. (in Chinese) doi: 10.3969/j.issn.1000-1093.2022.01.008
[13]	QIU X Q, GAO C S, JING W X. Maneuvering penetration strategies of ballistic missiles based on deep reinforcement learning[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2022, 236(16): 3494-3504.
[14]	张晚晴, 余文斌, 李静琳, 等. 基于纵程解析解的飞行器智能横程机动再入协同制导[J]. 兵工学报, 2021, 42(7): 1400-1411.
	ZHANG W Q, YU W B, LI J L, et al. Cooperative reentry guidance for intelligent lateral maneuver of hypersonic vehicle based on downrange analytical solution[J]. Acta Armamentarii, 2021, 42(7):1400-1411. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.07.007
[15]	WANG Y K, ZHAO K, GUIRAO J L G, et al. Online intelligent maneuvering penetration methods of missile with respect to unknown intercepting strategies based on reinforcement learning[J]. Electronic Research Archive, 2022, 30(12): 4366-4381.
[16]	吴杰, 张成, 李淼, 等. 基于凸优化和LQR的火箭返回轨迹跟踪制导[J]. 北京航空航天大学学报, 2022, 48(11):2270-2280.
	WU J, ZHANG C, LI M, et al. Rocket return trajectory tracking guidance based on convex optimization and LQR[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(11):2270-2280. (in Chinese)
[17]	KIRK R, ZHANG A, GREFENSTETTE E, et al. A survey of zero-shot generalization in deep reinforcement learning[J]. Journal of Artificial Intelligence Research, 2023, 76:201-264.
[18]	王琦, 杨毅远, 江季. Easy RL:强化学习教程[M]. 北京: 人民邮电出版社, 2022:37-98.
	WANG Q, YANG Y Y, JIANG J. Easy RL: reinforcement learning tutorial[M]. Beijing: Posts & Telecom Press, 2022:37-98. (in Chinese)
[19]	CHOU P W, MATURANA D, SCHERER S. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution[C]// Proceedings of the 34th International Conference on Machine Learning.Sydney, Australia: JMLR.org, 2017.
[20]	HAARNOJA T, TANG H, ABBEEL P, et al. Reinforcement Learning with Deep Energy-Based Policies[C]// Proceedings of the 34th International Conference on Machine Learning.Sydney, Australia:JMLR.org, 2017.
[21]	HUANG W, DU W, XU R Y D. On the neural tangent kernel of deep networks with orthogonal initialization[C]// Proceedings of the 30th International Joint Conference on Artificial Intelligence.Montreal,Canada:IJCAI, 2021.