基于强化学习的高超声速滑翔飞行器自适应末制导

doi:10.12382/bgxb.2024.0222

摘要/Abstract

摘要：

针对高超声速滑翔飞行器末制导段存在的动力学模型参数不确定性,以及传统强化学习算法收敛速度慢的问题,提出一种基于强化学习的自适应制导方法。将标称条件下的高超声速滑翔飞行器末制导问题转化为最优控制问题,并根据序列凸优化算法进行求解得到状态-控制对的数据集:基于监督学习对数据集进行拟合,得到相应的神经网络制导模型:引入气动参数偏差、控制响应延迟系数不确定性以及状态测量噪声等干扰,通过飞行器与当前环境的大量交互,基于强化学习进一步优化神经网络制导模型。数值仿真结果表明,新提出的制导方法与监督学习制导方法相比具有更好的鲁棒性与精确性。

关键词: 高超声速滑翔飞行器, 监督学习, 强化学习, 自适应末制导

Abstract:

Addressing the uncertainty of dynamic model parameters in the terminal guidance phase of hypersonic gliding vehicles and the slow convergence speed of traditional reinforcement learning algorithm,an adaptive guidance algorithm based on reinforcement learning is proposed.The terminal guidance problem for hypersonic gliding vehicles under nominal conditions is converted into an optimal control problem,which is solved using the sequential convex optimization algorithm to generate a dataset of state-control pairs.The dataset is fitted through supervised learning to obtain a corresponding guidance model.The disturbances such as aerodynamic parameter deviation,uncertainty in control response delay coefficient,and state measurement noise are introduced,and the guidance model is further optimized based on the reinforcement learning framework through numerous interactions between the vehicle and the current environment.Numerically simulated results indicate that the proposed guidance method exhibits better robustness and accuracy compared to the supervised learning guidance method.

Key words: hypersonic gliding vehicle, supervised learning, reinforcement learning, adaptive terminal guidance

中图分类号:

V448

肖柳骏, 李雅轩, 刘新福. 基于强化学习的高超声速滑翔飞行器自适应末制导[J]. 兵工学报, 2025, 46(2): 240222-.

XIAO Liujun, LI Yaxuan, LIU Xinfu. Adaptive Terminal Guidance for Hypersonic Gliding Vehicles Using Reinforcement Learning[J]. Acta Armamentarii, 2025, 46(2): 240222-.

图/表 15

图1 三维空间内制导几何关系

Fig.1 Three-dimensional guidance geometry

图2 神经网络输入参数的定义

Fig.2 Definition of input parameters for neural networks

表1 神经网络模型结构

Table 1 Network layer size

层数	神经元数量	激活函数
输入层	5	Tanh
隐藏层1	24	Tanh
隐藏层2	18	Tanh
输出层	2

表2 飞行器末制导初始状态参数

Table 2 Initial state parameters of terminal guidance of aerial vehicle

飞行器起始状态参数	取值范围
速度v/(m·s^-1)	(1700,1800)
水平面弹目距离d/km	(-40,-30)
弹道倾角γ/(°)	(-7.5,-2.5)
弹道偏角ψ/(°)	(5,15)

图3 序列凸优化求解的最优轨迹

Fig.3 Optimal trajectories generated by sequential convex optimization

图4 均方误差随迭代次数的变化曲线

Fig.4 MSE-epoch variation curve

表3 工况参数描述

Table 3 Description of operating parameters

参数	控制响应延迟/s	气动参数偏差/%	状态测量噪声/%
工况1	0.1	0	0
工况2	0	10	0
工况3	0.1s	10	1

表4 DDPG算法参数设置

Table 4 Parameters of DDPG algorithm

学习率	优化器	折扣因子	取样数	经验池
1×10^-4	Adam	0.99	128	1×10⁶

表5 不同方法的脱靶量

Table 5 Miss distances of different methods m

方法	工况1	工况2	工况3
开环指令制导	154.09	669.11	869.11
监督学习制导	35.86	106.27	139.02
强化学习自适应制导	5.62	12.78	15.45

图5 工况3飞行轨迹对比

Fig.5 Comparison of trajectories in Case 3

图6 工况3攻角与倾侧角对比

Fig.6 Comparison of angle of attack and bank angle in Case 3

图7 工况3下高度、速度、航迹角、航向角对比

Fig.7 Comparison of altitude, velocity, flight path angle and flight heading angle in Case 3

图8 收益函数对比

Fig.8 Comparison of reward functions

图9 不考虑气动参数偏差下两种制导方法的落点分布

Fig.9 Impact point distributions of two guidance methods without considering the deviation of aerodynamic parameters

图10 考虑气动参数偏差下两种制导方法的落点分布

Fig.10 Impact point distributions of two guidance methods considering the deviation of aerodynamic parameters

参考文献 25

[1]	董金鲁, 马悦萌, 周荻, 等. 临近空间高超声速飞行器的直接力与襟翼复合滑模控制[J]. 兵工学报, 2023, 44(2):496-506. doi: 10.12382/bgxb.2021.0690
	DONG J L, MA Y M, ZHOU D, et al. A composite sliding mode control scheme based on reaction jets and flaps for near-space hypersonic vehicles[J]. Acta Armamentarii, 2023, 44(2):496-506. (in Chinese) doi: 10.12382/bgxb.2021.0690
[2]	MURTAUGH S A, CRIEL H E. Fundamentals of proportional navigation[J]. IEEE Spectrum, 1966, 3(12):75-85.
[3]	刘畅, 王江, 范世鹏, 等. 基于BP神经网络的自适应偏置比例导引[J]. 兵工学报, 2022, 43(11):2798-2809.
	LIU C, WANG J, FAN S P, et al. BP neural network-based adaptive biased proportional navigation guidance law[J]. Acta Armamentarii, 2022, 43(11):2798-2809. (in Chinese) doi: 10.12382/bgxb.2021.0594
[4]	ULYBYSHEV Y. Terminal guidance law based on proportional navigation[J]. Journal of Guidance,Control,and Dynamics, 2005, 28(4):821-824.
[5]	LU P, DOMAN D B, SCHIERMAN J D. Adaptive terminal guidance for hypervelocity impact in specified direction[J]. Journal of Guidance,Control,and Dynamics, 2006, 29(2):269-278.
[6]	GUO R Y, DING Y B, YUE X K. Active adaptive continuous nonsingular terminal sliding mode controller for hypersonic vehicle[J]. Aerospace Science and Technology, 2023, 137:108279.
[7]	WU P, YANG M. Integrated guidance and control design for missile with terminal impact angle constraint based on sliding mode control[J]. Journal of Systems Engineering and Electronics, 2010, 21(4):623-628.
[8]	SI Y J, SONG S M. Adaptive reaching law based three-dimensional finite-time guidance law against maneuvering targets with input saturation[J]. Aerospace Science and Technology, 2017, 70:198-210.
[9]	LIU L H, ZHU J W, TANG G J, et al. Diving guidance via feedback linearization and sliding mode control[J]. Aerospace Science and Technology, 2015, 41:16-23.
[10]	ZUO Z Y. Non-singular fixed-time terminal sliding mode control of non-linear systems[J]. IET Control Theory and Applications, 2015, 9(4):545-552.
[11]	PAN B F, MA Y Y, YAN R. Newton-type methods in computational guidance[J]. Journal of Guidance,Control,and Dynamics, 2019, 42(2):377-383.
[12]	LU P. Introducing computational guidance and control[J]. Journal of Guidance,Control,and Dynamics, 2017, 40(2):193.
[13]	KIM B, LEE C H. Optimal midcourse guidance for dual-pulse rocket using pseudospectral sequential convex programming[J]. Journal of Guidance,Control,and Dynamics, 2023, 46(7):1425-1436.
[14]	LIU X F, SHEN Z J, LU P. Closed-loop optimization of guidance gain for constrained impact[J]. Journal of Guidance,Control,and Dynamics, 2016, 40(2):453-460.
[15]	COTTRELL R G, VINCENT T L, SADATI S H. Minimizing interceptor size using neural networks for terminal guidance law synthesis[J]. Journal of Guidance,Control,and Dynamics, 1996, 19(3):557-562.
[16]	SHI Y, WANG Z B. Onboard generation of optimal trajectories for hypersonic vehicles using deep learning[J]. Journal of Spacecraft and Rockets, 2021, 58(2):400-414.
[17]	SU L, WANG J, CHEN H. A real-time and optimal hypersonic entry guidance method using inverse reinforcement learning[J]. Aerospace, 2023, 10(11):948.
[18]	HE S M, SHIN H S, TSOURDOS A. Computational missile guidance:A deep reinforcement learning approach[J]. Journal of Aerospace Information Systems, 2021, 18(8):571-582.
[19]	李庆波, 李芳, 董瑞星, 等. 利用强化学习开展比例导引律的导航比设计[J]. 兵工学报, 2022, 43(12):3040-3047.
	LI Q B, LI F, DONG R X, et al. Navigation ratio design of proportional navigation law using reinforcement learning[J]. Acta Armamentarii, 2022, 43(12):3040-3047. (in Chinese) doi: 10.12382/bgxb.2021.0631
[20]	GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets[J]. Aerospace Science and Technology, 2020, 99:105746.
[21]	LIANG C, WANG W H, LIU Z H, et al. Learning to guide:Guidance law based on deep meta-learning and model predictive path integral control[J]. IEEE Access, 2019, 7:47353-47365.
[22]	GAUDET B, FURFARO R. Terminal adaptive guidance for autonomous hypersonic strike weapons via reinforcement metalearning[J]. Journal of Spacecraft and Rockets, 2023, 60(1):286-298.
[23]	LIU X F, LU P, PAN B F. Survey of convex optimization for aerospace applications[J]. Astrodynamics, 2017, 1:23-40.
[24]	GAUDET B, FURFARO R. Adaptive pinpoint and fuel efficient Mars landing using reinforcement learning[J]. Journal of Automatic Sinica, 2014, 1(4):397-411.
[25]	LI Y X, LIU X F, HE X H, et al. Cooperative optimal guidance of hypersonic glide vehicles by real-time optimization and deep learning[J]. Proceedings of the Institution of Mechanical Engineers,Part G:Journal of Aerospace Engineering, 2023, 237(10):2266-2283.