基于DQN变动力智能决策的轨迹规划

doi:10.12382/bgxb.2023.1009

摘要/Abstract

摘要：

针对航天飞行器气动力不足难以维持应急侧向操纵确保安全避开障碍物的问题,提出一种基于深度Q学习网络(Deep Q-learning Network, DQN)变动力智能决策的轨迹规划方法。根据变动力航天飞行器运动学方程,设计基于航程误差的纵向制导律和考虑避开障碍物的横侧向制导律,用于实时校正倾侧角的幅值和符号,保证终端制导精度和绕飞安全性。从变动力智能决策层面出发,将航天飞行器动力档位调节问题转化为马尔可夫决策过程,以攻角、马赫数以及航天飞行器与障碍物的相对距离为状态空间,以航天飞行器动力档位为动作空间,设计考虑碰撞概率和终端约束偏差的奖励函数,构建DQN网络对智能体进行训练,以得到最佳动力档位。仿真结果表明,所提算法可以赋能航天飞行器在满足终端约束条件下提升运动过程的横向避障能力。

关键词: 航天飞行器, 深度Q学习网络, 变动力, 智能决策, 轨迹规划

Abstract:

The aerospace craft faces difficulty in maintaining emergency lateral maneuver to avoid obstacle due to aerodynamic deficiency. Therefore, a trajectory planning method based on DQN variable dynamic intelligent decision is proposed. According to the kinematics equations for variable dynamic aerospace craft, the longitudinal guidance law based on range error and the lateral guidance law based on line-of-sight angle deviation are designed to respectively correct the heeling angle amplitude and symbol in real time, which ensures the terminal guidance accuracy and safety. In consideration of variable dynamic intelligent decision, the dynamic gear switching problem of aerospace craft is transformed into a Markov decision process. Then, the angle of attack, Mach, and relative distance from the aerospace craft to obstacle are taken as the state space, and the power gear position of aerospace craft is used as the action space. A reward function, considering the lowest collision probability and the smallest terminal error, is designed. and a DQN network is constructed to train the agent to obtain the best power gear. The simulated results show that the proposed algorithm can enable the aerospace craft to improve the lateral maneuverability during moving under the terminal constraints.

Key words: aerospace craft, deep Q-network, variable force, intelligent decision making, trajectory planning

中图分类号:

V448.235

梅泽伟, 李天任, 朱佳琳, 邵星灵, 丁天雲, 刘俊. 基于DQN变动力智能决策的轨迹规划[J]. 兵工学报, 2024, 45(12): 4395-4406.

MEI Zewei, LI Tianren, ZHU Jialin, SHAO Xingling, DING Tianyun, LIU Jun. A Trajectory Planning Method Based on DQN Variable Dynamic Intelligent Decision[J]. Acta Armamentarii, 2024, 45(12): 4395-4406.

图/表 18

图1 变动力航天飞行器侧喷发动机分布示意图

Fig.1 Distribution diagram of side ejector motor of variable power aerospace craft

图2 变动力智能决策的轨迹规划方法结构框图

Fig.2 Structure block diagram of trajectory planning method for variable dynamic intelligent decision

图3 障碍物示意图

Fig.3 Schematic diagram of obstacle

图4 多层前馈神经网络结构

Fig.4 Structure diagram of multi-layer feedforward neural network

图5 DQN训练流程

Fig.5 Training process of DQN

图6 基于DQN的变动力智能决策算法伪代码

Fig.6 Pseudocodes of DQN-based variable dynamic intelligent decision algorithm

表1 DQN的参数设置

Table 1 Parameter setting of DQN

参数	数值
奖励值	R₁=200, R₂=R₃=100, R₄=16
学习率	0.001
折扣因子	0.99
样品批量大小	256
经验池存储容量	1×10⁵

表2 仿真可调节参数

Table 2 Adjustable parameters of simulation

参数名称	相应数值
倾侧角最小值/(°)	0
倾侧角最大值/(°)	80
航向角阈值/(°)	8
航程容忍偏差的最小值/km	10
马赫容忍偏差的最小值	30/v_s

表3 场景1下的仿真结果对比

Table 3 Simulates results in Scenario 1

算法	高度偏差/km	经纬度偏差/(°)	制导精度/%
本文算法	0.32	(-0.0234,-0.0291)	58.18
无动力决策算法^[9]	0.36	(0.0321,-0.0824)	0

图7 探测半径内提供应急侧力的训练奖励

Fig.7 Training reward for providing emergency lateral force within the detection radius

图8 在探测半径内提供应急侧向力的横向航迹

Fig.8 Transverse track path diagram of providing emergency lateral force within the detection radius

图9 在探测半径内提供应急侧向力结果

Fig.9 Correlation diagram of providing emergency lateral force within the detection radius

图10 飞行全程提供额外侧向力的训练奖励

Fig.10 Training reward of providing additional lateral force in flight phase

图11 飞行全程提供额外侧向力的横向航迹

Fig.11 Transverse track path of providing additional lateral force in flight phase

图12 飞行全程提供额外侧向力结果

Fig.12 Correlated results of providing additional lateral force in flight phase

表4 场景2下的仿真结果对比

Table 4 Simulated results in Scenario 2

算法	高度偏差/km	经纬度偏差/(°)	制导精度/%
本文算法	0.14	(-0.0038,-0.0201)	98.06
无动力决策算法^[9]	0.39	(-1.0353,0.3467)	0

图13 与GPM对比横向航迹

Fig.13 Transverse track paths obtained by the proposed algorithm and GPM

图14 与GPM对比结果

Fig.14 Correlation results obtained by the proposed algorithm and GPM

参考文献 24

[1]	杜万闪, 周洲, 拜昱, 等. 组合式飞行器多体动力学建模与飞行力学特性[J]. 兵工学报, 2023, 44(8): 2245-2262. doi: 10.12382/bgxb.2022.0282
	DUN W S, ZHOU Z, BAI Y, et al. Study on multibody dynamics modeling and flight dynamic characteristics of combined aircraft[J]. Acta Armamentarii, 2023, 44(8): 2245-2262. (in Chinese) doi: 10.12382/bgxb.2022.0282
[2]	张晚晴, 余文斌, 李静琳, 等. 基于纵程解析解的飞行器智能横程机动再入协同制导[J]. 兵工学报, 2021, 42(7): 1400-1411.
	ZHANG W Q, YU W B, LI J L, et al. Cooperative reentry guidance for intelligent lateral maneuver of hypersonic vehicle based on downrange analytical solution[J]. Acta Armamentarii, 2021, 42(7): 1400-1411. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.07.007
[3]	姜丽敏, 刘海亮, 陈曙暄. 基于姿态反馈实现过载跟踪的飞行器控制方法[J]. 兵工学报, 2022, 43(8): 1835-1844. doi: 10.12382/bgxb.2021.0111
	JIANG L M, LIU H L, CHENG S X. Aircraft control method based on attitude feedback for overload tracking[J]. Acta Armamentarii, 2022, 43(8): 1835-1844. (in Chinese) doi: 10.12382/bgxb.2021.0111
[4]	周亮, 王昊宇, 尚海滨, 等. 基于高斯伪谱法的天基再入飞行器滑翔轨迹优化设计研究[J]. 空天防御, 2020, 3(3): 89-95.
	ZHOU L, WANG H Y, SHANG H B, et al. Research on optimal design of the glide trajectory of space-based reentry vehicle based on Gaussian pseudo spectral method[J]. Airspace Defense Force Structure, 2020, 3(3): 89-95. (in Chinese)
[5]	李惠峰, 谢陵. 基于预测校正方法的RLV再入制导律设计[J]. 北京航空航天大学学报, 2009, 35(11): 1344-1348.
	LI H F, XIE L. Reentry guidance law design for RLV based on predictor-corrector method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2009, 35(11): 1344-1348. (in Chinese)
[6]	LI M M, HU J, HUANG H. A segmented and weighted adaptive predictor-corrector guidance method for the ascent phase of hypersonic vehicle[J]. Aerospace Science and Technology, 2020, 106: 106231.
[7]	程阳, 程林, 张庆振, 等. 基于在线约束限制的飞行器预测校正制导[J]. 北京航空航天大学学报, 2017, 43(10): 2143-2153.
	CHENG Y, CHENG L, ZHANG Q Z. Aircraft predictor-corrector guidance based on online constraint limit enforcement[J]. Journal of Beihang University, 2017, 43(10): 2143-2153. (in Chinese)
[8]	马可, 田江. 主动拦截防护系统探测雷达防弹天线罩设计[J]. 现代雷达, 2021, 43(5): 80-84.
	MA K, TIAN J. Design of bulletproof radome for active interception protection system radar[J]. Modern Radar, 2021, 43(5): 80-84. (in Chinese)
[9]	田若岑, 张庆振, 郭云鹤, 等. 基于禁飞区规避的高超声速飞行器再入制导律设计[J]. 空天防御, 2022, 5(2): 65-74.
	TIAN R C, ZHANG Q Z, GUO Y H, et al. Design of reentry guidance law of hypersonic vehicle based on no-fly zone avoidance[J]. Airspace Defense Force Structure, 2022, 5(2): 65-74. (in Chinese)
[10]	章吉力, 刘凯, 樊雅卓, 等. 考虑禁飞区规避的空天飞行器分段预测校正再入制导方法[J]. 宇航学报, 2021, 42(1): 122-131.
	ZHANG J L, LIU K, FAN Y Z, et al. A piecewise predictor-corrector re-entry guidance algorithm with no-fly zone avoidance[J]. Acta Astronautica, 2021, 42(1): 122-131. (in Chinese)
[11]	赵江, 周锐, 张超. 考虑禁飞区规避的预测校正再入制导方法[J]. 北京航空航天大学学报, 2015, 41(5): 864-870.
	ZHAO J, ZHOU Y, ZHANG C. Predictor-corrector reentry guidance satisfying no-fly zone constraints[J]. Journal of Beihang University, 2015, 41(5): 864-870. (in Chinese)
[12]	WANG Z, AI Y, ZUO Q H, et al. A policy-reuse algorithm based on destination position prediction for aircraft guidance using deep reinforcement learning[J]. Aerospace, 2022, 9(11): 632-632.
[13]	WANG Y, LI K X, ZHUANG X, et al. A reinforcement learning method based on an improved sampling mechanism for unmanned aerial vehicle penetration[J]. Aerospace, 2023, 10(7): 642-642.
[14]	XU W F, LI Y H, PEI B B, et al. Coordinated intelligent control of the flight control system and shape change of variable sweep morphing aircraft based on dueling-DQN[J]. Aerospace Science and Technology, 2022, 130: 107898.
[15]	付京博, 邵会兵, 詹韬. 基于深度强化学习的飞行器自抗扰控制技术[J]. 计算机仿真, 2022, 39(10): 54-59.
	FU J B, SHAO H B, ZHAN T. Aircraft active disturbance rejection control technology based on deep reinforcement learning[J]. Computer Simulation, 2022, 39(10): 54-59. (in Chinese)
[16]	黄旭, 柳嘉润, 贾晨辉, 等. 深度确定性策略梯度算法用于无人飞行器控制[J]. 航空学报, 2021, 42(11): 397-407.
	HUANG X, LIU J R, JIA C H, et al. Deep deterministic policy gradient algorithm for UAV control[J]. Acta Aeronautica, 2021, 42(11): 397-407. (in Chinese)
[17]	GAO M J, YAN T, LI Q C, et al. Intelligent pursuit-evasion game based on deep reinforcement learning for hypersonic vehicles[J]. Aerospace, 2023, 10(1): 86-86.
[18]	闫斌斌, 李勇, 戴沛, 等. 基于增强学习的变体飞行器自适应变体策略与飞行控制方法研究[J]. 西北工业大学学报, 2019, 37(4): 656-663.
	YAN B B, LI Y, DAI P, et al. Adaptive wing morphing strategy and flight control method of a morphing aircraft based on reinforcement learning[J]. Journal of North-western Polytechnical University, 2019, 37(4): 656-663. (in Chinese)
[19]	汪韧, 惠俊鹏, 俞启东, 等. 基于LSTM模型的飞行器智能制导技术研究[J]. 力学学报, 2021, 53(7): 2047-2057.
	WANG R, HUI J P, YU Q D, et al. Research of LSTM model-based intelligent guidance of flight aircraft[J]. Acta Mechanica Sinica, 2021, 53(7): 2047-2057. (in Chinese)
[20]	惠俊鹏, 汪韧, 俞启东. 基于强化学习的再入飞行器“新质”走廊在线生成技术[J]. 航空学报, 2022, 43(9): 623-635.
	HUI J P, WANG R, YU Q D. Generating new quality flight corridor for reentry aircraft based on reinforcement learning[J]. Acta Aeronautica, 2022, 43(9): 623-635. (in Chinese)
[21]	ZHU Y Y, PAN M X, ZHOU W X, et al. Intelligent direct thrust control for multivariable turbofan engine based on reinforcement and deep learning methods[J]. Aerospace Science and Technology, 2022, 131: 107972.
[22]	SZIROCZAK D, SMITH H. A review of design issues specific to hypersonic flight vehicles[J]. Progress in Aerospace Sciences, 2016, 84:1-28.
[23]	PENG Z H, WANG D, LI T S, et al. Output-feedback cooperative formation maneuvering of autonomous surface vehicles with connectivity preservation and collision avoidance[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2527-2535. doi: 10.1109/TCYB.2019.2914717 pmid: 31180878
[24]	YADOLLAH R, AMIR K, KEN S C, et al. A potential field-based model predictive path-planning controller for autonomous road vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(5): 1255-1267.