利用强化学习开展比例导引律的导航比设计

doi:10.12382/bgxb.2021.0631

摘要/Abstract

摘要： 为提升导弹的制导性能，在比例导引的基础上，分别利用蒙特卡洛强化学习和Q-learning强化学习开展导航比的设计。采用蒙特卡洛强化学习的导航比设计方法，对导弹飞行过程进行粗略分段；利用Q-learning强化学习的导航比设计方法，用飞行时间、视线角速度、预计遭遇时间及目标特性等对制导环境进一步细分，根据环境和状态的变化，自适应地调整比例导引的导航比，以获得最佳的飞行制导策略。基于某型防空导弹，利用上述方法分别开展导航比设计，从全空域弹道库中随机抽取批量弹道进行仿真计算，并与传统经验设计进行对比分析。仿真结果表明，采用强化学习方法设计的导航比能够显著降低边界弹道的脱靶量，说明该设计方法能够切实提升导弹的制导拦截能力。

关键词: 比例导引, 蒙特卡洛强化学习, Q-learning强化学习, 导航比

Abstract: In order to improve the guidance performance of missiles, on the basis of proportional guidance, Monte Carlo reinforcement learning and Q-learning reinforcement learning are used respectively to design the navigation ratio. The first navigation ratio design method using Monte Carlo reinforcement learning only roughly segments a missile's flight process, whose algorithm is simple with strong engineering usability. The second navigation ratio design method using Q-learning reinforcement learning further subdivides the guidance environment by using flight time, line-of-sight rate, expected encounter time and target characteristics, and adaptively adjusts the navigation ratio of proportional guidance according to the changes of environment and state, so as to obtain the best flight guidance strategy. Based on a certain type of air defense missile, the navigation ratio design is carried out by using the above methods，and the batch trajectories are randomly selected from the whole airspace trajectory library for simulation calculation, which is then compared with the traditional empirical design. The simulation results show that the navigation ratio designed with reinforcement learning can significantly reduce the miss distance of boundary trajectories, indicating that the proposed design method can effectively improve the guidance and interception capabilities of the missile.

Key words: proportionalnavigation, MonteCarloreinforcementlearning, Q-learningreinforcementlearning, navigationratio

中图分类号:

TJ765

李庆波，李芳，董瑞星，樊瑞山，谢文龙. 利用强化学习开展比例导引律的导航比设计[J]. 兵工学报, 2022, 43(12): 3040-3047.

LI Qingbo, LI Fang, DONG Ruixing, FAN Ruishan, XIE Wenlong. Navigation Ratio Design of Proportional Navigation Law Using Reinforcement Learning[J]. Acta Armamentarii, 2022, 43(12): 3040-3047.

参考文献

［1］ KIM T H, LEE C H, TAHK Mi J.Time-to-go polynomial guidance with trajectory modulation for observability enhancement［J］.IEEE Transactions on Aerospace and Electric System, 2013, 49(1): 55- 73.
［2］ HSUEH M H, HUANG C I, FU L C.A differential game based guidance law for the interceptor missiles［C］∥Proceedings of the 33rd Annual Conference of the IEEE Industrial Electronics Society. Taipei, Taiwan, China:IEEE, 2007: 665-670.
［3］ LIN C L, LIN Y P, CHEN K M.On the design of fuzzifiedtrajectory shaping guidance law［J］.ISA Transactions, 2009, 48:148-155.
［4］曾庆华, 董荣华, 皮术武.基于最优制导模板的神经网络预测制导方法［J］.国防科技大学学报, 2014, 36(1):137-141.
ZENG Q H, DONG R H, PI S W.Neural network predictive guidance method based on pattern of optimal guidance［J］.Journal of National University of Defense Technology, 2014, 36(1):137-141. (in Chinese)
［5］周荻.寻的导弹新型导引规律［M］.北京:国防工业出版社, 2002.
ZHOU D.New guidance laws for homing missile［M］. Beijing:National Defense Industry Press, 2002.(in Chinese)
［6］黄景帅, 张洪波, 汤国建,等.拦截大气层内机动目标的自适应积分滑模制导律［J］.宇航学报, 2019, 40(1):52-60.
HUANG J S, ZHANG H B, TANG G J, et al.Adaptive integral sliding-mode guidance law for intercepting endo-atmospheric maneuvering targets［J］.Journal of Astronautics, 2019, 40(1):52-60. (in Chinese)
［7］李新三, 汪立新, 王明建, 等.基于MPSP和CPN的制导方法的协同制导律［J］.北京航空航天大学, 2016, 42(9):1857-1863.
LI X S, WANG L X, WANG M J, et al.Cooperative guidance law based on MPSP and CPN guidance method［J］.Journal of Beijing University of Aeronautics andAstronautics, 2016, 42(9):1857-1863.(in Chinese)
［8］闫梁, 赵继广, 李辕．带约束碰撞角的顺/逆轨制导律设计［J］.北京航空航天大学学报, 2015, 41(5):857-863.
YAN L, ZHAO J G, LI Y.Guidance law with angular constraints for head-pursuit or head-on engage-ment［J］.Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(5):857-863.(in Chinese)
［9］李波, 王元勋, 高晓光，等．基于遗传算法的空地一体化攻击模糊比例导引律［J］.兵工学报, 2017, 38(10):1950-1956．
LI B, WANG Y X, GAO X G, et al．Research on fuzzy proportional guidance law of air-to-ground attack based on genetic algorithm［J］.Acta Armamentarii, 2017, 38(10):1950-1956.(in Chinese)
［10］李辕, 赵继广, 闫梁, 等.拦截高速机动目标三维联合比例制导律设计［J］.北京航空航天大学学报, 2015, 41(5):825-834.
LI Y, ZHAO J G, YAN L, et al.United-proportional-navigation law for interception of high-speed maneuvering targets［J］.Journal of Beijing University of Aeronautics and Astronautics,2015, 41(5):825-834.(in Chinese)
［11］白国玉, 沈怀荣, 闫梁, 等.碰撞角约束的全向拦截制导律研究［J］.装备学院学报, 2017, 28(3):92-98.
BAI G Y, SHEN H R, YAN L, et al. Research on omni-directional interception guidance law with impact angle constraint［J］.Journal of Equipment Academy, 2017, 28(3):92-98.(in Chinese)
［12］秦潇, 李炯, 王华吉, 等.基于ESO的目标机动补偿反比例制导律［J］.弹道学报, 2015, 27(4):7-11.
QIN X, LI J, WANG H J, et al.Target maneuvering com-pensation retro-proportional navigation based on ESO［J］.Journal of Ballistics, 2015, 27(4):7-11.(in Chinese)
［13］ SU W, YAO D, LI K, et al.A novel biased proportional navigation guidance law for close approach phase［J］.Chinese Journal of Aeronautics, 2016, 29(1):228-237.
［14］王荣刚, 唐硕.拦截高速运动目标广义相对偏置比例制导律［J］.西北工业大学学报, 2019, 37(4):682-690.
WANG R G, TANG S.Intercepting higher-speed targets using generalized relative biased proportional navigation［J］.Journal of Northwestern Polytechnical University, 2019, 37(4):682-690.(in Chinese)
［15］钱杏芳, 林瑞雄, 赵亚南.导弹飞行力学［M］.北京:北京理工大学出版社, 2006.
QIAN X F, LIN R X, ZHAO Y N.Missile flight mechanics［M］.Beijing: Beijing Institute of Technology Press, 2006.(in Chinese)
［16］周来, 靳晓伟, 郑益凯.基于深度强化学习的作战辅助决策研究［J］.空天防御, 2018, 1(1):32-35.
ZHOU L, JIN X W, ZHENG Y K.Research on combat assistant decision making based on deep reinforcement learning［J］.Air & Space Defense, 2018, 1(1):32-35.(in Chinese)
［17］ MARCO W, MARTIJN V O. Reinforcement learning:state-of-the-art［M］.Berlin, Germany:Springer, 2012.
［18］陈中原, 韦文书, 陈万春.基于强化学习的多发导弹协同攻击智能制导律［J］.兵工学报, 2021, 42(8): 1638-1647.
CHEN Zh Y, WEI W S, CHEN W C. Reinforcement learning-based intelligent guidance law for cooperative attack of multiple missiles［J］.Acta Armamentarii, 2021, 42(8): 1638-1647.(in Chinese)
［19］梁晨, 王卫红, 赖超.带攻击角度约束的深度强化元学习制导律［J］.宇航学报, 2021, 42(5): 611-620.
LIANG C, WANG W H, LAI C. Deep reinforcement meta learning guidance with impact angle con-straint［J］.Journal of Astronautics, 2021, 42(5): 611-620.(in Chinese)
［20］周志华. 机器学习［M］. 北京:清华大学出版社, 2016.
ZHOU Z H. Maching learning［M］. Beijing:Tsinghua University Press,2016.(in Chinese)

[1]	黄嘉, 常思江, 陈琦, 张海洋. 不依赖剩余飞行时间的数据驱动攻击时间控制导引律[J]. 兵工学报, 2023, 44(8): 2299-2309.
[2]	刘畅，王江，范世鹏，李伶，林德福. 基于BP神经网络的自适应偏置比例导引[J]. 兵工学报, 2022, 43(11): 2798-2809.
[3]	陈升富，常思江，吴放. 带有视场角约束的滑模攻击时间控制制导律[J]. 兵工学报, 2019, 40(4): 777-787.
[4]	赵建博，杨树兴，熊芬芬. 基于领弹-从弹架构的无导引头导弹协同定位与制导方法[J]. 兵工学报, 2019, 40(4): 673-679.
[5]	马帅，王旭刚，王中原，杨靖. 带初始前置角和末端攻击角约束的偏置比例导引律设计以及剩余飞行时间估计[J]. 兵工学报, 2019, 40(1): 68-78.