欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2022, Vol. 43 ›› Issue (12): 3040-3047.doi: 10.12382/bgxb.2021.0631

• 论文 • 上一篇    下一篇

利用强化学习开展比例导引律的导航比设计

李庆波, 李芳, 董瑞星, 樊瑞山, 谢文龙   

  1. (上海机电工程研究所, 上海 201109)
  • 上线日期:2022-05-18
  • 作者简介:李庆波(1988—),男,高级工程师。E-mail: 18621785390@163.com

Navigation Ratio Design of Proportional Navigation Law Using Reinforcement Learning

LI Qingbo, LI Fang, DONG Ruixing, FAN Ruishan, XIE Wenlong   

  1. (Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China)
  • Online:2022-05-18

摘要: 为提升导弹的制导性能,在比例导引的基础上,分别利用蒙特卡洛强化学习和Q-learning强化学习开展导航比的设计。采用蒙特卡洛强化学习的导航比设计方法,对导弹飞行过程进行粗略分段;利用Q-learning强化学习的导航比设计方法,用飞行时间、视线角速度、预计遭遇时间及目标特性等对制导环境进一步细分,根据环境和状态的变化,自适应地调整比例导引的导航比,以获得最佳的飞行制导策略。基于某型防空导弹,利用上述方法分别开展导航比设计,从全空域弹道库中随机抽取批量弹道进行仿真计算,并与传统经验设计进行对比分析。仿真结果表明,采用强化学习方法设计的导航比能够显著降低边界弹道的脱靶量,说明该设计方法能够切实提升导弹的制导拦截能力。

关键词: 比例导引, 蒙特卡洛强化学习, Q-learning强化学习, 导航比

Abstract: In order to improve the guidance performance of missiles, on the basis of proportional guidance, Monte Carlo reinforcement learning and Q-learning reinforcement learning are used respectively to design the navigation ratio. The first navigation ratio design method using Monte Carlo reinforcement learning only roughly segments a missile's flight process, whose algorithm is simple with strong engineering usability. The second navigation ratio design method using Q-learning reinforcement learning further subdivides the guidance environment by using flight time, line-of-sight rate, expected encounter time and target characteristics, and adaptively adjusts the navigation ratio of proportional guidance according to the changes of environment and state, so as to obtain the best flight guidance strategy. Based on a certain type of air defense missile, the navigation ratio design is carried out by using the above methods,and the batch trajectories are randomly selected from the whole airspace trajectory library for simulation calculation, which is then compared with the traditional empirical design. The simulation results show that the navigation ratio designed with reinforcement learning can significantly reduce the miss distance of boundary trajectories, indicating that the proposed design method can effectively improve the guidance and interception capabilities of the missile.

Key words: proportionalnavigation, MonteCarloreinforcementlearning, Q-learningreinforcementlearning, navigationratio

中图分类号: