欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2021, Vol. 42 ›› Issue (5): 1101-1110.doi: 10.3969/j.issn.1000-1093.2021.05.023

• 论文 • 上一篇    下一篇

基于深度强化学习的巡飞弹突防控制决策

高昂1, 董志明1, 叶红兵2, 宋敬华1, 郭齐胜1   

  1. (1.陆军装甲兵学院 演训中心, 北京 100072; 2.湘南学院, 湖南 郴州 423099)
  • 上线日期:2021-06-12
  • 通讯作者: 董志明(1977—),男,教授,硕士生导师 E-mail:15689783388@163.com
  • 作者简介:高昂(1988—),男,博士研究生。E-mail: 236211566@qq.com
  • 基金资助:
    军队科研计划项目(41405030302、41401020301)

Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning

GAO Ang1, DONG Zhiming1, YE Hongbing2, SONG Jinghua1, GUO Qisheng1   

  1. (1.Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China; 2.Xiangnan University, Chenzhou 423099, Hunan, China)
  • Online:2021-06-12

摘要: 巡飞弹突防控制决策(LMPCD)问题是“多域战”作战概念背景下的重要研究方向。针对该问题,建立基于马尔可夫决策过程的LMPCD模型。拟合LMPCD函数与飞行状态-动作值函数,构建基于演员-评论家方法的LMPCD框架,给出基于深度确定性策略梯度算法的深度强化学习模型求解方法,生成巡飞弹突防控制最优决策网络。通过1 000次巡飞弹突防仿真测试,结果表明,巡飞弹执行任务成功率为82.1%,平均决策时间为1.48 ms,验证了LMPCD模型及其求解过程的有效性。

关键词: 巡飞弹, 深度强化学习, 马尔可夫决策过程, 突防, 控制决策

Abstract: Loitering munition penetration control decision (LMPCD) is an important research direction under the concept of “multi-domain war”. The research on real-time route planning of loitering munition penetration has important military significance. Traditional knowledge, reasoning, and planning methods do not have the ability to explore and discover new knowledge outside the framework. The bionic optimization method is suitable for solving the path planning problem in static environment, such as traveling salesman problem, and is difficult to be applied to the penetration problem of loitering munition with high requirement of environmental dynamics and real-time decision-making. For the limitations of the first two methods, the applicability of the deep reinforcement learning method is analyzed, and the domain knowledge of loitering munition is introduced into each element of the deep reinforcement learning algorithm. The flight motion model of loitering munition is analyzed, the state space, action space and reward function of loitering munition are designed, the algorithm framework of loitering munition penetration control decision is analyzed, and the training process of loitering munition penetration control decision algorithm is designed. Through the penetration simulation test of 1 000 rounds of loitering munition, the result shows that the penetration success rate of loitering munition is 82.1% and the average decision time is 1.48 ms, which verifies the effectiveness of the algorithm training process and the control decision model.

Key words: loiteringmunition, deepreinforcementlearning, Markovdecisionprocess, penetration, controldecision

中图分类号: