Welcome to Acta Armamentarii ! Today is Share:

Acta Armamentarii ›› 2021, Vol. 42 ›› Issue (5): 1101-1110.doi: 10.3969/j.issn.1000-1093.2021.05.023

• Paper • Previous Articles     Next Articles

Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning

GAO Ang1, DONG Zhiming1, YE Hongbing2, SONG Jinghua1, GUO Qisheng1   

  1. (1.Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China; 2.Xiangnan University, Chenzhou 423099, Hunan, China)
  • Online:2021-06-12

Abstract: Loitering munition penetration control decision (LMPCD) is an important research direction under the concept of “multi-domain war”. The research on real-time route planning of loitering munition penetration has important military significance. Traditional knowledge, reasoning, and planning methods do not have the ability to explore and discover new knowledge outside the framework. The bionic optimization method is suitable for solving the path planning problem in static environment, such as traveling salesman problem, and is difficult to be applied to the penetration problem of loitering munition with high requirement of environmental dynamics and real-time decision-making. For the limitations of the first two methods, the applicability of the deep reinforcement learning method is analyzed, and the domain knowledge of loitering munition is introduced into each element of the deep reinforcement learning algorithm. The flight motion model of loitering munition is analyzed, the state space, action space and reward function of loitering munition are designed, the algorithm framework of loitering munition penetration control decision is analyzed, and the training process of loitering munition penetration control decision algorithm is designed. Through the penetration simulation test of 1 000 rounds of loitering munition, the result shows that the penetration success rate of loitering munition is 82.1% and the average decision time is 1.48 ms, which verifies the effectiveness of the algorithm training process and the control decision model.

Key words: loiteringmunition, deepreinforcementlearning, Markovdecisionprocess, penetration, controldecision

CLC Number: