Welcome to Acta Armamentarii ! Today is

Acta Armamentarii ›› 2025, Vol. 46 ›› Issue (3): 240251-.doi: 10.12382/bgxb.2024.0251

Previous Articles     Next Articles

Dynamic Decision-making Method of Unmanned Platform Chaff Jamming for Terminal Defense Based on Multi-agent Deep Reinforcement Learning

LI Chuanhao1, MING Zhenjun1,2,*(), WANG Guoxin1,2, YAN Yan1,2, DING Wei1, WAN Silai1, DING Tao1,3   

  1. 1 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China
    2 Yangtze Delta Region Academy of Beijing Institute of Technology (Jiaxing), Jiaxing 314019, Zhejiang, China
    3 Southwest China Research Institute of Electronic Equipment, Chengdu 610036, Sichuan, China
  • Received:2024-04-07 Online:2025-03-26
  • Contact: MING Zhenjun

Abstract:

Chaff centroid jamming of unmanned platform is an important means of missile terminal defense.The intelligent decision-making ability in platform maneuvering and chaff launching is an important factor to determine whether the strategic assets can be protected successfully.The current decision-making methods,such as computational analysis based on mechanism model and space exploration based on heuristic algorithm,have the problems of low degree of intelligence,poor adaptability and slow decision-making speed.A dynamic decision-making method of chaff jamming for terminal defense based on multi-agent deep reinforcement learning is proposed.The problem of cooperative chaff jamming of multi-platform for terminal defense is defined,and a simulation environment is constructed.The missile guidance and fuze model,unmanned jamming platform maneuvering model,chaff diffusion model and centroid jamming model are established.The centroid jamming decision problem is transformed into a Markov decision problem,a decision-making agent is constructed,the state and action spaces are defined,and a reward function is set.The decision-making agent is trained by using the multi-agent proximal policy optimization (MAPPO) algorithm.The simulated results show that the proposed method reduces the training time by 85.5% and increases the success rate of asset protection by 3.84 compared with the multi-agent deep deterministic policy gradient (MADDPG) algorithm.Compared with the GA,it reduces the deciding time by 99.96 % and increases the success rate of asset protection 1.12.

Key words: unmanned platform, centroid jamming, chaff jamming, terminal defense, multi-agent reinforcement learning, electronic countermeasure