欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (3): 240251-.doi: 10.12382/bgxb.2024.0251

• • 上一篇    下一篇

基于多智能体深度强化学习的无人平台箔条干扰末端防御动态决策方法

李传浩1, 明振军1,2,*(), 王国新1,2, 阎艳1,2, 丁伟1, 万斯来1, 丁涛1,3   

  1. 1 北京理工大学 机械与车辆学院, 北京 100081
    2 北京理工大学长三角研究院(嘉兴), 浙江 嘉兴 314019
    3 西南电子设备研究所, 四川 成都 610036
  • 收稿日期:2024-04-07 上线日期:2025-03-26
  • 通讯作者:
  • 基金资助:
    国家自然科学基金项目(62373047)

Dynamic Decision-making Method of Unmanned Platform Chaff Jamming for Terminal Defense Based on Multi-agent Deep Reinforcement Learning

LI Chuanhao1, MING Zhenjun1,2,*(), WANG Guoxin1,2, YAN Yan1,2, DING Wei1, WAN Silai1, DING Tao1,3   

  1. 1 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China
    2 Yangtze Delta Region Academy of Beijing Institute of Technology (Jiaxing), Jiaxing 314019, Zhejiang, China
    3 Southwest China Research Institute of Electronic Equipment, Chengdu 610036, Sichuan, China
  • Received:2024-04-07 Online:2025-03-26

摘要:

无人平台箔条质心干扰是导弹末端防御的重要手段,其在平台机动和箔条发射等方面的智能决策能力是决定战略资产能否保护成功的重要因素。针对目前基于机理模型的计算分析和基于启发式算法的空间探索等决策方法存在的智能化程度低、适应能力差和决策速度慢等问题,提出基于多智能体深度强化学习的箔条干扰末端防御动态决策方法:对多平台协同进行箔条干扰末端防御的问题进行定义并构建仿真环境,建立导弹制导与引信模型、无人干扰平台机动模型、箔条扩散模型和质心干扰模型;将质心干扰决策问题转化为马尔科夫决策问题,构建决策智能体,定义状态、动作空间并设置奖励函数;通过多智能体近端策略优化算法对决策智能体进行训练。仿真结果显示,使用训练后的智能体进行决策,相比多智能体深度确定性策略梯度算法,训练时间减少了85.5%,资产保护成功率提升了3.84倍,相比遗传算法,决策时长减少了99.96%,资产保护成功率增加了1.12倍。

关键词: 无人平台, 质心干扰, 箔条干扰, 末端防御, 多智能体强化学习, 电子对抗

Abstract:

Chaff centroid jamming of unmanned platform is an important means of missile terminal defense.The intelligent decision-making ability in platform maneuvering and chaff launching is an important factor to determine whether the strategic assets can be protected successfully.The current decision-making methods,such as computational analysis based on mechanism model and space exploration based on heuristic algorithm,have the problems of low degree of intelligence,poor adaptability and slow decision-making speed.A dynamic decision-making method of chaff jamming for terminal defense based on multi-agent deep reinforcement learning is proposed.The problem of cooperative chaff jamming of multi-platform for terminal defense is defined,and a simulation environment is constructed.The missile guidance and fuze model,unmanned jamming platform maneuvering model,chaff diffusion model and centroid jamming model are established.The centroid jamming decision problem is transformed into a Markov decision problem,a decision-making agent is constructed,the state and action spaces are defined,and a reward function is set.The decision-making agent is trained by using the multi-agent proximal policy optimization (MAPPO) algorithm.The simulated results show that the proposed method reduces the training time by 85.5% and increases the success rate of asset protection by 3.84 compared with the multi-agent deep deterministic policy gradient (MADDPG) algorithm.Compared with the GA,it reduces the deciding time by 99.96 % and increases the success rate of asset protection 1.12.

Key words: unmanned platform, centroid jamming, chaff jamming, terminal defense, multi-agent reinforcement learning, electronic countermeasure