欢迎访问《兵工学报》官方网站,今天是

兵工学报

• •    下一篇

基于多智能体强化学习的区域防空反导火力分配

吴祥1*(),王园浩2,张宝恒3,范博洋1,薄煜明1   

  1. (1. 南京理工大学自动化学院, 江苏 南京 210094; 2.大连长丰实业总公司, 辽宁 大连 116033; 3.上海宇航系统研究所, 上海 201109)
  • 收稿日期:2025-03-12 修回日期:2025-06-14
  • 通讯作者: *邮箱:wuxiang1@njust.edu.cn
  • 基金资助:
    中央高校基本科研业务费专项项目(30922010710); 上海航天技术研究院产学研合作基金项目(SAST2023-049)

Regional Air Defense and Anti-Missile Weapon-Target Assignment Based on Multi-Agent Reinforcement Learning

WU Xiang1*(),WANG Yuanhao2 , ZHANG Baoheng3, FAN Boyang1, BO Yuming1   

  1. (1. School of Automation, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, China; 2. Dalian Changfeng Industrial Corporation, Dalian 116033, Liaoning, China ; 3. Shanghai Aerospace System Engineering Institue, Shanghai 201109, China)
  • Received:2025-03-12 Revised:2025-06-14

摘要: 针对区域防空反导作战中各要素复杂耦合所导致的战场态势快速演变、来袭目标数量动态变化等难题,提出一种基于可动态扩展且带时空推理的QMIX(QMIX with Dynamic extension and Spatiotemporal reasoning, QMIX-DS)的火力分配方法,以火力单元作为智能体构建决策网络,生成火力分配策略。核心改进为:为每个智能体的决策网络设计可动态扩展特征编码模块,自适应处理数量变化的来袭目标,并引入对比学习突出目标类别属性,形成差异化特征表征;构建两层多头自注意力机制捕捉不同类别目标间的动态时空依赖关系,快速推理任务过程中的态势演变,优化火力分配策略。基于墨子平台不同规模的仿真结果表明,所提出的火力分配方法能够在动态变化的战场条件下生成有效的防空反导策略,与基线算法及其他主流算法相比,所提QMIX-DS算法在目标拦截率、阵地存活率、导弹消耗数量等指标上均体现出了优势,并在不同场景中展现出较高的扩展性和泛化性。

关键词: 区域防空反导, 多智能体强化学习, 火力分配, 可扩展决策网络, 时序推理

Abstract: To address the challenges of rapid evolution of the battlefield situation and dynamic changes in the number of incoming targets caused by the complex coupled factors in regional air defense and anti-missile operations, this study proposes a weapon-target assignment method based on the QMIX with Dynamic extension and Spatiotemporal reasoning (QMIX-DS). The proposed method considers each fire unit as an agent, and constructs a decision network that generates weapon-target assignment strategies for each agent. The core improvements of the proposed method are: A dynamically expandable feature encoding module is designed for the decision network of each agent and can adaptively process various targets. Contrastive learning is introduced to highlight target category properties, forming differentiated feature representations. The two-layer multi-head self-attention mechanism is utilized to capture the interdependencies among different categories of targets, enabling rapid reasoning of situational changes during the mission and optimizing weapon-target assignment strategies. Simulation results on different scale scenarios conducted on MOZI platform demonstrate that the proposed weapon-target assignment method can generate effective air defense and anti-missile strategies under dynamically changing battlefield conditions. Compared with standard QMIX and other mainstream algorithms, the proposed algorithm showed advantages in indicators such as target interception rate, battlefield survival rate, and missile consumption, and exhibited high scalability and generalization across different scenarios.

Key words: regional air defense and anti-missile, multi-agent reinforcement learning, weapon-target assignment, scalable decision network, spatiotemporal reasoning

中图分类号: