基于多智能体强化学习的区域防空反导火力分配

doi:10.12382/bgxb.2025.0174

兵工学报

• • 下一篇

基于多智能体强化学习的区域防空反导火力分配

吴祥^1*()，王园浩²，张宝恒³，范博洋¹，薄煜明¹

(1. 南京理工大学自动化学院, 江苏南京 210094; 2.大连长丰实业总公司, 辽宁大连 116033; 3.上海宇航系统研究所, 上海 201109)

收稿日期:2025-03-12 修回日期:2025-06-14
通讯作者: *邮箱：wuxiang1@njust.edu.cn
基金资助:
中央高校基本科研业务费专项项目(30922010710); 上海航天技术研究院产学研合作基金项目(SAST2023-049)

Regional Air Defense and Anti-Missile Weapon-Target Assignment Based on Multi-Agent Reinforcement Learning

WU Xiang^1*()，WANG Yuanhao² , ZHANG Baoheng³, FAN Boyang¹, BO Yuming¹

(1. School of Automation, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, China; 2. Dalian Changfeng Industrial Corporation, Dalian 116033, Liaoning, China ; 3. Shanghai Aerospace System Engineering Institue, Shanghai 201109, China)

Received:2025-03-12 Revised:2025-06-14

摘要/Abstract

摘要： 针对区域防空反导作战中各要素复杂耦合所导致的战场态势快速演变、来袭目标数量动态变化等难题，提出一种基于可动态扩展且带时空推理的QMIX（QMIX with Dynamic extension and Spatiotemporal reasoning, QMIX-DS）的火力分配方法，以火力单元作为智能体构建决策网络，生成火力分配策略。核心改进为：为每个智能体的决策网络设计可动态扩展特征编码模块，自适应处理数量变化的来袭目标，并引入对比学习突出目标类别属性，形成差异化特征表征；构建两层多头自注意力机制捕捉不同类别目标间的动态时空依赖关系，快速推理任务过程中的态势演变，优化火力分配策略。基于墨子平台不同规模的仿真结果表明，所提出的火力分配方法能够在动态变化的战场条件下生成有效的防空反导策略，与基线算法及其他主流算法相比，所提QMIX-DS算法在目标拦截率、阵地存活率、导弹消耗数量等指标上均体现出了优势，并在不同场景中展现出较高的扩展性和泛化性。

关键词: 区域防空反导, 多智能体强化学习, 火力分配, 可扩展决策网络, 时序推理

Abstract: To address the challenges of rapid evolution of the battlefield situation and dynamic changes in the number of incoming targets caused by the complex coupled factors in regional air defense and anti-missile operations, this study proposes a weapon-target assignment method based on the QMIX with Dynamic extension and Spatiotemporal reasoning (QMIX-DS). The proposed method considers each fire unit as an agent, and constructs a decision network that generates weapon-target assignment strategies for each agent. The core improvements of the proposed method are: A dynamically expandable feature encoding module is designed for the decision network of each agent and can adaptively process various targets. Contrastive learning is introduced to highlight target category properties, forming differentiated feature representations. The two-layer multi-head self-attention mechanism is utilized to capture the interdependencies among different categories of targets, enabling rapid reasoning of situational changes during the mission and optimizing weapon-target assignment strategies. Simulation results on different scale scenarios conducted on MOZI platform demonstrate that the proposed weapon-target assignment method can generate effective air defense and anti-missile strategies under dynamically changing battlefield conditions. Compared with standard QMIX and other mainstream algorithms, the proposed algorithm showed advantages in indicators such as target interception rate, battlefield survival rate, and missile consumption, and exhibited high scalability and generalization across different scenarios.

Key words: regional air defense and anti-missile, multi-agent reinforcement learning, weapon-target assignment, scalable decision network, spatiotemporal reasoning

中图分类号:

TP181

吴祥, 王园浩, 张宝恒, 范博洋, 薄煜明. 基于多智能体强化学习的区域防空反导火力分配[J]. 兵工学报, doi: 10.12382/bgxb.2025.0174.

WU Xiang, WANG Yuanhao , ZHANG Baoheng, FAN Boyang, BO Yuming. Regional Air Defense and Anti-Missile Weapon-Target Assignment Based on Multi-Agent Reinforcement Learning[J]. Acta Armamentarii, doi: 10.12382/bgxb.2025.0174.

[1]	李传浩, 明振军, 王国新, 阎艳, 丁伟, 万斯来, 丁涛. 基于多智能体深度强化学习的无人平台箔条干扰末端防御动态决策方法[J]. 兵工学报, 2025, 46(3): 240251-.
[2]	娄抒瀚, 王冲冲, 龚炜, 邓立原, 李莉. 基于MLAT-DRL算法的协同区域信息采集策略[J]. 兵工学报, 2024, 45(12): 4423-4434.
[3]	李佳键, 史彦军, 杨雨, 李波, 赵熙俊. 无人集群作战任务的多智能体强化学习卸载决策[J]. 兵工学报, 2023, 44(11): 3295-3309.
[4]	赵文飞, 陈健, 王, 滕克难. 基于强化学习的海上要地群协同防空动态火力分配[J]. 兵工学报, 2023, 44(11): 3516-3528.
[5]	智洪欣, 赵鹏, 李中, 彭祥新, 鲁旭阳, 王琛. 基于可射击概率约束的防空作战火力优化分配[J]. 兵工学报, 2022, 43(4): 952-959.
[6]	王，赵文飞，滕克难，周璐，单鑫. 不确定因素下海上要地防空动态火力分配模型[J]. 兵工学报, 2022, 43(11): 2885-2896.
[7]	聂俊峰, 陈行军, 苏琦. 基于NSGA-Ⅲ算法的集群目标来袭火力分配建模与优化[J]. 兵工学报, 2021, 42(8): 1771-1779.
[8]	孙海文，谢晓方，孙涛，庞威. 改进型布谷鸟搜索算法的防空火力优化分配模型求解[J]. 兵工学报, 2019, 40(1): 189-197.
[9]	薛辉，刘铁林，乔治军，杨兆坤. 基于回合制的火力分配优化问题建模方法研究[J]. 兵工学报, 2018, 39(8): 1655-1664.
[10]	李臣明，宦超，石怀龙. 某箱式火箭炮对面目标分布式杀伤最优火力分配[J]. 兵工学报, 2017, 38(9): 1699-1704.
[11]	夏维，刘新学，范阳涛，元锋刚. 基于改进型多目标粒子群优化算法的武器-目标分配[J]. 兵工学报, 2016, 37(11): 2085-2093.
[12]	董朝阳，路遥，王青. 改进的遗传算法求解火力分配优化问题[J]. 兵工学报, 2016, 37(1): 97-102.
[13]	张滢，杨任农，左家亮，景小宁，何贵波. 改进分解进化算法求解动态火力分配多目标优化模型[J]. 兵工学报, 2015, 36(8): 1533-1540.
[14]	张蛟，王中许，陈黎，武兆斌，陆建锋. 具有多次拦截时机的防空火力分配建模及其优化方法研究[J]. 兵工学报, 2014, 35(10): 1644-1650.

基于多智能体强化学习的区域防空反导火力分配

Regional Air Defense and Anti-Missile Weapon-Target Assignment Based on Multi-Agent Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

本文评价