南京理工大学 自动化学院,江苏 南京 210094
大连长丰实业总公司,辽宁 大连 116033
上海宇航系统研究所,上海 201109
通信作者邮箱:wuxiang1@njust.edu.cn
收稿:2025-03-12,
网络首发:2025-12-25,
纸质出版:2026-02-28
移动端阅览
吴祥, 王园浩, 张宝恒, 等. 基于多智能体强化学习的区域防空反导火力分配[J]. 兵工学报, 2026,47(2):250174.
WU Xiang, WANG Yuanhao, ZHANG Baoheng, et al. Regional Air Defense and Anti-missile Weapon-Target Assignment Based on Multi-agent Reinforcement Learning[J]. Acta Armamentarii, 2026, 47(2): 250174.
吴祥, 王园浩, 张宝恒, 等. 基于多智能体强化学习的区域防空反导火力分配[J]. 兵工学报, 2026,47(2):250174. DOI: 10.12382/bgxb.2025.0174.
WU Xiang, WANG Yuanhao, ZHANG Baoheng, et al. Regional Air Defense and Anti-missile Weapon-Target Assignment Based on Multi-agent Reinforcement Learning[J]. Acta Armamentarii, 2026, 47(2): 250174. DOI: 10.12382/bgxb.2025.0174.
针对区域防空反导作战中各要素复杂耦合所导致的战场态势快速演变、来袭目标数量动态变化等难题,提出一种基于可动态扩展且带时空推理的QMIX(QMIXwith Dynamic extension and Spatiotemporal reasoning,QMIX-DS)的火力分配方法,以火力单元作为智能体构建决策网络,生成火力分配策略。核心改进为:为每个智能体的决策网络设计可动态扩展特征编码模块,自适应处理数量变化的来袭目标,并引入对比学习突出目标类别属性,形成差异化特征表征;构建两层多头自注意力机制捕捉不同类别目标间的动态时空依赖关系,快速推理任务过程中的态势演变,优化火力分配策略。基于墨子平台不同规模的仿真结果表明,所提出的火力分配方法能够在动态变化的战场条件下生成有效的防空反导策略,与基线算法及其他主流算法相比,所提QMIX-DS算法在目标拦截率、阵地存活率、导弹消耗数量等指标上均体现出了优势,并在不同场景中展现出较高的扩展性和泛化性。
The rapid evolution of battlefield situation and the dynamic changes in the number of incoming targets are caused by the complex coupled factors in regional air defense and anti-missile operations. Regarding the aforementioned issues
this paper proposes a weapon-target assignment method based on the QMIX with dynamic extension and spatiotemporal reasoning(QMIX-DS). The proposed method considers each fire unit as an agent
and constructs a decision network that generates weapon-target assignment strategies for each agent. The core improvements of the proposed method include: a dynamically expandable feature encoding module for the decision network of each agent and adaptively process various targets. Contrastive learning is introduced to highlight target category properties
forming differentiated feature representations. The two-layer multi-head self-attention mechanism is utilized to capture the dynamic spatiotemporal dependencies among different categories of targets
enabling the rapid reasoning of situational change during the mission and the optimization of weapon-target assignment strategies. Simulation of different scale scenarios on MOZI platform demonstrate that the proposed weapon-target assignment method can generate effective air defense and anti-missile strategies under dynamically changing battlefield conditions. Compared with standard QMIX and other mainstream algorithms
the proposed algorithm shows advantages in indicators such as target interception rate
battlefield survival rate
and missile consumption
and exhibits high scalability and generalization across different scenarios.
ANDERSEN A C, PAVLIKOV K, TOFFOLO T A M. Weapon-target assignment problem: Exact and approximate solution algorithms[J]. Annals of Operations Research, 2022, 312:581-606.
李梦杰,常雪凝,石建迈,等.武器目标分配问题研究进展:模型、算法与应用[J].系统工程与电子技术, 2023, 45(4):1049-1071.
LI M J, CHANG X N, SHI J M, et al. Developments of weapon target assignment: models, algorithms, and applications[J]. Systems Engineering and Electronics, 2023, 45 (4): 1049-1071. (in Chinese)
KLINE A G, AHNER D K, LUNDAY B J. Real-time heuristic algorithms for the static weapon target assignment problem[J]. Journal of Heuristics, 2019, 25: 377-397.
章政,夏小云,陈泽丰,等.融合强化学习的分阶段策略求解旅行背包问题[J].计算机工程与科学, 2025, 47(1): 140-149.
ZHANG Z, XIA X Y, CHEN Z F, et al. A phased strategy for reinforcement learning in solving the traveling salesman problem[J]. Computer Engineering and Science, 2025, 47(1): 140-149. (in Chinese)
陆一平,李慧慧.静态武器目标分配问题的攻击界整数规划求解方法[J].系统工程理论与实践, 2019, 39(3): 783-789.
LUYP,LIHH. An attack-number bounded integer programming method for the static WTA problem[J]. Systems Engineering-Theory & Practice, 2019, 39(3): 783-789. (in Chinese)
罗天羽,邢立宁,王锐,等.基于改进差分进化算法的动态防空资源分配优化[J].系统仿真学报, 2024, 36(6): 1285-1297.
LUO T Y, XING L N, WANG R, et al. Dynamic air defense resource allocation optimization based on improved differential evolution algorithm[J]. Journal of System Simulation, 2024, 36(6): 1285-1297. (in Chinese)
孙昕,邢立宁,王锐,等.基于多目标进化算法的防空导弹武器目标分配[J].系统仿真学报, 2024, 36(6): 1298-1308.
SUN X, XING L N, WANG R, et al. Air defense missile weapon target assignment based on multi-objective evolutionary algorithm[J]. Journal of System Simulation, 2024, 36(6): 1298-1308. (in Chinese)
赵文飞,刘孝磊,马翠玲,等.基于多目标模糊规划的海上要地防空动态火力分配[J].系统工程与电子技术, 2023, 45(3): 777-784.
ZHAO W F, LIU X L, MA C L, et al. Multi-objective fuzzy planning-based dynamic firepower allocation for maritime air defense[J]. Systems Engineering & Electronics, 2023, 45(3):777-784. (in Chinese)
佘维,岳瀚,田钊,等.基于D3QN的火力方案优选方法[J].火力与指挥控制, 2024, 49(8): 166-174.
SHE W, YUE H, TIAN Z, et al. A fire control scheme optimization method based onD3QN[J]. Fire Control and Command Control, 2024, 49(8): 166-174. (in Chinese)
谢俊伟,方峰,彭冬亮,等.融合多属性决策和深度Q值网络的反导火力分配方法[J].电子与信息学报, 2022, 44(11): 3833-3841.
XIE J W, FANG F, PENG D L, et al. Weapon-target assignment optimization based on multi-attribute decision-making and deep Q-network for missile defense system[J]. Journal of Electronics & Information Technology, 2022, 44(11): 3833-3841. (in Chinese)
赵文飞,陈健,王䶮,等.基于强化学习的海上要地群协同防空动态火力分配[J].兵工学报, 2023, 44(11): 3516-3528.
ZHAO W F, CHEN J, WANG Q, et al. Dynamic firepower assignment for maritime air defense based on reinforcement learning[J]. Acta Armamentarii, 2023, 44(11): 3516-3528. (in Chinese)
朱建文,赵长见,李小平,等.基于强化学习的集群多目标分配与智能决策方法[J].兵工学报, 2021, 42(9): 2040-2048.
ZHUJW, ZHAO C J, LIX P, et al. Multi-target assignment and intelligent decision based on reinforcement learning[J]. Acta Armamentarii, 2021, 42(9): 2040-2048. (in Chinese)
伍国华,李冰洁,袁于斐,等.基于任务分解与强化学习的多平台协同火力分配方法[J].控制与决策, 2024, 39(5):1727-1735.
WU G H,LIB J,YUAN Y F, et al. Multi-platform collaborative fire assignment method based on task decomposition and reinforcement learning[J]. Control and Decision,2024,39(5):1727-1735. (in Chinese)
唐骁,吴建设.基于多智能体近端策略优化的分布式动态火力分配方法[J].科技创新与应用, 2022, 12(19): 13-17.
TANG X, WU J S. Distributed dynamic fire assignment method based on multi-agent proximal policy optimization[J]. Technology Innovation and Application, 2022, 12 (19): 13-17. (in Chinese)
邢立宁,罗天羽,李豪,等.面向防空资源分配优化问题的自适应演化算法[J].中国科学:技术科学, 2024, 54(9):1707-1719.
XING L N, LUO T Y, LI H, et al. Adaptive evolutionary algorithm for air defense resource allocation optimization[J]. SCIENCE CHINA: Technological Sciences, 2024, 54(9):1707-1719. (in Chinese)
YAN C, XIANG X J, XU X, et al. A survey on scalability and transferability of multi-agent deep reinforcement learning[J]. Control and Decision, 2023, 37(12): 3083-3102.
ZHONG Y, KUBA J G, FENG X, et al. Heterogeneous-agent reinforcement learning[J]. Journal of Machine Learning Research, 2024, 25(32): 1-67.
LI Z, YANG Y, CHENG H. Efficient multi-agent cooperation:scalable reinforcement learning with heterogeneous graph networks and limited communication[J]. Knowledge-Based Systems, 2024, 300: 112124.
CHEN W, NIE J. A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation[J]. Multimedia Systems, 2023, 29: 2991-3000.
LIU H L, LIU P, BAI C J. Combining long and short spatiotemporal reasoning for deep reinforcement learning[J]. Neurocomputing, 2025, 619: 129165.
GUO W R, LIU G J, ZHOU Z Y, et al. Enhancing the robustness of QMIX against state-adversarial attacks[J]. Neurocomputing, 2024, 572: 127191.
ZHANG Y, JI Z, WANG D, et al. USER: unified semantic enhancement with momentum contrast for image-text retrieval[J]. IEEE Transactions on Image Processing, 2024, 33: 595-609.
0
浏览量
54
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号