WU Xiang, WANG Yuanhao, ZHANG Baoheng, et al. Regional Air Defense and Anti-missile Weapon-Target Assignment Based on Multi-agent Reinforcement Learning[J]. Acta Armamentarii, 2026, 47(2): 250174.
DOI:
WU Xiang, WANG Yuanhao, ZHANG Baoheng, et al. Regional Air Defense and Anti-missile Weapon-Target Assignment Based on Multi-agent Reinforcement Learning[J]. Acta Armamentarii, 2026, 47(2): 250174. DOI: 10.12382/bgxb.2025.0174.
Regional Air Defense and Anti-missile Weapon-Target Assignment Based on Multi-agent Reinforcement Learning
针对区域防空反导作战中各要素复杂耦合所导致的战场态势快速演变、来袭目标数量动态变化等难题,提出一种基于可动态扩展且带时空推理的QMIX(QMIXwith Dynamic extension and Spatiotemporal reasoning,QMIX-DS)的火力分配方法,以火力单元作为智能体构建决策网络,生成火力分配策略。核心改进为:为每个智能体的决策网络设计可动态扩展特征编码模块,自适应处理数量变化的来袭目标,并引入对比学习突出目标类别属性,形成差异化特征表征;构建两层多头自注意力机制捕捉不同类别目标间的动态时空依赖关系,快速推理任务过程中的态势演变,优化火力分配策略。基于墨子平台不同规模的仿真结果表明,所提出的火力分配方法能够在动态变化的战场条件下生成有效的防空反导策略,与基线算法及其他主流算法相比,所提QMIX-DS算法在目标拦截率、阵地存活率、导弹消耗数量等指标上均体现出了优势,并在不同场景中展现出较高的扩展性和泛化性。
Abstract
The rapid evolution of battlefield situation and the dynamic changes in the number of incoming targets are caused by the complex coupled factors in regional air defense and anti-missile operations. Regarding the aforementioned issues
this paper proposes a weapon-target assignment method based on the QMIX with dynamic extension and spatiotemporal reasoning(QMIX-DS). The proposed method considers each fire unit as an agent
and constructs a decision network that generates weapon-target assignment strategies for each agent. The core improvements of the proposed method include: a dynamically expandable feature encoding module for the decision network of each agent and adaptively process various targets. Contrastive learning is introduced to highlight target category properties
forming differentiated feature representations. The two-layer multi-head self-attention mechanism is utilized to capture the dynamic spatiotemporal dependencies among different categories of targets
enabling the rapid reasoning of situational change during the mission and the optimization of weapon-target assignment strategies. Simulation of different scale scenarios on MOZI platform demonstrate that the proposed weapon-target assignment method can generate effective air defense and anti-missile strategies under dynamically changing battlefield conditions. Compared with standard QMIX and other mainstream algorithms
the proposed algorithm shows advantages in indicators such as target interception rate
battlefield survival rate
and missile consumption
and exhibits high scalability and generalization across different scenarios.
关键词
Keywords
references
ANDERSEN A C, PAVLIKOV K, TOFFOLO T A M. Weapon-target assignment problem: Exact and approximate solution algorithms[J]. Annals of Operations Research, 2022, 312:581-606.
LI M J, CHANG X N, SHI J M, et al. Developments of weapon target assignment: models, algorithms, and applications[J]. Systems Engineering and Electronics, 2023, 45 (4): 1049-1071. (in Chinese)
KLINE A G, AHNER D K, LUNDAY B J. Real-time heuristic algorithms for the static weapon target assignment problem[J]. Journal of Heuristics, 2019, 25: 377-397.
ZHANG Z, XIA X Y, CHEN Z F, et al. A phased strategy for reinforcement learning in solving the traveling salesman problem[J]. Computer Engineering and Science, 2025, 47(1): 140-149. (in Chinese)
LUYP,LIHH. An attack-number bounded integer programming method for the static WTA problem[J]. Systems Engineering-Theory & Practice, 2019, 39(3): 783-789. (in Chinese)
LUO T Y, XING L N, WANG R, et al. Dynamic air defense resource allocation optimization based on improved differential evolution algorithm[J]. Journal of System Simulation, 2024, 36(6): 1285-1297. (in Chinese)
SUN X, XING L N, WANG R, et al. Air defense missile weapon target assignment based on multi-objective evolutionary algorithm[J]. Journal of System Simulation, 2024, 36(6): 1298-1308. (in Chinese)
ZHAO W F, LIU X L, MA C L, et al. Multi-objective fuzzy planning-based dynamic firepower allocation for maritime air defense[J]. Systems Engineering & Electronics, 2023, 45(3):777-784. (in Chinese)
SHE W, YUE H, TIAN Z, et al. A fire control scheme optimization method based onD3QN[J]. Fire Control and Command Control, 2024, 49(8): 166-174. (in Chinese)
XIE J W, FANG F, PENG D L, et al. Weapon-target assignment optimization based on multi-attribute decision-making and deep Q-network for missile defense system[J]. Journal of Electronics & Information Technology, 2022, 44(11): 3833-3841. (in Chinese)
ZHAO W F, CHEN J, WANG Q, et al. Dynamic firepower assignment for maritime air defense based on reinforcement learning[J]. Acta Armamentarii, 2023, 44(11): 3516-3528. (in Chinese)
ZHUJW, ZHAO C J, LIX P, et al. Multi-target assignment and intelligent decision based on reinforcement learning[J]. Acta Armamentarii, 2021, 42(9): 2040-2048. (in Chinese)
WU G H,LIB J,YUAN Y F, et al. Multi-platform collaborative fire assignment method based on task decomposition and reinforcement learning[J]. Control and Decision,2024,39(5):1727-1735. (in Chinese)
TANG X, WU J S. Distributed dynamic fire assignment method based on multi-agent proximal policy optimization[J]. Technology Innovation and Application, 2022, 12 (19): 13-17. (in Chinese)
XING L N, LUO T Y, LI H, et al. Adaptive evolutionary algorithm for air defense resource allocation optimization[J]. SCIENCE CHINA: Technological Sciences, 2024, 54(9):1707-1719. (in Chinese)
YAN C, XIANG X J, XU X, et al. A survey on scalability and transferability of multi-agent deep reinforcement learning[J]. Control and Decision, 2023, 37(12): 3083-3102.
ZHONG Y, KUBA J G, FENG X, et al. Heterogeneous-agent reinforcement learning[J]. Journal of Machine Learning Research, 2024, 25(32): 1-67.
LI Z, YANG Y, CHENG H. Efficient multi-agent cooperation:scalable reinforcement learning with heterogeneous graph networks and limited communication[J]. Knowledge-Based Systems, 2024, 300: 112124.
CHEN W, NIE J. A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation[J]. Multimedia Systems, 2023, 29: 2991-3000.
LIU H L, LIU P, BAI C J. Combining long and short spatiotemporal reasoning for deep reinforcement learning[J]. Neurocomputing, 2025, 619: 129165.
GUO W R, LIU G J, ZHOU Z Y, et al. Enhancing the robustness of QMIX against state-adversarial attacks[J]. Neurocomputing, 2024, 572: 127191.
ZHANG Y, JI Z, WANG D, et al. USER: unified semantic enhancement with momentum contrast for image-text retrieval[J]. IEEE Transactions on Image Processing, 2024, 33: 595-609.