火箭军工程大学,陕西 西安 710025
*通信作者邮箱:guoyang820@foxmail.com
收稿:2025-08-25,
网络首发:2026-02-11,
纸质出版:2026-01-31
移动端阅览
桑骥茂, 郭杨, 于传强, 等. 面向对抗环境下多拦截器协同决策的元强化学习方法[J]. 兵工学报, 2026,47(1):250771.
SANG Jimao, GUO Yang, YU Chuanqiang, et al. A Meta-reinforcement Learning Approach for Cooperative Decision-making of Multi-interceptors in Adversarial Environments[J]. Acta Armamentarii, 2026, 47(1): 250771.
桑骥茂, 郭杨, 于传强, 等. 面向对抗环境下多拦截器协同决策的元强化学习方法[J]. 兵工学报, 2026,47(1):250771. DOI: 10.12382/bgxb.2025.0771.
SANG Jimao, GUO Yang, YU Chuanqiang, et al. A Meta-reinforcement Learning Approach for Cooperative Decision-making of Multi-interceptors in Adversarial Environments[J]. Acta Armamentarii, 2026, 47(1): 250771. DOI: 10.12382/bgxb.2025.0771.
针对强对抗环境下高机动目标不可预测机动与电磁干扰导致的多拦截器动态目标分配环境适应性差、训练效率低等问题,提出一种自适应元强化学习目标分配方法。该方法融合元学习与改进型A3C算法,构建双模态协同决策机制。通过可微分元知识库实现跨任务策略迁移与快速环境适配;基于威胁等级引入渐进式课程学习,分阶段调节环境复杂度,增强算法对突发干扰与目标机动突变的鲁棒性;采用异构智能体分层参数共享、延迟感知梯度同步及双重优先级经验回放,提升分布式策略训练的样本效率与收敛稳定性。仿真结果表明,新方法显著提高了拦截成功率与协同决策实时性,有效克服了传统博弈论方法动态适应性差、计算复杂度过高,以及深度强化学习在部分可观测下样本效率低、突发干扰鲁棒性不足的局限,为复杂对抗场景下的多智能体协同分配与制导提供了有效支撑。
The unpredictable maneuvers of highly mobile targets and the electromagnetic interference result in the poor environmental adaptability and low training efficiency in multi-interceptor dynamic target allocation under strong adversarial conditions. This paper proposes an adaptive meta-reinforcement learning target allocation framework. This framework integrates meta-learning with an improved asynchronous advantage actor-critic (A3C ) algorithm to establish a dual-mode collaborative decision-making mechanism. Firstly
the cross-task strategy transfer and rapid environmental adaptation are achieved through a differentiable meta-knowledge base. Subsequently
a progressive curriculum learning is introduced based on threat levels
enabling the phased adjustment of environmental complexity to enhance the algorithm's robustness against sudden interference and abrupt target maneuvering. And then the heterogeneous agent hierarchical parameter sharing
delay-aware gradient synchronization and dual-priority experience replay are employed to improve the sample efficiency and convergence stability of distributed strategy training. Simulated results demonstrate that the proposed method increases the interception success rates and enhances the real-time performance of collaborative decision-making. It effectively overcomes the limitations
such as poor dynamic adaptability and excessively high computational complexity
of traditional game-theoretic methods
as well as the shortcomings
including low sample efficiency and insufficient robustness to sudden interference
of deep reinforcement learning in partially observable environments. This work provides effective support for multi-agent collaborative allocation and guidance in complex adversarial scenarios.
GUO D, LIANG Z X, PENG J, et al. Weapon-target assignment for multi-to-multi interception with grouping constraint [J]. IEEE Access,2019,7:34838-34849.
REN Z, ZHANG D, TANG S, et al. Cooperative maneuver decision making for multi-UAV air combat based on incomplete information dynamic game[J]. Defence Technology,2023,27(9):308-317.
朱建文,赵长见,李小平,等.基于强化学习的集群多目标分配与智能决策方法[J].兵工学报,2021,42(9):2040-2048.
ZHU J W, ZHAO C J, LI X P, et al. Swarm multi-target assignment and intelligent decision-making method based on reinforcement learning[J]. Acta Armamentarii,2021,42(9):2040-2048.(in Chinese)
刘兴宇,郭荣化,任成才,等.基于身份匈牙利算法的无人机蜂群分布式目标分配方法[J].兵工学报,2023,44(9):2824-2835.
LIU X Y, GUO R H, REN C C, et al. Distributed target allocation method for UAVs warm based on identity-based Hungarian algorithm [J].Acta Armamentarii,2023,44(9):2824-2835.(in Chinese)
SCHWARZROC J, ZACARIAS I, BAZZAN A L C, et al. Solving task allocation problem in multi unmanned aerial vehicles systems using swarm intelligence[J]. Engineering Applications of Artificial Intelligence,2018,72:10-20.
ZHAI S B, LI G W, WU G, et al. Cooperative task allocation for multi heterogeneous aerial vehicles using particle swarm optimization algorithm and entropy weight method[J]. Applied Soft Computing,2023,148:110918.
李梦杰,常雪凝,石建迈,等.武器目标分配问题研究进展:模型、算法与应用[J].系统工程与电子技术,2023,45(4):1049-1071.
LI M J, CHANG X N, SHI J M, et al. Research progress on weapon target assignment problem: models, algorithms and applications [J]. Systems Engineering and Electronics,2023,45(4):1049-1071.(in Chinese)
WU Y F, LEI Y L, ZHU Z, et al. Dynamic multitarget assignment based on deep reinforcement learning[J]. IEEE Access,2022,10:75998-76007.
李波,越凯强,甘志刚,等.基于MADDPG的多无人机协同任务决策[J].宇航学报,2021,42(6):757-765.
LI B, YUE K Q, GAN Z G, et al. Multi-UAV cooperative mission decision-making based on MADDPG[J]. Journal of Astronautics, 2021,42(6):757-765.(in Chinese)
LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments:arXiv:1706. 02275 [R]. Ithaca, NY, US:Cornell University,2017:1706. 02275.
万开方,高晓光,李波,等.基于部分可观察马尔可夫决策过程的多被动传感器组网协同反隐身探测任务规划[J].兵工学报,2015,36(4):731-743.
WAN K F, GAO X G, LI B, et al. Mission planning for cooperative anti-stealth detection of multi-passive sensor networking based on partially observable Markov decision process [J].Acta Armamentarii,2015,36(4):731-743.(in Chinese)
LUO W L, LÜ J H, LIU K X, et al. Learning-based policy optimization for adversarial missile-target assignment [J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems,2022, 52(7):4426-4437.
孙懿豪,闫超,相晓嘉,等.基于分层强化学习的多无人机协同围捕方法[J].控制理论与应用,2025,42(1):96-108.
SUN Y H, YAN C, XIANG X J, et al. Multi-UAV cooperative pursuit method based on hierarchical reinforcement learning[J]. Control Theory & Applications,2025,42(1):96-108.(in Chinese)
MIAO Z H, HUANG W T, ZHANG Y L, et al. Multi-robot task allocation using multimodal multi-objective evolutionary algorithm based on deep reinforcement learning [J]. Journal of Shanghai Jiaotong University(Science),2024,29(3):377-387.
郭建国,胡冠杰,许新鹏,等.基于强化学习的多对多拦截目标分配方法[J].空天防御,2024,7(1):24-31.
GUO J G, HU G J, XU X P, et al. Target assignment method for many-to-many interception based on reinforcement learning[J]. Air and Space Defense,2024,7(1):24-31.(in Chinese)
卢晓东,王一鸣,李强,等.面向多弹拦截的分布式非均衡一致性目标分配算法[J].宇航学报,2024,45(10):1645-1655.
LU X D, WANG Y M, LI Q, et al. Distributed unbalance consensus target assignment algorithm for multi-missile interception[J]. Journal of Astronautics,2024,45(10):1645-1655.(in Chinese)
闫天,程昊宇,高萌靖,等.基于预设性能的导弹拦截鲁棒智能制导律[J].宇航学报,2024,45(5):753-761.
YAN T, CHENG H Y, GAO M J, et al. Robust intelligent guidance law for missile interception based on prescribed performance[J]. Journal of Astronautics,2024,45(5):753-761.(in Chinese)
王伟,于之晨,林时尧,等.针对机动目标的三维领从协同制导律[J].兵工学报,2024,45(10):3538-3554.
WANG W, YU Z C, LIN S Y, et al. Three-dimensional leader-follower cooperative guidance law against maneuvering target[J]. Acta Armamentarii,2024,45(10):3538-3554.(in Chinese)
王存灿,王晓芳,林海.一种元学习和强化学习结合的多飞行器协同制导律[JL].兵工学报,2025,46(7):240568.
WANG C C, WANG X F, LIN H. A cooperative guidance law for multiple flight vehicles combining meta-learning and reinforcement learning [J]. Acta Armamentarii, 2025, 46(7): 240568.(in Chinese)
DASGUPTA I, WANG J, CHIAPPA S, et al. Causal reasoning from meta-reinforcement learning: arXiv: 1901. 08162 [R]. Ithaca, NY, US:Cornell University,2019:1901. 08162.
丁季时雨,孙科武,董博,等.基于元课程强化学习的多智能体协同博弈技术[J].现代防御技术,2022,50(5):36-42.
DING J S Y, SUN K W, DONG B, et al. Multi-agent cooperative gaming technology based on meta-curriculum reinforcement learning[J]. Modern Defense Technology,2022,50(5):36-42.(in Chinese)
YAN J Q, CHAKRABARTI A, RUPENYAN E, et al. MPC of uncertain nonlinear systems with meta-learning for fast adaptation of neural predictive models[C]∥Proceedings of the 2024 IEEE 20th International Conference on Automation Science and Engineering. Bari, Italy:IEEE,2024:1910-1915.
GUPTA A, MENDONCA R, LIU Y X, et al. Meta-reinforcement learning of structured exploration strategies:arXiv:1802. 07245 [R]. Ithaca, NY, US:Cornell University,2018:1802. 07245.
李乔易,王正杰,张小宁,等.基于深度确定性梯度学习的集群多目标分配方法[J].北京理工大学学报,2024,44(10):1051-1057.
LI Q Y, WANG Z J, ZHANG X N, et al. Multi-target allocation for cluster based on deep deterministic policy gradient learning[J]. Transactions of Beijing Institute of Technology,2024,44(10):1051-1057.(in Chinese)
SU W W, GAO M, GAO X B, et al. Adaptive decision-making with deep Q-network for heterogeneous unmanned aerial vehicle swarms in dynamic environments [J]. Computers & Electrical Engineering,2024,119(Part B):24.
0
浏览量
28
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号