
浏览全部资源
扫码关注微信
1. 西安工业大学 计算机科学与工程学院, 陕西 西安 710021
2. 北京机电工程研究所, 北京 100083
3. 95810部队, 北京 100076
4. 西安工业大学 兵器科学与技术学院, 陕西 西安 710021
Received:05 September 2023,
Published Online:30 October 2024,
Published:31 October 2024
移动端阅览
Yanfang FU, Kailin LEI, Jianing WEI, et al. A Hierarchical Multi-Agent Collaborative Decision-making Method Based on the Actor-critic Framework[J]. Acta Armamentarii, 2024, 45(10): 3385-3396.
Yanfang FU, Kailin LEI, Jianing WEI, et al. A Hierarchical Multi-Agent Collaborative Decision-making Method Based on the Actor-critic Framework[J]. Acta Armamentarii, 2024, 45(10): 3385-3396. DOI: 10.12382/bgxb.2023.0862.
针对复杂作战环境下多智能体协同决策中出现的任务分配不合理、决策一致性较差等问题
提出一种基于演员-评论家(Actor-Critic
AC)框架的层次化多智能体协同决策方法。通过将决策过程分为不同层次
并使用AC框架来实现智能体之间的信息交流和决策协同
以提高决策效率和战斗力。在高层次
顶层智能体制定任务决策
将总任务分解并分配给底层智能体。在低层次
底层智能体根据子任务进行动作决策
并将结果反馈给高层次。实验结果表明
所提方法在多种作战仿真场景下均取得了较好的性能
展现了其在提升军事作战协同决策能力方面的潜力。
A hierarchical multi-agent collaborative decision-making method based on the actor-critic (AC) frameworkis proposed to address the issues of improper task allocation and weak decision consistency in the collaborative decision-making of multiple agents in complex operational environments. The proposed method divides the decision-making process into different levels and utilizes the AC framework to facilitate information exchange and decision coordination among the agents
thereby enhancing thedecision efficiency and combat effectiveness. At the higher level
the top-level agents formulate thetask decisions by decomposing and assigning overall tasks to the lower-level agents. At the lower level
the lower-level agents make action decisions based on subtasks and provide feedback to the higher level. Experimental results demonstrate that the proposed method performs well in various operational simulation scenarios
showcasing its potential to enhance themilitary operational collaborative decision-making capability.
常晓飞 , 蒋邓怀 , 姬晓闯 , 等 . 无人作战系统仿真发展综述 [J ] . 无人系统技术 , 2021 , 4 ( 6 ): 28 - 36 .
CHANG X F , JIANG D H , JI X C , et al . Overview of unmanned combat system simulation development [J ] . Unmanned Systems Technology , 2021 , 4 ( 6 ): 28 - 36 . (in Chinese)
牛轶峰 , 肖湘江 , 柯冠岩 . 无人机集群作战概念及关键技术分析 [J ] . 国防科技 , 2013 , 34 ( 5 ): 37 - 43 .
NIU Y F , XIAO X J , KE G Y . Concept and key technology analysis of unmanned aerial vehicle swarm combat [J ] . National Defense Science & Technology , 2013 , 34 ( 5 ): 37 - 43 . (in Chinese)
刘彬 . 多无人艇协同编队控制理论及实验研究 [D ] . 武汉 : 华中科技大学 , 2021 .
LIU B . Theory and experimental research on cooperative control of multi-unmanned surface vehicles [D ] . Wuhan : Huazhong University of Science and Technology , 2021 . (in Chinese)
裴国旭 , 杜晓明 , 薛昭 , 等 . 多智能体系统在军事仿真领域的应用现状 [J ] . 飞航导弹 , 2017 ( 2 ): 46 - 49 , 73.
PEI G X , DU X M , XUE Z , et al . Current application status of multi-agent systems in the field of military simulation [J ] . Missile and Space Vehicle , 2017 ( 2 ): 46 - 49 , 73. (in Chinese)
张峰 , 李明强 , 唐思琦 , 等 . 多智能体协同决策方法研究 [J ] . 中国电子科学研究院学报 , 2022 , 17 ( 9 ): 905 - 910 .
ZHANG F , LI M Q , TANG S Q , et al . Research on multi-agent collaborative decision-making methods [J ] . Journal of the China Academy of Electronics and Information Technology , 2022 , 17 ( 9 ): 905 - 910 . (in Chinese)
周文卿 , 朱纪洪 , 匡敏驰 , 等 . 基于预知博弈树的多无人机群智协同空战算法 [J ] . 中国科学:技术科学 , 2023 , 53 ( 2 ): 187 - 199 .
ZHOU W Q , ZHU J H , KUANG M C , et al . Multi-unmanned aerial vehicle swarm intelligent collaborative air combat algorithm based on foreknowledge game tree [J ] . Science China Technological Sciences , 2023 , 53 ( 2 ): 187 - 199 . (in Chinese)
文超 , 董文瀚 , 解武杰 , 等 . 基于回访机制的无人机集群分布式协同区域搜索方法 [J ] . 航空学报 , 2023 , 44 ( 11 ): 253 - 270 .
WEN C , DONG W H , XIE W J , et al . Distributed collaborative area search method for unmanned aerial vehicle swarm based on return mechanism [J ] . Acta Aeronautica et Astronautica Sinica , 2023 , 44 ( 11 ): 253 - 270 . (in Chinese)
李子涵 . 基于强化学习的无人机集群对抗仿真研究 [D ] . 西安 : 西安工业大学 , 2023 .
LI Z H . Research on unmanned aerial vehicle swarm adversarial simulation based on reinforcement learning [D ] . Xi’an : Xi’an Technological University , 2023 . (in Chinese)
李静晨 , 史豪斌 , 黄国胜 . 基于自注意力机制和策略映射重组的多智能体强化学习算法 [J ] . 计算机学报 , 2022 , 45 ( 9 ): 1842 - 1858 .
LI J C , SHI H B , HUANG G S . Multi-agent reinforcement learning algorithm based on self-attention mechanism and strategy mapping rearrangement [J ] . Chinese Journal of Computers , 2022 , 45 ( 9 ): 1842 - 1858 . (in Chinese)
赵立阳 , 常天庆 , 褚凯轩 , 等 . 完全合作类多智能体深度强化学习综述 [J ] . 计算机工程与应用 , 2023 , 59 ( 12 ): 14 - 27 . DOI: 10.3778/j.issn.1002-8331.2209-0186 http://doi.org/10.3778/j.issn.1002-8331.2209-0186 作为机器学习和人工智能领域的重要分支之一,完全合作类多智能体深度强化学习以一种通用的方式将深度强化学习的表达决策能力和多智能体系统的分布协作能力有效结合,为完全合作类多智能体系统中的无模型序贯决策问题提供了一种端对端的解决方案。对深度强化学习的基本原理进行阐述,并从基于值函数、基于策略梯度和基于演员-评论家三个主要方向对单智能体深度强化学习的发展进行了总结。分析了多智能体深度强化学习面临的主要挑战和主要的训练框架。依据实现最大团队联合奖励方式的不同,将完全合作类的多智能体深度强化学习划分为基于独立学习、基于通信学习、基于协作学习和基于奖励函数塑造四大类,并分别进行了总结分析。从解决实际问题的角度出发,对完全合作类多智能体深度强化学习算法的未来发展方向进行了展望。
ZHAO L Y , CHANG T Q , CHU K X , et al . A comprehensive review of fully cooperative multi-agent deep reinforcement learning [J ] . Computer Engineering and Applications , 2023 , 59 ( 12 ): 14 - 27 . (in Chinese)
夏家伟 , 刘志坤 , 朱旭芳 , 等 . 基于多智能体强化学习的无人艇集群集结方法 [J ] . 北京航空航天大学学报 , 2023 , 49 ( 12 ): 3365 - 3376 .
XIA J W , LIU Z K , ZHU X F , et al . Method of unmanned boat swarm formation based on multi-agent reinforcement learning [J ] . Journal of Beijing University of Aeronautics and Astronautics , 2023 , 49 ( 12 ): 3365 - 3376 . (in Chinese)
YANG Y D , LUO R , LI M N , et al . Mean field multi-agent reinforcement learning [C ] // Proceedings of the 35th International conference on machine learning. Stockholm , Sweden : PMLR , 2018 , 80 : 5571 - 5580 .
陈佳黎 . 面向动作类游戏仿真的多层深度强化学习研究 [D ] . 成都 : 电子科技大学 , 2020 .
CHEN J L . Research on multi-layer deep reinforcement learning for action-based game simulation [D ] . Chengdu : University of Electronic Science and Technology of China , 2020 . (in Chinese)
赖俊 , 魏竞毅 , 陈希亮 . 分层强化学习综述 [J ] . 计算机工程与应用 , 2021 , 57 ( 3 ): 72 - 79 . DOI: 10.3778/j.issn.1002-8331.2010-0038 http://doi.org/10.3778/j.issn.1002-8331.2010-0038 近年来强化学习愈发体现其强大的学习能力,2017年AlphaGo在围棋上击败世界冠军,同时在复杂竞技游戏星际争霸2和DOTA2中人类的顶尖战队也败于AI之手,但其自身又存在着自身的弱点,在不断的发展中瓶颈逐渐出现。分层强化学习因为能够解决其维数灾难问题,使得其在环境更为复杂,动作空间更大的环境中表现出更加优异的处理能力,对其的研究在近几年不断升温。对强化学习的基本理论进行简要介绍,对Option、HAMs、MAXQ这3种经典分层强化学习算法进行介绍,之后对近几年在分层的思想下提出的分层强化学习算法从3个方面进行综述,并对其进行分析,讨论了分层强化学习的发展前景和挑战。
LAI J , WEI J Y , CHEN X L . A review of hierarchical reinforcement learning [J ] . Computer Engineering and Applications , 2021 , 57 ( 3 ): 72 - 79 . (in Chinese) DOI: 10.3778/j.issn.1002-8331.2010-0038 http://doi.org/10.3778/j.issn.1002-8331.2010-0038 In recent years, reinforcement learning has increasingly reflected its strong learning ability. In 2017, AlphaGo beat the world champion in go. Meanwhile, in the complex competitive games StarCraft 2 and dota2, the top human teams are also defeated by AI. However, it has its own weaknesses, and the bottleneck gradually appears in the continuous development. Hierarchical reinforcement learning can solve the problem of dimension disaster, which makes it show more excellent processing ability in the environment with more complex environment and larger action space. This paper briefly introduces the basic theory of reinforcement learning. It introduces three classical hierarchical reinforcement learning algorithms, option, hams and MAXQ. It summarizes and analyzes the hierarchical reinforcement learning algorithm proposed in recent years under the idea of stratification from three aspects. It discusses the development prospects and challenges of hierarchical reinforcement learning.
王善锐 . 基于目标分层的多智能体强化学习协作算法研究 [D ] . 北京 : 北京交通大学 , 2023 .
WANG S R . Research on multi-agent reinforcement learning collaboration algorithm based on target hierarchies [D ] . Beijing : Beijing Jiaotong University , 2023 . (in Chinese)
YANG X T , JI Z , WU J , et al . Hierarchical reinforcement learning with universal policies for multistep robotic manipulation [J ] . IEEE Transactions on Neural Networks , 2022 , 33 ( 9 ): 4727 - 4741 .
刘冰雁 , 叶雄兵 , 岳智宏 , 等 . 基于多组并行深度Q网络的连续空间追逃博弈算法 [J ] . 兵工学报 , 2021 , 42 ( 3 ): 663 - 672 . DOI: 10.3969/j.issn.1000-1093.2021.03.024 http://doi.org/10.3969/j.issn.1000-1093.2021.03.024 为解决连续空间追逃博弈(PEG)问题,提出一种基于多组并行深度Q网络(DQN)的连续空间PEG算法。应对连续行为空间中为避免传统强化学习存在的维数灾难不足,通过构建Takagi-Sugeno-Kang模糊推理模型来表征连续空间;为应对离散动作集自学习复杂且耗时不足,设计基于多组并行DQN的PEG算法。以4轮战车PEG问题为例设计仿真环境与运动模型,进行了运动计算,并与Q-learning算法、基于资格迹的强化学习算法、基于奖励的遗传算法结果相比对。仿真实验结果表明,连续空间PEG算法能够较好地解决连续空间PEG问题,且随着学习次数的增加不断提升问题处理能力,具备自主学习耗时少、追捕应用时间短的比较优势。
LIU B Y , YE X B , YUE Z H , et al . Continuous space pursuit-evasion game algorithm based on multiple parallel deepQ-networks [J ] . Acta Armamentarii , 2021 , 42 ( 3 ): 663 - 672 . (in Chinese)
于博文 , 吕明 , 张捷 . 基于分层强化学习的联合作战仿真作战决策算法 [J ] . 火力与指挥控制 , 2021 , 46 ( 10 ): 140 - 146 .
YU B W , LYU M , ZHANG J . Joint combat simulation decision-making algorithm based on hierarchical reinforcement learning [J ] . Firepower and Command Control , 2021 , 46 ( 10 ): 140 - 146 . (in Chinese)
BACON P L , HARB J , PRECUP D . The option-critic architecture [C ] // Proceedings of the 31th AAAI conference on artificial intelligence. San Francisco, CA, US:AAAI , 2017 : 1623 .
张建东 , 王鼎涵 , 杨啟明 , 等 . 基于分层强化学习的无人机空战多维决策 [J ] . 兵工学报 , 2023 , 44 ( 6 ): 1547 - 1563 . DOI: 10.12382/bgxb.2022.0711 http://doi.org/10.12382/bgxb.2022.0711 针对无人机空战过程中面临的智能决策问题,基于分层强化学习架构建立无人机智能空战的多维决策模型。将空战自主决策由单一维度的机动决策扩展到雷达开关、主动干扰、队形转换、目标探测、目标追踪、干扰规避、武器选择等多个维度,实现空战主要环节的自主决策;为解决维度扩展后决策模型状态空间复杂度、学习效率低的问题,结合Soft Actor-Critic算法和专家经验训练和建立元策略组,并改进传统的Option-Critic算法,设计优化策略终止函数,提高策略的切换的灵活性,实现空战中多个维度决策的无缝切换。实验结果表明,该模型在无人机空战全流程的多维度决策问题中具有较好的对抗效果,能够控制智能体根据不同的战场态势灵活切换干扰、搜索、打击、规避等策略,达到提升传统算法性能和提高解决复杂决策效率的目的。
ZHANG J , WANG D H , YANG Q M , et al . Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning [J ] . Acta Armamentarii , 2023 , 44 ( 6 ): 1547 - 1563 . (in Chinese) DOI: 10.12382/bgxb.2022.0711 http://doi.org/10.12382/bgxb.2022.0711 To solve the intelligent decision-making problem in the process of UAV air combat, a multi-dimensional decision-making model for UAV intelligent air combat based on the hierarchical reinforcement learning architecture is established, allowing the autonomous decision-making of air combat to be extended from a single-dimensional maneuver decision to a multi-dimensional one including radar switch, active jamming, formation conversion, target detection, target tracking, interference avoidance, weapon selection, etc., so that autonomous decision-making in the main steps of air combat is realized. In order to solve the problems of state-space complexity and low learning efficiency of the decision-making model after the dimension expansion, a meta-strategy group is trained and established with the Soft Actor-Critic algorithm and expert experience, and the traditional Option-Critic algorithm is improved. The strategy termination function is designed and optimized to improve the flexibility of strategy switching and realize seamless multi-dimensional decision-making switching in air combat.. The experimental results show that the proposed method has good countermeasure effectiveness for the multi-dimensional decision-making during the whole process of UAV air combat, which can control the agent to flexibly switch among interference, search, strike, and avoidance strategies according to different battlefield situations with the purpose of improving the performance of traditional algorithms and the efficiency of solving complex decision-making processes.
邢云燕 . 美军军事决策过程 [J ] . 国防科技 , 2018 , 39 ( 1 ): 76 - 80 .
XING Y Y . The U.S. Military decision-making process [J ] . National Defense Science and Technology , 2018 , 39 ( 1 ): 76 - 80 . (in Chinese)
尹奇跃 , 赵美静 , 倪晚成 , 等 . 兵棋推演的智能决策技术与挑战 [J ] . 自动化学报 , 2023 , 49 ( 5 ): 913 - 928 .
YIN Q Y , ZHAO M J , NI W C , et al . Intelligent decision-making techniques and challenges in military chess simulation [J ] . Acta Automatica Sinica , 2023 , 49 ( 5 ): 913 - 928 . (in Chinese)
SANTIAGO O , GABRIEL S , ALBERTO U , et al . A survey of real-time strategy game AI research and competition in StarCraft [J ] . IEEE Transactions on Computational Intelligence and AI in Games , 2013 , 5 ( 4 ): 293 - 311 .
文东日 . 深度强化学习在军事领域的应用研究 [J ] . 军事运筹与评估 , 2022 , 37 ( 3 ): 75 - 80 .
WEN D R . Research on the application of deep reinforcement learning in the military field [J ] . Military Operations Research and Assessment , 2022 , 37 ( 3 ): 75 - 80 . (in Chinese)
李航 , 刘代金 , 刘禹 . 军事智能博弈对抗系统设计框架研究 [J ] . 火力与指挥控制 , 2020 , 45 ( 9 ): 116 - 121 .
LI H , LIU D J , LIU Y . Research on the design framework of military intelligent game adversarial system [J ] . Firepower and Command Control , 2020 , 45 ( 9 ): 116 - 121 . (in Chinese)
MNIH V , BADIA A P , MIRZA M , et al . Asynchronous methods for deep reinforcement learning [C ] // Proceedings of the 33rd International Conference on Machine Learning. New York, NY, US:PMLR , 2016 , 48 : 1928 - 1937 .
HAARNOJA T , ZHOU A , ABBEEL P , et al . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C ] // Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden:PMLR , 2018 : 1861 - 1870 .
FUJIMOTO S , HOOF H , MEGER D . Addressing function approximation error in actor-critic methods [C ] // Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden:PMLR , 2018 : 1587 - 1596 .
陈彩辉 , 姜汉龙 . 任务空间概念模型(CMMS)研究 [J ] . 计算机仿真 , 2005 ( 9 ): 80 - 84 .
CHEN C H , JIANG H L . Research on the conceptual model of task space (CMMS) [J ] . Computer Simulation , 2005 ( 9 ): 80 - 84 . (in Chinese)
张博 , 康凤举 , 苏冰 . 一种面向论证仿真的舰艇作战系统任务空间概念模型 [J ] . 兵工学报 , 2015 , 36 ( 增刊2 ): 112 - 117 .
ZHANG B , KANG F J , SU B . A conceptual model of task space for naval combat system simulation with argumentation [J ] . Acta Armamentarii , 2015 , 36 ( S2 ): 112 - 117 . (in Chinese)
NACHUM O , GU S X , LEE H , et al . Data-efficient hierarchical reinforcement learning [C ] // Proceedings of the 32nd Annual Conferenceon Neural Information Processing Systems. Red Hook, NY , US : Curran Associates Inc. , 2018 .
LOWE R , WU Y , TAMAR A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [C ] // Proceedings of the 31st Advances in Annual Conference on Neural Information Processing Systems. Long Beach, CA , US : NeurIPS , 2017 .
0
Views
486
下载量
0
CNKI被引量
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024360号