1. 西安工业大学 计算机科学与工程学院, 陕西 西安 710021
2. 西安工业大学 兵器科学与技术学院, 陕西 西安 710021
3. 中国兵器工业试验测试研究院, 陕西 华阴 714200
* 邮箱: fuyanfang@xatu.edu.cn
收稿:2023-09-05,
网络出版:2024-01-15,
纸质出版:2023-12-30
移动端阅览
曹子建, 孙泽龙, 闫国闯, 等. 基于强化学习的无人机集群对抗策略推演仿真[J]. 兵工学报, 2023,44(S2):126-134.
Zijian CAO, Zelong SUN, Guochuang YAN, et al. Simulation of Reinforcement Learning-based UAV Swarm Adversarial Strategy Deduction[J]. Acta Armamentarii, 2023, 44(S2): 126-134.
曹子建, 孙泽龙, 闫国闯, 等. 基于强化学习的无人机集群对抗策略推演仿真[J]. 兵工学报, 2023,44(S2):126-134. DOI: 10.12382/bgxb.2023.0877.
Zijian CAO, Zelong SUN, Guochuang YAN, et al. Simulation of Reinforcement Learning-based UAV Swarm Adversarial Strategy Deduction[J]. Acta Armamentarii, 2023, 44(S2): 126-134. DOI: 10.12382/bgxb.2023.0877.
无人机集群在军事战争、公共安全和商业领域的应用越来越广泛
但在复杂多变的对抗环境下
制定高效的策略仍然是一个挑战。为使无人机集群能够自主学习和适应对抗环境的变化
提高任务执行的效率和成功率
提出一种基于值分解的多智能体强化学习算法框架
在仿真平台模拟不同对抗场景下的无人机集群行为
通过强化学习算法
培养无人机集群在不同情境下做出决策的能力
以实现任务目标的最优化。讨论不同强化学习算法在无人机集群对抗策略中的应用和性能比较。实验结果表明
该算法在多种集群对抗环境下均表现出良好的效果
展现出其在军事无人机集群对抗中的有力支持。
The application of drone clusters in military warfare
public safety
and commercial fields is becoming increasingly widespread. But it is a challenge to develop the efficient strategiesin complex and ever-changing adversarial environments. In order to enable the drone clusters to autonomously learn and adapt to the change in adversarial environment
and improve the efficiency and success rate of task execution
a multi-agent reinforcement learning algorithm framework based on value decomposition is proposed. The behavior of drone clusters in different adversarial scenarios is simulated on a simulation platform
and the ability of drone clusters to make decisions in different situations is cultivated to achieve the optimal task objectives through reinforcement learning algorithms. The application and performance comparison of different reinforcement learning algorithms in drone swarm adversarial strategies are discussed. The experimental results show that the proposed algorithm shows good performance in various cluster confrontation environments
demonstrating its strong support in military drone cluster confrontation.
王尔申 , 刘帆 , 宏晨 , 等 . 基于MASAC的无人机集群对抗博弈方法 [J ] . 中国科学:信息科学 , 2022 , 52 ( 12 ): 2254 - 2269 .
WANG E S , LIU F , HONG C , et al. MASAC based unmanned aerial vehicle cluster adversarial game method [J ] . Scientia Sinica Informationis , 2022 , 52 ( 12 ): 2254 - 2269 . (in Chinese) DOI: 10.1360/SSI-2022-0303 http://doi.org/10.1360/SSI-2022-0303 https://engine.scichina.com/doi/10.1360/SSI-2022-0303 https://engine.scichina.com/doi/10.1360/SSI-2022-0303
王莉 . 人工智能在军事领域的渗透与应用思考 [J ] . 科技导报 , 2017 , 35 ( 15 ): 15 - 19 .
WANG L . The penetration and application of artificiaintelligence in the military field [J ] . Science & Technology Review , 2017 , 35 ( 15 ): 15 - 19 . ( in Chinese)
喻新尧 , 李平俊 , 张焱翔 . 人工智能技术在军事及后勤领域的应用研究 [J ] . 舰船电子工程 , 2022 , 42 ( 9 ): 1 - 5 ,40.
YU X Y , LI P J , ZHANG Y X . Research on the application of artificial intelligence technology in the military and logis tics fields [J ] . Ship Electronic Engineering , 2022 , 42 ( 9 ): 1 - 5 ,40. (in Chinese)
彭昉 , 田尧 , 南永刚 , 等 . 人工智能在军事领域的发展应用 [J ] . 甘肃科技 , 2022 , 38 ( 17 ): 47 - 49 ,57.
PENG F , TIAN Y , NAN Y G , et al. Development and application of artificial intelligence in the military field [J ] . Gansu Science and Technology , 2022 , 38 ( 17 ): 47 - 49 ,57. (in Chinese)
WANG Z H , GUO Y , LI N , et al. Autonomous confrontation strategy learning evolution mechanism of unmanned system group under actual combat in the loop [J ] . Computer Communications , 2023 , 209 : 283 - 301 . DOI: 10.1016/j.comcom.2023.07.006 http://doi.org/10.1016/j.comcom.2023.07.006 https://linkinghub.elsevier.com/retrieve/pii/S0140366423002335 https://linkinghub.elsevier.com/retrieve/pii/S0140366423002335
ZHANG J , XING J H . Cooperative task assignment of multi-uav system [J ] . Chinese Journal of Aeronautics , 2020 , 33 ( 11 ): 2825 - 2827 . DOI: 10.1016/j.cja.2020.02.009 http://doi.org/10.1016/j.cja.2020.02.009 https://linkinghub.elsevier.com/retrieve/pii/S1000936120300650 https://linkinghub.elsevier.com/retrieve/pii/S1000936120300650
张煌 , 贾珍珍 . 无人作战的人机融合:挑战与出路 [J ] . 国防科技 , 2020 , 41 ( 6 ): 105 - 109 .
ZHANG H , JIA Z Z . Human-machine integration in unmanned combat: Challenges and solutions [J ] . National Defense & Technology , 2020 , 41 ( 6 ): 105 - 109 . (in Chinese)
汤润泽 , 张承龙 , 李林林 . 人工智能在无人战场态势预判与博弈对抗中的应用 [J ] . 现代防御技术 , 2020 , 48 ( 5 ): 25 - 31 . DOI: 10.3969/j.issn.1009-086x.2020.05.004 http://doi.org/10.3969/j.issn.1009-086x.2020.05.004 在未来高强度火力对抗的战场环境下,具备无人协同、相互协作、优势互补及效能倍增的军事智能无人作战能力,是新时代军队取得战争胜利的关键所在。首先分析了当前智能无人作战的发展现状,然后对未来技术发展进行了研判,提出了人工智能在战场态势预判与博弈对抗中的应用,最后为智能无人作战的发展进行了总结和展望。
TANG R Z , ZHANG C L , LI L L . Application of artificial intelligence in situational prediction and game-based adversarial scenarios on unmanned battlefields [J ] . Modern Defense Technology , 2020 , 48 ( 5 ): 25 - 31 . (in Chinese)
VOLODYMYR M , KORRAY K , DAVID S , et al. Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 . DOI: 10.1038/nature14236 http://doi.org/10.1038/nature14236
TIMOTHY P L , JONATHAN J H , ALEXANDER P , et al. Continuous control with deep reinforcement learning [J ] . Computing Research Repository , 2015 , abs/1509.02971.
JOHN S , FILIP W , PRAFULLA D , et al. Proximal policy optimization algorithms [J ] . Computing Research Repository , 2017 , arXiv:1707.06347 . https://doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347 . https://doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347
MNIH V , BADIA PUIGDOMÈNECH A , MIRZA M , et al. Asynchronous methods for deep reinforcement learning :arXiv:1602.01783 [R ] . Ithaca,NY , US : Cornell University , 2016 : 1602 .01783.
程思雨 , 林锋 . 计算机围棋AlphaGo算法对人类围棋算法的影响 [J ] . 中国科技信息 , 2019 ( 2 ): 40 - 41 .
CHENG S Y , LIN F . The impact of AlphaGo algorithm on human Go algorithms [J ] . China Science and Technology Information , 2019 ( 2 ): 40 - 41 . (in Chinese)
赵诣 . 大数据下的机器学习算法综述——以AlphaGO为例 [J ] . 信息记录材料 , 2019 , 20 ( 1 ): 10 - 12 .
ZHAO Y . A review of machine learning algorithms in the era of big data—a case study of AlphaGo [J ] . Information Recording Materials , 2019 , 20 ( 1 ): 10 - 12 . (in Chinese)
LIU H Y , WU K , HUANG K H , et al. Optimization of large-scale UAV cluster confrontation game based on integrated evolution strategy [Z ] . Berlin, Germany:Springer Link , 2023 .D OI: 10.1007/s10586-022-03961-0 https://dx.doi.org/10.1007/s10586-022-03961-0 .
ZHU P X , FANG X . Multi-UAV cooperative task assignment based on half random Q-Learning [J ] . Symmetry , 2021 , 13 ( 12 ): 2417 . DOI: 10.3390/sym13122417 http://doi.org/10.3390/sym13122417 https://www.mdpi.com/2073-8994/13/12/2417 https://www.mdpi.com/2073-8994/13/12/2417 Unmanned aerial vehicle (UAV) clusters usually face problems such as complex environments, heterogeneous combat subjects, and realistic interference factors in the course of mission assignment. In order to reduce resource consumption and improve the task execution rate, it is very important to develop a reasonable allocation plan for the tasks. Therefore, this paper constructs a heterogeneous UAV multitask assignment model based on several realistic constraints and proposes an improved half-random Q-learning (HR Q-learning) algorithm. The algorithm is based on the Q-learning algorithm under reinforcement learning, and by changing the way the Q-learning algorithm selects the next action in the process of random exploration, the probability of obtaining an invalid action in the random case is reduced, and the exploration efficiency is improved, thus increasing the possibility of obtaining a better assignment scheme, this also ensures symmetry and synergy in the distribution process of the drones. Simulation experiments show that compared with Q-learning algorithm and other heuristic algorithms, HR Q-learning algorithm can improve the performance of task execution, including the ability to improve the rationality of task assignment, increasing the value of gains by 12.12%, this is equivalent to an average of one drone per mission saved, and higher success rate of task execution. This improvement provides a meaningful attempt for UAV task assignment.
马建平 . 无人机协同作战及其战场态势可视化 [D ] . 西安 : 西安电子科技大学 , 2021 .
MA J P . Unmanned aerial vehicle collaborative operations and visualization of battlefield situation [D ] . Xi’an : Xidian University , 2021 . (in Chinese)
马云婷 . 多智能体强化学习奖励机制研究 [D ] . 合肥 : 合肥工业大学 , 2021 .
MA Y T . Research onreward mechanisms in multi-agent reinforcement learning [D ] . Hefei : Hefei University of Technology , 2021 . (in Chinese)
唐峯竹 , 唐欣 , 李春海 , 等 . 基于深度强化学习的多无人机任务动态分配 [J ] . 广西师范大学学报(自然科学版) , 2021 , 39 ( 6 ): 63 - 71 .
TANG F Z , TANG X , LI C H , et al. Dynamic allocation of multiple UAV tasks based on deep reinforcement learning [J ] . Journal of Guangxi Normal University (Natural Science Edition) , 2021 , 39 ( 6 ): 63 - 71 . (in Chinese)
薛喜地 . 基于深度强化学习的室内无人机避障 [D ] . 哈尔滨 : 哈尔滨工业大学 , 2020 .
XUE X D . Indoor drone obstacle avoidance based on deep reinforcement learning [D ] . Harb in:Harbin Institute of Technology, 2020 . (in Chinese)
施伟 , 冯旸赫 , 程光权 , 等 . 基于深度强化学习的多机协同作战方法研究 [J ] . 自动化学报 , 2021 , 47 ( 7 ): 1610 - 1623 .
SHI W , FENG Y H , CHENG G Q , et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning [J ] . Acta Automatica Sinica , 2021 , 47 ( 7 ): 1610 - 1623 . (in Chinese)
段海滨 , 张岱峰 , 范彦铭 , 等 . 从狼群智能到无人机集群协同决策 [J ] . 中国科学:信息科学 , 2019 , 49 ( 1 ): 112 - 118 .
DUO H B , ZHANG D F , FAN Y M , et al. From Wolf Pack Intelligence tocollaborative decision-making of unmanned aerial vehicle swarms [J ] . Scientia Sinica Informationis , 2019 , 49 ( 1 ): 112 - 118 . (in Chinese) DOI: 10.1360/N112018-00168 http://doi.org/10.1360/N112018-00168 http://engine.scichina.com/doi/10.1360/N112018-00168 http://engine.scichina.com/doi/10.1360/N112018-00168
李子涵 . 基于强化学习的无人机集群对抗仿真研究 [D ] . 西安 : 西安工业大学 , 2023 .
LI Z H . Research on drone swarm adversarial simulat on based on reinforcement learning [D ] . Xi’an : Xi’an Technological University , 2023 . (in Chinese)
熊丽琴 , 曹雷 , 赖俊 , 等 . 基于值分解的多智能体深度强化学习综述 [J ] . 计算机科学 , 2022 , 49 ( 9 ): 172 - 182 . DOI: 10.11896/jsjkx.210800112 http://doi.org/10.11896/jsjkx.210800112 基于值分解的多智能体深度强化学习是众多多智能体深度强化学习算法中的一类,也是多智能体深度强化学习领域的一个研究热点。它利用某种约束将多智能体系统的联合动作值函数分解为个体动作值函数的某种特定组合,能够有效解决多智能体系统中的环境非稳定性和动作空间指数爆炸等问题。文中首先说明了进行值函数分解的原因;其次,介绍了多智能体深度强化学习的基本理论;接着根据是否引入其他机制以及引入机制的不同将基于值分解的多智能体深度强化学习算法分为3类:简单因子分解型、基于IGM(个体-全局-最大)原则型以及基于注意力机制型;然后按分类重点介绍了几种典型算法并对算法的优缺点进行对比分析;最后简要阐述了所提算法的应用和发展前景。
XIONG L Q , CAO L , LAI J , et al. A review of multi-agent deep reinforcement learning based on value decomposition [J ] . Computer Science , 2022 , 49 ( 9 ): 172 - 182 . (in Chinese)
李航 , 刘代金 , 刘禹 . 军事智能博弈对抗系统设计框架研究 [J ] . 火力与指挥控制 , 2020 , 45 ( 9 ): 116 - 121 .
LI H , LIU D J , LIU Y . Research on the design framework of military intelligent game adversarial system [J ] . Firepower and Command Control , 2020 , 45 ( 9 ): 116 - 121 . (in Chinese)
国子婧 , 冯旸赫 , 姚晨蝶 , 等 . 基于人类先验知识的强化学习综述 [J ] . 计算机应用 , 2021 , 41 ( 增刊2 ): 1 - 4 .
GUO Z J , FENG Y H , YAO C D , et al. A review of reinforcement learning based on human prior knowledge [J ] . Computer Applications , 2021 , 41 ( S2 ): 1 - 4 . (in Chinese)
MING T . Multi-agent reinforcement learning: independent vursus cooperative agents [C ] // Proceedings of International Conference on Machine Learning.San Francisco, CA , US : Morgan Kaufmann Publishers Inc. , 1993 : 330 - 337 .
肖扬 , 吴家威 , 李鉴学 , 等 . 一种基于深度强化学习的动态路由算法 [J ] . 信息通信技术与政策 , 2020 , 46 ( 9 ): 48 - 54 .
XIAO Y , WU J W , LI J X , et al. A dynamic routing algorithm based on deep reinforcement learning [J ] . Information and Communication Technology & Policy , 2020 , 46 ( 9 ): 48 - 54 . (in Chinese)
卜令正 . 基于深度强化学习的机械臂控制研究 [D ] . 北京 : 中国矿业大学 , 2019 .
BU L Z . Research on robotic arm control based on deep reinforcement learning [D ] . Beijing : China University of Mining and Technology , 2019 . (in Chinese)
TABISH R , MIKAYEL S , CHRISTIAN S D W , et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning:arXiv:1803.11485 [R ] . Ithaca,NY , US : Cornell University , 2018 : 1803 .11485.
KYUNGHWAN S , DAEWOO K , WAN J K , et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning [J ] . Computing Research Repository , 2019 , 97 : 5887 - 5896 .
PETER S , GUY L , AUDRUNAS G , et al. Value-decomposition networks for cooperative multi-agent learning:arXiv:1706.05296 [R ] . Ithaca,NY , US : Cornell University , 2017 : 1706 .05296.
0
浏览量
464
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号