基于强化学习的无人机集群对抗策略推演仿真

doi:10.12382/bgxb.2023.0877

摘要/Abstract

摘要：

无人机集群在军事战争、公共安全和商业领域的应用越来越广泛,但在复杂多变的对抗环境下,制定高效的策略仍然是一个挑战。为使无人机集群能够自主学习和适应对抗环境的变化,提高任务执行的效率和成功率,提出一种基于值分解的多智能体强化学习算法框架,在仿真平台模拟不同对抗场景下的无人机集群行为,通过强化学习算法,培养无人机集群在不同情境下做出决策的能力,以实现任务目标的最优化。讨论不同强化学习算法在无人机集群对抗策略中的应用和性能比较。实验结果表明,该算法在多种集群对抗环境下均表现出良好的效果,展现出其在军事无人机集群对抗中的有力支持。

关键词: 无人机集群, 对抗策略, 强化学习, 值分解

Abstract:

The application of drone clusters in military warfare, public safety, and commercial fields is becoming increasingly widespread. But it is a challenge to develop the efficient strategiesin complex and ever-changing adversarial environments. In order to enable the drone clusters to autonomously learn and adapt to the change in adversarial environment, and improve the efficiency and success rate of task execution, a multi-agent reinforcement learning algorithm framework based on value decomposition is proposed. The behavior of drone clusters in different adversarial scenarios is simulated on a simulation platform, and the ability of drone clusters to make decisions in different situations is cultivated to achieve the optimal task objectives through reinforcement learning algorithms. The application and performance comparison of different reinforcement learning algorithms in drone swarm adversarial strategies are discussed. The experimental results show that the proposed algorithm shows good performance in various cluster confrontation environments, demonstrating its strong support in military drone cluster confrontation.

Key words: drone swarm, adversarial strategy, deep reinforcement learning, value decomposition

中图分类号:

TP18
E91

曹子建, 孙泽龙, 闫国闯, 傅妍芳, 杨博, 李秦洁, 雷凯麟, 高领航. 基于强化学习的无人机集群对抗策略推演仿真[J]. 兵工学报, 2023, 44(S2): 126-134.

CAO Zijian, SUN Zelong, YAN Guochuang, FU Yanfang, YANG Bo, LI Qinjie, LEI Kailin, GAO Linghang. Simulation of Reinforcement Learning-based UAV Swarm Adversarial Strategy Deduction[J]. Acta Armamentarii, 2023, 44(S2): 126-134.

图/表 9

图1 强化学习示意图

Fig.1 Schematic diagram of reinforcement learning

图2 QMIX算法框架

Fig.2 QMIX algorithm framework

图3 算法框架图

Fig.3 Algorithm framework diagram

表1 状态空间

Table 1 State space

智能体	状态空间
红方智能体	经度、纬度、高度、速度、朝向、燃油、损伤、弹药量
蓝芳智能体	经度、纬度、高度、速度、朝向、燃油、损伤、弹药量

表2 动作空间

Table 2 Action space

智能体	动作空间
红方智能体	方位、朝向角、攻击、高度、速度
蓝方智能体	方位、朝向角、高度、速度

图4 智能体训练流程图

Fig.4 Flowchart of agent training

图5 区域分割

Fig.5 Fegional segmentation

图6 奖励曲线图

Fig.6 Reward curve chart

图7 胜率曲线图

Fig.7 Victory curve chart

参考文献 32

[1]	王尔申, 刘帆, 宏晨, 等. 基于MASAC的无人机集群对抗博弈方法[J]. 中国科学:信息科学, 2022, 52(12):2254-2269.
	WANG E S, LIU F, HONG C, et al. MASAC based unmanned aerial vehicle cluster adversarial game method[J]. Scientia Sinica Informationis, 2022, 52(12):2254-2269. (in Chinese) doi: 10.1360/SSI-2022-0303 URL
[2]	王莉. 人工智能在军事领域的渗透与应用思考[J]. 科技导报, 2017, 35(15): 15-19.
	WANG L. The penetration and application of artificiaintelligence in the military field[J]. Science & Technology Review, 2017, 35(15): 15-19. ( in Chinese)
[3]	喻新尧, 李平俊, 张焱翔. 人工智能技术在军事及后勤领域的应用研究[J]. 舰船电子工程, 2022, 42(9):1-5,40.
	YU X Y, LI P J, ZHANG Y X. Research on the application of artificial intelligence technology in the military and logis tics fields[J]. Ship Electronic Engineering, 2022, 42(9): 1-5,40. (in Chinese)
[4]	彭昉, 田尧, 南永刚, 等. 人工智能在军事领域的发展应用[J]. 甘肃科技, 2022, 38(17):47-49,57.
	PENG F, TIAN Y, NAN Y G, et al. Development and application of artificial intelligence in the military field[J]. Gansu Science and Technology, 2022, 38(17): 47-49,57. (in Chinese)
[5]	WANG Z H, GUO Y, LI N, et al. Autonomous confrontation strategy learning evolution mechanism of unmanned system group under actual combat in the loop[J]. Computer Communications, 2023, 209: 283-301. doi: 10.1016/j.comcom.2023.07.006 URL
[6]	ZHANG J, XING J H. Cooperative task assignment of multi-uav system[J]. Chinese Journal of Aeronautics, 2020, 33(11): 2825-2827. doi: 10.1016/j.cja.2020.02.009 URL
[7]	张煌, 贾珍珍. 无人作战的人机融合:挑战与出路[J]. 国防科技, 2020, 41(6):105-109.
	ZHANG H, JIA Z Z. Human-machine integration in unmanned combat: Challenges and solutions[J]. National Defense & Technology, 2020, 41(6): 105-109. (in Chinese)
[8]	汤润泽, 张承龙, 李林林. 人工智能在无人战场态势预判与博弈对抗中的应用[J]. 现代防御技术, 2020, 48(5):25-31. doi: 10.3969/j.issn.1009-086x.2020.05.004
	TANG R Z, ZHANG C L, LI L L. Application of artificial intelligence in situational prediction and game-based adversarial scenarios on unmanned battlefields[J]. Modern Defense Technology, 2020, 48(5): 25-31. (in Chinese)
[9]	VOLODYMYR M, KORRAY K, DAVID S, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[10]	TIMOTHY P L, JONATHAN J H, ALEXANDER P, et al. Continuous control with deep reinforcement learning[J]. Computing Research Repository, 2015, abs/1509.02971.
[11]	JOHN S, FILIP W, PRAFULLA D, et al. Proximal policy optimization algorithms[J]. Computing Research Repository, 2017, arXiv:1707.06347. https://doi.org/10.48550/arXiv.1707.06347.
[12]	MNIH V, BADIA PUIGDOMÈNECH A, MIRZA M, et al. Asynchronous methods for deep reinforcement learning :arXiv:1602.01783[R]. Ithaca,NY, US: Cornell University, 2016:1602.01783.
[13]	程思雨, 林锋. 计算机围棋AlphaGo算法对人类围棋算法的影响[J]. 中国科技信息, 2019(2):40-41.
	CHENG S Y, LIN F. The impact of AlphaGo algorithm on human Go algorithms[J]. China Science and Technology Information, 2019(2): 40-41. (in Chinese)
[14]	赵诣. 大数据下的机器学习算法综述——以AlphaGO为例[J]. 信息记录材料, 2019, 20(1):10-12.
	ZHAO Y. A review of machine learning algorithms in the era of big data—a case study of AlphaGo[J]. Information Recording Materials, 2019, 20(1): 10-12. (in Chinese)
[15]	LIU H Y, WU K, HUANG K H, et al. Optimization of large-scale UAV cluster confrontation game based on integrated evolution strategy[Z]. Berlin, Germany:Springer Link, 2023.DOI:10.1007/s10586-022-03961-0.
[16]	ZHU P X, FANG X. Multi-UAV cooperative task assignment based on half random Q-Learning[J]. Symmetry, 2021, 13(12): 2417. doi: 10.3390/sym13122417 URL
[17]	马建平. 无人机协同作战及其战场态势可视化[D]. 西安: 西安电子科技大学, 2021.
	MA J P. Unmanned aerial vehicle collaborative operations and visualization of battlefield situation[D]. Xi’an: Xidian University, 2021. (in Chinese)
[18]	马云婷. 多智能体强化学习奖励机制研究[D]. 合肥: 合肥工业大学, 2021.
	MA Y T. Research onreward mechanisms in multi-agent reinforcement learning[D]. Hefei: Hefei University of Technology, 2021. (in Chinese)
[19]	唐峯竹, 唐欣, 李春海, 等. 基于深度强化学习的多无人机任务动态分配[J]. 广西师范大学学报(自然科学版), 2021, 39(6):63-71.
	TANG F Z, TANG X, LI C H, et al. Dynamic allocation of multiple UAV tasks based on deep reinforcement learning[J]. Journal of Guangxi Normal University (Natural Science Edition), 2021, 39(6): 63-71. (in Chinese)
[20]	薛喜地. 基于深度强化学习的室内无人机避障[D]. 哈尔滨: 哈尔滨工业大学, 2020.
	XUE X D. Indoor drone obstacle avoidance based on deep reinforcement learning[D]. Harbin:Harbin Institute of Technology, 2020. (in Chinese)
[21]	施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同作战方法研究[J]. 自动化学报, 2021, 47(7):1610-1623.
	SHI W, FENG Y H, CHENG G Q, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning[J]. Acta Automatica Sinica, 2021, 47(7): 1610-1623. (in Chinese)
[22]	段海滨, 张岱峰, 范彦铭, 等. 从狼群智能到无人机集群协同决策[J]. 中国科学:信息科学, 2019, 49(1): 112-118.
	DUO H B, ZHANG D F, FAN Y M, et al. From Wolf Pack Intelligence tocollaborative decision-making of unmanned aerial vehicle swarms[J]. Scientia Sinica Informationis, 2019, 49(1):112-118. (in Chinese) doi: 10.1360/N112018-00168 URL
[23]	李子涵. 基于强化学习的无人机集群对抗仿真研究[D]. 西安: 西安工业大学, 2023.
	LI Z H. Research on drone swarm adversarial simulat on based on reinforcement learning[D]. Xi’an: Xi’an Technological University, 2023. (in Chinese)
[24]	熊丽琴, 曹雷, 赖俊, 等. 基于值分解的多智能体深度强化学习综述[J]. 计算机科学, 2022, 49(9):172-182. doi: 10.11896/jsjkx.210800112
	XIONG L Q, CAO L, LAI J, et al. A review of multi-agent deep reinforcement learning based on value decomposition[J]. Computer Science, 2022, 49(9): 172-182. (in Chinese)
[25]	李航, 刘代金, 刘禹. 军事智能博弈对抗系统设计框架研究[J]. 火力与指挥控制, 2020, 45(9):116-121.
	LI H, LIU D J, LIU Y. Research on the design framework of military intelligent game adversarial system[J]. Firepower and Command Control, 2020, 45(9): 116-121. (in Chinese)
[26]	国子婧, 冯旸赫, 姚晨蝶, 等. 基于人类先验知识的强化学习综述[J]. 计算机应用, 2021, 41(增刊2):1-4.
	GUO Z J, FENG Y H, YAO C D, et al. A review of reinforcement learning based on human prior knowledge[J]. Computer Applications, 2021, 41(S2): 1-4. (in Chinese)
[27]	MING T. Multi-agent reinforcement learning: independent vursus cooperative agents[C]// Proceedings of International Conference on Machine Learning.San Francisco, CA, US: Morgan Kaufmann Publishers Inc., 1993: 330-337.
[28]	肖扬, 吴家威, 李鉴学, 等. 一种基于深度强化学习的动态路由算法[J]. 信息通信技术与政策, 2020, 46(9):48-54.
	XIAO Y, WU J W, LI J X, et al. A dynamic routing algorithm based on deep reinforcement learning[J]. Information and Communication Technology & Policy, 2020, 46(9): 48-54. (in Chinese)
[29]	卜令正. 基于深度强化学习的机械臂控制研究[D]. 北京: 中国矿业大学, 2019.
	BU L Z. Research on robotic arm control based on deep reinforcement learning[D]. Beijing: China University of Mining and Technology, 2019. (in Chinese)
[30]	TABISH R, MIKAYEL S, CHRISTIAN S D W, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning:arXiv:1803.11485[R]. Ithaca,NY, US: Cornell University, 2018:1803.11485.
[31]	KYUNGHWAN S, DAEWOO K, WAN J K, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning[J]. Computing Research Repository, 2019, 97: 5887-5896.
[32]	PETER S, GUY L, AUDRUNAS G, et al. Value-decomposition networks for cooperative multi-agent learning:arXiv:1706.05296[R]. Ithaca,NY, US: Cornell University, 2017:1706.05296.