基于MLAT-DRL算法的协同区域信息采集策略

doi:10.12382/bgxb.2023.1081

摘要/Abstract

摘要：

针对对抗环境下无人机集群协同信息采集任务面临的环境结构复杂、集群通信受阻等难题,提出一种基于多层次混合观测空间与注意力机制的深度强化学习(Multi-Level hybrid observation space with Attention-Deep Reinforcement Learning,MLAT-DRL)算法,用于信息采集任务中无人机的决策。采用集中式训练、分布式执行(Centralized Training with Decentralized Execution,CTDE)范式,实现无通信条件下无人机集群的高效协同;提出多层次混合观测空间方法,形成环境特征的多尺度表达,实现了对全局信息和局部观测的高效利用;在算法网络结构中引入结合注意力(Attention)机制的循环神经网络(Recurrent Neural Network,RNN),提高了无人机集群的风险感知能力;采用优先经验回放(Priority Experience Replay,PER)策略,提高样本利用率,降低训练难度。经过仿真实验验证,MLAT-DRL算法在数据采集和风险规避等方面效果均优于基线算法。

关键词: 无人机集群, 区域信息采集, 多智能体强化学习, 多层次混合观测空间, 注意力机制

Abstract:

Aiming at the difficulties faced by UAV swarm collaborative regional information collection in adversarial environment (e.g., complex environment structure and blocked swarm communication), a multi-level hybrid observation space with attention-deep reinforcement learning (MLAT-DRL) is proposed for decision making of UAV in information collection task. The proposed algorithm adopts a centralized training with decentralized execution paradigm, which realizes the efficient collaboration of UAV swarm in the absence of communications. In addition, a multi-level hybrid observation space method is proposed to develop the multi-scale representations of environmental features and realize the efficient use of global information and local observation. Moreover, the algorithm introduces a recurrent neural network incorporating an attention mechanism in the network, which improves the risk perception ability of UAV swarm. A prioritized experience replay strategy is employed to improve the utilization rate of samples and reduces the difficulty of training. It is verified from simulations that the proposed MLAT-DRL algorithm outperforms baseline algorithms in terms of data collection and risk aversion.

Key words: unmanned aerial vehicle swarm, regional information collection, multi-agent reinforcement learning, multi-level hybrid observation space, attention mechanism

中图分类号:

V279
TP18

娄抒瀚, 王冲冲, 龚炜, 邓立原, 李莉. 基于MLAT-DRL算法的协同区域信息采集策略[J]. 兵工学报, 2024, 45(12): 4423-4434.

LOU Shuhan, WANG Chongchong, GONG Wei, DENG Liyuan, LI Li. Collaborative Regional Information Collection Strategy Based on MLAT-DRL Algorithm[J]. Acta Armamentarii, 2024, 45(12): 4423-4434.

图/表 13

图1 无人机集群协同信息采集示意图

Fig.1 Schematic diagram of collaborative information collection by UAV swarm

图2 无人机集群和任务区域的初始状态

Fig.2 Initial state of UAV swarm and task region

图3 多层次混合观测空间

Fig.3 Multi-level hybrid observation space

图4 MLAT-DRL算法网络结构

Fig.4 Network structure of MLAT-DRL algorithm

图5 MLAT-DRL区域信息采集算法

Fig.5 MLAT-DRL regional information collection algorithm

图6 区域内数据密度三维热图

Fig.6 3-D heat map of data density in the region

表1 无人机参数设置

Table 1 Parameters of UAVs

参数	数值	参数	数值
v/(m·s^-1)	16.7	α	1
ξ	30	β	0.1
η	10	ρ_c/m	1000

表2 雷达参数设置

Table 2 Parameters of counterparty radars

雷达编号	坐标/km	最大探测半径/km	虚警概率
Ⅰ	(3.2, 3.2)	1.50
Ⅱ	(1.25, 1.25)	1.25
Ⅲ	(5.15, 1.25)	1.25	1×10^-6
Ⅳ	(1.25, 5.15)	1.25
Ⅴ	(5.15, 5.15)	1.25

图7 ρc=300m、E=700时的回报

Fig.7 Return curve for ρc=300m and E=700

图8 ρc=300m、E=700的采集数据量和集群剩余能量

Fig.8 Amount of collected data and swarm remaining energy for ρc=300m and E=700

图9 智能体运动轨迹

Fig.9 Trajectories of agents

图10 不同算法智能体的采集数据量

Fig.10 Amount of data collected by agents with different algorithms

图11 不同算法智能体的平均每轮被探测次数

Fig.11 Average number of detections per round for agents with different algorithms

参考文献 23

[1]	宋晓茹, 刘康, 高嵩, 等. 复杂战场环境下改进YOLOv5军事目标识别算法研究[J]. 兵工学报, 2024, 45(3):934-947. doi: 10.12382/bgxb.2022.0736
	SONG X R, LIU K, GAO S, et al. Research on improving YOLOv5-based military target recognition algorithmused in complex battlefield environment[J]. Acta Armamentarii, 2024, 45(3):934-947. (in Chinese)
[2]	WANG T, LI M, ZHANG M Y. Cooperative coverage reconnaissance of multi-UAV[C]// Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference. Chongqing, China: IEEE, 2020: 1647-1651.
[3]	傅彦铭, 张思远. 基于综合评分的移动群智感知隐私激励机制[J]. 计算机科学, 2024, 51(7):397-404. doi: 10.11896/jsjkx.230400181
	FU Y M, ZHANG S Y, Privacy incentive mechanism for mobile crowd-sensing with comprehensive scoring[J]. Computer Science, 2024, 51(7):397-404. (in Chinese) doi: 10.11896/jsjkx.230400181
[4]	尹依伊, 王晓芳, 周健. 基于Q学习的多无人机协同航迹规划方法[J]. 兵工学报, 2023, 44(2): 484-495. doi: 10.12382/bgxb.2021.0606
	YIN Y Y, WANG X F, ZHOU J. Q-learning-based multi-UAV cooperative path planning method[J]. Acta Armamentarii, 2023, 44(2): 484-495. (in Chinese) doi: 10.12382/bgxb.2021.0606
[5]	WANG J, KWAN M P. Hexagon-based adaptive crystal growth Voronoi diagrams based on weighted planes for service area delimitation[J]. ISPRS International Journal of Geo-Information, 2018, 7(7): 257.
[6]	SCHWARZROCK J, ZACARIAS I, BAZZAN A L C, et al. Solving task allocation problem in multi unmanned aerial vehicles systems using swarm intelligence[J]. Engineering Applications of Artificial Intelligence, 2018, 72: 10-20.
[7]	LIU J H, YANG J G, LIU H P, et al. An improved ant colony algorithm for robot path planning[J]. Soft Computing, 2017, 21: 5829-5839.
[8]	AZPÚRUA H, FREITAS G M, MACHARET D G, et al. Multi-robot coverage path planning using hexagonal segmentation for geophysical surveys[J]. Robotica, 2018, 36(8): 1144-1166.
[9]	LIU W, ZHANG T, HUANG S J, et al. A hybrid optimization framework for UAV reconnaissance mission planning[J]. Computers & Industrial Engineering, 2022, 173: 108653.
[10]	GAO S, WU J Z, AI J L. Multi-UAV reconnaissance task allocation for heterogeneous targets using grouping ant colony optimization algorithm[J]. Soft Computing, 2021, 25: 7155-7167.
[11]	彭泉, 郑晓龙, 尤浩, 等. 一种多无人机协同覆盖航迹规划算法[J]. 舰船电子工程, 2020, 40(8): 44-48, 101.
	PENG Q, ZHENG X L, YOU H, et al. A path planning algorithm for cooperative coverage of multiple UAVs[J]. Ship Electronic Engineering, 2020, 40(8): 44-48,101. (in Chinese)
[12]	ZHAO Y J, ZHENG Z, LIU Y. Survey on computational-intelligence-based UAV path planning[J]. Knowledge-Based Systems, 2018, 158: 54-64.
[13]	王勋, 姚佩阳, 梅权. 多无人机协同运动目标搜索问题研究[J]. 电光与控制, 2016, 23(8): 18-22.
	WANG X, YAO P Y, MEI Q. On multi-UAV cooperation for moving target searching[J]. Electronics Optics and Control, 2016, 23(8): 18-22. (in Chinese)
[14]	张宇明, 徐连明, 印思源, 等. 面向信息年龄的应急无人机网络低能耗信息采集和传输调度机制[J]. 通信学报, 2023, 44(7): 1-13. doi: 10.11959/j.issn.1000-436x.2023116
	ZHANG Y M, XU L M, YIN S Y, et al. AoI-oriented low-energy-consumption information collection and transmission scheduling mechanism for emergency UAV networks[J]. Journal of Communications, 2023, 44(7): 1-13. (in Chinese)
[15]	LIU K, ZHENG J. UAV trajectory optimization for time-constrained data collection in UAV-enabled environmental monitoring systems[J]. IEEE Internet of Things Journal, 2022, 9(23): 24300-24314.
[16]	LIU C H, CHEN Z Y, ZHAN Y F. Energy-efficient distributed mobile crowd sensing: A deep learning approach[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(6): 1262-1276.
[17]	CAI T, YANG Z H, CHEN Y F, et al. Cooperative data sensing and computation offloading in UAV-assisted crowdsensing with multi-agent deep reinforcement learning[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(5): 3197-3211.
[18]	DAI Z P, LIU C H, HAN R, et al. Delay-sensitive energy-efficient UAV crowdsensing by deep reinforcement learning[J]. IEEE Transactions on Mobile Computing, 2023, 22(4): 2038-2052.
[19]	蔡超, 葛超, 武振波, 等. 基于动态RCS的无人飞行器隐身突防航迹规划[J]. 华中科技大学学报(自然科学版), 2022, 50(11): 72-78.
	CAI C, GE C, WU Z B, et al. Stealth penetration path planning of unmanned aerial vehicle based on dynamic RCS[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50(11): 72-78. (in Chinese)
[20]	赵艳丽, 李宏, 杜嘉薇, 等. 对空间探测雷达网的多机协同航迹欺骗干扰方法[J]. 电子与信息学报, 2023, 45(2): 585-591.
	ZHAO Y L, LI H, DU J W, et al. Multi-jammer cooperation track deception jamming method against space detection radar network[J]. Journal of Electronics & Information Technology, 2023, 45(2): 585-591. (in Chinese)
[21]	韩晓斐, 何华锋, 何耀民, 等. 主被动异构弹载雷达组网抑制假目标干扰方法[J]. 兵工学报, 2022, 43(12): 3142-3150.
	HAN X F, HE H F, HE Y M, et al. A method against false-target jamming based on active/passive isomerism missile-borne radar network[J]. Acta Armamentarii, 2022, 43(12): 3142-3150. (in Chinese) doi: 10.12382/bgxb.2021.0716
[22]	丁柏圆, 郑凯元, 刘承禹, 等. 基于改进差分进化的多无人机协同航迹欺骗算法研究[J]. 航天电子对抗, 2021, 37(5): 25-30,39.
	DING B Y, ZHENG K Y, LIU C Y, et al. Multi UCAVs cooperative track deception algorithm based on improved differential evolution[J]. Aerospace Electronic Warfare, 2021, 37(5): 25-30,39. (in Chinese)
[23]	代学武, 吴越, 石琦, 等. 基于优先经验回放可迁移深度强化学习的高铁调度[J]. 控制与决策, 2023, 38(8): 2375-2388.
	DAI X W, WU Y, SHI Q, et al. A transferable deep reinforcement learning high-speed railway rescheduling method based on prioritized experience replay[J]. Control and Decision, 2023, 38(8): 2375-2388. (in Chinese)