Multi-agent Reinforcement Learning-based Offloading Decision for UAV Cluster Combat Tasks

doi:10.12382/bgxb.2023.0810

Abstract

Abstract:

In recent years, the task offloading has been becoming a research hotspot. It is one of the key technologies to ensure the efficient cooperative operations of unmanned aerial vehicle (UAV) cluster, aiming to overcome the constraints of insufficient computing power and limited energy of a single platform. The purpose of reducing cost s and increasing efficiency is achieved by offloading the computing tasks to the servers of edge network for processing. In this paper, the UAV cluster-assisted air-ground integrated cooperative reconnaissance is taken as the combat scenario, and the complex wartime electromagnetic environment and the time-varying network topology of the cluster is considered. The long-term task offloading is decoupled into an online Markov decision process via Lyapunov optimization. To solve the problems of difficult convergence in hybrid action space and low learning efficiency, a multi-agent reinforcement learning offloading decision algorithm driven by data-model bi-level optimization is proposed by combining the convex optimization and multi-agent deep deterministic strategy to solve the power allocation and task allocation problem hierarchically. Numerical experiments show that the proposed algorithm can adaptively adjust the agent task offloading strategy according to the time-varying battlefield environment to improve the performance of traditional algorithm and optimize the complex multi-dimensional objectives.

Key words: UAV cluster, task offload, multi-agent reinforcement learning, convex optimization, Lyapunov optimization

CLC Number:

TP181

LI Jiajian, SHI Yanjun, YANG Yu, LI Bo, ZHAO Xijun. Multi-agent Reinforcement Learning-based Offloading Decision for UAV Cluster Combat Tasks[J]. Acta Armamentarii, 2023, 44(11): 3295-3309.

Figures/Tables 13

Fig.1 Urban air-ground collaborative reconnaissance combat mission demand

Fig.2 UAV cluster-assisted MEC system for task offloading

Fig.3 Framework of offloading decision optimization algorithm based on multi-agent reinforcement learning

Fig.4 Dataflow of O-MADDPG algorithm driven by two-layer optimization

Table 1 Setting of simulation parameters

参数	默认值
UE数量N/个	5
UAV数量K/个	4
无人机通信半径L/m	250
无人机悬停高度H/m	15
每个UAU信道带宽B/MHz	1
白噪声功率σ₀/dBm	-113
UE计算频率f_l/GHz	2.4
UAV计算频率f_e/GHz	24
平均能量E_avg/J	0.5
UE执行1bit所需的CPU时钟周期X^l/(cycle·bit^-1)	5000
UAV执行1bit所需的CPU时钟周期X^e/(cycle·bit^-1)	5000
时隙任务所需的计算量A_i_,_t	[5000, 40000]
时隙任务卸载时所需传输的数据量L_i_,_t	[5000, 10000]
容忍时延 $τ d i, t$ /s	0.006
系统周期T/min	10
时隙长度τ/s	0.002
有效电容常数κ	10^-28
Lyapunov权重V	0.001

Table 1 Setting of simulation parameters

参数	默认值
UE数量N/个	5
UAV数量K/个	4
无人机通信半径L/m	250
无人机悬停高度H/m	15
每个UAU信道带宽B/MHz	1
白噪声功率σ₀/dBm	-113
UE计算频率f_l/GHz	2.4
UAV计算频率f_e/GHz	24
平均能量E_avg/J	0.5
UE执行1bit所需的CPU时钟周期X^l/(cycle·bit^-1)	5000
UAV执行1bit所需的CPU时钟周期X^e/(cycle·bit^-1)	5000
时隙任务所需的计算量A_i_,_t	[5000, 40000]
时隙任务卸载时所需传输的数据量L_i_,_t	[5000, 10000]
容忍时延 $τ d i, t$ /s	0.006
系统周期T/min	10
时隙长度τ/s	0.002
有效电容常数κ	10^-28
Lyapunov权重V	0.001

Table 2 Parameter settings of actor-critic networks

网络	数值	超参数	数值
	fc(state_dim, 64), relu	学习率	0.005
	fc(64,64), relu	批大小	1024
演员	fc(64,1), tanh	回放池容量	100000
	fc(state_dim, 64), relu	折扣因子	0.95
评论家	fc(64,64), relu	软更新温度因子	0.01
	fc(64,1), tanh	软更新间隔	1024

Fig.5 Reward curves for all UEs under O-MADDPG algorithm

Fig.6 Training efficiency of O-MADDPG under different training parameter settings

Fig.7 Convergence curves of O-MADDPG under different tolerance delay settings

Table 3 Analysis of simulated results

组	训练次数	任务失败率/%			任务处理时延/ms			系统能耗/J
组	训练次数	O-MADDPG	V-MADDPG	O-DDQN	O-MADDPG	V-MADDPG	O-DDQN	O-MADDPG	V-MADDPG	O-DDQN
1	1	6.418	3.893	27.264	3.817	3.611	5.204	0.8797	0.7089	1.3690
2	51	0.418	8.759	0.660	3.884	3.723	4.691	0.0054	0.1727	0.0066
3	101	0.385	11.897	0.649	3.757	3.616	4.692	0.0052	0.2807	0.0065
4	150	0.386	14.437	0.648	3.757	3.591	4.688	0.0052	0.5372	0.0065
5	201	0.362	13.867	0.649	3.655	3.574	4.691	0.0051	0.3072	0.0065
6	251	0.372	13.804	0.647	3.698	3.486	4.678	0.0051	0.3296	0.0065
均值		1.390	11.109	5.086	3.761	3.600	4.774	0.151	0.3894	0.234

Fig.8 Comparison of MEC performance under different algorithms

Fig.9 Reward performances of different algorithms with varying task computational load

Fig.10 Comparison of MEC performance between O-MADDPG algorithm and different offloading modes

References 33

[1]	李超, 王瑞星, 黄建忠, 等. 稀疏奖励下基于强化学习的无人集群自主决策与智能协同[J]. 兵工学报, 2023, 44(6): 1537-1546. doi: 10.12382/bgxb.2022.0177
	LI C, WANG R X, HUANG J Z, et al. Autonomous decision-making and intelligent collaboration of UAV swarms based on reinforcement learning with sparse rewards[J]. Acta Armamentarii, 2023, 44 (6): 1537-1546. (in Chinese) doi: 10.12382/bgxb.2022.0177
[2]	吕震华, 高亢. 美国无人集群城市作战应用发展综述[J]. 中国电子科学研究院学报, 2020, 15(8):738-745.
	LÜ Z H, GAO K. Review of the development of drone swarm urban combat applications in the USA[J]. Journal of China Academy of Electronics and Information Technology, 2020, 15(8):738-745. (in Chinese)
[3]	胡鹏林, 赵春晖, 胡劲文, 等. 拒止环境无人机集群协同感知与自主控制[C]// 第40届中国控制会议论文集(15). 上海: CNKI, 2021: 728-733.
	HU P L, ZHAO C H, HU J W, et al. Cooperative sensing and autonomous control of UAV swarm in denied environmentt[C]// Proceedings of the 40th Chinese Control Conference. Shanghai, China: CNKI, 2021:728-733. (in Chinese)
[4]	孙立健, 周鋆, 朱承, 等. 马赛克战兵力设计下的边缘指挥与控制组织结构[J]. 指挥与控制学报, 2022, 8(2):141-149.
	SUN L J, ZHOU Y, ZHU C, et al. Organizational structure of edge C2 under force design of mosaic warfare[J]. Journal of Command And Control, 2022, 8(2):141-149. (in Chinese)
[5]	DARIO S, ALESSANDRO V, PEKKA K, et al. Mobile-edge computing architecture: the role of MEC in the internet of things[J]. IEEE Consumer Electronics Magazine, 2016, 5(4): 84-91. doi: 10.1109/MCE.2016.2590118 URL
[6]	陈霄, 王潋, 刘巍, 等. 美军机动边缘信息服务能力现状概述[J]. 电光与控制, 2021, 28(7): 62-67.
	CHEN X, WANG L, LIU W, et al. Overview of the status quo of U.S. military mobile edge information service capability[J]. Electronics Optics and Control, 2021, 28(7): 62-67. (in Chinese)
[7]	陈霄, 刘巍, 夏淋淋, 等. 边缘计算军事应用需求及作战运用构想[J]. 火力与指挥控制, 2021, 46(8):1-4.
	CHEN X, LIU W, XIA L L, et al. Military application requirements and operational conception of edge computing[J]. Fire Control & Command Control, 2021, 46(8):1-4. (in Chinese)
[8]	薛建强, 史彦军, 李波. 面向无人集群的边缘计算技术综述[J]. 兵工学报, 2023, 44(9):2546-2555. doi: 10.12382/bgxb.2022.1209
	XUE J Q, SHI Y J, LI B. Overview of edge computing technology for unmanned cluster[J]. Acta Armamentarii, 2023, 44(9):2546-2555. (in Chinese)
[9]	王万斌. 面向战术智能终端任务的移动边缘计算卸载策略研究[D]. 成都: 电子科技大学, 2022.
	WANG W B. Research on mobile edge computing offloading strategy for tactical intelligent terminal task[D]. Chengdu: University of Electronic Science and Technology of China, 2022. (in Chinese)
[10]	ZHOU J J, SU Z, XU Q C, et al. Cooperative content offloading scheme in air-ocean integrated networks[J]. Peer-to-Peer Networking and Applications, 2021, 14(5): 3388-3404. doi: 10.1007/s12083-021-01160-z
[11]	LI K, NI W, YUAN X, et al. Deep-graph-based reinforcement learning for joint cruise control and task offloading for aerial edge internet of things(EdgeIoT)[J]. IEEE Internet of Things Journal, 2022, 9(21):21676-21686. doi: 10.1109/JIOT.2022.3182119 URL
[12]	缪家辉, 郑镐, 谢正昊, 等. 数字孪生辅助UAV网络计算卸载优化[J/OL]. 北京邮电大学学报, 2022, 45(6):133-139. DOI:10.13190/j.jbupt.2022-181.
	MIAO J H, ZHENG H, XIE Z H, et al. Offloading optimization in digital twin-aided UAV networks[J/OL]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(6): 133-139. DOI:10.13190/j.jbupt.2022-181. (in Chinese)
[13]	ZHOU S H, FEI S H, FENG Y Z. Deep reinforcement learning based UAV-assisted maritime network computation offloading strategy[C]// Proceedings of 2022 IEEE/CIC International Conference on Communications in China. Foshan, China: IEEE, 2022:890-895.
[14]	RAZA S, LIU W, AHMED M, et al. An efficient task offloading scheme in vehicular edge computing[J]. Journal of Cloud Computing, 2020, 9:1-14.
[15]	赵晓焱, 韩威, 张俊娜, 等. 基于异步深度强化学习的车联网协作卸载策略[J/OL]. 计算机应用. 2023:1-11,DOI:10.11772/j.issn.1001-9081.2023050788.
	ZHAO X Y, HAN W, ZHANG J N, et al. Collaborative offloading mechanism in internet of vehicles based on asynchronous deep reinforcement learning[J]. Journal of Computer Applications, 2023:1-11.doi:10.11772/j.issn.1001-9081.2023050788. (in Chinese)
[16]	TAN K G, FENG L, DÁN G, et al. Decentralized convex optimization for joint task offloading and resource allocation of vehicular edge computing systems[J]. IEEE Transactions on Vehicular Technology, 2022, 71(12):13226-13241. doi: 10.1109/TVT.2022.3197627 URL
[17]	刘晓宇, 许驰, 曾鹏, 等. 面向异构工业任务高并发计算卸载的深度强化学习算法[J]. 计算机学报, 2021, 44(12):2367-2381.
	LIU X Y, XU C, ZENG P, et al. Deep reinforcement learning-based high concurrent computing offloading for heterogeneous industrial tasks[J]. Chinese Journal of Computers, 2021, 44(12):2367-2381. (in Chinese)
[18]	ARDI T, TAMBET M, DORIAN K, et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PloS One, 2017, 12(4):e0172395. doi: 10.1371/journal.pone.0172395 URL
[19]	PETER S, GUY L, AUDRUNAS G, et al. Value-decomposition networks for cooperative multi-agent learning:arXiv:1706.05296[R]. Ithaca,NY,US:Cornell University, 2017:1706.05296.
[20]	TABISH R, MIKAYEL S, CHRISTIAN S D W, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1):7234-7284.
[21]	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments: arXiv:1706.02275[R]. Ithaca,NY,US:Cornell University, 2017:706.02275.
[22]	ZHU S C, GUI L, ZHAO D M, et al. Learning-based computation offloading approaches in UAVs-assisted edge computing[J]. IEEE Transactions on Vehicular Technology, 2021, 70(1):928-944. doi: 10.1109/TVT.25 URL
[23]	HE H, REN T, QIU Y, et al. Multi-agent computation offloading in UAV assisted MEC via deep reinforcement learning[C]// Proceedings of Smart Computing and Communication. New York, NY, US: Springer, 2022:416-426.
[24]	LI Y L, LIANG L, FU J L, et al. Multiagent reinforcementlearning for task offloading of space/aerial-assisted edgecomputing[J]. Security and Communication Networks, 2022, 2022: 4193365.17.
[25]	CHENG Z P, LIANG M H, CHEN N, et al. Deep reinforcement learning-based joint task and energy offloading in UAV-aided 6G intelligent edge networks[J]. Computer Communications, 2022, 192:234-244. doi: 10.1016/j.comcom.2022.06.017 URL
[26]	苏维亚, 徐飞, 王森. 基于改进MADDPG的UAV轨迹和计算卸载联合优化算法[J/OL]. 计算机系统应用, 2023, 32(11). DOI: 10.15888/j.cnki.csa.009277.
	SU W Y, XU F, WANG S. Joint optimization algorithm for UAV trajectory and computational offloading based on improved MADDPG[J]. Computer Systems & Applications, 2023, 32(11). DOI: 10.15888/j.cnki.csa.009277. (in Chinese)
[27]	李慧. 基于强化学习的无人机用户自适应边缘计算卸载策略研究[D]. 深圳: 哈尔滨工业大学(深圳), 2021.
	LI H. Research on UAV user adaptive edge computing offloading based on reinforcement learning[D]. Shenzhen: Harbin Institute of Technology, Shenzhen, 2021. (in Chinese)
[28]	XUE J B, WU Q Q, ZHANG H J. Cost optimization of UAV-MEC network calculation offloading: a multi-agent reinforcement learning method[J]. Ad Hoc Networks, 2022, 136:102981. doi: 10.1016/j.adhoc.2022.102981 URL
[29]	DAI Z J, ZHANG Y, ZHANG W C, et al. A multi-agent collaborative environment learning method for UAV deployment and resource allocation[J]. IEEE Transactions on Signal and Information Processing over Networks, 2022, 8:120-130. doi: 10.1109/TSIPN.2022.3150911 URL
[30]	ZHANG H B, LIU X Y, BIAN X, et al. A resource allocation scheme for real-time energy-aware offloading in vehicular networks with MEC[J]. Wireless Communications and Mobile Computing, 2022, 2022(10):1-17.
[31]	栗志. 基于MEC的计算卸载及资源分配算法研究[D]. 南京: 南京邮电大学, 2021.
	LI Z. Research on computing offloading and resource allocation algorithm based on MEC[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2021. (in Chinese)
[32]	ZHU B T, BEDEER E, NGUYEN H H, et al. UAV trajectory planning in wireless sensor networksfor energy consumption minimization by deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2021, 70(9):9540-9554. doi: 10.1109/TVT.2021.3102161 URL
[33]	杨力, 马伟东, 郭江宇, 等. 陆战场中的计算卸载和资源分配[J]. 火力与指挥控制, 2023, 48(4):17-23,31.
	YANG L, MA W D, GUO J Y, et al. Computation offloading and resource allocation in the land battlefield[J]. Fire Control & Command Control, 2023, 48(4):17-23,31. (in Chinese)