欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2023, Vol. 44 ›› Issue (11): 3295-3309.doi: 10.12382/bgxb.2023.0810

所属专题: 群体协同与自主技术

• • 上一篇    下一篇

无人集群作战任务的多智能体强化学习卸载决策

李佳键1,2, 史彦军1,2,*(), 杨雨1,3, 李波3,4, 赵熙俊3,4   

  1. 1 大连理工大学 机械工程学院, 辽宁 大连 116024
    2 高性能精密制造全国重点实验室, 辽宁 大连 116024
    3 中兵智能创新研究院有限公司, 北京 100072
    4 群体协同与自主实验室, 北京 100072
  • 收稿日期:2023-08-29 上线日期:2023-11-05
  • 通讯作者:
    *邮箱:
  • 基金资助:
    群体协同与自主实验室开放基金项目(QXZ23014301)

Multi-agent Reinforcement Learning-based Offloading Decision for UAV Cluster Combat Tasks

LI Jiajian1,2, SHI Yanjun1,2,*(), YANG Yu1,3, LI Bo3,4, ZHAO Xijun3,4   

  1. 1 School of Mechanical Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China
    2 State Key Laboratory of High-performance Precision Manufacturing, Dalian 116024, Liaoning, China
    3 China North Artificial Intelligence & Innovation Research Institute, Beijing 100072, China
    4 Collective Intelligence & Collaboration Laboratory, Beijing 100072, China
  • Received:2023-08-29 Online:2023-11-05

摘要:

近年来,任务卸载作为保障无人集群高效协同作战的关键技术之一,正成为研究热点。任务卸载旨在克服单平台算力不足、能量有限等约束,将计算任务卸载到边缘网络的服务器上进行处理,以达到降本增效的目的。以无人集群辅助的天地一体化协同侦察为作战场景,考虑战时复杂多变的电磁环境以及集群组网拓扑时变性,利用Lyapunov优化把长期任务卸载解耦为在线马尔可夫决策过程。为解决混合动作空间收敛难、学习效率底的问题,结合凸优化和多智能体深度确定性策略,分层求解功率分配和任务分配问题,提出数据-模型双层优化驱动的多智能体强化学习卸载决策算法。数值实验结果表明,新算法能够根据时变的战场环境自适应调整智能体任务卸载策略,达到提升传统算法性能和优化复杂多维目标的目的。

关键词: 无人集群, 任务卸载, 多智能体强化学习, 凸优化, Lyapunov优化

Abstract:

In recent years, the task offloading has been becoming a research hotspot. It is one of the key technologies to ensure the efficient cooperative operations of unmanned aerial vehicle (UAV) cluster, aiming to overcome the constraints of insufficient computing power and limited energy of a single platform. The purpose of reducing cost s and increasing efficiency is achieved by offloading the computing tasks to the servers of edge network for processing. In this paper, the UAV cluster-assisted air-ground integrated cooperative reconnaissance is taken as the combat scenario, and the complex wartime electromagnetic environment and the time-varying network topology of the cluster is considered. The long-term task offloading is decoupled into an online Markov decision process via Lyapunov optimization. To solve the problems of difficult convergence in hybrid action space and low learning efficiency, a multi-agent reinforcement learning offloading decision algorithm driven by data-model bi-level optimization is proposed by combining the convex optimization and multi-agent deep deterministic strategy to solve the power allocation and task allocation problem hierarchically. Numerical experiments show that the proposed algorithm can adaptively adjust the agent task offloading strategy according to the time-varying battlefield environment to improve the performance of traditional algorithm and optimize the complex multi-dimensional objectives.

Key words: UAV cluster, task offload, multi-agent reinforcement learning, convex optimization, Lyapunov optimization

中图分类号: