欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2024, Vol. 45 ›› Issue (12): 4423-4434.doi: 10.12382/bgxb.2023.1081

• • 上一篇    下一篇

基于MLAT-DRL算法的协同区域信息采集策略

娄抒瀚1, 王冲冲1, 龚炜1,2,*(), 邓立原1, 李莉1,2   

  1. 1 同济大学 电子与信息工程学院, 上海 201804
    2 同济大学 上海自主智能无人系统科学中心, 上海 201804
  • 收稿日期:2023-11-06 上线日期:2024-02-19
  • 通讯作者:
  • 基金资助:
    国家自然科学基金项目(72171172); 国家自然科学基金项目(92367101); 中央高校基本科研业务费专项资金项目(22120220613); 上海市重大科技专项项目(2021SHZDZX0100)

Collaborative Regional Information Collection Strategy Based on MLAT-DRL Algorithm

LOU Shuhan1, WANG Chongchong1, GONG Wei1,2,*(), DENG Liyuan1, LI Li1,2   

  1. 1 School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
    2 Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai 201804, China
  • Received:2023-11-06 Online:2024-02-19

摘要:

针对对抗环境下无人机集群协同信息采集任务面临的环境结构复杂、集群通信受阻等难题,提出一种基于多层次混合观测空间与注意力机制的深度强化学习(Multi-Level hybrid observation space with Attention-Deep Reinforcement Learning,MLAT-DRL)算法,用于信息采集任务中无人机的决策。采用集中式训练、分布式执行(Centralized Training with Decentralized Execution,CTDE)范式,实现无通信条件下无人机集群的高效协同;提出多层次混合观测空间方法,形成环境特征的多尺度表达,实现了对全局信息和局部观测的高效利用;在算法网络结构中引入结合注意力(Attention)机制的循环神经网络(Recurrent Neural Network,RNN),提高了无人机集群的风险感知能力;采用优先经验回放(Priority Experience Replay,PER)策略,提高样本利用率,降低训练难度。经过仿真实验验证,MLAT-DRL算法在数据采集和风险规避等方面效果均优于基线算法。

关键词: 无人机集群, 区域信息采集, 多智能体强化学习, 多层次混合观测空间, 注意力机制

Abstract:

Aiming at the difficulties faced by UAV swarm collaborative regional information collection in adversarial environment (e.g., complex environment structure and blocked swarm communication), a multi-level hybrid observation space with attention-deep reinforcement learning (MLAT-DRL) is proposed for decision making of UAV in information collection task. The proposed algorithm adopts a centralized training with decentralized execution paradigm, which realizes the efficient collaboration of UAV swarm in the absence of communications. In addition, a multi-level hybrid observation space method is proposed to develop the multi-scale representations of environmental features and realize the efficient use of global information and local observation. Moreover, the algorithm introduces a recurrent neural network incorporating an attention mechanism in the network, which improves the risk perception ability of UAV swarm. A prioritized experience replay strategy is employed to improve the utilization rate of samples and reduces the difficulty of training. It is verified from simulations that the proposed MLAT-DRL algorithm outperforms baseline algorithms in terms of data collection and risk aversion.

Key words: unmanned aerial vehicle swarm, regional information collection, multi-agent reinforcement learning, multi-level hybrid observation space, attention mechanism

中图分类号: