欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2023, Vol. 44 ›› Issue (6): 1537-1546.doi: 10.12382/bgxb.2022.0177

• • 上一篇    下一篇

稀疏奖励下基于强化学习的无人集群自主决策与智能协同

李超1,2, 王瑞星1,*(), 黄建忠1, 江飞龙3, 魏雪梅1, 孙延鑫1   

  1. 1.中国兵器工业试验测试研究院 技术中心, 陕西 西安 710116
    2.南京理工大学 机械工程学院, 江苏 南京 210094
    3.哈尔滨工业大学 航天学院, 黑龙江 哈尔滨 150001
  • 收稿日期:2022-03-21 上线日期:2023-06-30

Autonomous Decision-making and Intelligent Collaboration of UAV Swarms Based on Reinforcement Learning with Sparse Rewards

LI Chao1,2, WANG Ruixing1,*(), HUANG Jianzhong1, JIANG Feilong3, WEI Xuemei1, SUN Yanxin1   

  1. 1. Technology Center, Norinco Group Test and Measuring Academy, Xi’an 710116, Shaanxi, China
    2. School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, China
    3. School of Astronautics, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
  • Received:2022-03-21 Online:2023-06-30

摘要:

无人集群将深刻地塑造战争样式,为提升无人集群自主决策算法能力,对异构无人集群攻防对抗自主决策方法进行研究。对无人集群对抗模型设计进行总体概述,并对无人集群攻防对抗场景进行模型设计;针对无人集群自主决策采用强化学习技术广泛存在的稀疏奖励问题,提出基于局部回报重塑的奖励机制设定方法;在此基础上叠加优先经验回放,有效地改善稀疏奖励问题;通过程序仿真和演示系统设计,验证该方法的优越性。该方法的研究将加速基于强化学习技术的无人集群自主决策算法网络收敛过程,对无人集群自主决策算法研究具有重要意义。

关键词: 多智能体, 无人智能, 博弈对抗, 强化学习, 稀疏奖励

Abstract:

UAV swarms will profoundly shape the pattern of warfare. In order to improve the autonomous decision-making algorithm capability of UAV swarms, the autonomous decision-making method for heterogeneous UAV swarm attack-defense confrontation scenarios is studied. An overview of the design of the UAV swarm confrontation model and the model design of the UAV swarm attack-defense confrontation scenario are carried out. To solve the sparse reward problem which widely exists in the reinforcement learning technology in the autonomous decision-making of the UAV swarm, a reward mechanism setting method based on local reward reshaping is proposed. And then, the prioritized experience replay is superimposed, which effectively improves the sparse reward problem. Finally, the superiority of this method is verified by simulation and demonstration system design. This study will accelerate the network convergence process of the autonomous decision-making algorithm for UAV swarms based on reinforcement learning technology, which is of great significance to the research on autonomous decision-making algorithms of UAV swarms.

Key words: multiple agents, UAV intelligence, game confrontation, reinforcement learning, sparse reward