欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2023, Vol. 44 ›› Issue (S2): 126-134.doi: 10.12382/bgxb.2023.0877

所属专题: 群体协同与自主技术

• • 上一篇    下一篇

基于强化学习的无人机集群对抗策略推演仿真

曹子建1, 孙泽龙2, 闫国闯3, 傅妍芳1,*(), 杨博1, 李秦洁1, 雷凯麟1, 高领航1   

  1. 1 西安工业大学 计算机科学与工程学院, 陕西 西安 710021
    2 西安工业大学 兵器科学与技术学院, 陕西 西安 710021
    3 中国兵器工业试验测试研究院, 陕西 华阴 714200

Simulation of Reinforcement Learning-based UAV Swarm Adversarial Strategy Deduction

CAO Zijian1, SUN Zelong2, YAN Guochuang3, FU Yanfang1,*(), YANG Bo1, LI Qinjie1, LEI Kailin1, GAO Linghang1   

  1. 1 School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, Shaanxi, China
    2 School of Armament Science and Technology, Xi’an Technological University, Xi’an 710021, Shaanxi, China
    3 Test and Measuring Academy of Norinco Group, Huayin 714200, Shaanxi, China
  • Received:2023-09-05 Online:2024-01-10

摘要:

无人机集群在军事战争、公共安全和商业领域的应用越来越广泛,但在复杂多变的对抗环境下,制定高效的策略仍然是一个挑战。为使无人机集群能够自主学习和适应对抗环境的变化,提高任务执行的效率和成功率,提出一种基于值分解的多智能体强化学习算法框架,在仿真平台模拟不同对抗场景下的无人机集群行为,通过强化学习算法,培养无人机集群在不同情境下做出决策的能力,以实现任务目标的最优化。讨论不同强化学习算法在无人机集群对抗策略中的应用和性能比较。实验结果表明,该算法在多种集群对抗环境下均表现出良好的效果,展现出其在军事无人机集群对抗中的有力支持。

关键词: 无人机集群, 对抗策略, 强化学习, 值分解

Abstract:

The application of drone clusters in military warfare, public safety, and commercial fields is becoming increasingly widespread. But it is a challenge to develop the efficient strategiesin complex and ever-changing adversarial environments. In order to enable the drone clusters to autonomously learn and adapt to the change in adversarial environment, and improve the efficiency and success rate of task execution, a multi-agent reinforcement learning algorithm framework based on value decomposition is proposed. The behavior of drone clusters in different adversarial scenarios is simulated on a simulation platform, and the ability of drone clusters to make decisions in different situations is cultivated to achieve the optimal task objectives through reinforcement learning algorithms. The application and performance comparison of different reinforcement learning algorithms in drone swarm adversarial strategies are discussed. The experimental results show that the proposed algorithm shows good performance in various cluster confrontation environments, demonstrating its strong support in military drone cluster confrontation.

Key words: drone swarm, adversarial strategy, deep reinforcement learning, value decomposition

中图分类号: