欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (5): 241146-.doi: 10.12382/bgxb.2024.1146

• • 上一篇    

基于强化学习冲突消解的大规模无人机集群航迹规划方法

周桢林1,2, 龙腾1,2,3,4, 刘大卫5, 孙景亮1,2,3,*(), 钟建鑫1,2, 李俊志1,2   

  1. 1 北京理工大学 宇航学院, 北京 100081
    2 飞行器动力学与控制教育部重点实验室, 北京 100081
    3 北京理工大学重庆创新中心, 重庆 401121
    4 陆空基信息感知与控制全国重点实验室, 北京 100081
    5 中国兵器科学研究院, 北京 100089
  • 收稿日期:2024-12-24 上线日期:2025-05-07
  • 通讯作者:
  • 基金资助:
    国家杰出青年科学基金项目(52425211); 北京理工大学青年教师学术启动计划项目(XSQD-202201005)

Path Planning Method for Large-scale UAV Swarms Based on Reinforcement Learning Conflict Resolution

ZHOU Zhenlin1,2, LONG Teng1,2,3,4, LIU Dawei5, SUN Jingliang1,2,3,*(), ZHONG Jianxin1,2, LI Junzhi1,2   

  1. 1 School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China
    2 Key Laboratory of Dynamics and Control of Flight Vehicle of Ministry of Education, Beijing 100081, China
    3 Beijing Institute of Technology Chongqing Innovation Center, Chongqing, 401121, China
    4 National Key Laboratory of Land and Air Based Information Perception and Control, Beijing 100081, China
    5 Research and Development Academy of Machinery Equipment, Beijing 100089, China
  • Received:2024-12-24 Online:2025-05-07

摘要:

面向大规模无人机集群协同作业场景,针对航迹冲突频繁导致集群航迹规划高耗时的问题,开展基于强化学习冲突消解的大规模无人机集群航迹规划方法研究。构建“顶层冲突消解-底层航迹规划”的双层规划架构,降低航迹冲突的时空维度。在顶层冲突消解层,设计基于Rainbow DQN (Deep Q-Networks algorithm)训练框架的冲突消解策略网络,将每个航迹冲突的消解过程转换为二叉树拓展左、右树节点的动作选择过程,实现不同冲突消解顺序与冲突消解结果的映射,减少树节点的遍历,提高冲突消解效率;在底层航迹规划层,将时间维度引入空间避碰策略,提出基于节点重扩展机制的跳点搜索法(Re-planning Jump Point Search, ReJPS),增加规划可行域,提升航迹冲突的消解能力。仿真结果表明:相比基于CBS (Conflict Based Search)+A*方法与CBS+ReJPS航迹规划方法,新方法在最优性相当的前提下,平均规划耗时分别降低了86.64%和19.65%。

关键词: 无人机集群, 航迹规划, 深度强化学习, 冲突搜索方法, 冲突消解

Abstract:

In the context of large-scale unmanned aerial vehicle (UAV) swarm cooperative flight scenarios, the high computational time consumption in swarm path planning is caused by frequent path conflicts. Aiming at the problem above,a large-scale UAV swarm path planning method based on reinforcement learning conflict resolution is developed. A dual-layer planning architecture, comprising a high-level layer of conflict resolution and a low-level layer of path planning, is constructed to reduce the spatial and temporal dimensions of path conflicts. At the high-level layer of conflict resolution, a conflict resolution strategy network based on the Rainbow deep Q-networks (DQN) algorithm training framework is designed. This network transforms the resolution process of each path conflict into the action selection process of left and right tree nodes of a binary tree. This approach maps different conflict resolution sequences to their outcomes, thereby reducing the traversal of tree nodes and improving the efficiency of conflict resolution. At the low-level layer of path planning, the time dimension is incorporated into the spatial collision avoidance strategy. A re-planning jump point search (ReJPS) method based on a node re-expansion mechanism is proposed, which increases the feasible planning domain and enhances the ability to resolve the path conflicts. Simulated results indicate that, compared to the path planning methods based on the conflict-based search (CBS)+A* and CBS+ReJPS, the proposed method reduces the average planning time by 86.64% and 19.65%, respectively, while maintaining comparable optimality.

Key words: unmanned aerial vehicle swarm, path planning, deep reinforcement learning, conflict-based search, conflict resolution

中图分类号: