基于强化学习冲突消解的大规模无人机集群航迹规划方法

doi:10.12382/bgxb.2024.1146

摘要/Abstract

摘要：

面向大规模无人机集群协同作业场景,针对航迹冲突频繁导致集群航迹规划高耗时的问题,开展基于强化学习冲突消解的大规模无人机集群航迹规划方法研究。构建“顶层冲突消解-底层航迹规划”的双层规划架构,降低航迹冲突的时空维度。在顶层冲突消解层,设计基于Rainbow DQN (Deep Q-Networks algorithm)训练框架的冲突消解策略网络,将每个航迹冲突的消解过程转换为二叉树拓展左、右树节点的动作选择过程,实现不同冲突消解顺序与冲突消解结果的映射,减少树节点的遍历,提高冲突消解效率;在底层航迹规划层,将时间维度引入空间避碰策略,提出基于节点重扩展机制的跳点搜索法(Re-planning Jump Point Search, ReJPS),增加规划可行域,提升航迹冲突的消解能力。仿真结果表明:相比基于CBS (Conflict Based Search)+A^*方法与CBS+ReJPS航迹规划方法,新方法在最优性相当的前提下,平均规划耗时分别降低了86.64%和19.65%。

关键词: 无人机集群, 航迹规划, 深度强化学习, 冲突搜索方法, 冲突消解

Abstract:

In the context of large-scale unmanned aerial vehicle (UAV) swarm cooperative flight scenarios, the high computational time consumption in swarm path planning is caused by frequent path conflicts. Aiming at the problem above,a large-scale UAV swarm path planning method based on reinforcement learning conflict resolution is developed. A dual-layer planning architecture, comprising a high-level layer of conflict resolution and a low-level layer of path planning, is constructed to reduce the spatial and temporal dimensions of path conflicts. At the high-level layer of conflict resolution, a conflict resolution strategy network based on the Rainbow deep Q-networks (DQN) algorithm training framework is designed. This network transforms the resolution process of each path conflict into the action selection process of left and right tree nodes of a binary tree. This approach maps different conflict resolution sequences to their outcomes, thereby reducing the traversal of tree nodes and improving the efficiency of conflict resolution. At the low-level layer of path planning, the time dimension is incorporated into the spatial collision avoidance strategy. A re-planning jump point search (ReJPS) method based on a node re-expansion mechanism is proposed, which increases the feasible planning domain and enhances the ability to resolve the path conflicts. Simulated results indicate that, compared to the path planning methods based on the conflict-based search (CBS)+A^* and CBS+ReJPS, the proposed method reduces the average planning time by 86.64% and 19.65%, respectively, while maintaining comparable optimality.

Key words: unmanned aerial vehicle swarm, path planning, deep reinforcement learning, conflict-based search, conflict resolution

中图分类号:

V249

周桢林, 龙腾, 刘大卫, 孙景亮, 钟建鑫, 李俊志. 基于强化学习冲突消解的大规模无人机集群航迹规划方法[J]. 兵工学报, 2025, 46(5): 241146-.

ZHOU Zhenlin, LONG Teng, LIU Dawei, SUN Jingliang, ZHONG Jianxin, LI Junzhi. Path Planning Method for Large-scale UAV Swarms Based on Reinforcement Learning Conflict Resolution[J]. Acta Armamentarii, 2025, 46(5): 241146-.

图/表 17

图1 栅格地图

Fig.1 Grid-based map

图2 基于冲突搜索的算法的逻辑示意图

Fig.2 Logical diagram of hierarchical search structure

图3 无人机集群状态空间

Fig.3 State space of UAV swarms

图4 网络设计

Fig.4 Design of network

图5 经典JPS算法对于多机规划的不足

Fig.5 Limitations of the classic JPS algorithm for UAV swarm path planning

图6 节点重扩展方案示意图

Fig.6 Schematic diagram of node re-expansion plan

表1 算法伪代码

Table 1 Pseudocode of algorithm

算法:基于节点重扩展机制的跳点搜索法
Input: $g x y p r o = 2$ , $g x y p r o = 3$ ,C_now,Con, Open, Close
Output: ξ_i={ $p i 1$ , $p i 2$ ,…, $p i k$ },k∈Z⁺
1 Get positiong $g x y p r o = 2$ , $g x y p r o = 3$ ;
2 Initialize C_now,Con, Open, Close;
3 While Open≠Ø do
4 Find g_xy=min(Open)
5 Do prune rules:Flag=1
6 while g_xy expands no jump point‖ jump point in Con
7 If g_xy reach boundary
8 Flag=0;
9 Break;
10 else
11 g_xy=diagonal search g_xy
12 End if
13 End while
14 If Flag
15 g_xy from Open to Close, jump point in Open
16 End if
17 If g_xy == $g x y p r o = 3$
18 Return ξ_i={ $p i 1$ , $p i 2$ ,…, $p i k$ },k∈Z⁺
19 End if
20 End while

表1 算法伪代码

Table 1 Pseudocode of algorithm

算法:基于节点重扩展机制的跳点搜索法
Input: $g x y p r o = 2$ , $g x y p r o = 3$ ,C_now,Con, Open, Close
Output: ξ_i={ $p i 1$ , $p i 2$ ,…, $p i k$ },k∈Z⁺
1 Get positiong $g x y p r o = 2$ , $g x y p r o = 3$ ;
2 Initialize C_now,Con, Open, Close;
3 While Open≠Ø do
4 Find g_xy=min(Open)
5 Do prune rules:Flag=1
6 while g_xy expands no jump point‖ jump point in Con
7 If g_xy reach boundary
8 Flag=0;
9 Break;
10 else
11 g_xy=diagonal search g_xy
12 End if
13 End while
14 If Flag
15 g_xy from Open to Close, jump point in Open
16 End if
17 If g_xy == $g x y p r o = 3$
18 Return ξ_i={ $p i 1$ , $p i 2$ ,…, $p i k$ },k∈Z⁺
19 End if
20 End while

图7 ReJPS算法规避航迹冲突示意图

Fig.7 Illustration of ReJPS algorithm for avoiding path conflicts

表2 航迹冲突检测

Table 2 Conflict detection

航迹冲突坐标	无人机到达时间
(8, 7)	u₁				u₂
	4.24				5.83
(9, 12)	u₄				u₅
	2.83				6.24
(7, 10)	u₃				u₅
	4.83				9.66
(5, 6)	u₁		u₂				u₅
	7.65		9.24				14.49
(3, 7)	u₁	u₂		u₃		u₄		u₅
	10.07	12.66		11.49		14.66		16.90

图8 JPS算法规避航迹冲突示意图

Fig.8 Illustration of JPS algorithm for avoiding path conflicts

图9 规划可行解空间

Fig.9 Feasible solution space of path planning

图10 不同规模的无人机集群初始航迹冲突数量与代价

Fig.10 Initial number and cost of path conflict for UAV swarms with different scales

图11 不同规模的无人机集群航迹冲突消解过程

Fig.11 Deconfliction of UAV swarms with different scales

图12 不同规模的无人机集群航迹冲突消解时间

Fig.12 Deconfliction time of UAV swarms with different scales

表3 蒙特卡洛仿真对比方法

Table 3 Different methods in Monte Carlo simulation

对比方法	顶层规划方法	底层规划方法
1	CBS	A^*
2	CBS	MSA^*
3	CBS	ReJPS
4	DRLCBS	ReJPS

图13 航程对比

Fig.13 Comparison of flight times

图14 规划耗时对比

Fig.14 Comparison of planning times

参考文献 26

[1]	GHOMMAM J, SAAD M, WRIGHT S, et al. Relay manoeuvre based fixed-time synchronized tracking control for UAV transport system[J]. Aerospace Science and Technology, 2020, 103:105887.
[2]	SHAHI T B, XU C Y, NEUPANE A, et al. Machine learning methods for precision agriculture with UAV imagery: a review[J]. Electronic Research Archive, 2022, 30(12):4277-4317.
[3]	KHAN A, GUPTA S, GUPTA S K. Emerging UAV technology for disaster detection, mitigation, response, and preparedness[J]. Journal of Field Robotics, 2022, 39(6):905-955.
[4]	陈亚萍, 王楠, 洪华杰, 等. 面向多无人平台区域监视任务的信息素正向激励栅格方法[J]. 兵工学报, 2023, 44(9):2859-2870. doi: 10.12382/bgxb.2022.0537
	CHEN Y P, WANG N, HONG H J, et al. Pheromone positive incentive grid method for multi-unmanned platform regional surveillance task[J]. Acta Armamentarii, 2023, 44(9):2859-2870. (in Chinese) doi: 10.12382/bgxb.2022.0537
[5]	CHANG G N, FU W X, ZHAO J M, et al. Overview of research on intelligent swarm munitions[J/OL]. Defence Technology, 2024, DOI: https://doi.org/10.1016/j.dt.2024.08.017.
[6]	李军, 陈士超. 无人机蜂群关键技术发展综述[J]. 兵工学报, 2023, 44(9):2533-2545. doi: 10.12382/bgxb.2023.0514
	LI J, CHEN S C. Overview of key technology and its development of drone swarm[J]. Acta Armamentarii, 2023, 44(9):2533-2545. (in Chinese) doi: 10.12382/bgxb.2023.0514
[7]	赵军民, 何浩哲, 王少奇, 等. 复杂环境下多无人机目标跟踪与避障联合航迹规划[J]. 兵工学报, 2023, 44(9):2685-2696. doi: 10.12382/bgxb.2022.0525
	ZHAO J M, HE H Z, WANG S Q, et al. Joint trajectory planning for multiple UAVs target tracking and obstacle avoidance in a complicated environment[J]. Acta Armamentarii, 2023, 44(9):2685-2696. (in Chinese) doi: 10.12382/bgxb.2022.0525
[8]	于连波, 曹品钊, 石亮, 等. 基于改进冲突搜索的多智能体路径规划算法[J]. 航空学报, 2023, 44(增刊1):727648.
	YU L B, CAO P Z, SHI L, et al. An improved conflict-based search algorithm for multi-agent path planning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1):727648. (in Chinese)
[9]	HONG Y K, KIM Y D. Two-stage multicriteria decision-making framework for aircraft conflict resolution[J]. Journal of Aerospace Information Systems, 2023, 20(10):596-604.
[10]	LIU Z X, CAI K Q, XIE J D, et al. A network-based conflict resolution approach for unmanned aerial vehicle operations in dense nonsegregated airspace[J]. IEEE Intelligent Transportation Systems Magazine, 2022, 14(3):212-232.
[11]	VAN DEN BERG J, SNOEYINK J, LIN M C, et al. Centralized path planning for multiple robots: optimal decoupling into sequential plans[C]// Proceedings of Robotics: Science and Systems. Seattle, WA, US: MIT Press, 2009.
[12]	YANG J, YIN D, NIU Y F, et al. Distributed cooperative onboard planning for the conflict resolution of unmanned aerial vehicles[J]. Journal of Guidance, Control, and Dynamics, 2019, 42(2):272-283.
[13]	徐广通, 王祝, 曹严, 等. 动态优先级解耦的无人机集群轨迹分布式序列凸规划[J]. 航空学报, 2022, 43(2):325059. doi: 10.7527/S1000-6893.2021.25059
	XU G T, WANG Z, CAO Y, et al. Dynamic-priority-decoupled UAV swarm trajectory planning using distributed sequential convex programming[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(2):325059. (in Chinese) doi: 10.7527/S1000-6893.2021.25059
[14]	REN Z Q, RATHINAM S, CHOSET H. CBSS:a new approach for multiagent combinatorial path finding[J]. IEEE Transactions on Robotics, 2023, 39(4):2669-2683.
[15]	SHARON G, STERN R, FELNER A, et al. Conflict-based search for optimal multi-agent pathfinding[J]. Artificial Intelligence, 2015, 219:40-66.
[16]	王子晗, 童向荣. 基于冲突搜索的多智能体路径规划研究进展[J]. 计算机科学, 2023, 50(6):358-368. doi: 10.11896/jsjkx.220800151
	WANG Z H, TONG X R. Research progress of multi-agent path finding based on conflict-based search algorithms[J]. Computer Science, 2023, 50(6):358-368. (in Chinese) doi: 10.11896/jsjkx.220800151
[17]	REN Z Q, RATHINAM S, CHOSET H. A conflict-based search framework for multiobjective multiagent path finding[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20(2):1262-1274.
[18]	BOYARSKI E, FELNER A, STERN R, et al. ICBS:the improved conflict-based search algorithm for multi-agent pathfinding: extended abstract[C]// Proceedings of the 8th Annual International Symposium on Combinatorial Search. Ein Gedi, the Dead Sea, Israel: Israel Science Foundation, 2015:223-225.
[19]	SHARON G, STERN R, FELNER A, et al. Meta-agent conflict-based search for optimal multi-agent path finding[C]// Proceedings of the 5th Annual Symposium on Combinatorial Search. Niagara Falls, Ontario, Canada: Association for the Advancement of Artificial Intelligence, 2012:97-104.
[20]	HUANG T A, KOENIG S, DILKINA B. Learning to resolve conflicts for multi-agent path finding with conflict-based search[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Reston, VA, US:AAAI, 2021:11246-11253.
[21]	周熙栋, 张辉, 陈波. 非结构化场景下基于改进JPS算法的移动机器人路径规划[J]. 控制与决策, 2024, 39(2):474-482.
	ZHOU X D, ZHANG H, CHEN B. Mobile robot path planning based on improved JPS algorithm in unstructured scenarios[J]. Control and Decision, 2024, 39(2):474-482. (in Chinese)
[22]	ZHANG J C, AN Y Q, CAO J N, et al. UAV trajectory planning for complex open storage environments based on an improved RRT algorithm[J]. IEEE Access, 2023, 11:23189-23204.
[23]	MAO T X, DENG H. Path planning of slender tensegrities based on the artificial potential field method[J]. AIAA Journal, 2023, 61(5):2255-2265.
[24]	HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans, LA, US:AAAI, 2018:3215-3222.
[25]	JI S W, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231. pmid: 22392705
[26]	XIANG J, CHEN J, LIU Y C. Hybrid multiscale search for dynamic planning of multi-agent drone traffic[J]. Journal of Guidance, Control, and Dynamics, 2023, 46(10):1963-1974.