Q-Learning-based Multi-UAV Cooperative Path Planning Method

doi:10.12382/bgxb.2021.0606

Abstract

Abstract:

To solve the path planning problem of multiple UAVs' synchronous arrival at the target, the battlefield environment model and the Markov decision process model of the path planning for a single UAV is established, and the optimal path is calculated based on the Q-learning algorithm. With this algorithm, the Q-table is obtained and used to calculate the shortest path of each UAV and the cooperative range. Then the time-coordinated paths is obtained by adjusting the action selection strategy of the circumventing UAVs. Considering the collision avoidance problem of multiple UAVs, the partical replanning area is determined by designing retreat parameters, and based on the deep reinforcement learning theory, the neural network is used to replace Q-table to re-plan the partical path for UAVs, which can avoid the problem of dimensional explosion. As for the previously unexplored obstacles, the obstacle matrix is designed based on the idea of the artificial potential field theory, which is then superimposed on the original Q-table to realize collision avoidance for the unexplored obstacle. The simulation results verify that with the proposed reinforcement learning path planning method, the coordinated paths with time coordination and collision avoidance can be obtained, and the previously unexplored obstacles in the simulation can be avoided as well. Compared with A^* algorithm, the proposed method can achieve higher efficiency for online application problems.

Key words: multiple UAVs, path planning, Q-learning, time coordination, collision avoidance

CLC Number:

V249.12

YIN Yiyi, WANG Xiaofang, ZHOU Jian. Q-Learning-based Multi-UAV Cooperative Path Planning Method[J]. Acta Armamentarii, 2023, 44(2): 484-495.

Figures/Tables 23

Fig.1 Diagram of battlefield model

Fig.2 Diagram of action space

Fig.3 Partial path replanning for multiple UAVs

Fig.4 Neural network model of partial path replanning

Fig.5 Unexplored obstacle area

Fig.6 Battlefield model with unexplored obstacle

Fig.7 A simulation example of battlefield model

Fig.8 Part of Q value variation curves

Table 1 Parameters of UAVs

无人机编号	初始位置		期望位置
无人机编号	x	y	X	Y
1	2	1	15	12
2	1	18	15	12
3	16	1	15	12
4	19	20	15	12

Fig.9 Time-coordinated path 1

Fig.10 Time-coordinated path 2

Fig.11 Time-coordinated paths based on A* algorithm

Table 2 Performance comparison between the Q-learning and A* algorithms

次数	Q学习算法时间/s	A^*算法时间/s
1	0.131	0.278
2	0.135	0.196
3	0.122	0.209
4	0.158	0.194
4	0.128	0.202

Fig.12 Partial path replanning area

Table 3 Starting position and expected position of UAV 3 and 4 in the partial area

无人机编号	初始位置		期望位置
无人机编号	x	y	X	Y
1	1	1	6	1
2	1	3	5	2

Fig.13 Average reward curve

Fig.14 Diagram of partial path replanning

Fig.15 Diagram of cooperative path

Fig.16 Battlefield model with unexplored obstacle for simulation

Table 4 Partial Q-table before superposition

Q-table	上	下	左	右
state₁	0	1.3118	0	1.3118
…	…	…	…	…
State₄₈	6.2560	9.7735	6.2550	9.7734
…	…	…	…	…
state₆₉	9.7734	15.270	9.7734	15.270
state₇₀	0	19.088	12.216	19.088

Table 5 Partial Q-table after superposition

Q-table	上	下	左	右
state₁	0	1.3118	0	1.3118
…	…	…	…	…
State₄₈	6.2550	9.1334	5.6150	9.3638
…	…	…	…	…
state₆₉	9.3638	14.631	9.1334	15.271
state₇₀	0	19.088	12.216	19.088

Fig.17 Cooperative path 1 with unexplored obstacle

Fig.18 Cooperative path 2 with unexplored obstacle

References 20

[1]	陈中原, 韦文书, 陈万春. 基于强化学习的多弹协同攻击智能制导律[J]. 兵工学报, 2021, 42(8):1638-1647.
	CHEN Z Y, WEI W S, CHEN W C. Reinforcement learning-based intelligent guidance law for cooperative attack of multiple missiles[J]. Acta Armamentarii, 2021, 42(8):1638-1647. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.08.008
[2]	LUO W, TANG Q, FU C H, et al. Deep-sarsa based multi-UAV path planning and obstacle avoidance in a dynamic environment[C]// Proceedings of International Conference on Swarm Intelligence. Cham, Germany:Springer, 2018:102-111.
[3]	CHEN X, AI Y D. Multi-UAV Path planning based on improved neural network[C]// Proceedings of Chinese Control and Decision Conference.Shenyang, China:IEEE, 2018:354-359.
[4]	LIU X J, GU Q, YANG C L. Path planning of multi-cruise missile based on particle swarm optimization[C]// Proceedings of 2019 International Conference on Sensing,Diagnostics,Prognostics,and Control. Beijing, China: IEEE, 2019:910-912.
[5]	杜云, 彭瑜, 邵士凯, 等. 基于改进粒子群优化的多无人机协同航迹规划[J]. 科学技术与工程, 2020, 20(32):13258-13264.
	DU Y, PENG Y, SHAO S K, et al. Cooperative path planning of multi-unmanned aerial vehicle based on improved particle swarm optimization[J]. Science Technology and Engineering, 2020, 20(32):13258-13264. (in Chinese)
[6]	王洪斌, 郝策, 张平, 等. 基于A^*算法和人工势场法的移动机器人路径规划[J]. 中国机械工程, 2019 30(20):2489-2496.
	WANG H B, HAO C, ZHANG P, et al. Path planning of mobile robots based on A^* algorithm and artificial potential field algorithm[J]. China Mechanical Engineering, 2019, 30(20): 2489-2496. (in Chinese)
[7]	CAUSA F, FASANO G, GRASSI M. Multi-UAV path planning for autonomous missions in mixed GNSS coverage scenarios[J]. Sensors, 2018, 18(12):4188. doi: 10.3390/s18124188 URL
[8]	LAI D C, DAI J Y. Research on multi-UAV path planning and obstacle avoidance based on improved artificial potential field method[C]// Proceedings of the 2020 3rd International Conference on Mechatronics,Robotics and Automation.Shanghai,China:IEEE, 2020:84-88.
[9]	杜楠楠, 陈建, 马奔, 等. 多太阳能无人机覆盖路径优化方法[J]. 航空学报, 2021, 42(6):324476. doi: 10.7527/S1000-6893.2020.24476
	DU N N, CHEN J, MA B, et al. Optimization method for coverage path planning of multi-solar powered UAVs[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(6):324476. (in Chinese) doi: 10.7527/S1000-6893.2020.24476
[10]	WANG L S, ZHANG X L, DENG P Y, et al. An energy-balanced path planning algorithm for multiple ferrying UAVs based on GA[J]. International Journal of Aerospace Engineering, 2020(20):1-15.
[11]	胡腾, 刘占军, 刘洋, 等. 多无人机3D侦察路径规划[J]. 系统工程与电子技术, 2019, 41(7):1551-1559.
	HU T, LIU Z J, LIU Y, et al. 3D surveillance path planning for multi-UAVs[J]. Systems Engineering and Electronics, 2019, 41(7):1551-1559. (in Chinese)
[12]	YANG L, WANG P, ZHANG Y. Coordinated path planning for UAVs based on sheep optimization[J]. Transactions of Nanjing University of Aeronautics and Astronautics, 2020, 37(5):816-830.
[13]	BELKADI A, ABAUNZA H, CIARLETTA L, et al. Design and implementation of distributed path planning algorithm for a fleet of UAVs[J]. IEEE Transactions on Aerospace and Electronic Systems, 2019, 55(6):2647-2657. doi: 10.1109/TAES.7 URL
[14]	乔林, 罗杰. 学习过程中共享经验的Q学习算法的研究[J]. 计算机科学, 2012, 39(5):213-216.
	QIAO L, LUO J. Research on Q learning algorithm with sharing experience in learning process[J]. Computer Science, 2012, 39(5): 213-216. (in Chinese)
[15]	BAYERLEIN H, THEILE M, CACCAMO M et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2:1171-1187. doi: 10.1109/OJCOMS.2021.3081996 URL
[16]	王毅然, 经小川, 田涛, 等. 基于强化学习的多Agent路径规划方法研究[J]. 计算机应用与软件, 2019, 36(8): 165-171.
	WANG Y R, JING X C, TIAN T, et al. Multi-agent path planning based on reinforcement learning[J]. Computer Applications and Software, 2019, 36(8):165-171. (in Chinese)
[17]	HAN Q, SHI D X, SHEN T L, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE Access, 2019, 7:146264-146272. doi: 10.1109/Access.6287639 URL
[18]	YANG Y, LI J T, PENG L L. Multirobot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183. doi: 10.1049/cit2.v5.3 URL
[19]	相晓嘉, 闫超, 王菖, 等. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4): 524009. doi: 10.7527/S1000-6893.2020.24009
	XIANG X J, YAN C, WANG C, et al. Coordination control method for fixed-wing UAV formation through deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4):524009. (in Chinese) doi: 10.7527/S1000-6893.2020.24009
[20]	姚冬冬, 王晓芳, 田震. 一种同时满足攻击角度和时间的航迹规划方法[J]. 弹箭与制导学报, 2019, 39(3):111-114.
	YAO D D, WANG X F, TIAN Z. A cooperative path planning with constraints of impact time and impact angle[J]. Journal of Projectiles,Rockets,Missiles and Guidance, 2019, 39(3):111-114. (in Chinese)