基于Q学习的多无人机协同航迹规划方法

doi:10.12382/bgxb.2021.0606

摘要/Abstract

摘要：

针对多无人机同时到达目标的航迹规划问题,建立战场环境模型和单无人机航迹规划的马尔可夫决策模型,基于Q学习算法解算航程最短的最优航迹,应用基于Q学习算法得到的经验矩阵快速解算各无人机的最短航迹并计算协同航程,通过调整绕行无人机的动作选择策略,得到各无人机满足时间协同的航迹组。考虑多无人机的避碰问题,通过设计后退参数确定局部重规划区域,基于深度Q学习理论,采用神经网络替代Q_table对局部多无人机航迹进行重规划,避免维度爆炸问题。对于先前未探明的障碍物,参考人工势场法思想设计障碍物Q矩阵,将其叠加至原Q矩阵,实现无人机的避碰。仿真结果表明:所提基于Q学习的多无人机协同航迹规划算法能够得到时间协同与碰撞避免的协同航迹,并对环境建模时所未探明的障碍物进行躲避;与A^*算法相比,针对在线应用问题,新算法具有更高的求解效率。

关键词: 多无人机, 航迹规划, Q学习, 时间协同, 碰撞避免

Abstract:

To solve the path planning problem of multiple UAVs' synchronous arrival at the target, the battlefield environment model and the Markov decision process model of the path planning for a single UAV is established, and the optimal path is calculated based on the Q-learning algorithm. With this algorithm, the Q-table is obtained and used to calculate the shortest path of each UAV and the cooperative range. Then the time-coordinated paths is obtained by adjusting the action selection strategy of the circumventing UAVs. Considering the collision avoidance problem of multiple UAVs, the partical replanning area is determined by designing retreat parameters, and based on the deep reinforcement learning theory, the neural network is used to replace Q-table to re-plan the partical path for UAVs, which can avoid the problem of dimensional explosion. As for the previously unexplored obstacles, the obstacle matrix is designed based on the idea of the artificial potential field theory, which is then superimposed on the original Q-table to realize collision avoidance for the unexplored obstacle. The simulation results verify that with the proposed reinforcement learning path planning method, the coordinated paths with time coordination and collision avoidance can be obtained, and the previously unexplored obstacles in the simulation can be avoided as well. Compared with A^* algorithm, the proposed method can achieve higher efficiency for online application problems.

Key words: multiple UAVs, path planning, Q-learning, time coordination, collision avoidance

中图分类号:

V249.12

尹依伊, 王晓芳, 周健. 基于Q学习的多无人机协同航迹规划方法[J]. 兵工学报, 2023, 44(2): 484-495.

YIN Yiyi, WANG Xiaofang, ZHOU Jian. Q-Learning-based Multi-UAV Cooperative Path Planning Method[J]. Acta Armamentarii, 2023, 44(2): 484-495.

图/表 23

图1 战场模型示意图

Fig.1 Diagram of battlefield model

图2 动作空间示意图

Fig.2 Diagram of action space

图3 多无人机局部航迹重规划

Fig.3 Partial path replanning for multiple UAVs

图4 局部重规划的神经网络模型

Fig.4 Neural network model of partial path replanning

图5 未探明障碍物区域

Fig.5 Unexplored obstacle area

图6 叠加未探明障碍物后的战场模型

Fig.6 Battlefield model with unexplored obstacle

图7 仿真算例战场模型

Fig.7 A simulation example of battlefield model

图8 部分Q值变化曲线

Fig.8 Part of Q value variation curves

表1 无人机相关参数

Table 1 Parameters of UAVs

无人机编号	初始位置		期望位置
无人机编号	x	y	X	Y
1	2	1	15	12
2	1	18	15	12
3	16	1	15	12
4	19	20	15	12

图9 满足时间协同的协同航迹1

Fig.9 Time-coordinated path 1

图10 满足时间协同的协同航迹2

Fig.10 Time-coordinated path 2

图11 基于A*算法的协同航迹

Fig.11 Time-coordinated paths based on A* algorithm

表2 Q学习算法与A*算法性能对比

Table 2 Performance comparison between the Q-learning and A* algorithms

次数	Q学习算法时间/s	A^*算法时间/s
1	0.131	0.278
2	0.135	0.196
3	0.122	0.209
4	0.158	0.194
4	0.128	0.202

图12 局部重规划区域

Fig.12 Partial path replanning area

表3 局部区域无人机3、4的初始位置、期望位置

Table 3 Starting position and expected position of UAV 3 and 4 in the partial area

无人机编号	初始位置		期望位置
无人机编号	x	y	X	Y
1	1	1	6	1
2	1	3	5	2

图13 平均回报函数曲线

Fig.13 Average reward curve

图14 局部重规划示意图

Fig.14 Diagram of partial path replanning

图15 协同航迹图

Fig.15 Diagram of cooperative path

图16 存在未探明障碍物战场的仿真算例

Fig.16 Battlefield model with unexplored obstacle for simulation

表4 叠加前局部经验矩阵取值

Table 4 Partial Q-table before superposition

Q-table	上	下	左	右
state₁	0	1.3118	0	1.3118
…	…	…	…	…
State₄₈	6.2560	9.7735	6.2550	9.7734
…	…	…	…	…
state₆₉	9.7734	15.270	9.7734	15.270
state₇₀	0	19.088	12.216	19.088

表5 叠加后局部经验矩阵取值

Table 5 Partial Q-table after superposition

Q-table	上	下	左	右
state₁	0	1.3118	0	1.3118
…	…	…	…	…
State₄₈	6.2550	9.1334	5.6150	9.3638
…	…	…	…	…
state₆₉	9.3638	14.631	9.1334	15.271
state₇₀	0	19.088	12.216	19.088

图17 存在未探明障碍物的协同航迹1

Fig.17 Cooperative path 1 with unexplored obstacle

图18 存在未探明障碍物的协同航迹2

Fig.18 Cooperative path 2 with unexplored obstacle

参考文献 20

[1]	陈中原, 韦文书, 陈万春. 基于强化学习的多弹协同攻击智能制导律[J]. 兵工学报, 2021, 42(8):1638-1647.
	CHEN Z Y, WEI W S, CHEN W C. Reinforcement learning-based intelligent guidance law for cooperative attack of multiple missiles[J]. Acta Armamentarii, 2021, 42(8):1638-1647. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.08.008
[2]	LUO W, TANG Q, FU C H, et al. Deep-sarsa based multi-UAV path planning and obstacle avoidance in a dynamic environment[C]// Proceedings of International Conference on Swarm Intelligence. Cham, Germany:Springer, 2018:102-111.
[3]	CHEN X, AI Y D. Multi-UAV Path planning based on improved neural network[C]// Proceedings of Chinese Control and Decision Conference.Shenyang, China:IEEE, 2018:354-359.
[4]	LIU X J, GU Q, YANG C L. Path planning of multi-cruise missile based on particle swarm optimization[C]// Proceedings of 2019 International Conference on Sensing,Diagnostics,Prognostics,and Control. Beijing, China: IEEE, 2019:910-912.
[5]	杜云, 彭瑜, 邵士凯, 等. 基于改进粒子群优化的多无人机协同航迹规划[J]. 科学技术与工程, 2020, 20(32):13258-13264.
	DU Y, PENG Y, SHAO S K, et al. Cooperative path planning of multi-unmanned aerial vehicle based on improved particle swarm optimization[J]. Science Technology and Engineering, 2020, 20(32):13258-13264. (in Chinese)
[6]	王洪斌, 郝策, 张平, 等. 基于A^*算法和人工势场法的移动机器人路径规划[J]. 中国机械工程, 2019 30(20):2489-2496.
	WANG H B, HAO C, ZHANG P, et al. Path planning of mobile robots based on A^* algorithm and artificial potential field algorithm[J]. China Mechanical Engineering, 2019, 30(20): 2489-2496. (in Chinese)
[7]	CAUSA F, FASANO G, GRASSI M. Multi-UAV path planning for autonomous missions in mixed GNSS coverage scenarios[J]. Sensors, 2018, 18(12):4188. doi: 10.3390/s18124188 URL
[8]	LAI D C, DAI J Y. Research on multi-UAV path planning and obstacle avoidance based on improved artificial potential field method[C]// Proceedings of the 2020 3rd International Conference on Mechatronics,Robotics and Automation.Shanghai,China:IEEE, 2020:84-88.
[9]	杜楠楠, 陈建, 马奔, 等. 多太阳能无人机覆盖路径优化方法[J]. 航空学报, 2021, 42(6):324476. doi: 10.7527/S1000-6893.2020.24476
	DU N N, CHEN J, MA B, et al. Optimization method for coverage path planning of multi-solar powered UAVs[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(6):324476. (in Chinese) doi: 10.7527/S1000-6893.2020.24476
[10]	WANG L S, ZHANG X L, DENG P Y, et al. An energy-balanced path planning algorithm for multiple ferrying UAVs based on GA[J]. International Journal of Aerospace Engineering, 2020(20):1-15.
[11]	胡腾, 刘占军, 刘洋, 等. 多无人机3D侦察路径规划[J]. 系统工程与电子技术, 2019, 41(7):1551-1559.
	HU T, LIU Z J, LIU Y, et al. 3D surveillance path planning for multi-UAVs[J]. Systems Engineering and Electronics, 2019, 41(7):1551-1559. (in Chinese)
[12]	YANG L, WANG P, ZHANG Y. Coordinated path planning for UAVs based on sheep optimization[J]. Transactions of Nanjing University of Aeronautics and Astronautics, 2020, 37(5):816-830.
[13]	BELKADI A, ABAUNZA H, CIARLETTA L, et al. Design and implementation of distributed path planning algorithm for a fleet of UAVs[J]. IEEE Transactions on Aerospace and Electronic Systems, 2019, 55(6):2647-2657. doi: 10.1109/TAES.7 URL
[14]	乔林, 罗杰. 学习过程中共享经验的Q学习算法的研究[J]. 计算机科学, 2012, 39(5):213-216.
	QIAO L, LUO J. Research on Q learning algorithm with sharing experience in learning process[J]. Computer Science, 2012, 39(5): 213-216. (in Chinese)
[15]	BAYERLEIN H, THEILE M, CACCAMO M et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2:1171-1187. doi: 10.1109/OJCOMS.2021.3081996 URL
[16]	王毅然, 经小川, 田涛, 等. 基于强化学习的多Agent路径规划方法研究[J]. 计算机应用与软件, 2019, 36(8): 165-171.
	WANG Y R, JING X C, TIAN T, et al. Multi-agent path planning based on reinforcement learning[J]. Computer Applications and Software, 2019, 36(8):165-171. (in Chinese)
[17]	HAN Q, SHI D X, SHEN T L, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE Access, 2019, 7:146264-146272. doi: 10.1109/Access.6287639 URL
[18]	YANG Y, LI J T, PENG L L. Multirobot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183. doi: 10.1049/cit2.v5.3 URL
[19]	相晓嘉, 闫超, 王菖, 等. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4): 524009. doi: 10.7527/S1000-6893.2020.24009
	XIANG X J, YAN C, WANG C, et al. Coordination control method for fixed-wing UAV formation through deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4):524009. (in Chinese) doi: 10.7527/S1000-6893.2020.24009
[20]	姚冬冬, 王晓芳, 田震. 一种同时满足攻击角度和时间的航迹规划方法[J]. 弹箭与制导学报, 2019, 39(3):111-114.
	YAO D D, WANG X F, TIAN Z. A cooperative path planning with constraints of impact time and impact angle[J]. Journal of Projectiles,Rockets,Missiles and Guidance, 2019, 39(3):111-114. (in Chinese)