欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2023, Vol. 44 ›› Issue (2): 484-495.doi: 10.12382/bgxb.2021.0606

• • 上一篇    下一篇

基于Q学习的多无人机协同航迹规划方法

尹依伊1,2, 王晓芳1,*(), 周健3   

  1. 1 北京理工大学 宇航学院, 北京 100081
    2 北京电子工程总体研究所, 北京 100854
    3 西安现代控制技术研究所, 陕西 西安 710065
  • 收稿日期:2021-09-05 上线日期:2022-06-10
  • 通讯作者:

Q-Learning-based Multi-UAV Cooperative Path Planning Method

YIN Yiyi1,2, WANG Xiaofang1,*(), ZHOU Jian3   

  1. 1 School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China
    2 Beijing Institute of Electronic System Engineering, Beijing 100854, China
    3 Xi'an Modern Control Technology Research Institute, Xi'an 710065, Shaanxi, China
  • Received:2021-09-05 Online:2022-06-10

摘要:

针对多无人机同时到达目标的航迹规划问题,建立战场环境模型和单无人机航迹规划的马尔可夫决策模型,基于Q学习算法解算航程最短的最优航迹,应用基于Q学习算法得到的经验矩阵快速解算各无人机的最短航迹并计算协同航程,通过调整绕行无人机的动作选择策略,得到各无人机满足时间协同的航迹组。考虑多无人机的避碰问题,通过设计后退参数确定局部重规划区域,基于深度Q学习理论,采用神经网络替代Qtable对局部多无人机航迹进行重规划,避免维度爆炸问题。对于先前未探明的障碍物,参考人工势场法思想设计障碍物Q矩阵,将其叠加至原Q矩阵,实现无人机的避碰。仿真结果表明:所提基于Q学习的多无人机协同航迹规划算法能够得到时间协同与碰撞避免的协同航迹,并对环境建模时所未探明的障碍物进行躲避;与A*算法相比,针对在线应用问题,新算法具有更高的求解效率。

关键词: 多无人机, 航迹规划, Q学习, 时间协同, 碰撞避免

Abstract:

To solve the path planning problem of multiple UAVs' synchronous arrival at the target, the battlefield environment model and the Markov decision process model of the path planning for a single UAV is established, and the optimal path is calculated based on the Q-learning algorithm. With this algorithm, the Q-table is obtained and used to calculate the shortest path of each UAV and the cooperative range. Then the time-coordinated paths is obtained by adjusting the action selection strategy of the circumventing UAVs. Considering the collision avoidance problem of multiple UAVs, the partical replanning area is determined by designing retreat parameters, and based on the deep reinforcement learning theory, the neural network is used to replace Q-table to re-plan the partical path for UAVs, which can avoid the problem of dimensional explosion. As for the previously unexplored obstacles, the obstacle matrix is designed based on the idea of the artificial potential field theory, which is then superimposed on the original Q-table to realize collision avoidance for the unexplored obstacle. The simulation results verify that with the proposed reinforcement learning path planning method, the coordinated paths with time coordination and collision avoidance can be obtained, and the previously unexplored obstacles in the simulation can be avoided as well. Compared with A* algorithm, the proposed method can achieve higher efficiency for online application problems.

Key words: multiple UAVs, path planning, Q-learning, time coordination, collision avoidance

中图分类号: