1. 北京理工大学 宇航学院, 北京 100081
2. 北京电子工程总体研究所, 北京 100854
3. 西安现代控制技术研究所, 陕西 西安 710065
*邮箱: wangxf@bit.edu.cn
收稿:2021-09-05,
网络出版:2023-03-10,
纸质出版:2023-02-28
移动端阅览
尹依伊, 王晓芳, 周健. 基于Q学习的多无人机协同航迹规划方法[J]. 兵工学报, 2023,44(2):484-495.
Yiyi YIN, Xiaofang WANG, Jian ZHOU. Q-Learning-based Multi-UAV Cooperative Path Planning Method[J]. Acta Armamentarii, 2023, 44(2): 484-495.
尹依伊, 王晓芳, 周健. 基于Q学习的多无人机协同航迹规划方法[J]. 兵工学报, 2023,44(2):484-495. DOI: 10.12382/bgxb.2021.0606.
Yiyi YIN, Xiaofang WANG, Jian ZHOU. Q-Learning-based Multi-UAV Cooperative Path Planning Method[J]. Acta Armamentarii, 2023, 44(2): 484-495. DOI: 10.12382/bgxb.2021.0606.
针对多无人机同时到达目标的航迹规划问题
建立战场环境模型和单无人机航迹规划的马尔可夫决策模型
基于Q学习算法解算航程最短的最优航迹
应用基于Q学习算法得到的经验矩阵快速解算各无人机的最短航迹并计算协同航程
通过调整绕行无人机的动作选择策略
得到各无人机满足时间协同的航迹组。考虑多无人机的避碰问题
通过设计后退参数确定局部重规划区域
基于深度Q学习理论
采用神经网络替代
Q
table
对局部多无人机航迹进行重规划
避免维度爆炸问题。对于先前未探明的障碍物
参考人工
势场法思想设计障碍物
Q
矩阵
将其叠加至原
Q
矩阵
实现无人机的避碰。仿真结果表明:所提基于Q学习的多无人机协同航迹规划算法能够得到时间协同与碰撞避免的协同航迹
并对环境建模时所未探明的障碍物进行躲避;与A
*
算法相比
针对在线应用问题
新算法具有更高的求解效率。
To solve the path planning problem of multiple UAVs' synchronous arrival at the target
the battlefield environment model and the Markov decision process model of the path planning for a single UAV is established
and the optimal path is calculated based on the Q-learning algorithm. With this algorithm
the Q-table is obtained and used to calculate the shortest path of each UAV and the cooperative range. Then the time-coordinated paths is obtained by adjusting the action selection strategy of the circumventing UAVs. Considering the collision avoidance problem of multiple UAVs
the partical replanning area is determined by designing retreat parameters
and based on the deep reinforcement learning theory
the neural network is used to replace Q-table to re-plan the partical path for UAVs
which can avoid the problem of dimensional explosion. As for the previously unexplored obstacles
the obstacle matrix is designed based on the idea of the artificial potential field theory
which is then superimposed on the original Q-table to realize collision avoidance for the unexplored obstacle. The simulation results verify that with the proposed reinforcement learning path planning method
the coordinated paths with time coordination and collision avoidance can be obtained
and the previously unexplored obstacles in the simulation can be avoided as well. Compared with A
*
algorithm
the proposed method can achieve higher efficiency for online application problems.
陈中原 , 韦文书 , 陈万春 . 基于强化学习的多弹协同攻击智能制导律 [J ] . 兵工学报 , 2021 , 42 ( 8 ): 1638 - 1647 .
CHEN Z Y , WEI W S , CHEN W C . Reinforcement learning-based intelligent guidance law for cooperative attack of multiple missiles [J ] . Acta Armamentarii , 2021 , 42 ( 8 ): 1638 - 1647 . (in Chinese) DOI: 10.3969/j.issn.1000-1093.2021.08.008 http://doi.org/10.3969/j.issn.1000-1093.2021.08.008 A reinforcement learning-based cooperative guidance law utlitizing a deep deterministic policy gradient descent neural network is proposed to achieve the cooperative attack of multiple missiles against a target and improve the attack effectiveness. The estimation equation of time-to-go based on the linear engagement dynamics is revised to improve the estimation accuracy of time-to-go, which is no longer restricted by the assumption of small angle. The time-to-go error of each missile is regarded as the coordination variable. The time-to-go error and range-to-go of each missile are used as the observables of the reinforcement learning algorithm. The reward function is constructed by using miss distance and time-to-go error, and then a reinforcement learning agent is generated by offline training. In the process of closed-loop guidance, the reinforcement learning agent generates guidance commands in real time, by that simultaneous attack can be achieved. Simulated results verify that the proposed reinforcement learning guidance law can achieve simultaneous attack on the target. Compared with the traditional cooperative guidance law, the reinforcement learning cooperative guidance law can be used to obtain smaller miss distances and smaller attack time errors.
LUO W , TANG Q , FU C H , et al. Deep-sarsa based multi-UAV path planning and obstacle avoidance in a dynamic environment [C ] // Proceedings of International Conference on Swarm Intelligence. Cham, Germany:Springer , 2018 : 102 - 111 .
CHEN X , AI Y D . Multi-UAV Path planning based on improved neural network [C ] // Proceedings of Chinese Control and Decision Conference.Shenyang, China:IEEE , 2018 : 354 - 359 .
LIU X J , GU Q , YANG C L . Path planning of multi-cruise missile based on particle swarm optimization [C ] // Proceedings of 2019 International Conference on Sensing,Diagnostics,Prognostics,and Control . Beijing, China : IEEE , 2019 : 910 - 912 .
杜云 , 彭瑜 , 邵士凯 , 等 . 基于改进粒子群优化的多无人机协同航迹规划 [J ] . 科学技术与工程 , 2020 , 20 ( 32 ): 13258 - 13264 .
DU Y , PENG Y , SHAO S K , et al. Cooperative path planning of multi-unmanned aerial vehicle based on improved particle swarm optimization [J ] . Science Technology and Engineering , 2020 , 20 ( 32 ): 13258 - 13264 . (in Chinese)
王洪斌 , 郝策 , 张平 , 等 . 基于A * 算法和人工势场法的移动机器人路径规划 [J ] . 中国机械工程 , 2019 30 ( 20 ): 2489 - 2496 .
WANG H B , HAO C , ZHANG P , et al. Path planning of mobile robots based on A * algorithm and artificial potential field algorithm [J ] . China Mechanical Engineering , 2019 , 30 ( 20 ): 2489 - 2496 . (in Chinese)
CAUSA F , FASANO G , GRASSI M . Multi-UAV path planning for autonomous missions in mixed GNSS coverage scenarios [J ] . Sensors , 2018 , 18 ( 12 ): 4188 . DOI: 10.3390/s18124188 http://doi.org/10.3390/s18124188 http://www.mdpi.com/1424-8220/18/12/4188 http://www.mdpi.com/1424-8220/18/12/4188 This paper presents an algorithm for multi-UAV path planning in scenarios with heterogeneous Global Navigation Satellite Systems (GNSS) coverage. In these environments, cooperative strategies can be effectively exploited when flying in GNSS-challenging conditions, e.g., natural/urban canyons, while the different UAVs can fly as independent systems in the absence of navigation issues (i.e., open sky conditions). These different flight environments are taken into account at path planning level, obtaining a distributed multi-UAV system that autonomously reconfigures itself based on mission needs. Path planning, formulated as a vehicle routing problem, aims at defining smooth and flyable polynomial trajectories, whose time of flight is estimated to guarantee coexistence of different UAVs at the same challenging area. The algorithm is tested in a simulation environment directly derived from a real-world 3D scenario, for variable number of UAVs and waypoints. Its solution and computational cost are compared with optimal planning methods. Results show that the computational burden is almost unaffected by the number of UAVs, and it is compatible with near real time implementation even for a relatively large number of waypoints. The provided solution takes full advantage from the available flight resources, reducing mission time for a given set of waypoints and for increasing UAV number.
LAI D C , DAI J Y . Research on multi-UAV path planning and obstacle avoidance based on improved artificial potential field method [C ] // Proceedings of the 2020 3rd International Conference on Mechatronics,Robotics and Automation.Shanghai,China:IEEE , 2020 : 84 - 88 .
杜楠楠 , 陈建 , 马奔 , 等 . 多太阳能无人机覆盖路径优化方法 [J ] . 航空学报 , 2021 , 42 ( 6 ): 324476 . DOI: 10.7527/S1000-6893.2020.24476 http://doi.org/10.7527/S1000-6893.2020.24476 为解决传统电动无人机在覆盖作业时存在的续航时间短的问题,提出应用多架太阳能无人机进行覆盖作业。首先,在建立了应用于覆盖作业的太阳能无人机的能量模型的基础上,提出了能量流动效率这一指标来评价太阳能无人机在作业过程中对能量的利用率。其次,针对边界存在障碍物的凹多边形区域和内部含障碍物的多边形区域,以总作业完成时间最短为优化目标,提出基于无向图搜索方法的覆盖路径优化模型,定义约束方程限制无人机按照一定规则访问无向图中的节点,通过混合整数线性规划的方法求解每架无人机的最优飞行路径。再次,考虑无人机转弯时的姿态变化对能量流动效率的影响,将总作业完成时间最短和总能量流动效率最高同时作为优化目标,建立双目标优化方程,在首先以作业时间最短为优化目标进行求解的基础上,通过有限遍历的方式选择使能流效率和作业时间相对最优的覆盖飞行方向及飞行路径。大量仿真实验表明,所提的优化模型选取不同的优化目标,应用于不同形状的待覆盖区域,适用性广,在工程上应用范围广、可行性强。
DU N N , CHEN J , MA B , et al. Optimization method for coverage path planning of multi-solar powered UAVs [J ] . Acta Aeronautica et Astronautica Sinica , 2021 , 42 ( 6 ): 324476 . (in Chinese) DOI: 10.7527/S1000-6893.2020.24476 http://doi.org/10.7527/S1000-6893.2020.24476 To solve the problem of short endurance of traditional electric UAVs during coverage operations, this paper proposes to use multiple solar powered UAVs for coverage operations. Firstly, an energy model for the coverage operations of the solar powered UAV is established, and the index of energy flow efficiency is proposed to evaluate the energy utilization rate of the solar powered UAV during the operation. Secondly, to get the shortest overall operation time, a coverage path optimization model based on the undirected graph search method is proposed for the concave polygon region with obstacles on the boundary and the polygon area with obstacles inside. The constraint equations are defined to restrict the UAV to visit the nodes of the undirected graph according to certain rules, and the optimal flight path of each UAV is solved by the method of mixed integer linear programming. Thirdly, considering the influence of attitude change on energy flow efficiency when the UAV turns, a double objective optimization equation is established to obtain the shortest operation time and the highest energy flow efficiency at the same time. After obtaining the shortest operation time, the coverage flight direction and flight path with the relatively optimal energy flow efficiency and operation time are selected through the limited traversal method. A large number of simulation experiments show that the optimization model proposed selects different optimization objectives and can be applied to different shapes of areas to be covered, and has wide applicability and strong feasibility in Engineering practice.
WANG L S , ZHANG X L , DENG P Y , et al. An energy-balanced path planning algorithm for multiple ferrying UAVs based on GA [J ] . International Journal of Aerospace Engineering , 2020 ( 20 ): 1 - 15 .
胡腾 , 刘占军 , 刘洋 , 等 . 多无人机3D侦察路径规划 [J ] . 系统工程与电子技术 , 2019 , 41 ( 7 ): 1551 - 1559 .
HU T , LIU Z J , LIU Y , et al. 3D surveillance path planning for multi-UAVs [J ] . Systems Engineering and Electronics , 2019 , 41 ( 7 ): 1551 - 1559 . (in Chinese)
YANG L , WANG P , ZHANG Y . Coordinated path planning for UAVs based on sheep optimization [J ] . Transactions of Nanjing University of Aeronautics and Astronautics , 2020 , 37 ( 5 ): 816 - 830 .
BELKADI A , ABAUNZA H , CIARLETTA L , et al. Design and implementation of distributed path planning algorithm for a fleet of UAVs [J ] . IEEE Transactions on Aerospace and Electronic Systems , 2019 , 55 ( 6 ): 2647 - 2657 . DOI: 10.1109/TAES.7 http://doi.org/10.1109/TAES.7 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7
乔林 , 罗杰 . 学习过程中共享经验的Q学习算法的研究 [J ] . 计算机科学 , 2012 , 39 ( 5 ): 213 - 216 . 主要以提高多智能体系统中Q学习算法的学习效率为研究目标,以追捕问题为研究平台,提出了一种基于共享经验的Q学习算法。该算法模拟人类的团队学习行为,各个智能体拥有共同的最终目标,即围捕猎物,同时每个智能体通过协商获得自己的阶段目标。在学习过程中把学习分为阶段性学习,每学习一个阶段,就进行一次阶段性总结,分享彼此好的学习经验,以便于下一阶段的学习。这样以学习快的、好的带动慢的、差的,进而提升总体的学习性能。仿真实验证明,在学习过程中共享经验的Q学习算法能够提高学习系统的性能,高效地收敛于最优策略。
QIAO L , LUO J . Research on Q learning algorithm with sharing experience in learning process [J ] . Computer Science , 2012 , 39 ( 5 ): 213 - 216 . (in Chinese) The aim of the research is to improve the efficicnce of multi-agent "lcaring algorithm. This paper proposed a method of multi-agent C}learning with sharing experience based on the pursuit problem. This algorithm simulats human behavior of a learning team, and all agents share a common utimate goal of capturing the prey, at the same time every agent gets their own milestones through negotiations. I}he learning process is divided into some stages. After a learning stage,there will be a stage summary. Then good learning experience will be shared with each other in order to facilitate the next stage of learning. I}he agents who learn fast and well can help the ones who learn slow and not well,so in thisway the performance of the system is enhanced. The simulation results prove that the呀learning algorithm with sharing experience in learning process can improve the performance of learning systems and efficient convergence to the optimal strategy.
BAYERLEIN H , THEILE M , CACCAMO M et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning [J ] . IEEE Open Journal of the Communications Society , 2021 , 2 : 1171 - 1187 . DOI: 10.1109/OJCOMS.2021.3081996 http://doi.org/10.1109/OJCOMS.2021.3081996 https://ieeexplore.ieee.org/document/9437338/ https://ieeexplore.ieee.org/document/9437338/
王毅然 , 经小川 , 田涛 , 等 . 基于强化学习的多Agent路径规划方法研究 [J ] . 计算机应用与软件 , 2019 , 36 ( 8 ): 165 - 171 .
WANG Y R , JING X C , TIAN T , et al. Multi-agent path planning based on reinforcement learning [J ] . Computer Applications and Software , 2019 , 36 ( 8 ): 165 - 171 . (in Chinese)
HAN Q , SHI D X , SHEN T L , et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning [J ] . IEEE Access , 2019 , 7 : 146264 - 146272 . DOI: 10.1109/Access.6287639 http://doi.org/10.1109/Access.6287639 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639
YANG Y , LI J T , PENG L L . Multirobot path planning based on a deep reinforcement learning DQN algorithm [J ] . CAAI Transactions on Intelligence Technology , 2020 , 5 ( 3 ): 177 - 183 . DOI: 10.1049/cit2.v5.3 http://doi.org/10.1049/cit2.v5.3 https://onlinelibrary.wiley.com/toc/24682322/5/3 https://onlinelibrary.wiley.com/toc/24682322/5/3
相晓嘉 , 闫超 , 王菖 , 等 . 基于深度强化学习的固定翼无人机编队协调控制方法 [J ] . 航空学报 , 2021 , 42 ( 4 ): 524009 . DOI: 10.7527/S1000-6893.2020.24009 http://doi.org/10.7527/S1000-6893.2020.24009 由于运动学的复杂性和环境的动态性,控制一组无人机遂行任务目前仍面临较大挑战。首先,以固定翼无人机为研究对象,考虑复杂动态环境的随机性和不确定性,提出了基于无模型深度强化学习的无人机编队协调控制方法。然后,为平衡探索和利用,将ε-greedy策略与模仿策略相结合,提出了ε-imitation动作选择策略;结合双重Q学习和竞争架构对DQN(Deep Q-Network)算法进行改进,提出了ID3QN(Imitative Dueling Double Deep Q-Network)算法以提高算法的学习效率。最后,构建高保真半实物仿真系统进行硬件在环仿真飞行实验,验证了所提算法的适应性和实用性。
XIANG X J , YAN C , WANG C , et al. Coordination control method for fixed-wing UAV formation through deep reinforcement learning [J ] . Acta Aeronautica et Astronautica Sinica , 2021 , 42 ( 4 ): 524009 . (in Chinese) DOI: 10.7527/S1000-6893.2020.24009 http://doi.org/10.7527/S1000-6893.2020.24009 Due to the complexity of kinematics and environmental dynamics, controlling a squad of fixed-wing Unmanned Aerial Vehicles (UAVs) remains a challenging problem. Considering the uncertainty of complex and dynamic environments, this paper solves the coordination control problem of UAV formation based on the model-free deep reinforcement learning algorithm. A new action selection strategy, <i>ε</i>-imitation strategy, is proposed by combining the <i>ε</i>-greedy strategy and the imitation strategy to balance the exploration and the exploitation. Based on this strategy, the double Q-learning technique, and the dueling architecture, the ID3QN (Imitative Dueling Double Deep Q-Network) algorithm is developed to boost learning efficiency. The results of the Hardware-In-Loop experiments conducted in a high-fidelity semi-physical simulation system demonstrate the adaptability and practicality of the proposed ID3QN coordinated control algorithm.
姚冬冬 , 王晓芳 , 田震 . 一种同时满足攻击角度和时间的航迹规划方法 [J ] . 弹箭与制导学报 , 2019 , 39 ( 3 ): 111 - 114 .
YAO D D , WANG X F , TIAN Z . A cooperative path planning with constraints of impact time and impact angle [J ] . Journal of Projectiles,Rockets,Missiles and Guidance , 2019 , 39 ( 3 ): 111 - 114 . (in Chinese)
0
浏览量
331
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号