1. 电子科技大学 自动化工程学院, 四川 成都 611731
2. 深圳市人工智能与机器人研究院 群体频谱智能研究中心,广东 深圳 518054
3. 电磁空间认知与智能控制技术实验室,北京 100089
* 邮箱: jinliangshao@126.com
收稿:2023-09-06,
网络出版:2024-01-15,
纸质出版:2023-12-30
移动端阅览
李松, 麻壮壮, 张蕴霖, 等. 基于安全强化学习的多智能体覆盖路径规划[J]. 兵工学报, 2023,44(S2):101-113.
Song LI, Zhuangzhuang MA, Yunlin ZHANG, et al. Multi-agent Coverage Path Planning Based on Security Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(S2): 101-113.
李松, 麻壮壮, 张蕴霖, 等. 基于安全强化学习的多智能体覆盖路径规划[J]. 兵工学报, 2023,44(S2):101-113. DOI: 10.12382/bgxb.2023.0881.
Song LI, Zhuangzhuang MA, Yunlin ZHANG, et al. Multi-agent Coverage Path Planning Based on Security Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(S2): 101-113. DOI: 10.12382/bgxb.2023.0881.
覆盖路径规划的目的是为智能体找到一条安全的轨迹
其不仅可以有效覆盖任务区域
而且可以避开障碍物与邻近智能体。在执行覆盖任务时
复杂的大面积任务区域总是不可避免的。如何在保证智能体安全的前提下加强智能体之间的协同合作
以改善集群任务效率低、能力不足的缺点是值得探索的问题。为此
利用栅格地图建立离散的覆盖路径规划数学模型
提出一种基于值分解网络的安全多智能体强化学习算法
并通过理论证明论证其合理性。该算法通过分解群体价值函数以避免智能体的虚假奖励
有助于加强智能体之间协同覆盖策略的学习
以提高算法的收敛速度。通过在训练过程中引入屏蔽器以修正智能体的出界和碰撞等行为
保证智能体在整个任务过程中的安全。仿真和半实物实验结果表明
新算法不仅可以保证智能体的覆盖效率
同时还能有效维护智能体的安全。
The purpose of coverage path planning is to find a safe path for an agent
which can not only effectively cover the task area
but also avoid obstacles and neighboring agents. Complex and large task areas are always unavoidable when the coverage tasks are performed
so it is worth exploring how to ensure the safety of agents and enhance the collaboration between agents to improve the task efficiency and capacity of cluster. Therefore
a discrete coverage path planning mathematical model is established using raster maps
a secure multi-agent reinforcement learning algorithm based on value decomposition network is proposed
and its reasonableness is theoretically demonstrated. The proposed algorithm helps to strengthen the learning of collaborative coverage strategies among the agents by decomposing the group value function to avoid the false rewards of the agents
thus improving the convergence speed of the algorithm. The safety of the agent during an entire task is guaranteed by introducing a shield in the training process to correct the behaviors of the agent
such as out-of-bounds and collision. The simulated and semi-physical experiment results show that the algorithm can not only ensure the coverage efficiency of the agents
but also effectively maintain the safety of the agents.
TAN C S , MOHD-MOKHTAR R , ARSHAD M R . A comprehensive review of coverage path planning in robotics using classical and heuristic algorithms [J ] . IEEE Access , 2021 , 9 : 119310 - 119342 . DOI: 10.1109/ACCESS.2021.3108177 http://doi.org/10.1109/ACCESS.2021.3108177 https://ieeexplore.ieee.org/document/9523743/ https://ieeexplore.ieee.org/document/9523743/
李波 , 杨志鹏 , 贾卓然 , 等 . 一种无监督学习型神经网络的无人机全区域侦察路径规划 [J ] . 西北工业大学学报 , 2021 , 39 ( 1 ): 77 - 84 .
LI B , YANG Z P , JIA Z R , et al. An unsupervised learning neural network for planning UAV full-area reconnaissance path [J ] . Journal of Northwestern Polytechnical University , 2021 , 39 ( 1 ): 77 - 84 . (in Chinese) DOI: 10.1051/jnwpu/20213910077 http://doi.org/10.1051/jnwpu/20213910077 https://www.jnwpu.org/10.1051/jnwpu/20213910077 https://www.jnwpu.org/10.1051/jnwpu/20213910077 To plan a UAV's full-area reconnaissance path under uncertain information conditions, an unsupervised learning neural network based on the genetic algorithm is proposed. Firstly, the environment model, the UAV model and evaluation indexes are presented, and the neural network model for planning the UAV's full-area reconnaissance path is established. Because it is difficult to obtain the training samples for planning the UAV's full-area reconnaissance path, the genetic algorithm is used to optimize the unsupervised learning neural network parameters. Compared with the traditional methods, the evaluation indexes constructed in this paper do not need to specify UAV maneuver rules. The offline learning method proposed in the paper has excellent transfer performances. The simulation results show that the UAV based on the unsupervised learning neural network can plan effective full-area reconnaissance paths in the unknown environments and complete full-area reconnaissance missions.
吴文超 , 黄长强 , 宋磊 , 等 . 不确定环境下的多无人机协同搜索航路规划 [J ] . 兵工学报 , 2011 , 32 ( 11 ): 1337 - 1342 . 为解决多架无人机(UAV)在不确定环境中搜索目标的问题,根据多UAV协同搜索的基本原则,建立了多UAV协同搜索的环境模型和UAV运动模型,提出了一种满足UAV机动限制和适应数据通讯延迟的协同路径决策算法。根据先验知识将环境分为未知环境,已知环境和禁飞区,设计了搜索回报函数,引导UAV对未知环境进行搜索,对已知环境进行规避;提出了禁飞区回避决策,实现了UAV对禁飞区的完全回避。最后进行了仿真实验,并与2种非协同搜索算法的仿真结果进行了比较,结果表明,本文提出的协同搜索算法的有效性。
WU W C , HUANG C Q , SONG L , et al. Cooperative search and path planning of multi-unmanned air vehicles in uncertain environment [J ] . Acta Armamentarii , 2011 , 32 ( 11 ): 1337 - 1342 . (in Chinese)
SANTOS L C , SANTOS F N , PIRES E J S , et al. Path planning for ground robots in agriculture: A short review [C ] // Proceedings of the 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC). Ponta Delgada , Portugal : IEEE , 2020 : 61 - 66 .
GHODKE V , MADAKE J . Navigational path-planning for all-terrain autonomous agricultural robot [J ] . arXiv preprint, [2021-09-05] . https://doi.org/10.48550/arXiv.2109.02015 https://doi.org/10.48550/arXiv.2109.02015 . https://doi.org/10.48550/arXiv.2109.02015 https://doi.org/10.48550/arXiv.2109.02015
DE CARVALHO R N , VIDAL H A , VIEIRA P , et al. Complete coverage path planning and guidance for cleaning robots [C ] // Proceeding of the IEEE International Symposium on Industrial Electronics.Kyoto , Japan : IEEE , 1997 : 677 - 682 .
HASAN K M , REZA K J . Path planning algorithm development for autonomous vacuum cleaner robots [C ] // Proceedings of the 2014 International Conference on Informatics, Electronics & Vision.Dhaka , Bangladesh : IEEE , 2014 : 1 - 6 .
ROTTMANN N , DENZ R , BRUDER R , et al. A probabilistic approach for complete coverage path planning with low-cost systems [C ] // Proceedings of the 2021 European Conference on Mobile Robots.Bonn , Germany : IEEE , 2021 : 1 - 8 .
CHANG S J , DAN B J . Free movimg pattern’s online spanning tree coverage algorithm [C ] // Proceedings of the 2006 SICE-ICASE International Joint Conference.Busan , Korea : IEEE , 2006 : 2935 - 2938 .
LUO H C , LIN H F , ZHU T , et al. Complete coverage path planning of UUV for marine mine countermeasure using grid division and spanning tree [C ] // Proceedings of the 2019 Chinese Control and Decision Conference. Nanchang , China : IEEE , 2019 : 5016 - 5021 .
DOGRU S , MARQUES L . A * -based solution to the coverage path planning problem [C ] // Proceedings of the Iberian Robotics Conference 2017.Seville, Spain:Springer International Publishing , 2017 : 240 - 248 .
李御驰 , 闫军涛 , 宋志华 , 等 . 基于遗传算法的无人机监视覆盖航路规划算法研究 [J ] . 计算机科学与应用 , 2019 , 9 ( 6 ): 1208 - 1215 .
LI Y C , YAN J T , SONG Z H , et al. Research on algorithm of UAV monitoring coverage path planning based on genetic algorithm [J ] . Computer Science and Application , 2019 , 9 ( 6 ): 1208 - 1215 . (in Chinese) DOI: 10.12677/CSA.2019.96135 http://doi.org/10.12677/CSA.2019.96135 https://www.HansPub.org/journal/doi.aspx?DOI=10.12677/CSA.2019.96135 https://www.HansPub.org/journal/doi.aspx?DOI=10.12677/CSA.2019.96135
PIARDI L , LIMA J , PEREIRA A I , et al. Coverage path planning optimization based on Q-learning algorithm [C ] // Proceedings of the 16th International Conference of Numerical Analysis and Applied Mathematics. Rhodes , Greece : AIP Conference Proceedings , 2019, 2133 : 220002 .
HEYDARI J , SAHA O , GANAPATHY V . Reinforcement learning-based coverage path planning with implicit cellular decomposition [J ] . arXiv preprint, [2021-10-18] . https://doi.org/10.48550/arXiv.2110.09018 https://doi.org/10.48550/arXiv.2110.09018 . https://doi.org/10.48550/arXiv.2110.09018 https://doi.org/10.48550/arXiv.2110.09018
BIALAS J , DOLLER M . Coverage path planning for unmanned aerial vehicles in complex 3D environments with deep reinforcement learning [C ] // Proceedings of the 2022 IEEE International Conference on Robotics and Biomimetics.Xishuangbanna , China : IEEE , 2022 : 1080 - 1085 .
张伟 , 王乃新 , 魏世琳 , 等 . 水下无人潜航器集群发展现状及关键技术综述 [J ] . 哈尔滨工程大学学报 , 2020 , 41 ( 2 ): 289 - 297 .
ZHANG W , WANG N X , WEI S L , et al. Overview of unmanned underwater vehicle swarm development status and key technologies [J ] . Journal of Harbin Engineering University , 2020 , 41 ( 2 ): 289 - 297 . (in Chinese)
罗志远 , 刘小峰 , 陈俊风 , 等 . 一种基于分步遗传算法的多无人清洁车区域覆盖路径规划方法 [J ] . 电子测量与仪器学报 , 2020 , 34 ( 8 ): 43 - 50 .
LUO Z Y , LIU X F , CHEN J F , et al. Method of area coverage path planning of multi-unmanned cleaning vehicles based on step by step genetic algorithm [J ] . Journal of Electronic Measurement and Instrumentation , 2020 , 34 ( 8 ): 43 - 50 . (in Chinese)
SANNA G , GODIO S , GUGLIERI G . Neural network based algorithm for multi-UAV coverage path planning [C ] // Proceedings of the 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Athens , Greece : IEEE , 2021 : 1210 - 1217 .
LI W H , ZHAO T , DIAN S Y . Multirobot coverage path planning based on deep Q-network in unknown environment [J ] . Journal of Robotics , 2022 ( 2 ): 1 - 15 . DOI: 10.1155/2022/6825902 https://dx.doi.org/10.1155/2022/6825902 .
SUNEHAG P , LEVER G , GRUSLYS A , et al. Value-decomposition networks for cooperative multi-agent learning [J ] . arXiv preprint, [2017-06-16] . https://doi.org/10.48550/arXiv.1706.05296 https://doi.org/10.48550/arXiv.1706.05296 . https://doi.org/10.48550/arXiv.1706.05296 https://doi.org/10.48550/arXiv.1706.05296
王雪松 , 王荣荣 , 程玉虎 . 安全强化学习综述 [J ] . 自动化学报 , 2023 , 49 ( 9 ): 1813 - 1835 .
WANG X S , WANG R R , CHENG Y H . Safe reinforcement learning: a survey [J ] . Acta Automatica Sinica , 2023 , 49 ( 9 ): 1813 - 1835 . (in Chinese)
MATIGNON L , LAURENT G J , LE FORT-PIAT N . Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems [J ] . The Knowledge Engineering Review , 2012 , 27 ( 1 ): 1 - 31 . DOI: 10.1017/S0269888912000057 http://doi.org/10.1017/S0269888912000057 https://www.cambridge.org/core/product/identifier/S0269888912000057/type/journal_article https://www.cambridge.org/core/product/identifier/S0269888912000057/type/journal_article In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.
ZAREMBA W , SUTSKEVER I , VINYALS O . Recurrent neural network regularization [J ] . arXiv preprint, [2015-02-19] . https://doi.org/10.48550/arXiv.1409.2329 https://doi.org/10.48550/arXiv.1409.2329 . https://doi.org/10.48550/arXiv.1409.2329 https://doi.org/10.48550/arXiv.1409.2329
HU J L , WELLMAN M P . Nash Q-learning for general sum stochasticgames [J ] . Journal of Machine Learning Research , 2003 , 4 : 1039 - 1069 .
0
浏览量
1122
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号