基于角度搜索和深度Q网络的移动机器人路径规划算法

doi:10.12382/bgxb.2024.0265

摘要/Abstract

摘要：

针对深度Q网络(Deep Q Network,DQN)算法在求解路径规划问题时存在学习时间长、收敛速度慢的局限性,提出一种角度搜索(Angle Searching,AS)和DQN相结合的算法(Angle Searching-Deep Q Network,AS-DQN),通过规划搜索域,控制移动机器人的搜索方向,减少栅格节点的遍历,提高路径规划的效率。为加强移动机器人之间的协作能力,提出一种物联网信息融合技术(Internet Information Fusion Technology,IIFT)模型,能够将多个分散的局部环境信息整合为全局信息,指导移动机器人规划路径。仿真实验结果表明:与标准DQN算法相比,AS-DQN算法可以缩短移动机器人寻得到达目标点最优路径的时间,将IIFT模型与AS-DQN算法相结合路径规划效率更加显著。实体实验结果表明:AS-DQN算法能够应用于Turtlebot3无人车,并成功找到起点至目标点的最优路径。

关键词: 移动机器人, 路径规划, 深度Q网络, 角度搜索策略, 物联网信息融合技术

Abstract:

deep Q-network algorithm has the limitations of long learning time and slow convergence speed when solving path planning problems.A path planning algorithm that combines angle search strategy and deep Q-network,called AS-DQN algorithm is proposed.A search domain is set to control the search direction of mobile robot and reduce the traversal of grid nodes,thus improving the efficiency of path planning.In order to enhance the collaboration ability of mobile robots,an internet of things information fusion technology model is proposed,which can integrate multiple scattered local environmental informations into a global information to guide multi-robot path planning.Simulation experimental results show that AS-DQN algorithm can take less time to find the optimal path to the target point for mobile robots compared with the standard DQN algorithm.Combining IIFT model with AS-DQN algorithm for path planning is more efficient.The physical experimental results show that AS-DQN algorithm can be applied to the Turtlebot3 unmanned vehicle and successfully finds the optimal path from the starting point to the target point.

Key words: mobile robot, path planning, deep Q-network, angle searching strategy, internet of things information fusion technology

中图分类号:

TP183

李宗刚, 韩森, 陈引娟, 宁小刚. 基于角度搜索和深度Q网络的移动机器人路径规划算法[J]. 兵工学报, 2025, 46(2): 240265-.

LI Zonggang, HAN Sen, CHEN Yinjuan, NING Xiaogang. A Path Planning Algorithm for Mobile Robots Based on Angle Searching and Deep Q-Network[J]. Acta Armamentarii, 2025, 46(2): 240265-.

图/表 33

图1 MRC-OPP模型及转化地图

Fig.1 MRC-OPP model and transform map

图2 物联网信息融合技术模型示意图

Fig.2 Schematic diagram of IIFT model

图3 DQN算法结构示意图

Fig.3 Schematic diagram of DQN algorithm structure

图4 位置关系及搜索域示意图

Fig.4 Positional relationship and search area diagram

图5 改进搜索域示意图

Fig.5 Improved search domain

图6 四个数值矩阵构成的观测空间

Fig.6 Observation space formed by four numerical matrices

图7 移动机器人可选栅格点

Fig.7 Optional grid points for mobile robot

图8 神经网络结构图

Fig.8 Neural network structure diagram

表1 AS-DQN伪代码

Table 1 AS-DQN pseudocode

Algorithm 1 AS-DQN:
Initialization Initialize replay memory,Initialize the Q network and target network and other hyperparameters.Initialize S=0.
1: for S<S_max do
2: if S≠0 then s_k=s_k₊₁
3: else get the initial observation s_k
4: end if
5: if S<pre.step then
6: random select action a_k
7: else
8: if μ<$ \epsilon $ then random select action a_k
9: else select a_k= $m a x a ∈ M$ Q(s_k,a,θ)
10: end if
11: end if
12: if coordinate.a_k=FALSE
13: brake
14: Store experience e_k=(s_k,a_k,r_k,s_k₊₁)
15: if S<decline.step then
16: $ \epsilon = \epsilon +0.002$
17: end if
18: if S>pre.step then
19: Calculate the loss (y-Q(s_i,a_i;θ_i))²
20: Train and update Q network’s weight θ_i
21: Every Z steps copy θ_i to θ_i₊₁
22: end if
23: end for

表1 AS-DQN伪代码

Table 1 AS-DQN pseudocode

Algorithm 1 AS-DQN:
Initialization Initialize replay memory,Initialize the Q network and target network and other hyperparameters.Initialize S=0.
1: for S<S_max do
2: if S≠0 then s_k=s_k₊₁
3: else get the initial observation s_k
4: end if
5: if S<pre.step then
6: random select action a_k
7: else
8: if μ<$ \epsilon $ then random select action a_k
9: else select a_k= $m a x a ∈ M$ Q(s_k,a,θ)
10: end if
11: end if
12: if coordinate.a_k=FALSE
13: brake
14: Store experience e_k=(s_k,a_k,r_k,s_k₊₁)
15: if S<decline.step then
16: $ \epsilon = \epsilon +0.002$
17: end if
18: if S>pre.step then
19: Calculate the loss (y-Q(s_i,a_i;θ_i))²
20: Train and update Q network’s weight θ_i
21: Every Z steps copy θ_i to θ_i₊₁
22: end if
23: end for

图9 AS-DQN训练结构图

Fig.9 AS-DQN training structure diagram

表2 神经网络的超参及数值

Table 2 Hyperparameters and numerical values of neural network

参数名称	数值
记忆池	20000
开始训练的经验数量	100
处理样本数量	32
目标网络更新频率	100
折扣因子γ	0.9
学习率	0.001
经验回放内存值	500
选择最大Q值动作的概率ε	0.01
ε最大值	1
ε增加速率	0.002

图10 地图障碍物模型示意图

Fig.10 Schematic diagram of map obstacles model

图11 训练过程损失值变化示意图

Fig.11 Schematic diagram of changes in loss values during training

图12 8×8地图路径规划示意图

Fig.12 Schematic diagram of path planning of 8×8 map

图13 12×12地图路径规划示意图

Fig.13 Schematic diagram of path planning of 12×12 map

图14 8×8地图损失值变化示意图

Fig.14 Schematic diagram of changes in loss values of 8×8 map

表3 8×8地图移动机器人数据

Table 3 Mobile robot data of 8×8map

机器人	收敛步长	收敛时间/s
R₁	28000	372.9
R₂	30000	446.5
R₃	17000	258.6

图15 12×12地图损失值变化示意图

Fig.15 Schematic diagram of changes in loss values of 12×12map

表4 12×12地图移动机器人数据

Table 4 Mobile robot data of 12×12map

机器人	收敛步长	收敛时间/s
R₄	109000	1510.6
R₅	103000	1429.8
R₆	75000	1137.2

图16 训练过程损失值变化示意图

Fig.16 Diagram of changes in loss values during training

表5 8×8地图静态障碍物模型数据

Table 5 Model data of static obstacle of 8×8map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	27000	368.5	20.68
AS-DQN	20500	292.3	20.68

表6 8×8地图动态障碍物模型数据

Table 6 Model data of dynamic obstacle of 8×8map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	28000	398.1	23.99
AS-DQN	20000	302.6	23.99

表7 12×12地图静态障碍物模型数据

Table 7 Model data of static obstacle of 12×12map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	90000	1216.9	15.75
AS-DQN	75000	1025.3	15.75

表8 12×12地图动态障碍物模型数据

Table 8 Model data of dynamic obstacle of 12×12map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	109000	1510.6	16.54
AS-DQN	85500	1260.7	16.54

图17 训练过程损失值变化示意图

Fig.17 Schematic diagram of changes in loss values during training

表9 8×8地图静态障碍物模型数据

Table 9 Model data of static obstacle of 8×8map

算法
DQN	27000	368.5
AS-DQN	20500	292.3	53.92
AS-DQN(IIFT)	12000	169.8	41.91

表10 8×8地图动态障碍物模型数据

Table 10 Model data of dynamic obstacle of 8×8map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	28000	398.1
AS-DQN	20000	302.6	52.65
AS-DQN(IIFT)	12500	188.5	37.71

表11 12×12地图静态障碍物模型数据

Table 11 Model data of static obstacle of 12×12map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	90000	1216.9
AS-DQN	75000	1025.3	40.33
AS-DQN(IIFT)	56000	726.1	29.18

表12 12×12地图动态障碍物模型数据

Table 12 Model data of dynamic obstacle of 12×12 map

算法	收敛步长	收敛时间/s	节省时间/%
DQN	109000	1510.6
AS-DQN	85500	1260.7	35.54
AS-DQN(IIFT)	65000	973.8	22.76

图18 实体图片

Fig.18 Physical images

图19 刚体和NOKOV动捕系统示意图

Fig.19 Schematic diagram of rigid body and NOKOV dynamic capture system

图20 TB3无人车路径规划过程示意图

Fig.20 Schematic diagram of TB3 unmanned vehicle path planning process

图21 Rviz中TB3路径规划示意图

Fig.21 Path planning of TB3 on Rviz

参考文献 28

[1]	王旭, 朱其新, 朱永红, 等. 面向二维移动机器人的路径规划算法综述[J]. 计算机工程与应用, 2023, 59(20):51-66. doi: 10.3778/j.issn.1002-8331.2212-0050
	WANG X, ZHU Q X, ZHU Y H, et al. Review of path planning algorithms for mobile robots[J]. Computer Engineering and Applications, 2023, 59(20):51-66. (in Chinese) doi: 10.3778/j.issn.1002-8331.2212-0050
[2]	毛建旭, 贺振宇. 电力巡检机器人路径规划技术及应用综述[J]. 控制与决策, 2023, 38(11):3009-3024.
	MAO J X, HE Z Y. Review of research and applications on path planning technology for power inspection robots[J]. Control and Decision, 2023, 38(11):3009-3024. (in Chinese)
[3]	ZHONG H, LIU Y. The dynamic path planning research for mobile robot based on artificial potential field[C]// Proceedings of Hubei University of Science and Technology.2011 International Conference on Consumer Electronics,Communications and Networks. Xianning,China: IEEE, 2011:2736-2739.
[4]	HAO B, DU H, YAN Z P. A path planning approach for unmanned surface vehicles based on dynamic and fast Q-learning[J]. Ocean Engineering, 2023, 270:113632.
[5]	YANG X F, SHI Y L. Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle[J]. Ocean Engineering, 2022, 266(1):112809.
[6]	BASHIR N. An obstacle avoidance approach for UAV path planning[J]. Simulation Modelling Practice and Theory, 2023, 129:102815.
[7]	ZHU X M, WANG L. Path planning of multi-UAVs based on deep Q-network for energy-efficient data collection in UAVs-assisted[J]. Vehicular Communications, 2022, 36:100491.
[8]	HU X M, CHEN L, TANG B, et al. Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles[J]. Mechanical Systems and Signal Processing, 2018, 100:500.
[9]	REDA M, ONSY A. Path planning algorithms in the autonomous driving system:a comprehensive review[J]. Robotics and Autonomous Systems, 2024, 174:104630.
[10]	ZHAGN R, GUO H, ANDRIUKAITIS D, et al. Intelligent path planning by an improved RRT algorithm with dual grid map[J]. Alexandria Engineering Journal, 2024, 88:91-104.
[11]	郭利进, 李强. 基于改进RRT^*算法的移动机器人路径规划[J]. 智能系统学报, 2024, 19(05):1209-1217.
	GUO L J, LI Q. Mobile robot path planning based on improved RRT^* algorithm[J]. CAAI Transactions on Intelligent Systems, 2024, 19(05):1209-1217. (in Chinese)
[12]	XU X, ZENG J Z, ZHAO Y, et al. Research on global path planning algorithm for mobile robots based on improved A^*[J]. Expert Systems with Applications, 2024, 243:122922.
[13]	梅艺林, 崔立堃, 胡雪岩. 基于人工势场法的无人车路径规划与避障研究[J]. 兵器装备工程学报, 2024, 45(09):300-306.
	MEI Y L, CUI L K, HU X Y. Research on path planning and obstacle avoidance of unmanned vehicles based on artificial potential field method[J/OL]. Journal of Ordnance Equipment Engineering, 2024, 45(09):300-306. (in Chinese)
[14]	吴妮妮, 王岫鑫. 移动机器人导航路径的自主学习粒子群规划方法[J]. 机械设计与制造, 2024(7):342-346.
	WU N N, WANG X X. Mobile robot navigation path planning method autonomous learning particle swarm algorithm[J]. Machinery Design & Manufacture, 2024(7):342-346.
[15]	CUI J G, WU L, HUANG X D, et al. Multi-strategy adaptable ant colony optimization algorithm and its application in robot path planning[J]. Knowledge-Based Systems, 2024, 288:111459.
[16]	MOHD N A W, AMRIL N, ASHRAF K, et al. Improved genetic algorithm for mobile robot path planning in static environments[J]. Expert Systems with Applications, 2024, 249:123762.
[17]	PENG Y, LI S W, HU Z Z. A self-learning dynamic path planning method for evacuation in large public buildings based on neural networks[J]. Neurocomputing, 2019, 365:71-85.
[18]	CUI L L, WANG X W, ZHANG Y. Reinforcement learning-based asymptotic cooperative tracking of a class multi-agent dynamic systems using neural networks[J]. Neuro-computing, 2016, 171:220-229.
[19]	MAOUDJ A, HENTOUT A. Optimal path planning approach based on Q-learning algorithm for mobile robots[J]. Applied Soft Computing, 2020, 97(A):106796.
[20]	ZHOU Q, LIAN Y, WU J Y, et al. An optimized Q-learning algorithm for mobile robot local path planning[J]. Knowledge-Based Systems, 2024, 286:111400.
[21]	ZHANG S J, LV L H, DING D R, et al. Path planning via an improved DQN-based learning policy[J]. IEEE Access, 2019, 7 :67319-67330. doi: 10.1109/ACCESS.2019.2918703
[22]	KONG F C, WANG Q, GAO S, et al. B-APFDQN:a UAV path planning algorithm based on deep Q-network and artificial potential field[J]. IEEE Access, 2023, 11:44051-44064.
[23]	LUO L, ZHAO N, ZHU Y, et al. A^* guiding DQN algorithm for automated guided vehicle pathfinding problem of robotic mobile fulfillment systems[J]. Computers and Industrial Engineering, 2023, 178:109112.
[24]	史殿习, 彭滢璇, 杨焕焕, 等. 基于DQN的多智能体深度强化学习运动规划方法[J]. 计算机科学, 2024, 51(2):268-277.
	SHI D X, PENG Y X, YANG H H, et al. DQN-based multi-agent motion planning method with deep reinforcement learning[J]. Computer Science, 2024, 51(2):268-277. (in Chinese) doi: 10.11896/jsjkx.230500113
[25]	YU Y, LIU Y F, WANG J C, et al. Obstacle avoidance method based on double DQN for agricultural robots[J]. Computers and Electronics in Agriculture, 2023, 204:107546.
[26]	MNIH V, KAVUKCUOGLU K. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540):529-533.
[27]	王雅如, 姚得鑫, 刘增力, 等. 基于角度搜索的移动机器人路径规划方法[J]. 系统仿真学报, 2024, 36(7):1643-1654. doi: 10.16182/j.issn1004731x.joss.23-0407
	WANG Y R, YAO D X, LIU Z L, et al. Path planning for mobile robot based on angle search[J]. Journal of System Simulation, 2024, 36(7):1643-1654. (in Chinese) doi: 10.16182/j.issn1004731x.joss.23-0407
[28]	SILVER D, MADDISON C. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529:484-489.