Research on Robot Navigation Method Integrating Safe Convex Space and Deep Reinforcement Learning

doi:10.12382/bgxb.2023.0982

Abstract

Abstract:

A robot navigation method based on deep reinforcement learning (DRL) is proposed for navigating the a robot in the scenario where the global map is unknown and there are dynamic and static obstacles in the environment. Compared to other DRL-based navigation methods applied in complex dynamic environment, the improvements in the designs of action space, state space, and reward function are introduced into the proposed method. Additionally, the proposed method separates the control process from neural network, thus facilitating the simulation research to be effectively implemented in practice. Specifically, the action space is defined by intersecting the safe convex space, calculated from 2D Lidar data, with the kinematic limits of robot. This intersection narrows down the feasible trajectory search space while meeting both short-term dynamic obstacle avoidance and long-term global navigation needs. Reference points are sampled from this action space to form a reference trajectory that the robot follows using a model predictive control (MPC) algorithm. The method also incorporates additional elements such as safe convex space and reference points in the design of state space and reward function. Ablation studies demonstrate the superior navigation success rate, reduced time consumption, and robust generalization capabilities of the proposed method in various static and dynamic environments.

Key words: mobile robot navigation, deep reinforcement learning, safe convex space, model predictive control, dynamic unknown environment

CLC Number:

TP242.6

DONG Mingze, WEN Zhuanglei, CHEN Xiai, YANG Jiongkun, ZENG Tao. Research on Robot Navigation Method Integrating Safe Convex Space and Deep Reinforcement Learning[J]. Acta Armamentarii, 2024, 45(12): 4372-4382.

Figures/Tables 16

References 25

[1]	HESS W, KOHLER D, RAPP H, et al. Real-time loop closure in 2D LIDAR SLAM[C]// Proceedings of 2016 IEEE International Conference on Robotics and Automation. Stockholm, Sweden: IEEE, 2016: 1271-1278.
[2]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM 2: an open-source slam system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262.
[3]	PÜTZ S, SIMÓN J S, HERTZBERG J. Move base flex a highly flexible navigation framework for mobile robots[C]// Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain:IEEE, 2018: 3416-3421.
[4]	CAI K Q, WANG C Q, CHENG J Y, et al. Mobile robot path planning in dynamic environments: a survey[J]. Instrumentation, 2019, 6(2): 90-100.
[5]	王霄龙, 陈洋, 胡棉, 等. 基于改进深度Q网络的机器人持续监测路径规划[J]. 兵工学报, 2024, 45(6):1813-1823. doi: 10.12382/bgxb.2023.0227
	WANG X L, CHEN Y, HU M, et al. Robot path plannimg for persistent monitoring based on improved deep Q networks[J]. Acta Armamentarii, 2024, 45(6):1813-1823. (in Chinese)
[6]	董豪, 杨静, 李少波, 等. 基于深度强化学习的机器人运动控制研究进展[J]. 控制与决策, 2022, 37(2):278-292.
	DONG H, YANG J, LI S B, et al. Research progress of robot motion control based on deep reinforcement learning[J]. Control and Decision, 2022, 37(2): 278-292. (in Chinese)
[7]	XU X L, CAI P, AHMED Z, et al. Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning[J]. Neurocomputing, 2022, 468: 181-197.
[8]	YAN N, HUANG S B, KONG C. Reinforcement learning-based autonomous navigation and obstacle avoidance for USVs under partially observable conditions[J]. Mathematical Problems in Engineering, 2021, 2021: 5519033.
[9]	PFEIFFER M, SCHAEUBLE M, NIETO J, et al. From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots[C]// Proceedings of 2017 IEEE International Conference on Robotics and Automation. Marina Bay Sands, Singapore: IEEE, 2017: 1527-1533.
[10]	TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation[C]// Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE, 2017: 31-36.
[11]	XIE L H, WANG S, ROSA S, et al. Learning with training wheels: Speeding up training with a simple controller for deep reinforcement learning[C]// Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE, 2018: 6276-6283.
[12]	LIU L C, DUGAS D, CESARI G, et al. Robot navigation in crowded environments using deep reinforcement learning[C]// Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, NV, US: IEEE, 2020: 5671-5677.
[13]	黄昱洲, 王立松, 秦小麟. 一种基于深度强化学习的无人小车双层路径规划方法[J]. 计算机科学, 2023, 50(1):194-204. doi: 10.11896/jsjkx.220500241
	HUANG Y Z, WANG L S, QIN X L. Bi-level path planning method for unmanned vehicle based on deep reinforcement learning[J]. Computer Science, 2023, 50(1): 194 -204. (in Chinese) doi: 10.11896/jsjkx.220500241
[14]	BRITO B, EVERETT M, HOW J P, et al. Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4616-4623.
[15]	NIKDEL P, VAUGHAN R, CHEN M, et al. LBGP: learning based goal planning for autonomous following in front[C]// Proceedings of 2021 International Conference on Robotics and Automation. Xi’an, China: IEEE, 2021: 3140-3146.
[16]	ZHONG X G, WU Y W, WANG D, et al. Generating large convex polytopes directly on point clouds: arXiv:2010.08744[R/OL]. Ithaca, NY, US: Cornell University, 2020 (2020-10-17) [2024-02-25]. https://arxiv.org/abs/2010.08744.
[17]	ZHUANG X. The strategy entropy of reinforcement learning for mobile robot navigation in complex environments[C]// Proceedings of 2005 IEEE International Conference on Robotics and Automation. Barcelona, Spain:IEEE, 2005: 1742-1747.
[18]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms: arXiv:1707.06347[R/OL]. Ithaca, NY, US: Cornell University, 2017 (2017-07-20) [2024-02-25]. https://arxiv.org/abs/1707.06347.
[19]	HEESS N, TB D, SRIRAM S, et al. Emergence of locomotion behaviours in rich environments: arXiv: 1707.02286[R/OL]. Ithaca, NY, US: Cornell University, 2017 (2017-07-07) [2024-02-25]. https://arxiv.org/abs/1707.02286.
[20]	DOLGOV D, THRUN S, MONTEMERLO M, et al. Practical search techniques in path planning for autonomous driving[J]. Ann Arbor, 2008, 1001(48105): 18-80.
[21]	WERLING M, ZIEGLER J, KAMMEL S, et al. Optimal trajectory generation for dynamic street scenarios in a frenet frame[C]// Proceedings of 2010 IEEE International Conference on Robotics and Automation. Anchorage, AK, US: IEEE, 2010: 987-993.
[22]	YOO H, ZAVALA V M, LEE J H. A dynamic penalty function approach for constraint-handling in reinforcement learning[J]. IFAC-PapersOnLine, 2021, 54(3): 487-491.
[23]	RAILEANU R, FERGUS R. Decoupling value and policy for generalization in reinforcement learning[C]// Proceedings of the 38th International Conference on Machine Learning. Virtual Event: PMLR, 2021: 8787-8798.
[24]	NARVEKAR S, PENG B, LEONETTI M, et al. Curriculum learning for reinforcement learning domains: a framework and survey[J]. The Journal of Machine Learning Research, 2020, 21(1): 7382-7431.
[25]	ZHU K, ZHANG T. Deep reinforcement learning based mobile robot navigation: a review[J]. Tsinghua Science and Technology, 2021, 26(5): 674-691.

阶段	环境尺寸/m	静态障碍物个数	动态障碍物个数	动态障碍物半径/m	动态障碍物速度/(m·s^-1)
1	20×30	0	0
2	20×30	10	0
3	20×30	10	5	0.2~0.3	0.3
4	20×30	10	10	0.2~0.3	0.3
5	10×10	0	10	0.1~0.4	0.3~0.6
6	10×10	0	20	0.1~0.4	0.3~0.6
7	10×10	0	30	0.1~0.4	0.3~0.6

阶段	环境尺寸/m	静态障碍物个数	动态障碍物个数	动态障碍物半径/m	动态障碍物速度/(m·s^-1)
1	20×30	0	0
2	20×30	10	0
3	20×30	10	5	0.2~0.3	0.3
4	20×30	10	10	0.2~0.3	0.3
5	10×10	0	10	0.1~0.4	0.3~0.6
6	10×10	0	20	0.1~0.4	0.3~0.6
7	10×10	0	30	0.1~0.4	0.3~0.6

方法		成功率/%		导航时间/s				导航路程/m				速度/(m·s^-1)				加速度/(m·s^-2)				加加速度/(m·s^-3)
方法		成功率/%		均值		标准差		均值		标准差		均值		标准差		均值		标准差		均值		标准差
设计1	76.0		4.0		2.2		11.3		6.3		2.9		0.3		0		1.0		-0.6		11.8
设计2	76.0		4.0		2.2		11.3		6.3		2.8		0.2		0		1.5		-0.1		22.5
设计3	90.3		9.0		5.2		15.2		8.3		2.8		0.2		0.3		1.9		-0.1		7.0
设计4	83.0		5.0		2.6		11.7		6.3		2.2		0.4		0.5		1.3		-0.2		4.8
本文方法	89.2		5.0		2.6		12.2		6.6		2.2		0.4		0.3		1.4		-0.5		4.0

方法		成功率/%		导航时间/s				导航路程/m				速度/(m·s^-1)				加速度/(m·s^-2)				加加速度/(m·s^-3)
方法		成功率/%		均值		标准差		均值		标准差		均值		标准差		均值		标准差		均值		标准差
设计1	76.0		4.0		2.2		11.3		6.3		2.9		0.3		0		1.0		-0.6		11.8
设计2	76.0		4.0		2.2		11.3		6.3		2.8		0.2		0		1.5		-0.1		22.5
设计3	90.3		9.0		5.2		15.2		8.3		2.8		0.2		0.3		1.9		-0.1		7.0
设计4	83.0		5.0		2.6		11.7		6.3		2.2		0.4		0.5		1.3		-0.2		4.8
本文方法	89.2		5.0		2.6		12.2		6.6		2.2		0.4		0.3		1.4		-0.5		4.0

方法	成功率/%	导航时间/s		导航路程/m		速度/(m·s^-1)		加速度/(m·s^-2)		加加速度/(m·s^-3)
方法	成功率/%	均值	标准差	均值	标准差	均值	标准差	均值	标准差	均值	标准差
设计1	80	4.0	2.2	11.4	6.2	2.9	0.3	-0.1	1.3	-0.1	16.9
设计2	79	4.0	2.1	11.4	6.2	2.9	0.2	-0.1	2.2	0.0	35.9
设计3	88	8.8	5.0	15.4	8.6	1.8	0.4	0.3	1.8	-0.1	6.6
设计4	84	4.9	2.4	11.6	6.3	2.2	0.4	0.5	1.3	-0.3	4.9
本文方法	89	5.9	3.3	13.0	7.3	2.2	0.4	0.3	1.4	-0.5	4.2