欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2024, Vol. 45 ›› Issue (12): 4372-4382.doi: 10.12382/bgxb.2023.0982

• • 上一篇    下一篇

安全凸空间与深度强化学习结合的机器人导航方法

董明泽, 温庄磊, 陈锡爱*(), 杨炅坤, 曾涛   

  1. 中国计量大学 机电工程学院, 浙江 杭州 310018
  • 收稿日期:2023-09-27 上线日期:2024-02-27
  • 通讯作者:
  • 基金资助:
    国家自然科学基金项目(52005472); 浙江省自然科学基金探索项目(LQ20E050015)

Research on Robot Navigation Method Integrating Safe Convex Space and Deep Reinforcement Learning

DONG Mingze, WEN Zhuanglei, CHEN Xiai*(), YANG Jiongkun, ZENG Tao   

  1. College of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou 310018, Zhejiang, China
  • Received:2023-09-27 Online:2024-02-27

摘要:

针对机器人在全局地图未知且环境内存在动态和静态障碍物场景中的导航问题,提出一种基于深度强化学习(Deep Reinforcement Learning,DRL)的移动机器人导航方法。相较于其他应用于复杂动态环境的DRL机器人导航方法,该方法在动作空间设计、状态空间设计和奖励函数设计上进行了改进,并采用了控制环节与神经网络分离的设计,有助于将仿真研究便捷有效地实现在各类机器人的实际应用中。在动作空间设计上,为缩小可行轨迹的采样空间并同时满足短期动态避障和长期的全局导航需求,将通过激光点云数据计算得到的安全凸空间与机器人运动学极限的交集设定为机器人的动作空间,并从该动作空间中采样出参考位置点形成参考路径,而后机器人通过模型预测控制算法对参考路径进行跟踪。在状态空间和奖励函数的设计上,额外添加了安全凸空间、长短期参考位置点等元素。消融实验结果表明,该设计在各种静态和动态环境中都能达到更高的导航成功率、更短的耗时,并且具有较强的泛化能力。

关键词: 移动机器人导航, 深度强化学习, 安全凸空间, 模型预测控制, 动态未知环境

Abstract:

A robot navigation method based on deep reinforcement learning (DRL) is proposed for navigating the a robot in the scenario where the global map is unknown and there are dynamic and static obstacles in the environment. Compared to other DRL-based navigation methods applied in complex dynamic environment, the improvements in the designs of action space, state space, and reward function are introduced into the proposed method. Additionally, the proposed method separates the control process from neural network, thus facilitating the simulation research to be effectively implemented in practice. Specifically, the action space is defined by intersecting the safe convex space, calculated from 2D Lidar data, with the kinematic limits of robot. This intersection narrows down the feasible trajectory search space while meeting both short-term dynamic obstacle avoidance and long-term global navigation needs. Reference points are sampled from this action space to form a reference trajectory that the robot follows using a model predictive control (MPC) algorithm. The method also incorporates additional elements such as safe convex space and reference points in the design of state space and reward function. Ablation studies demonstrate the superior navigation success rate, reduced time consumption, and robust generalization capabilities of the proposed method in various static and dynamic environments.

Key words: mobile robot navigation, deep reinforcement learning, safe convex space, model predictive control, dynamic unknown environment

中图分类号: