欢迎访问《兵工学报》官方网站,今天是

兵工学报

• •    下一篇

基于蚁群算法引导深度Q网络的移动机器人路径规划算法

李海亮1,2,李宗刚1,2*,宁小刚1,2,杜亚江1,2   

  1. 1. 兰州交通大学 机电工程学院, 甘肃 兰州730070; 2.兰州交通大学 机器人研究所 , 甘肃 兰州73007
  • 收稿日期:2025-02-28 修回日期:2025-08-05
  • 基金资助:
    国家自然科学基金项目(61663020); 甘肃省重大科技专项(24ZDGA014); 甘肃省高等学校产业支撑计划项目(2022CYZC−33); 大连理工大学工业装备结构分析国家重点实验室开放课题项目(GZ22119)

A Mobile Robot Path Planning Algorithm Based on Ant Colony Optimization Guide Deep Q-Networks

LI Hailiang1,2,LI Zonggang1,2*, NING Xiaogang1,2, DU Yajiang1,2   

  1. 1. School of Mechanical and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, Gansu, China; 2. Robot Institute of Lanzhou Jiaotong University, Lanzhou 730070, Gansu, China
  • Received:2025-02-28 Revised:2025-08-05

摘要: 针对移动机器人深度Q网络(Deep Q-Network, DQN)路径规划算法在处理大规模复杂未知环境时收敛速度慢、规划路径差等问题,提出一种结合蚁群算法(Ant Colony Optimization, ACO)与DQN的路径规划算法(Ant Colony Optimization Guide Deep Q-Networks, ACOG-DQN)。首先引入ACO的信息素机制,以有利于到达终点为目标对当前可能路径进行选择,在降低对环境无效探索次数的基础上确定最优路径;对先前路径选择经验利用阈值筛选形成样本集对Q-network进行训练,然后利用Q-network确定当前环境下移动机器人最优路径。以ACO和Q-network分别确定的最优路径、以及随机探索确定的最优路径为候选,设计Q-network最优路径权重随时间增大的路径选择机制进行决策,遴选出当前动作,达到路径最终由Q-network完全决策的目标。3组不同复杂环境下的仿真与实体试验结果均表明,所提ACOG-DQN算法相对于DQN算法,在收敛速度,路径质量和算法稳定性方面表现出更优的性能,表明了所提算法的有效性。

关键词: 移动机器人, 路径规划, 深度Q网络算法, 蚁群算法, 强化学习, 算法优化

Abstract: To address the issues of slow convergence and poor path planning associated with the Deep Q-Network (DQN) algorithm for mobile robot path planning in large-scale complex unknown environments, a path planning algorithm combining Ant Colony Optimization (ACO) and DQN, termed ACOG-DQN, is proposed. Initially, the pheromone mechanism of ACO is introduced to facilitate the selection of potential paths with the goal of reaching the destination, thereby reducing the number of ineffective environmental explorations and determining the optimal path. Concurrently, the previous path selection experiences are filtered using a threshold to form a sample set for training the Q-network, which is then utilized to determine the optimal path for the mobile robot in the current environment. Finally, a path selection mechanism is designed where the weight of the Q-network's optimal path increases over time, using the optimal paths determined by ACO and the Q-network, as well as those determined by random exploration, as candidates. This mechanism selects the current action, aiming to achieve a path that is ultimately decided entirely by the Q-network. Simulation and physical experiments conducted in three different complex environments demonstrate that the proposed ACOG-DQN algorithm exhibits superior performance in terms of convergence speed, path quality, and algorithm stability compared to the DQN algorithm, thereby validating the effectiveness of the proposed algorithm.

Key words: mobile robot, path planning, Deep Q-network(DQN), ant colony optimization algorithm(ACO), reinforcement learning, algorithm optimization

中图分类号: