基于蚁群算法引导深度Q网络的移动机器人路径规划算法

doi:10.12382/bgxb.2025.0134

兵工学报

• • 下一篇

基于蚁群算法引导深度Q网络的移动机器人路径规划算法

李海亮^1,2，李宗刚^1,2*，宁小刚^1,2，杜亚江^1,2

1. 兰州交通大学机电工程学院, 甘肃兰州730070; 2.兰州交通大学机器人研究所 , 甘肃兰州73007

收稿日期:2025-02-28 修回日期:2025-08-05
基金资助:
国家自然科学基金项目(61663020); 甘肃省重大科技专项(24ZDGA014); 甘肃省高等学校产业支撑计划项目(2022CYZC−33); 大连理工大学工业装备结构分析国家重点实验室开放课题项目(GZ22119)

A Mobile Robot Path Planning Algorithm Based on Ant Colony Optimization Guide Deep Q-Networks

LI Hailiang^1,2，LI Zonggang^1,2*, NING Xiaogang^1,2, DU Yajiang^1,2

1. School of Mechanical and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, Gansu, China; 2. Robot Institute of Lanzhou Jiaotong University, Lanzhou 730070, Gansu, China

Received:2025-02-28 Revised:2025-08-05

摘要/Abstract

摘要： 针对移动机器人深度Q网络(Deep Q-Network, DQN)路径规划算法在处理大规模复杂未知环境时收敛速度慢、规划路径差等问题，提出一种结合蚁群算法(Ant Colony Optimization, ACO)与DQN的路径规划算法(Ant Colony Optimization Guide Deep Q-Networks, ACOG-DQN)。首先引入ACO的信息素机制，以有利于到达终点为目标对当前可能路径进行选择，在降低对环境无效探索次数的基础上确定最优路径；对先前路径选择经验利用阈值筛选形成样本集对Q-network进行训练，然后利用Q-network确定当前环境下移动机器人最优路径。以ACO和Q-network分别确定的最优路径、以及随机探索确定的最优路径为候选，设计Q-network最优路径权重随时间增大的路径选择机制进行决策，遴选出当前动作，达到路径最终由Q-network完全决策的目标。3组不同复杂环境下的仿真与实体试验结果均表明，所提ACOG-DQN算法相对于DQN算法，在收敛速度，路径质量和算法稳定性方面表现出更优的性能，表明了所提算法的有效性。

关键词: 移动机器人, 路径规划, 深度Q网络算法, 蚁群算法, 强化学习, 算法优化

Abstract: To address the issues of slow convergence and poor path planning associated with the Deep Q-Network (DQN) algorithm for mobile robot path planning in large-scale complex unknown environments, a path planning algorithm combining Ant Colony Optimization (ACO) and DQN, termed ACOG-DQN, is proposed. Initially, the pheromone mechanism of ACO is introduced to facilitate the selection of potential paths with the goal of reaching the destination, thereby reducing the number of ineffective environmental explorations and determining the optimal path. Concurrently, the previous path selection experiences are filtered using a threshold to form a sample set for training the Q-network, which is then utilized to determine the optimal path for the mobile robot in the current environment. Finally, a path selection mechanism is designed where the weight of the Q-network's optimal path increases over time, using the optimal paths determined by ACO and the Q-network, as well as those determined by random exploration, as candidates. This mechanism selects the current action, aiming to achieve a path that is ultimately decided entirely by the Q-network. Simulation and physical experiments conducted in three different complex environments demonstrate that the proposed ACOG-DQN algorithm exhibits superior performance in terms of convergence speed, path quality, and algorithm stability compared to the DQN algorithm, thereby validating the effectiveness of the proposed algorithm.

Key words: mobile robot, path planning, Deep Q-network(DQN), ant colony optimization algorithm(ACO), reinforcement learning, algorithm optimization

中图分类号:

TP183

李海亮, 李宗刚, 宁小刚, 杜亚江. 基于蚁群算法引导深度Q网络的移动机器人路径规划算法[J]. 兵工学报, doi: 10.12382/bgxb.2025.0134.

LI Hailiang, LI Zonggang, NING Xiaogang, DU Yajiang. A Mobile Robot Path Planning Algorithm Based on Ant Colony Optimization Guide Deep Q-Networks[J]. Acta Armamentarii, doi: 10.12382/bgxb.2025.0134.

[1]	路潇然, 邹渊, 张旭东, 孙巍, 孟逸豪, 张彬. 基于Munchausen-PER算法优化的混合动力履带车辆能量管理策略[J]. 兵工学报, 2025, 46(6): 240498-.
[2]	张闯, 卫超强, 李延通, 喻妍, 刘锦超. 基于舰机协同的岛礁巡航路径规划[J]. 兵工学报, 2025, 46(5): 240505-.
[3]	周桢林, 龙腾, 刘大卫, 孙景亮, 钟建鑫, 李俊志. 基于强化学习冲突消解的大规模无人机集群航迹规划方法[J]. 兵工学报, 2025, 46(5): 241146-.
[4]	先苏杰, 王康, 曾鑫, 宋杰, 吴志林. 基于深度强化学习的落角和视场角约束制导律[J]. 兵工学报, 2025, 46(4): 240435-.
[5]	潘云伟, 李敏, 曾祥光, 黄傲, 张加衡, 任文哲, 彭倍. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 240300-.
[6]	李传浩, 明振军, 王国新, 阎艳, 丁伟, 万斯来, 丁涛. 基于多智能体深度强化学习的无人平台箔条干扰末端防御动态决策方法[J]. 兵工学报, 2025, 46(3): 240251-.
[7]	张旺, 邵学辉, 唐慧龙, 魏建林, 王伟. 一种探索率自适应设置的强化学习雷达干扰决策方法[J]. 兵工学报, 2025, 46(3): 240357-.
[8]	肖柳骏, 李雅轩, 刘新福. 基于强化学习的高超声速滑翔飞行器自适应末制导[J]. 兵工学报, 2025, 46(2): 240222-.
[9]	李宗刚, 韩森, 陈引娟, 宁小刚. 基于角度搜索和深度Q网络的移动机器人路径规划算法[J]. 兵工学报, 2025, 46(2): 240265-.
[10]	胡明哲, 李旭光, 任智颖, 曾帅. 基于改进启发函数的A^*算法的无人机三维路径规划[J]. 兵工学报, 2024, 45(S1): 302-307.
[11]	胡砚洋, 何凡, 白成超. 高超声速飞行器末制导段协同避障决策方法[J]. 兵工学报, 2024, 45(9): 3147-3160.
[12]	孙浩, 黎海青, 梁彦, 马超雄, 吴翰. 基于知识辅助深度强化学习的巡飞弹组动态突防决策[J]. 兵工学报, 2024, 45(9): 3161-3176.
[13]	牛奕龙, 杨仪, 张凯, 穆莹, 王奇, 王英民. 基于改进DQN算法的应召搜潜无人水面艇路径规划方法[J]. 兵工学报, 2024, 45(9): 3204-3215.
[14]	陈文杰, 崔小红, 王斌锐. 安全最优跟踪控制算法与机械手仿真[J]. 兵工学报, 2024, 45(8): 2688-2697.
[15]	姬鹏, 郭明皓. 基于Frenet坐标下改进人工势场法的无人车局部路径规划[J]. 兵工学报, 2024, 45(7): 2097-2109.

基于蚁群算法引导深度Q网络的移动机器人路径规划算法

A Mobile Robot Path Planning Algorithm Based on Ant Colony Optimization Guide Deep Q-Networks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价