欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (4): 240300-.doi: 10.12382/bgxb.2024.0300

• • 上一篇    下一篇

基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划

潘云伟1, 李敏1,*(), 曾祥光1, 黄傲1, 张加衡1, 任文哲1, 彭倍2   

  1. 1 西南交通大学 机械工程学院, 四川 成都 610031
    2 电子科技大学 机械与电气工程学院, 四川 成都 611731
  • 收稿日期:2024-04-17 上线日期:2025-04-30
  • 通讯作者:
  • 基金资助:
    四川省科技厅重点研发计划项目(2023YFG0285); 国家自然科学基金面上项目(52075456)

AUV Obstacle Avoidance and Path Planning Based on Artificial Potential Field and Improved Reinforcement Learning

PAN Yunwei1, LI Min1,*(), ZENG Xiangguang1, HUANG Ao1, ZHANG Jiaheng1, REN Wenzhe1, PENG Bei2   

  1. 1 School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, Sichuan, China
    2 School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, China
  • Received:2024-04-17 Online:2025-04-30

摘要:

自主式水下潜航器(Autonomous Underwater Vehicle,AUV)作为重要的水下探测工具之一,广泛应用于各种海洋军事行动中。现有的AUV避障和路径规划研究多集中于网格地图,较少考虑AUV在水下的真实机动情况。针对该问题,提出一种基于积极经验回放机制的改进近端策略优化(Positive-experience Retraining Proximal Policy Optimization,PR-PPO)算法和人工势场的AUV避障与路径规划方法。利用仿真软件中AUV模型自身传感器和水下环境构建动态人工势场。基于PR-PPO强化学习算法,通过与环境交互进行学习,建立AUV状态与动作之间的映射关系,无需动力学模型和地图信息,即可实现实时避障和路径规划。研究结果表明,与传统的竞争双深度Q网络算法和近端策略优化算法相比,所提算法不仅能保证任务的成功率,还缩短了模型训练时长,提升了收敛效果。

关键词: 自主式水下潜航器, 强化学习, 人工势场, 避障, 路径规划

Abstract:

Autonomous underwater vehicle (AUV),as one of the important underwater detection tools,are widely used in various marine military operations.Most of the existing research on AUV obstacle avoidance and path planning focuses on grid maps,and rarely considers the real maneuverability of AUVs under water.In order to solve this problem,an improved proximal policy optimization based on positive-experience retraining (PR-PPO) algorithm and an AUV obstacle avoidance and path planning method based on artificial potential field are proposed.A dynamic artificial potential field is constructed by using the sensor in AUV model and the underwater environment in the simulation software.Based on the PR-PPO reinforcement learning algorithm,the mapping relationship between the AUV state and the action is established by interacting with the environment.Real-time obstacle avoidance and path planning can be realized without dynamic model and map information.The results show that,compared with the traditional D3QN and PPO algorithms,the proposed algorithm can not only ensure the success rate of the task,but also shorten the model training time and improve the convergence effect.

Key words: autonomous underwater vehicle, reinforcement learning, artificial potential field, obstacle avoidance, path planning