基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划

doi:10.12382/bgxb.2024.0300

摘要/Abstract

摘要：

自主式水下潜航器(Autonomous Underwater Vehicle,AUV)作为重要的水下探测工具之一,广泛应用于各种海洋军事行动中。现有的AUV避障和路径规划研究多集中于网格地图,较少考虑AUV在水下的真实机动情况。针对该问题,提出一种基于积极经验回放机制的改进近端策略优化(Positive-experience Retraining Proximal Policy Optimization,PR-PPO)算法和人工势场的AUV避障与路径规划方法。利用仿真软件中AUV模型自身传感器和水下环境构建动态人工势场。基于PR-PPO强化学习算法,通过与环境交互进行学习,建立AUV状态与动作之间的映射关系,无需动力学模型和地图信息,即可实现实时避障和路径规划。研究结果表明,与传统的竞争双深度Q网络算法和近端策略优化算法相比,所提算法不仅能保证任务的成功率,还缩短了模型训练时长,提升了收敛效果。

关键词: 自主式水下潜航器, 强化学习, 人工势场, 避障, 路径规划

Abstract:

Autonomous underwater vehicle (AUV),as one of the important underwater detection tools,are widely used in various marine military operations.Most of the existing research on AUV obstacle avoidance and path planning focuses on grid maps,and rarely considers the real maneuverability of AUVs under water.In order to solve this problem,an improved proximal policy optimization based on positive-experience retraining (PR-PPO) algorithm and an AUV obstacle avoidance and path planning method based on artificial potential field are proposed.A dynamic artificial potential field is constructed by using the sensor in AUV model and the underwater environment in the simulation software.Based on the PR-PPO reinforcement learning algorithm,the mapping relationship between the AUV state and the action is established by interacting with the environment.Real-time obstacle avoidance and path planning can be realized without dynamic model and map information.The results show that,compared with the traditional D3QN and PPO algorithms,the proposed algorithm can not only ensure the success rate of the task,but also shorten the model training time and improve the convergence effect.

Key words: autonomous underwater vehicle, reinforcement learning, artificial potential field, obstacle avoidance, path planning

潘云伟, 李敏, 曾祥光, 黄傲, 张加衡, 任文哲, 彭倍. 基于人工势场和改进强化学习的自主式水下潜航器避障和航迹规划[J]. 兵工学报, 2025, 46(4): 240300-.

PAN Yunwei, LI Min, ZENG Xiangguang, HUANG Ao, ZHANG Jiaheng, REN Wenzhe, PENG Bei. AUV Obstacle Avoidance and Path Planning Based on Artificial Potential Field and Improved Reinforcement Learning[J]. Acta Armamentarii, 2025, 46(4): 240300-.

图/表 16

图1 简化的AUV模型

Fig.1 Simplified AUV model

图2 AUV进行路径规划和轨迹跟踪时的坐标系示意图

Fig.2 Schematic diagram of coordinate system for AUV path planning and trajectory tracking

图3 APF示意图

Fig.3 Schematic diagram of an APF

图4 算法流程框架

Fig.4 The framework of the proposed algorithm

图5 APF奖励

Fig.5 APF reward

图6 水下任务场景

Fig.6 Underwater mission scene

表1 AUV仿真参数

Table 1 AUV simulation parameters

AUV参数	数值	水下场景	数值
长度/m	1.5	范围/m	100×100
直径/m	0.2	水密度/(kg·m^-3)	1000
速度/(m·s^-1)	2	水黏度/(Pa·s)	0.001
质量/kg	40	水流速/(m·s^-1)	0~0.5

表2 算法参数

Table 2 Algorithm parameters

参数	数值	参数	数值
k_a	5	激活函数	ReLu
k_r	15	衰减因子	0.99
积极经验提取系数	0.01	经验库尺寸	1×10⁴
状态个数	12	小批量尺寸	256
动作个数	6	批训练轮数	10
η_a	3×10^-4	梯度截断	0.5
η_c	3×10^-4	回合最大步数	1×10⁵
隐藏层数	2	经验回放常数	0.95

图7 PR-PPO算法训练流程

Fig.7 Training framework of PR-PPO algorithm

图8 训练结果

Fig.8 Result of training process

图9 时变洋流下的算法性能展示

Fig.9 Effect of the algorithm under time-varying ocean currents

图10 仿真地图

Fig.10 Simulation map

表3 算法验证

Table 3 Algorithm validation

场景	平均奖励	平均步数	平均路径长度/m	成功率/%
地图1	-110	995	124	99
地图2	-261	1403	141	95
地图3	-100	1437	127	96
地图4	-242	1340	141	96

图11 训练结果

Fig.11 Training results

表4 鲁棒性对比

Table 4 Robustness contrast

算法	奖励值	步数	路径长度/m
D3QN算法	-1371±302	1900±148	207±25
PPO算法	-821±140	1731±116	144±5
PR-PPO算法	-513±52	1519±71	142±0.5

图12 动态环境避障和路径规划

Fig.12 Dynamic environment obstacle avoidance and path planning

参考文献 21

[1]	王圣洁, 康凤举, 韩翃. 潜艇与智能无人水下航行器协同系统控制体系及决策研究[J]. 兵工学报, 2017, 38(2):335-344. doi: 10.3969/j.issn.1000-1093.2017.02.018
	WANG S J, KANG F J, HAN H. Research on control and decision-making of submarine and intelligent UUV cooperative system[J]. Acta Armamentarii, 2017, 38(2):335-344. (in Chinese)
[2]	丁文俊, 张国宗, 刘海旻, 等. 面向海流扰动和通信时延的欠驱动AUV编队跟踪控制[J]. 兵工学报, 2024, 45(1):184-196. doi: 10.12382/bgxb.2023.0417
	DING W J, ZHANG G Z, LIU H M, et al. Tracking control of underactuated autonomous underwater vehicle formation under current disturbance and communication delay[J]. Acta Armamentarii, 2024, 45(1):184-196. (in Chinese) doi: 10.12382/bgxb.2023.0417
[3]	WANG C, CHENG C S, YANG D Y, et al. Path planning in localization uncertaining environment based on Dijkstra method[J]. Frontiers in Neurorobotics, 2022,16:821991.
[4]	LIU Y, HUANG P F, ZHANG F, et al. Distributed formation control using artificial potentials and neural network for constrained multiagent systems[J]. IEEE Transactions on Control Systems Technology, 2020, 28(2):697-704.
[5]	WANG W, ZUO L, XU X. A learning-based multi-RRT approach for robot path planning in narrow passages[J]. Journal of Intelligent and Robotic Systems:Theory and Applications, 2018, 90(1/2):81-100.
[6]	TAO C L, ZHANG Y R, LI S Q, et al. Mechanism of interior ballistic peak phenomenon of guns and its effects[J]. Journal of Applied Mechanics, 2010, 77(5):51405.
[7]	ROBERGE V, TARBOUCHI M, LABONTE G. Comparison of parallel genetic algorithm and particle swarm optimization for real-time UAV path planning[J]. IEEE Transactions on Industrial Informatics, 2012, 9(1):132-141.
[8]	CHEN M Z, ZHU D Q. Optimal time-consuming path planning for autonomous underwater vehicles based on a dynamic neural network model in ocean current environments[J]. IEEE Transactions on Vehicular Technology, 2020, 69(12):14401-14412.
[9]	杨静, 吴金平, 刘剑, 等. 一种半监督学习潜艇规避防御智能决策方法[J]. 兵工学报, 2024, 45(10):3474-3487. doi: 10.12382/bgxb.2023.0684
	YANG J, WU J P, LIU J, et al. A semi-supervised learning method for intelligent decision making of submarine maneuver defense[J]. Acta Armamentarii, 2024, 45(10):3474-3487. (in Chinese) doi: 10.12382/bgxb.2023.0684
[10]	HE Z C, DONG L, SUN C Y, et al. Asynchronous multithreading reinforcement-learning-based path planning and tracking for unmanned underwater vehicle[J]. IEEE Transactions on Systems,Man,and Cybernetics:Systems, 2022, 52(5):2757-2769.
[11]	SUN Y S, RAN X R, ZHANG G C, et al. AUV path following controlled by modified deep deterministic policy gradient[J]. Ocean Engineering, 2020,210:107360.
[12]	闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述[J]. 计算机工程, 2021, 47(10):16-25. doi: 10.19678/j.issn.1000-3428.0060683
	YAN J J, ZHANG Q S, HU X P. Review of path planning techniques based on reinforcement learning[J]. Computer Engineering, 2021, 47(10):16-25. (in Chinese) doi: 10.19678/j.issn.1000-3428.0060683
[13]	ROUSSEAS P, BECHLIOULIS C, KYRIAKOPOULOS K J. Harmonic-based optimal motion planning in constrained workspaces using reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6(2):2005-2011.
[14]	王思鹏, 杜昌平, 郑耀. 基于强化学习的扑翼飞行器路径规划算法[J]. 控制与决策, 2022, 37(4):851-860.
	WANG S P, DU C P, ZHENG Y. Local planner for flapping wing micro aerial vehicle based on deep reinforcement learning[J]. Control and Decision, 2022, 37(4):851-860. (in Chinese)
[15]	WANG C L, YANG X, LI H. Improved Q-learning applied to dynamic obstacle avoidance and path planning[J]. IEEE Access, 2022,10:92879-92888.
[16]	LV L H, ZHANG S J, DING D R, et al. Path planning via an improved DQN-based learning policy[J]. IEEE Access, 2019,7:67319-67330.
[17]	武建国, 石凯, 刘健, 等. 6000m AUV“潜龙一号”浮力调节系统开发及试验研究[J]. 海洋技术学报, 2014, 33(5):1-7.
	WU J G, SHI K, LIU J, et al. Development and experimental research on the variable buoyancy system for the 6000m rated class “Qianlong I” AUV[J]. Ocean Technology, 2014, 33(5):1-7. (in Chinese)
[18]	WU Z T, DAI J Y, JIANG B P, et al. Robot path planning based on artificial potential field with deterministic annealing[J]. ISA Transactions, 2023,138:74-87.
[19]	SHIN Y J, KIM E. Hybrid path planning using positioning risk and artificial potential fields[J]. Aerospace Science and Technology, 2021,112:106640.
[20]	朱伟达. 基于改进型人工势场法的车辆避障路径规划研究[D]. 镇江: 江苏大学, 2017.
	ZHU W D. Research on path planning and obstacle avoidance method for vehicle based on improved artificial potential field[D]. Zhenjiang: Jiangsu University, 2017. (in Chinese)
[21]	翟丽, 张雪莹, 张闲, 等. 基于势场法的无人车局部动态避障路径规划算法[J]. 北京理工大学学报, 2022, 42(7):696-705.
	ZHAI L, ZHANG X Y, ZHANG X, et al. Local dynamic obstacle avoidance path planning algorithm for unmanned vehicles based on potential field method[J]. Transaction of Beijing Institute of Technology, 2022, 42(7):696-705. (in Chinese)