基于改进DQN算法的应召搜潜无人水面艇路径规划方法

doi:10.12382/bgxb.2023.0909

摘要/Abstract

摘要：

针对应召反潜中无人水面艇航向和航速机动的情形,提出一种基于改进深度Q学习(Deep Q-learning, DQN)算法的无人艇路径规划方法。结合应召搜潜模型,引入改进的深度强化学习(Improved-DQN, I-DQN)算法,通过联合调整无人水面艇(Unmanned Surface Vessel,USV)的动作空间、动作选择策略和奖励等,获取一条最优路径。算法采用时变动态贪婪策略,根据环境和神经网络的学习效果自适应调整USV动作选择,提高全局搜索能力并避免陷入局部最优解;结合USV所处的障碍物环境和当前位置设置分段非线性奖惩函数,保证不避碰的同时提升算法收敛速度;增加贝塞尔算法对路径平滑处理。仿真结果表明,在相同环境下新方法规划效果优于DQN算法、A^*算法和人工势场算法,具有更好的稳定性、收敛性和安全性。

关键词: 无人水面艇, 路径规划, 深度Q学习算法, 应召搜索

Abstract:

Aiming at the situation that the unmanned surface vessel (USV) manoeuvrs in the course and speed of on-call anti-submarine, a path planning method for USV based on the improved deep Q-learning (DQN) algorithm is proposed. The proposed method uses the on-call submarine search model and introduces an improved deep reinforcement learning algorithm to obtain an optimal path by jointly adjusting the action space, action selection strategy and reward of the USV. The algorithm adopts a time-varying dynamic greedy strategy. The strategy can adaptively adjust the USV action selection according to the environment and the learning effect of the neural network, which improves the global search ability and avoids falling into the local optimal solution. The piecewise nonlinear reward and punishment function is set according to the obstacle environment and the current position of the USV so as to improve the convergence speed of the algorithm while avoiding the obstacles. Bezier algorithm is added to smooth the path. The simulated results show that the planning effect of the proposed method is better than DQN algorithm, A^* algorithm and APF algorithm in the same environment, and it has better stability, convergence and safety.

Key words: unmanned surface vessel, path planning, Deep Q-learning algorithm, on-call search

中图分类号:

TP24
U674.7

牛奕龙, 杨仪, 张凯, 穆莹, 王奇, 王英民. 基于改进DQN算法的应召搜潜无人水面艇路径规划方法[J]. 兵工学报, 2024, 45(9): 3204-3215.

NIU Yilong, YANG Yi, ZHANG Kai, MU Ying, WANG Qi, WANG Yingmin. Path Planning Method for Unmanned Surface Vessel in On-call Submarine Search Based on Improved DQN Algorithm[J]. Acta Armamentarii, 2024, 45(9): 3204-3215.

图/表 21

图1 航速未知时目标潜艇位置分布

Fig.1 Target submarine position distribution with unknown speed

图2 航速已知时目标潜艇位置分布

Fig.2 Target submarine position distribution with known speed

图3 USV状态空间模型

Fig.3 USV state space model

图4 动作空间优化

Fig.4 Action space optimization

图5 膨胀前(左)后(右)路径示意图

Fig.5 Schematic diagram of path before(lift) and after(right) expansion

图6 仿真环境地图

Fig.6 Simulation environment

表1 模型训练参数

Table 1 Model training parameters

参数	数值	参数	数值
动作空间大小	8	隐藏层个数	2
学习率α	0.01	神经元个数	64
衰减率γ	0.90	经验池大小	10 000
探索率ε	[0.01,0.99]	取样数目	32

图7 ε不同取值的收敛曲线

Fig.7 Convergence curves with different values of ε

图8 策略改进前后的收敛情况对比

Fig.8 Comparison of convergences before and after strategy improvement

图9 10m×10m地图中的路径规划与平均奖励值对比

Fig.9 Comparison of path planning and average reward value in 10m×10m map

图10 20m×20m地图中的路径规划与平均奖励值对比

Fig.10 Comparison of path planning and average reward value in 20m×20m map

图11 30m×30m地图中的路径规划与平均奖励值对比

Fig.11 Comparison of path planning and average reward value in 30m×30m map

表2 不同环境地图下的仿真结果

Table 2 Simulated results under different environments

算法	环境尺寸/m	平均路径长度/m	迭代稳定代数	拐点数	是否碰撞
	10×10	11.66	136	4	是
DQN	20×20	19.48	156	6	否
	30×30	21.48	188	7	否
	10×10	12.83	141	2	否
I-DQN	20×20	18.24	150	4	否
	30×30	20.66	169	2	否

表3 I-DQN算法相比DQN算法的性能提升幅度

Table 3 Performance improvement of I-DQN compared with DQN algorithm

环境尺寸/m	平均路径长度/%	收敛速度/%	拐点减少数/个
10×10	-10	-3.68	2
20×20	6	3.85	2
30×30	3.82	10.11	5

图12 不同障碍物环境下不同算法的路径规划

Fig.12 Path planning with different algorithm in different obstacles environments

表4 简单和复杂障碍物环境下的仿真数据统计对比

Table 4 Statistical comparison of simulated data under simple and complex obstacle environments

环境	算法	平均路径长度/m	避开障碍物数/个	拐点数/个	平均路径长度缩短程度/%	拐点数减少数/ 个
简单	DQN	10.54	2	6	10.5	4
	I-DQN	9.54	3	2
复杂	DQN	10.95	3	5	14.9	1
	I-DQN	9.53	4	4

图13 起点和终点位于T型障碍物两侧的算法对比

Fig.13 Algorithm comparison of starting point and ending point on both sides of T-shaped obstacle

图14 路径通过狭窄可行域的算法对比

Fig.14 Algorithm comparison of the path passing through the narrow feasible region

图15 起点和终点分布在障碍物附近的算法对比

Fig.15 Algorithm comparison of starting point and ending point distributed near obstacles

表5 不同测试场景下各指标对比

Table 5 Comparison of indicators under different test conditions

算法	起点和终点位置	是否完成任务	路径长度/m	是否碰撞	拐点数
	T型障碍物两侧	是	17.90	否	5
A^*	狭窄可行域	是	13.83	是	2
	障碍物附近	是	24.97	是	6
	T型障碍物两侧	否
APF	狭窄可行域	是	16.17	否
	障碍物附近	否
	T型障碍物两侧	是	17.07	否	5
I-DQN	狭窄可行域	是	13.83	否	2
	障碍物附近	是	25.97	否	5

图16 路径优化方法对比

Fig.16 Comparison of path optimization methods

参考文献 30

[1]	CHEN Q, LAU Y Y, ZHANG P F, et al. From concept to practicality:unmanned vessel research in China[J]. Heliyon, 2023, 9(4):e15182.
[2]	郑荣, 辛传龙, 汤钟, 等. 无人水面艇自主部署自主水下机器人平台技术综述[J]. 兵工学报, 2020, 41(8):1675-1687. doi: 10.3969/j.issn.1000-1093.2020.08.022
	ZHENG R, XIN C L, TANG Z, et al. Review on the platform technology of autonomous deployment of AUV by USV[J]. Acta Armamentarii, 2020, 41(8): 1675-1687. (in Chinese) doi: 10.3969/j.issn.1000-1093.2020.08.022
[3]	XING B W, YU M J, LIU Z C, et al. A review of path planning for unmanned surface vehicles[J]. Journal of Marine Science and Engineering, 2023, 11(8):1556.
[4]	刘佳, 王杰. 无人水面艇避障路径规划算法综述[J]. 计算机应用与软件, 2020, 37(8):1-10, 20.
	LIU J, WANG J. Overview of obstacle avoidance path planning algorithm for unmanned surface vehicle[J]. Computer Applications and Software, 2020, 37(8): 1-10, 20. (in Chinese)
[5]	李元昊, 段鹏飞, 郭绍义, 等. 船舶全局路径规划相关算法研究综述[J]. 船舶标准化工程师, 2022, 55(5):26-30,55.
	LI Y H, DUAN P F, GUO S Y, et al. Overview of ship global path planning algorithms[J]. Ship Standardization Engineer, 2022, 55(5):26-30, 55. (in Chinese)
[6]	陶亚东, 张会霞, 于海深. 水面无人艇全局路径规划常用算法简述[J]. 应用数学进展, 2022, 11(3):1400-1405.
	TAO Y D, ZHANG H X, YU H S. A brief description of common algorithms for global path planning of surface unmanned vehicles[J]. Progress in Applied Mathematics, 2022, 11(3):1400-1405. (in Chinese)
[7]	VENKATESWARAN C, RAMACHANDRAN M, RAMU K, et al. Application of simulated annealing in various field[J]. Materials and Its Characterization, 2022, 1(1):1-8.
[8]	GAN L X, YAN Z X, ZHANG L, et al. Ship path planning based on safety potential field in inland rivers[J]. Ocean Engineering, 2022, 260:111928.
[9]	张沫, 吴一卓. 基于A^*算法的搬运机器人路径规划优化[J]. 现代电子技术, 2023, 46(13):135-139.
	ZHANG M, WU Y Z. Optimization of transfer robot path planning based on A^* algorithm[J]. Modern Electronic Technology, 2023, 46(13):135-139. (in Chinese)
[10]	CUI Y N, REN J, ZHANG Y. Path planning algorithm for unmanned surface vehicle based on optimized ant colony algorithm[J]. IEEJ Transactions on Electrical and Electronic Engineering, 2022, 17(7):1027-1037.
[11]	LIN Y, CANG N M, QU G X, et al. Research on patrol path planning based on ant colony optimization for unmanned surface vessels[C]//Proceedings of 2023 International Conference on Advanced Robotics and Mechatronics. Sanya, China: IEEE, 2023: 410-414.
[12]	THAMMACHANTUEK I, KETCHAM M. Path planning for autonomous mobile robots using multi-objective evolutionary particle swarm optimization[J]. Plos One, 2022, 17(8): e0271924.
[13]	孙玉山, 王力锋, 吴菁, 等. 智能水下机器人路径规划方法综述[J]. 舰船科学技术, 2020, 42(4):1-7.
	SUN Y S, WANG L F, WU J, et al. A general overview of path planning methods for autonomous underwater vehicle[J]. Ship Science and Technology, 2020, 42(4):1-7. (in Chinese)
[14]	XI M, YANG J C, WEN J B, et al. Comprehensive ocean information-enabled AUV path planning via reinforcement learning[J]. IEEE Internet of Things Journal, 2022, 9(18):17440-17451.
[15]	GUO S Y, ZHANG X G, DU Y Q, et al. Path planning of coastal ships based on optimized DQN reward function[J]. Journal of Marine Science and Engineering, 2021, 9(2):210.
[16]	YANG X F, SHI Y L, LIU W, et al. Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle[J]. Ocean Engineering, 2022, 266: 112809.
[17]	MIAO R L, WANG L X, PANG S. Coordination of distributed unmanned surface vehicles via model-based reinforcement learning methods[J]. Applied Ocean Research, 2022, 122(14): 03106.
[18]	YANG C P, ZHAO Y Q, CAI X, et al. Path planning algorithm for unmanned surface vessel based on multiobjective reinforcement learning[J]. Computational Intelligence and Neuroscience, 2023, 2023: 2146314.
[19]	李涛, 彭耿, 刘磊. 远海目标散布模型研究及特性分析[J]. 战术导弹技术, 2021(3):20-27.
	LI T, PENG G, LIU L. Research and characteristic analysis of dispersion model of the offshore targets[J]. Tactical Missile Technology, 2021(3):20-27. (in Chinese)
[20]	崔东华, 纪秀美, 代志恒, 等. 基于瑞利-均匀分布的AUV应召搜索目标散布区域估计方法[J]. 水下无人系统学报, 2021, 29(5):580-585.
	CUI D H, JI X M, DAI Z H, et al. Method for estimating the target distribution area in auv call search based on a rayleigh-uniform distribution[J]. Journal of Underwater Unmanned Systems, 2021, 29(5):580-585. (in Chinese)
[21]	张宁, 寇小明, 李斌, 等. 基于遗传算法的应召搜潜路径优化[J]. 水下无人系统学报, 2023, 31(2):244-251.
	ZHANG N, KOU X M, LI B, et al. Route optimization of on call submarine search based on genetic algorithm[J]. Tactical Missile Technology, 2023, 31(2):244-251. (in Chinese)
[22]	YANG Y, LI J T, PENG L L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183.
[23]	YANG J C, NI J F, XI M, et al. Intelligent path planning of underwater robot based on reinforcement learning[J]. IEEE Transactions on Automation Science and Engineering, 2023, 20(3): 1983-1996.
[24]	BULUT V. Path planning for autonomous ground vehicles based on quintic trigonometric Bézier curve[J]. Journal of the Brazilian Society of Mechanical Sciences and Engineering, 2021, 43(2):1-14.
[25]	王冰晨, 连晓峰, 颜湘, 等. 基于深度Q网络和人工势场的移动机器人路径规划研究[J]. 计算机测量与控制, 2022, 30(11): 226-232,239.
	WANG B C, LIAN X F, YAN X, et al. Research on path planning of mobile robot based on deep Q-network and a.pngicial potential field[J]. Computer Measurement and Control, 2022, 30(11):226-232,239. (in Chinese)
[26]	许志远. 基于改进神经网络的船舶航行路径规划[J]. 舰船科学技术, 2022, 44(14):57-60.
	XU Z Y. Research on ship navigation path planning based on improved neural network[J]. Ship Science and Technology, 2022, 44 (14): 57-60. (in Chinese)
[27]	徐晗, 金隼, 罗磊, 等. 基于拓扑栅格建模的AGV路径规划算法优化[J]. 计算机工程与设计, 2022, 43(1):101-109.
	XU H, JIN S, LUO L, et al AGV path planning algorithm and optimizationbased on topological grid modeling[J]. Computer Engineering and Design, 2022, 43(1):101-109. (in Chinese)
[28]	SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. 2nd edition. Cambridge, MA, US: MIT Press, 2018.
[29]	张家闻, 房浩霖, 李家旺. 基于复杂约束条件的欠驱动AUV三维路径规划[J]. 兵工学报, 2022, 43(6): 1407-1414. doi: 10.12382/bgxb.2021.0340
	ZHANG J W, FANG H L, LI J W. 3D path planning of underactuated AUV based on complex constraints[J]. Acta Armamentarii, 2022, 43(6):1407-1414. (in Chinese) doi: 10.12382/bgxb.2021.0340
[30]	CAO H, ZOLDY M. Implementing B-spline path planning method based on roundabout geometry elements[J]. IEEE Access, 2022, 10:81434-81446.