北京理工大学 机械与车辆学院,北京 100081
中国北方车辆研究所 槐树岭实验室,北京 100072
通信作者邮箱:lixueyuan@bit.edu.cn
收稿:2025-01-08,
网络首发:2025-12-25,
纸质出版:2026-02-28
移动端阅览
杨帆, 李雪原, 杜明刚, 等. Safe-DRL:无人平台安全深度强化学习决策算法[J]. 兵工学报, 2026,47(2):250030.
YANG Fan, LI Xueyuan, DU Minggang, et al. Safe-DRL: A Safety-conscious Deep Reinforcement Learning Decision-making Algorithm for Unmanned Platforms[J]. Acta Armamentarii, 2026, 47(2): 250030.
杨帆, 李雪原, 杜明刚, 等. Safe-DRL:无人平台安全深度强化学习决策算法[J]. 兵工学报, 2026,47(2):250030. DOI: 10.12382/bgxb.2025.0030.
YANG Fan, LI Xueyuan, DU Minggang, et al. Safe-DRL: A Safety-conscious Deep Reinforcement Learning Decision-making Algorithm for Unmanned Platforms[J]. Acta Armamentarii, 2026, 47(2): 250030. DOI: 10.12382/bgxb.2025.0030.
针对传统深度强化学习(Deep Reinforcement Learning,DRL)在推理过程中存在不可预测行为带来的安全性问题,本文提出了一种面向无人平台自动驾驶的多任务场景安全DRL算法。该算法基于改进的马尔可夫过程,引入动作判定网络以实现预执行安全评估,采用并行双线程网络结构有效抑制危险驾驶行为,并结合运动学特性设计了新型奖励函数,以兼顾驾驶安全性与效率。在highway-env环境下,所提算法在单行道、十字路口和环岛三种典型驾驶场景中进行了对比实验。结果表明,该算法显著提升了驾驶安全性和泛化能力,有效支持无人平台在远程部署、物资运输及区域渗透等自动驾驶任务中的应用需求。
To address the safety issue caused by unpredictable behaviors in traditional deep reinforcement learning(DRL)during inference
this paper proposes a safety-enhanced deep reinforcement learning(DRL)algorithm for autonomous driving in unmanned platforms across multi-task scenarios. The algorithm integrates an improved Markov process with an action recognition network for pre-execution safety assessment
and adopts a parallel dual-thread network architecture to suppress hazardous driving behaviors. Additionally
a novel kinematics-based reward function is designed to take into account driving safety and efficiency. In the highway-env environment
a comparative experiment is conducted on the proposed algorithm in three typical driving scenarios—single-lane roads
intersections
and roundabouts. It is shown that the proposed algorithm significantly improves driving safety and generalization capability. The results verify its effectiveness and potential for supporting the application of unmanned platform in remote deployment
cargo transportation
and regional penetration.
WANG H J, YU Y, YUAN Q B. Application of Dijkstra algorithm in robot path-planning[C]∥Proceedings of the 2011 Second International Conference on Mechanic Automation and Control Engineering. Hohhot, China: IEEE, 2011: 1067-1069.
HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4 (2):100-107.
WARREN C W. Global path planning using artificial potential fields[C]∥Proceedings of 1989 IEEE International Conference on Robotics and Automation. Scottsdale, AZ, US : IEEE, 1989:316-321.
KARAMAN S, WALTER M R, PEREZ A, et al. Anytime motion planning using the RRT * [C ] ∥ Proceedings of 2011 IEEE International Conference on Robotics and Automation. Shanghai, China:IEEE, 2011: 1478-1483.
BOHLIN R, KAVRAKI L E. Path planning using lazy PRM[C]∥Proceedings 2000 IEEE International Conference on Robotics and Automation. San Francisco, CA, US: IEEE, 2000: 521-528.
李伟东,马草原,史浩,等.基于分层强化学习的自动驾驶决策控制算法[J/OL].吉林大学学报(工学版), 2023(2023-12-19). https://doi.org/10.13229/j.cnki.jdxbgxb.20230891.
LIWD,MA C Y, SHIH, et al. Decision control algorithm for autonomous driving based on hierarchical reinforcement learning[J/OL]. Journal of Jilin University (Engineering and Technology Edition),2023(2023-12-19). https://doi.org/10.13229/j.cnki.jdxbgxb.20230891. (in Chinese)
杨伟达,吴志周,梁韵逸.基于循环图注意力强化学习的交叉口多车协同控制方法[J/OL].计算机工程与应用, 2024 (2024-10-28).http://kns.cnki.net/kcms/detail/11.2127.tp. 20241025.1529.015.html.
YANG W D, WU Z Z, LIANG Y Y. Acooperative control method for multi-vehicle at intersections based on recurrent graph attention reinforcement learning[J]. Computer Engineering and Applications, 2024(2024-10-28). http://kns.cnki.net/kcms/detail/11.2127.tp.20241025.1529.015.html. (in Chinese)
陈菁,赵聪,马裕城,等.车路云一体化架构下面向舒适性提升的自动驾驶速度智能决策方法[J].中国公路学报, 2025, 38(2):243-257.
CHEN J, ZHAO C, MA Y C, et al. Intelligent speed decision-making method for autonomous driving aiming at comfort improvement under the vehicle-road-cloud integrated architecture[J]. China Journal of Highway and Transport, 2025, 38 (2):243-257. (in Chinese)
ZHANG Y X, LIANG X L, LI D Y, et al. Adaptive safe reinforcement learning with full-state constraints and constrained adaptation for autonomous vehicles[J]. IEEE Transactions on Cybernetics, 2024,54(3):1907-1920.
董明泽,温庄磊,陈锡爱,等.安全凸空间与深度强化学习结合的机器人导航方法[J].兵工学报, 2024, 45 (12):4372-4382.
DONG M Z, WEN Z L, CHEN X A, et al. Robot navigation method combining safe convex space and deep reinforcement learning[J]. Acta Armamentarii, 2024, 45(12): 4372-4382. (in Chinese)
李松,麻壮壮,张蕴霖,等.基于安全强化学习的多智能体覆盖路径规划[J].兵工学报, 2023,44(增刊2):101-113.
LI S, MA Z Z, ZHANG Y L, et al. Coverage path planning for multi-agent systems based on safe reinforcement learning[J]. Acta Armamentarii, 2023, 44(S2) : 101-113. (in Chinese)
李小华,刘莹,邹嵩楠.基于可变障碍函数和强化学习的预设性能最优安全跟踪控制[J].控制与决策, 2025, 40(3):803-812.
LI X H, LIU Y, ZOU S N. Optimal safe tracking control with preset performance based on variable barrier function and reinforcement learning[J]. Control and Decision,2025,40(3):803-812. (in Chinese)
耿小虎,付尧,王杰,等.考虑不同时域的商用车预见性巡航控制[J].汽车工程, 2024,46(11):2046-2058.
GENG X H, FU Y, WANG J, et al. Predictive cruise control for commercial vehicles considering different time domains[J]. Automotive Engineering, 2024, 46 (11): 2046-2058. (in Chinese)
CANDELA E, DOUSTALY O, PARADA L, et al. Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning[J]. Artificial Intelligence,2023,320:103923.
WANG C Y, WANG L H, LU Z M, et al. SRL-TR2: a safe reinforcement learning based trajectory tracker framework[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6):5765-5780.
HE X K, HUANG W H, LV C. Toward trustworthy decision-making for autonomous vehicles: a robust reinforcement learning approach with safety guarantees[J]. Engineering, 2024, 33:77-89.
NIE X T, LIANG Y P, OHKURA K. Autonomous highway driving using reinforcement learning with safety check system based on time-to-collision[J]. Artificial Life and Robotics, 2023,28(1):158-165.
HU Y F, FU J J, WEN G H. Safe reinforcement learning for model-reference trajectory tracking of uncertain autonomous vehicles with model-based acceleration[J]. IEEE Transactions on Intelligent Vehicles, 2023,8(3):2332-2344.
SELVARAJ D C, HEGDE S, AMATI N, et al. A deep reinforcement learning approach for efficient, safe and comfortable driving[J]. Applied Sciences, 2023, 13 (9) :5272.
杨硕,李时珍,赵中原,等.基于时序差分学习模型预测控制的一体化自动驾驶换道策略[J].机械工程学报, 2024, 60(10):329-338.
YANG S, LI S Z, ZHAO Z Y, et al. Integrated autonomous driving lane change policy based on temporal difference learning model predictive control[J]. Journal of Mechanical Engineering, 2024, 60(10): 329-338. (in Chinese)
WANG K, MU C X, NI Z, et al. Safe reinforcement learning and adaptive optimal control with applications to obstacle avoidance problem[J]. IEEE Transactions on Automation Science and Engineering, 2024,21(3):4599-4612.
LEURENT E. An Environment for autonomous driving decision-making[EB/OL]. GitHub repository, 2018[2025-07-15]. https://github.com/eleurent/highway-env.
0
浏览量
35
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号