YANG Fan, LI Xueyuan, DU Minggang, et al. Safe-DRL: A Safety-conscious Deep Reinforcement Learning Decision-making Algorithm for Unmanned Platforms[J]. Acta Armamentarii, 2026, 47(2): 250030.
DOI:
YANG Fan, LI Xueyuan, DU Minggang, et al. Safe-DRL: A Safety-conscious Deep Reinforcement Learning Decision-making Algorithm for Unmanned Platforms[J]. Acta Armamentarii, 2026, 47(2): 250030. DOI: 10.12382/bgxb.2025.0030.
Safe-DRL: A Safety-conscious Deep Reinforcement Learning Decision-making Algorithm for Unmanned Platforms
To address the safety issue caused by unpredictable behaviors in traditional deep reinforcement learning(DRL)during inference
this paper proposes a safety-enhanced deep reinforcement learning(DRL)algorithm for autonomous driving in unmanned platforms across multi-task scenarios. The algorithm integrates an improved Markov process with an action recognition network for pre-execution safety assessment
and adopts a parallel dual-thread network architecture to suppress hazardous driving behaviors. Additionally
a novel kinematics-based reward function is designed to take into account driving safety and efficiency. In the highway-env environment
a comparative experiment is conducted on the proposed algorithm in three typical driving scenarios—single-lane roads
intersections
and roundabouts. It is shown that the proposed algorithm significantly improves driving safety and generalization capability. The results verify its effectiveness and potential for supporting the application of unmanned platform in remote deployment
cargo transportation
and regional penetration.
关键词
Keywords
references
WANG H J, YU Y, YUAN Q B. Application of Dijkstra algorithm in robot path-planning[C]∥Proceedings of the 2011 Second International Conference on Mechanic Automation and Control Engineering. Hohhot, China: IEEE, 2011: 1067-1069.
HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4 (2):100-107.
WARREN C W. Global path planning using artificial potential fields[C]∥Proceedings of 1989 IEEE International Conference on Robotics and Automation. Scottsdale, AZ, US : IEEE, 1989:316-321.
KARAMAN S, WALTER M R, PEREZ A, et al. Anytime motion planning using the RRT * [C ] ∥ Proceedings of 2011 IEEE International Conference on Robotics and Automation. Shanghai, China:IEEE, 2011: 1478-1483.
BOHLIN R, KAVRAKI L E. Path planning using lazy PRM[C]∥Proceedings 2000 IEEE International Conference on Robotics and Automation. San Francisco, CA, US: IEEE, 2000: 521-528.
LIWD,MA C Y, SHIH, et al. Decision control algorithm for autonomous driving based on hierarchical reinforcement learning[J/OL]. Journal of Jilin University (Engineering and Technology Edition),2023(2023-12-19). https://doi.org/10.13229/j.cnki.jdxbgxb.20230891. (in Chinese)
YANG W D, WU Z Z, LIANG Y Y. Acooperative control method for multi-vehicle at intersections based on recurrent graph attention reinforcement learning[J]. Computer Engineering and Applications, 2024(2024-10-28). http://kns.cnki.net/kcms/detail/11.2127.tp.20241025.1529.015.html. (in Chinese)
CHEN J, ZHAO C, MA Y C, et al. Intelligent speed decision-making method for autonomous driving aiming at comfort improvement under the vehicle-road-cloud integrated architecture[J]. China Journal of Highway and Transport, 2025, 38 (2):243-257. (in Chinese)
ZHANG Y X, LIANG X L, LI D Y, et al. Adaptive safe reinforcement learning with full-state constraints and constrained adaptation for autonomous vehicles[J]. IEEE Transactions on Cybernetics, 2024,54(3):1907-1920.
DONG M Z, WEN Z L, CHEN X A, et al. Robot navigation method combining safe convex space and deep reinforcement learning[J]. Acta Armamentarii, 2024, 45(12): 4372-4382. (in Chinese)
LI S, MA Z Z, ZHANG Y L, et al. Coverage path planning for multi-agent systems based on safe reinforcement learning[J]. Acta Armamentarii, 2023, 44(S2) : 101-113. (in Chinese)
LI X H, LIU Y, ZOU S N. Optimal safe tracking control with preset performance based on variable barrier function and reinforcement learning[J]. Control and Decision,2025,40(3):803-812. (in Chinese)
GENG X H, FU Y, WANG J, et al. Predictive cruise control for commercial vehicles considering different time domains[J]. Automotive Engineering, 2024, 46 (11): 2046-2058. (in Chinese)
CANDELA E, DOUSTALY O, PARADA L, et al. Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning[J]. Artificial Intelligence,2023,320:103923.
WANG C Y, WANG L H, LU Z M, et al. SRL-TR2: a safe reinforcement learning based trajectory tracker framework[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6):5765-5780.
HE X K, HUANG W H, LV C. Toward trustworthy decision-making for autonomous vehicles: a robust reinforcement learning approach with safety guarantees[J]. Engineering, 2024, 33:77-89.
NIE X T, LIANG Y P, OHKURA K. Autonomous highway driving using reinforcement learning with safety check system based on time-to-collision[J]. Artificial Life and Robotics, 2023,28(1):158-165.
HU Y F, FU J J, WEN G H. Safe reinforcement learning for model-reference trajectory tracking of uncertain autonomous vehicles with model-based acceleration[J]. IEEE Transactions on Intelligent Vehicles, 2023,8(3):2332-2344.
SELVARAJ D C, HEGDE S, AMATI N, et al. A deep reinforcement learning approach for efficient, safe and comfortable driving[J]. Applied Sciences, 2023, 13 (9) :5272.
YANG S, LI S Z, ZHAO Z Y, et al. Integrated autonomous driving lane change policy based on temporal difference learning model predictive control[J]. Journal of Mechanical Engineering, 2024, 60(10): 329-338. (in Chinese)
WANG K, MU C X, NI Z, et al. Safe reinforcement learning and adaptive optimal control with applications to obstacle avoidance problem[J]. IEEE Transactions on Automation Science and Engineering, 2024,21(3):4599-4612.
LEURENT E. An Environment for autonomous driving decision-making[EB/OL]. GitHub repository, 2018[2025-07-15]. https://github.com/eleurent/highway-env.