基于归一化优势函数的强化学习混合动力履带车辆能量管理

doi:10.3969/j.issn.1000-1093.2021.10.011

摘要/Abstract

摘要： 基于强化学习的能量管理策略由于状态变量和控制变量的离散化，处理高维问题时存在“维数灾难”的困扰。针对此问题，提出一种基于归一化优势函数的深度强化学习能量管理算法。采用两个具有归一化优势函数的深度神经网络实现连续控制，消除离散化。在对串联式混合动力履带车辆动力总成建模的基础上，完成深度强化学习能量管理算法的框架搭建和参数的更新过程，并将其应用于串联式混合动力履带车辆。仿真结果表明，该算法能够输出更为细化的控制量以及更小的输出波动性，与深度Q学习算法相比，对于串联式混合动力履带车辆的燃油经济性提升了3.96%. 通过硬件在环仿真实验验证了强化学习能量管理算法的适应性，以及在实时控制环境下的优化效果。

关键词: 履带车辆, 能量管理策略, 归一化优势函数, 连续控制, 串联式混合动力, 硬件在环仿真

Abstract: The energy management strategy based on reinforcement learning encounters the problem of “dimension disaster”when dealing with high-dimensional problems because of the discretization of state and control variables. For this problem, a new energy management algorithm based on deep reinforcement learning with normalized advantage function is proposed, where two deep neural networks with normalized advantage function are used to realize the continuous control of energy and eliminate the discretization of state and control variables. Based on the modeling of powertrain of a series hybrid tracked vehicle, the framework of the proposed deep reinforcement learning algorithm was built and the parameter update process was completed for the series hybrid tracked vehicle. The simulated results show that the proposed algorithm can output more refined control quantity and less output fluctuation. Compared with the deep Q-learning algorithm, the proposed algorithm improves the fuel economy of series hybrid tracked vehicle by 3.96%. In addition, the adaptability of the proposed algorithm and the optimized effect in real-time control environment are verified by the hardware-in-the-loop simulation.

Key words: serieshybridtrackedvehicle, energymanagementstrategy, normalizedadvantagefunction, continuouscontrol, hardware-in-the-loopsimulation

中图分类号:

TJ810.2

邹渊，张彬，张旭东，赵志颖，康铁宇，郭玉枫，吴喆. 基于归一化优势函数的强化学习混合动力履带车辆能量管理[J]. 兵工学报, 2021, 42(10): 2159-2169.

ZOU Yuan, ZHANG Bin, ZHANG Xudong, ZHAO Zhiying, KANG Tieyu, GUO Yufeng, WU Zhe. Energy Management of Hybrid Tracked Vehicle Based on Reinforcement Learning with Normalized Advantage Function[J]. Acta Armamentarii, 2021, 42(10): 2159-2169.

参考文献

［1］邓启文, 刘书雷, 沈雪石. 无人装备发展新动向及影响研究[J]. 装备学院学报, 2016, 27(1): 76-79.
DENG Q W, LIU S L, SHEN X S. New trends and influence of unmanned equipment[J]. Journal of Equipment Academy, 2016, 27(1): 76-79. (in Chinese)
[2] 陈慧岩, 张玉. 军用地面无人机动平台技术发展综述[J]. 兵工学报, 2014, 35(10): 1696-1706.
CHEN H Y, ZHANG Y. An overview of research on military unmanned ground vehicles[J]. Acta Armamentarii, 2014, 35(10): 1696-1706. (in Chinese)
[3] 孟红, 朱森. 地面无人系统的发展及未来趋势[J]. 兵工学报, 2014, 35(增刊1): 1-7.
MENG H, ZHU S. The development and future trends of unmanned ground systems[J]. Acta Armamentarii, 2014, 35(S1): 1-7. (in Chinese)
[4] 邹渊, 焦飞翔, 崔星, 等.地面无人平台动力源集成技术发展综述[J].兵工学报, 2020, 41(10): 2131-2144.
ZOU Y, JIAO F X, CUI X, et al. A review on power source technology of unmanned ground vehicles[J]. Acta Armamentarii, 2020, 41(10): 2131-2144. (in Chinese)
[5] FARAJ M, BASIR O. Range anxiety reduction in battery-powered vehicles[C]∥Proceedings of 2016 IEEE Transportation Electrification Conference and Expo. Dearborn, MI, US:IEEE, 2016:16176997.
[6] CAUX S, GAOUA Y, LOPEZ P. A combinatorial optimisation approach to energy management strategy for a hybrid fuel cell vehicle[J]. Energy, 2017, 133(15):219-230.
[7] 张卫青. 混合动力汽车的发展现状及其关键技术 [J]. 重庆理工大学学报, 2006, 20(5): 19-22.
ZHANG W Q. Research actuality and key technologies of hybrid electric vehicle[J].Journal of Chongqing Institute of Technology, 2006, 20(5): 19-22. (in Chinese)
[8] PENG J K, HE H W, XIONG R. Rule based energy management strategy for a series-parallel plug-in hybrid electric bus optimized by dynamic programming[J]. Applied Energy, 2016, 185(Part 2): 1633-1643.
[9] LIU T, HU X S. A bi-level control for energy efficiency improvement of a hybrid tracked vehicle[J]. IEEE Transactions on Industrial Informatics, 2018, 14(4):1616-1625.
[10] VAGG C, AKEHURST S, BRACE C J, et al. Stochastic dyna-mic programming in the real-world control of hybrid electric vehicles[J]. IEEE Transactions on Control Systems Technology, 2016, 24(3):853-866.
[11] WIRASINFHA S G, EMADI A. Classification and review of control strategies for plug-in hybrid electric vehicles[C]∥Proceedings of 2009 IEEE Vehicle Power and Propulsion Conference. Dearborn, MI, US:IEEE, 2009:907-914.
[12] ZHENG C H, LI W M, LIANG Q. An energy management stra-tegy of hybrid energy storage systems for electric vehicle applications[J]. IEEE Transactions on Sustainable Energy, 2018, 9(4): 1880-1888.
[13] ENANG W, BANNISTER C. Modelling and control of hybrid electric vehicles (a comprehensive review) [J]. Renewable and Sustainable Energy Reviews, 2017, 74(7):1210-1239.
[14] CHEN Z, ZHANG X, MI C C. Slide mode and fuzzy logic based powertrain controller for the energy management and battery lifetime extension of series hybrid electric vehicles [J]. Journal of Asian Electric Vehicles, 2010, 8(2): 1425-1432.
[15] HUANG Y J, WANG H, KHAJEPOUR A, et al. Model predictive control power management strategies for HEVs: a review [J]. Journal of Power Sources, 2017, 341(2):91-106.
[16] SCIARRETTA A, BACK M, GUZZELLA L. Optimal control of parallel hybrid electric vehicles [J]. IEEE Transactions on Control Systems Technology, 2004, 12(3): 352-363.
[17] XIANG C L, DING F, WANG W D, et al. MPC-based energy management with adaptive Markov-chain prediction for a dual-mode hybrid electric vehicle [J]. SCIENCE CHINA Technological Sciences, 2017, 60(5): 737-748.
[18] QI X W, LUO Y D, WU G Y, et al. Deep reinforcement learning-based vehicle energy efficiency autonomous learning system [C]∥Proceedings of IEEE Intelligent Vehicles Symposium. Los Angeles, CA, US:IEEE, 2017:1228-1233.
[19] WU J D, HE H, PENG J, et al. Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus [J]. Applied Energy, 2018, 222: 799-811.
[20] 孙逢春, 张承宁. 装甲车辆混合动力电传动技术 [M]. 北京:国防工业出版社, 2008:288-301.
SUN F C, ZHANG C N. Technologies for the hybrid electric drive system of armored vehicle[M]. Beijing:National Defense Industry Press,2008:288-301. (in Chinese)

[1]	袁艺，盖江涛，曾根，周广明，李训明，马长军. 高速履带车辆横摆运动响应特性分析与试验验证[J]. 兵工学报, 2024, 45(4): 1094-1107.
[2]	李欢欢, 刘辉, 盖江涛, 李训明. 基于粒子群优化算法PID参数优化的双电机耦合驱动履带车辆转向控制[J]. 兵工学报, 2024, 45(3): 916-924.
[3]	王绪, 李睿, 黄英, 沈继伟, 商显赫. 考虑不同路面特征的军用履带车辆循环工况构建[J]. 兵工学报, 2024, 45(3): 907-915.
[4]	刘佳, 刘海鸥, 陈慧岩, 毛飞鸿. 基于融合特征的无人履带车辆道路类型识别方法[J]. 兵工学报, 2023, 44(5): 1267-1276.
[5]	卢佳兴, 刘海鸥, 关海杰, 李德润, 陈慧岩, 刘龙龙. 基于双参数自适应优化的无人履带车辆轨迹跟踪控制[J]. 兵工学报, 2023, 44(4): 960-971.
[6]	曾子豪, 张京东, 龚雪莲, 刘坤明, 桂学文, 廖日东. 拉伸载荷下双销式履带板强度计算方法[J]. 兵工学报, 2023, 44(3): 831-840.
[7]	生辉, 项昌乐, 盖江涛, 袁艺, 简洪超, 张楠. 双侧电机耦合驱动履带车辆单侧电机故障模式下车辆安全控制[J]. 兵工学报, 2023, 44(11): 3498-3507.
[8]	陶俊峰, 刘海鸥, 关海杰, 陈慧岩, 臧政. 基于可通行度估计的无人履带车辆路径规划[J]. 兵工学报, 2023, 44(11): 3320-3332.
[9]	张发平, 张书畅, 武锴, 张云贺, 阎艳. 基于代理模型进化的履带车辆动力学参数优化[J]. 兵工学报, 2023, 44(1): 27-39.
[10]	周铖, 罗杨, 魏江, 曹宏瑞, 兰海, 张万昊. 履带车辆制动器扭振信号瞬时频率特征提取方法研究[J]. 兵工学报, 2023, 44(1): 316-324.
[11]	袁艺, 盖江涛, 周广明, 高秀才, 李训明, 马长军. 高速电驱动履带车辆操纵特性分析[J]. 兵工学报, 2023, 44(1): 203-213.
[12]	帅志斌, 贺帅, 李国辉, 李耀恒, 李勇, 张颖, 简洪超. 特种履带车辆机电复合传动装置低温启动过程建模与优化控制[J]. 兵工学报, 2023, 44(1): 117-128.
[13]	唐泽月, 刘海鸥, 薛明轩, 陈慧岩, 龚小杰, 陶俊峰. 基于MPC-MFAC的双侧独立电驱动无人履带车辆轨迹跟踪控制[J]. 兵工学报, 2023, 44(1): 129-139.
[14]	张伟，刘辉，韩立金，刘宝帅，张勋，张万年. 混合动力履带车辆机电联合制动控制[J]. 兵工学报, 2022, 43(5): 969-981.
[15]	王博洋, 关海杰, 龚建伟, 陈慧岩, 赵卉菁. 面向异构履带车辆的统一运动规划方法[J]. 兵工学报, 2022, 43(2): 241-251.