欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2021, Vol. 42 ›› Issue (10): 2159-2169.doi: 10.3969/j.issn.1000-1093.2021.10.011

• 论文 • 上一篇    下一篇

基于归一化优势函数的强化学习混合动力履带车辆能量管理

邹渊1, 张彬1, 张旭东1, 赵志颖2, 康铁宇2, 郭玉枫2, 吴喆1   

  1. (1.北京理工大学 机械与车辆学院, 北京 100081;2.北京北方车辆集团有限公司, 北京 100072)
  • 上线日期:2021-11-03
  • 作者简介:邹渊(1976—), 男, 教授, 博士生导师。 E-mail: zouyuanbit@vip.163.com;
    张彬(1990—), 男, 博士研究生。 E-mail: 18811528189@163.com
  • 基金资助:
    国家自然科学基金项目(51775039)

Energy Management of Hybrid Tracked Vehicle Based on Reinforcement Learning with Normalized Advantage Function

ZOU Yuan1, ZHANG Bin1, ZHANG Xudong1, ZHAO Zhiying2, KANG Tieyu2, GUO Yufeng2, WU Zhe1   

  1. (1.School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China; 2.Beijing North Vehicle Group Corporation, Beijing 100072, China)
  • Online:2021-11-03

摘要: 基于强化学习的能量管理策略由于状态变量和控制变量的离散化,处理高维问题时存在“维数灾难”的困扰。针对此问题,提出一种基于归一化优势函数的深度强化学习能量管理算法。采用两个具有归一化优势函数的深度神经网络实现连续控制,消除离散化。在对串联式混合动力履带车辆动力总成建模的基础上,完成深度强化学习能量管理算法的框架搭建和参数的更新过程,并将其应用于串联式混合动力履带车辆。仿真结果表明,该算法能够输出更为细化的控制量以及更小的输出波动性,与深度Q学习算法相比,对于串联式混合动力履带车辆的燃油经济性提升了3.96%. 通过硬件在环仿真实验验证了强化学习能量管理算法的适应性,以及在实时控制环境下的优化效果。

关键词: 履带车辆, 能量管理策略, 归一化优势函数, 连续控制, 串联式混合动力, 硬件在环仿真

Abstract: The energy management strategy based on reinforcement learning encounters the problem of “dimension disaster”when dealing with high-dimensional problems because of the discretization of state and control variables. For this problem, a new energy management algorithm based on deep reinforcement learning with normalized advantage function is proposed, where two deep neural networks with normalized advantage function are used to realize the continuous control of energy and eliminate the discretization of state and control variables. Based on the modeling of powertrain of a series hybrid tracked vehicle, the framework of the proposed deep reinforcement learning algorithm was built and the parameter update process was completed for the series hybrid tracked vehicle. The simulated results show that the proposed algorithm can output more refined control quantity and less output fluctuation. Compared with the deep Q-learning algorithm, the proposed algorithm improves the fuel economy of series hybrid tracked vehicle by 3.96%. In addition, the adaptability of the proposed algorithm and the optimized effect in real-time control environment are verified by the hardware-in-the-loop simulation.

Key words: serieshybridtrackedvehicle, energymanagementstrategy, normalizedadvantagefunction, continuouscontrol, hardware-in-the-loopsimulation

中图分类号: