南京理工大学 自动化学院,江苏 南京 210094
*通信作者邮箱:jiezhang_njust@163.com
收稿:2025-06-03,
网络首发:2026-02-11,
纸质出版:2026-01-31
移动端阅览
陈尧伟, 吕明, 张捷. 基于深度强化学习与非线性模型预测方法的四足机器人运动控制[J]. 兵工学报, 2026,47(1):250565.
CHEN Yaowei, LÜ Ming, ZHANG Jie. Motion Control of Quadruped Robot Based on Deep Reinforcement Learning and Nonlinear Model Predictive Control Technique[J]. Acta Armamentarii, 2026, 47(1): 250565.
陈尧伟, 吕明, 张捷. 基于深度强化学习与非线性模型预测方法的四足机器人运动控制[J]. 兵工学报, 2026,47(1):250565. DOI: 10.12382/bgxb.2025.0565.
CHEN Yaowei, LÜ Ming, ZHANG Jie. Motion Control of Quadruped Robot Based on Deep Reinforcement Learning and Nonlinear Model Predictive Control Technique[J]. Acta Armamentarii, 2026, 47(1): 250565. DOI: 10.12382/bgxb.2025.0565.
在四足机器人运动控制中,针对传统基于非线性模型预测控制(Nonlinear Model Predictive Control
NMPC)轨迹跟踪控制器对权重参数设计高度敏感的问题,提出一种结合深度强化学习(Deep Reinforcement Learning
DRL)与NMPC的控制方法。基于四足机器人的单刚体动力学模型,设计了参数化的NMPC轨迹跟踪控制器,用以生成最优的地面反作用力;通过训练策略网络输出控制器的决策变量,实现对NMPC轨迹跟踪控制器的动态调整。为提升学习效率,提出改进的近端策略优化算法。仿真实验结果验证了新算法在学习效率上的有效性。仿真结果表明,与传统控制方法相比,新提出的DRL-NMPC方法提升了四足机器人的轨迹跟踪能力和抗干扰能力。
The traditional nonlinear model predictive control (NMPC )-based trajectory tracking controllers is highly sensitive to the design of weight parameters in the motion control of quadruped robot. A deep reinforcement learning (DRL )-NMPC method is proposed for the motion control of quadruped robot. To generate optimal ground reaction forces
a parameterized NMPC trajectory tracking controller is designed based on the single rigid body dynamics model of the quadruped robot. A policy network is trained to output the decision variables for dynamically adjusting the NMPC controller. An improved proximal policy optimization (PPO) algorithm is introduced to improve learning efficiency. The effectiveness of the proposed algorithm in terms of improving learning efficiency is validated through simulation experiment. The results demonstrate that the proposed DRL-NMPC method improves the trajectory tracking performance and disturbance rejection capability of quadruped robot compared with traditional control methods.
许鹏,赵建新,范文慧,等.四足机器人特定复杂运动技能控制[J].兵工学报,2023,44(增刊2):135-145.
XU P, ZHAO J X, FAN W H, et al.Specific complex locomotion skills control for quadruped robots[J].Acta Armamentarii,2023, 44(S2):135-145.(in Chinese)
许鹏,邢伯阳,刘宇飞,等.基于扩张状态观测器和模型预测方法的四足机器人抗干扰复合控制[J].兵工学报,2023, 44(增刊2):12-21.
XU P, XING B Y, LIU Y F, et al.Anti-disturbance composite controller design of quadruped robot based on extended state observer and model predictive control technique [J].Acta Armamentarii,2023,44(S2):12-21.(in Chinese)
YUAN K, LI Z B. An improved formulation for model predictive control of legged robots for gait planning and feedback control[C]∥Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington, D. C. ,US:IEEE,2018:1-9.
XU S H, ZHU L J, ZHANG H T, et al. Robust convex model predictive control for quadruped locomotion under uncertainties [J]. IEEE Transactions on Robotics,2023,39(6):4837-4854.
LIU K G, DONG L J, TAN X, et al. Optimization-based flocking control and MPC-based gait synchronization control for multiple quadruped robots [J]. IEEE Robotics and Automation Letters, 2024,9(2):1929-1936.
CHI W C, JIANG X Y, ZHENG Y. A linearization of centroidal dynamics for the model-predictive control of quadruped robots[C]∥Proceedings of 2022 International Conference on Robotics and Automation. Washington, D. C. ,US:IEEE,2022:4656-4663.
YANG Q J, LI C F, ZHU R, et al. Push recovery control based on model predictive control of hydraulic quadruped robots[J]. Journal of Intelligent & Robotic Systems,2023,109(2):41.
SONG Y L, SCARAMUZZA D. Policy search for model predictive control with application to agile drone flight[J]. IEEE Transactions on Robotics,2022,38(4):2114-2130.
LEE J, HWANGBO J, WELLHAUSEN L, et al. Learning quadrupedal locomotion over challenging terrain [J]. Science Robotics,2020,5(47):eabc5986.
HWANGBO J, LEE J, DOSOVITSKIY A, et al. Learning agile and dynamic motor skills for legged robots [J]. Science Robotics, 2019,4(26):eaau5872.
GONG Y K, HARTLEY R, DA X Y, et al. Feedback control of a cassie bipedal robot:Walking, standing, and riding a segway[C]∥Proceedings of 2019 American Control Conference. Washington, D. C. ,US:IEEE,2019:4559-4566.
LI Z Y, CUMMINGS C, SREENATH K. Animated cassie: a dynamic relatable robotic character[C]∥Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington, D. C. ,US:IEEE,2020:3739-3746.
SHIRWATKAR A, KURVA V K, VINODA D, et al. Force control for robust quadruped locomotion:a linear policy approach[C]∥Proceedings of 2023 IEEE International Conference on Robotics and Automation. Washington, D. C. , US: IEEE, 2023: 5113 -5119.
YOUM D, JUNG H, KIM H, et al. Imitating and finetuning model predictive control for robust and symmetric quadrupedal locomotion[J]. IEEE Robotics and Automation Letters, 2023, 8(11):7799-7806.
ZHANG Z T, AN H L, WEI Q, et al. Learning-based model predictive control for quadruped locomotion on slippery ground [C]∥Proceedings of the 2022 4th International Conference on Control and Robotics. Washington, D. C. , US: IEEE, 2022:47-52.
HOU L D, LI B, LIU W L, et al. Deep reinforcement learning for model predictive controller based on disturbed single rigid body model of biped robots[J].Machines,2022,10(11):975.
ZHANG Z T, CHANG X, MA H X, et al. Model predictive control of quadruped robot based on reinforcement learning[J]. Applied Sciences,2022,13(1):154.
SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL].(2017-08-28)[2025-06-25]. https:∥arxiv. org/abs/1707. 06347.
HESSEL M, DANIHELKA I, VIOLA F, et al. Muesli:combining improvements in policy optimization [C]∥Proceedings of International Conference on Machine Learning. New York, NY, US:PMLR,2021:4214-4226.
ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: a case study on ppo and trpo [DB/OL].(2020-05-25)[2025-06-25]. https:∥arxiv. org/abs/2005. 12729.
WANG Y H, HE H, WEN C, et al. Truly proximal policy optimization:arXiv: 1903. 07940 [R]. Ithaca, NY, US: Cornell University,2019:1903. 07940.
CHEN Y W, ZHANG J, LÜ M, et al. Real-time robust nonlinear model predictive control with monotonically increasing weight for quadruped locomotion[J]. Nonlinear Dynamics,2025,113(7):6795-6813.
RATHOD N, BRATTA A, FOCCHI M, et al. Model predictive control with environment adaptation for legged locomotion [J]. IEEE Access,2021,9:145710-145727.
SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]∥Proceedings of International Conference on Machine Learning. New York, NY, US:PMLR,2015:1889-1897.
JAYANT A K, BHATNAGAR S. Model-based safe deep reinforcement learning via a constrained proximal policy optimization algorithm [J]. Advances in Neural Information Processing Systems,2022,35:24432-24445.
VAN H G, MOSQUERA C, NÁPOLES G. A review on the long short-term memory model [J]. Artificial Intelligence Review, 2020,53(8):5929-5955.
BONASSI F, FARINA M, SCATTOLINI R. On the stability properties of gated recurrent units neural networks[J]. Systems &Control Letters,2021,157:105049.
CARROLL M, LIU Z C, KASAEI M, et al. Agile and versatile robot locomotion via kernel-based residual learning[C]∥Proceedings of 2023 IEEE International Conference on Robotics and Automation. London, UK:IEEE,2023:5148-5154.
JI G, MUN J, KIM H, et al. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion [J]. IEEE Robotics and Automation Letters,2022,7(2):4630-4637.
WÄCHTER A, BIEGLER L T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming[J]. Mathematical Programming,2006,106:25-57.
FERREAU H J, KIRCHES C, POTSCHKA A, et al. qpOASES:a parametric active-set algorithm for quadratic programming[J]. Mathematical Programming Computation,2014,6:327-363.
STELLATO B, BANJAC G, GOULART P, et al. OSQP: an operator splitting solver for quadratic programs[J]. Mathematical Programming Computation,2020,12(4):637-672.
BISHOP A L, ZHANG J Z, GURUMURTHY S, et al. ReLU-QP:a GPU-accelerated quadratic programming solver for modelpredictive control: arXiv: 2311. 18056 [R]. Ithaca, NY, US:Cornell University,2023:2311. 18056.
ZHOU Z H, SONG J Y, XIE X, et al. Towards building AI-CPS with NVIDIA Isaac Sim:an industrial benchmark and case study for robotics manipulation [C]∥Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice. Lisbon, Portugal:IEEE,2024:263-274.
MITTAL M, YU C, YU Q X, et al. Orbit: a unified simulation framework for interactive robot learning environments[J]. IEEE Robotics and Automation Letters,2023,8(6):3740-3747.
0
浏览量
26
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号