基于积分强化学习的四旋翼无人机鲁棒跟踪

doi:10.12382/bgxb.2022.1051

摘要/Abstract

摘要：

针对系统模型动态不确定和受外部干扰的四旋翼无人机位置轨迹跟踪控制问题,提出一种新的基于积分强化学习的鲁棒轨迹跟踪控制方法。建立四旋翼无人机原系统与参考轨迹的增广系统,将四旋翼无人机的鲁棒轨迹跟踪问题转化为镇定问题。通过使用带有折扣因子的价值函数,将无人机增广系统的鲁棒镇定问题转化为四旋翼无人机的最优控制问题,从而兼顾到四旋翼无人机的跟踪误差和整体性能。基于积分强化学习方法,构建了单网络演员-评论家结构对最优价值函数进行估计,进而实现对四旋翼无人机控制器的在线求解。对四旋翼无人机系统跟踪误差的稳定性及单网络结构权值的收敛性进行了严格的数学证明,仿真结果验证了所设计控制方案的优越性和鲁棒性。

关键词: 四旋翼无人机, 鲁棒跟踪控制, 积分强化学习, 最优控制, 不确定性

Abstract:

A novel robust trajectory tracking control method based on integral reinforcement learning is proposed for the quadrotor UAV position trajectory tracking control with uncertain system model dynamics and external disturbances. Firstly, an augmented system of the original system and reference trajectory of the quadrotor UAV is established to transform the robust trajectory tracking problem of the quadrotor UAV into a sedimentation problem. By using the value function with discount factor, the robust calming problem of the UAV augmented system is transformed into an optimal control problem, taking into account the tracking errors and the overall performance of the quadrotor UAV. Then, based on the integral reinforcement learning method, a single network actor-critic structure is developed to estimate the optimal value function and online solution for the quadrotor UAV controller. Finally, the stability of the quadrotor UAV system tracking errors and the convergence of the single network structure weights are rigorously demonstrated mathematically, and the simulation results verify the superiority and robustness of the proposed control scheme.

Key words: quadrotor unmanned aerial vehicle, robust tracking control, integral reinforcement learning, optimal control, uncertainties

中图分类号:

TJ301

杨加秀, 李新凯, 张宏立, 王昊. 基于积分强化学习的四旋翼无人机鲁棒跟踪[J]. 兵工学报, 2023, 44(9): 2802-2813.

YANG Jiaxiu, LI Xinkai, ZHANG Hongli, WANG Hao. Robust Tracking of Quadrotor UAVs Based on Integral Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(9): 2802-2813.

图/表 16

图1 四旋翼无人机的模型示意图

Fig.1 Quadrotor UAV model

图2 算法2流程图

Fig.2 Flowchart of Algorithm 2

图3 基于演员-评论家结构的系统控制框图

Fig.3 Control block diagram of the system based on the actor-critic structure

图4 算法2基于神经网络在线实现的伪代码

Fig.4 Algorithm 2 pseudo-code for online implementation based on neural networks

表1 四旋翼无人机模型的标称参数

Table 1 Nominal parameters of the quadrotor UAV model

参数	标称值	参数
b_x	8	a₁_x,a₂_x
b_y	4.2	a₄_x,a₅_x
b_ψ	3.5	a₁_y,a₂_y
b_z	9.5	a₄_y,a₅_y
a₃_x	9.8	a₁_ψ,a₂_ψ
a₃_y	9.8	a₁_z,a₂_z

图5 算例1中神经网络权重的收敛性

Fig.5 Convergence of weights of NNs in Case 1

图6 算例1中无人机IRL过程中的三维轨迹跟踪曲线

Fig.6 3-D tracking trajectory of the UAV during IRL learning in Case 1

图7 算例1中IRL学习过程中的位置跟踪

Fig.7 Position tracking during IRL learning Case 1

图8 算例1中IRL学习过程中的姿态响应

Fig.8 Attitude response during IRL learning in Case 1

图9 算例1中IRL学习过程中无人机4个子系统控制输入

Fig.9 Control inputs for the four subsystems of the UAV during IRL learning in Case 1

图10 算例2中神经网络权重的收敛性

Fig.10 Convergence of weights of NNs in Case 2

图11 算例2中IRL学习过程中的位置跟踪

Fig.11 Position tracking during IRL learning Case 2

图12 算例2中IRL学习过程中的姿态响应

Fig.12 Attitude response during IRL learning in Case 2

图13 算例2中IRL学习过程中无人机4个子系统控制输入

Fig.13 Control inputs for the four subsystems of the UAV during IRL learning in Case 2

图14 算例2中无人机的三维轨迹跟踪曲线

Fig.14 3-D tracking trajectory of the UAV in Case 2

图15 算例2中四旋翼无人机4个子系统的跟踪误差

Fig.15 Tracking errors of the four subsystems of a quadrotor UAV in Case 2

参考文献 25

[1]	GOODCHILD A, TOY J. Delivery by drone: An evaluation of unmanned aerial vehicle technology in reducing CO₂ emissions in the delivery service industry[J]. Transportation Research Part D: Transport and Environment, 2018, 61: 58-67. doi: 10.1016/j.trd.2017.02.017 URL
[2]	MARAVALL D, DE LOPE J, FUENTES J P. Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks[J]. Neurocomputing, 2015, 151: 101-107. doi: 10.1016/j.neucom.2014.09.077 URL
[3]	梁文勇, 吴大伟, 谷山强, 等. 输电线路多旋翼无人机精细化自主巡检航迹优化方法[J]. 高电压技术, 2020, 46(9): 3054-3061.
	LIANG W Y, WU D W, GU S Q, et al. Optimization method for fine autonomous inspection route of transmission lines by multi-rotor unmanned aerial vehicle[J]. High Voltage Engineering, 2020, 46(9): 3054-3061. (in Chinese)
[4]	王慧东, 周来宏. 四旋翼无人机反步积分自适应控制器设计[J]. 兵工学报, 2021, 42(6): 1283-1289. doi: 10.3969/j.issn.1000-1093.2021.06.019
	WANG H D, ZHOU L H. A backstepping integral adaptive controller for quadrotor UAV[J]. Acta Armamentarii, 2021, 42(6): 1283-1289. (in Chinese)
[5]	LIU H, LI D J, KIM J, et al. Real-time implementation of decoupled controllers for multirotor aircrafts[J]. Journal of Intelligent & Robotic Systems, 2014, 73(1): 197-207.
[6]	LIU H, ZHAO W B, HONG S, et al. Robust backstepping-based trajectory tracking control for quadrotors with time delays[J]. IET Control Theory & Applications, 2019, 13(12): 1945-1954. doi: 10.1049/cth2.v13.12 URL
[7]	REKABI F, SHIRAZI F A, SADIGH M J, et al. Nonlinear H∞ measurement feedback control algorithm for quadrotor position tracking[J]. Journal of the Franklin Institute, 2020, 357(11): 6777-6804. doi: 10.1016/j.jfranklin.2020.04.056 URL
[8]	赵振华, 肖亮, 姜斌, 等. 基于扩张状态观测器的四旋翼无人机快速非奇异终端滑模轨迹跟踪控制[J]. 控制与决策, 2022, 37(9): 2201-2210.
	ZHAO Z H, XIAO L, JIANG B, et al. Fast nonsingular terminal sliding mode trajectory tracking control of a quadrotor UAV based on extended state observers[J]. Control and Decision, 2022, 37(9): 2201-2210. (in Chinese)
[9]	修杨, 邓宏彬, 危怡然, 等. 基于参数估计的四旋翼无人机自适应鲁棒路径跟随控制器[J]. 兵工学报, 2022, 43(8): 1926-1938.
	XIU Y, DENG H B, WEI Y R, et al. Adaptive robust path following controller for quadrotor UAVs based on parameter estimation[J]. Acta Armamentarii, 2022, 43(8):1926-1938. (in Chinese) doi: 10.12382/bgxb.2021.0444
[10]	李俊芳, 李峰, 吉月辉, 等. 四旋翼无人机轨迹稳定跟踪控制[J]. 控制与决策, 2020, 35(2): 349-356.
	LI J F, LI F, JI Y H, et al. Trajectory stable tracking control of quadrotor UAV[J]. Control and Decision, 2020, 35(2): 349-356. (in Chinese)
[11]	武晓晶, 韩欣芮, 吴学礼, 等. 动力学参数未知的四旋翼无人机预定性能控制[J/OL]. 北京航空航天大学学报, (2022-04-08) [2022-11-10]. https://doi.org/10.13700/j.bh.10-01-5965.2021.0714.
	WU X J, HAN X R, WU X L, et al. Prescribed performance control for quadrotor UAV with unknown kinetic parameters[J/OL]. Journal of Beijing University of Aeronautics and Astronautics, (2022-04-08) [2022-11-10]. https://doi.org/10.13700/j.bh.1001-5965.2021.0714. (in Chinese)
[12]	沈智鹏, 曹晓明. 输入受限四旋翼飞行器的模糊自适应动态面轨迹跟踪控制[J]. 控制与决策, 2019, 34(7): 1401-1408.
	SHEN Z P, CAO X M. Fuzzy adaptive dynamic surface trajectory tracking control for quadrotor UAV with input constraints[J]. Control and Decision, 2019, 34(7): 1401-1408. (in Chinese)
[13]	WANG F Y, ZHANG H, LIU D. Adaptive dynamic programming:an introduction[J]. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47. doi: 10.1109/MCI.2009.932261 URL
[14]	JIANG Y, JIANG Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics[J]. Automatica, 2012, 48(10): 2699-2704. doi: 10.1016/j.automatica.2012.06.096 URL
[15]	MODARES H, LEWIS F L. Linear quadratic tracking control of partially-unknown continuous-time systems using rein-forcement learning[J]. IEEE Transactions on Automatic Control, 2014, 59(11): 3051-3056. doi: 10.1109/TAC.9 URL
[16]	ZHU L M, MODARES H, PEEN G O, et al. Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning[J]. IEEE Transactions on Control Systems Technology, 2014, 23(1): 264-273. doi: 10.1109/TCST.87 URL
[17]	庞文砚, 范家璐, 姜艺, 等. 基于强化学习的部分线性离散时间系统的最优输出调节[J]. 自动化学报, 2022, 48(9): 2242-2253.
	PANG W Y, FAN J L, JIANG Y, et al. Optimal output regulation of partially linear discrete-time systems using reinforcement learning[J]. Acta Automatica Sinica, 2022, 48(9): 2242-2253. (in Chinese)
[18]	MENG Q Q, PENG Y J. Output-feedback quadratic tracking control of continuous-time systems by using off-policy reinforcement learning with neural networks observer[C]//Proceedings of 2020 Chinese Control And Decision Conference. Hefei, China: IEEE, 2020: 1504-1509.
[19]	罗傲, 肖文彬, 周琪, 等. 基于强化学习的一类具有输入约束非线性系统最优控制[J]. 控制理论与应用, 2022, 39(1): 154-164.
	LUO A, XIAO W B, ZHOU Q, et al. Optimal control for a class of nonlinear systems with input constraints based on reinforcement learning[J]. Control Theory & Applications, 2022, 39(1): 154-164. (in Chinese)
[20]	袁兆麟, 何润姿, 姚超, 等. 基于强化学习的浓密机底流浓度在线控制算法[J]. 自动化学报, 2021, 47(7): 1558-1571.
	YUAN Z L, HE R Z, YAO C, et al. Online reinforcement learning control algorithm for concentration of thickener underflow[J]. Acta Automatica Sinica, 2021, 47(7): 1558-1571. (in Chinese)
[21]	FENG Y T, ZHANG M, GUO W H, et al. Adaptive optimal control of space tether system for payload capture via policy iteration[J]. Transactions of Nanjing University of Aeronautics and Astronautics, 2021, 38(4): 560-570.
[22]	BARBIERI E, ALBA-FLORES R. On the infinite-horizon LQ tracker[J]. Systems & Control Letters, 2000, 40(2): 77-82. doi: 10.1016/S0167-6911(00)00004-9 URL
[23]	TUTSOY O, BARKANA D E, TUGAL H. Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay[J]. ISA Transactions, 2018, 76: 67-77. doi: S0019-0578(18)30096-X pmid: 29550063
[24]	MODARES H, LEWIS F L, JIANG Z P. H_∞ Tracking control of completely unknown continuous-time systems via off-policy reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2550-2562. doi: 10.1109/TNNLS.2015.2441749 URL
[25]	ABU-KHALAF M, LEWIS F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach[J]. Automatica, 2005, 41(5): 779-791. doi: 10.1016/j.automatica.2004.11.034 URL