新疆大学 电气工程学院, 新疆 乌鲁木齐 830017
*邮箱: lxk@xju.edu.cn
收稿:2022-11-12,
网络出版:2023-09-25,
纸质出版:2023-09-20
移动端阅览
杨加秀, 李新凯, 张宏立, 等. 基于积分强化学习的四旋翼无人机鲁棒跟踪[J]. 兵工学报, 2023,44(9):2802-2813.
Jiaxiu YANG, Xinkai LI, Hongli ZHANG, et al. Robust Tracking of Quadrotor UAVs Based on Integral Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(9): 2802-2813.
杨加秀, 李新凯, 张宏立, 等. 基于积分强化学习的四旋翼无人机鲁棒跟踪[J]. 兵工学报, 2023,44(9):2802-2813. DOI: 10.12382/bgxb.2022.1051.
Jiaxiu YANG, Xinkai LI, Hongli ZHANG, et al. Robust Tracking of Quadrotor UAVs Based on Integral Reinforcement Learning[J]. Acta Armamentarii, 2023, 44(9): 2802-2813. DOI: 10.12382/bgxb.2022.1051.
针对系统模型动态不确定和受外部干扰的四旋翼无人机位置轨迹跟踪控制问题
提出一种新的基于积分强化学习的鲁棒轨迹跟踪控制方法。建立四旋翼无人机原系统与参考轨迹的增广系统
将四旋翼无人机的鲁棒轨迹跟踪问题转化为镇定问题。通过使用带有折扣因子的价值函数
将无人机增广系统的鲁棒镇定问题转化为四旋翼无人机的最优控制问题
从而兼顾到四旋翼无人机的跟踪误差和整体性能。基于积分强化学习方法
构建了单网络演员-评论家结构对最优价值函数进行估计
进而实现对四旋翼无人机控制器的在线求解。对四旋翼无人机系统跟踪误差的稳定性及单网络结构权值的收敛性进行了严格的数学证明
仿真结果验证了所设计控制方案的优越性和鲁棒性。
A novel robust trajectory tracking control method based on integral reinforcement learning is proposed for the quadrotor UAV position trajectory tracking control with uncertain system model dynamics and external disturbances. Firstly
an augmented system of the original system and reference trajectory of the quadrotor UAV is established to transform the robust trajectory tracking problem of the quadrotor UAV into a sedimentation problem. By using the value function with discount factor
the robust calming problem of the UAV augmented system is transformed into an optimal control problem
taking into account the tracking errors and the overall performance of the quadrotor UAV. Then
based on the integral reinforcement learning method
a single network actor-critic structure is developed to estimate the optimal value function and online solution for the quadrotor UAV controller. Finally
the stability of the quadrotor UAV system tracking errors and the convergence of the single network structure weights are rigorously demonstrated mathematically
and the simulation results verify the superiority and robustness of the proposed control scheme.
GOODCHILD A , TOY J . Delivery by drone: An evaluation of unmanned aerial vehicle technology in reducing CO 2 emissions in the delivery service industry [J ] . Transportation Research Part D: Transport and Environment , 2018 , 61 : 58 - 67 . DOI: 10.1016/j.trd.2017.02.017 http://doi.org/10.1016/j.trd.2017.02.017 https://linkinghub.elsevier.com/retrieve/pii/S136192091630133X https://linkinghub.elsevier.com/retrieve/pii/S136192091630133X
MARAVALL D , DE LOPE J , FUENTES J P . Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks [J ] . Neurocomputing , 2015 , 151 : 101 - 107 . DOI: 10.1016/j.neucom.2014.09.077 http://doi.org/10.1016/j.neucom.2014.09.077 https://linkinghub.elsevier.com/retrieve/pii/S0925231214013228 https://linkinghub.elsevier.com/retrieve/pii/S0925231214013228
梁文勇 , 吴大伟 , 谷山强 , 等 . 输电线路多旋翼无人机精细化自主巡检航迹优化方法 [J ] . 高电压技术 , 2020 , 46 ( 9 ): 3054 - 3061 .
LIANG W Y , WU D W , GU S Q , et al . Optimization method for fine autonomous inspection route of transmission lines by multi-rotor unmanned aerial vehicle [J ] . High Voltage Engineering , 2020 , 46 ( 9 ): 3054 - 3061 . (in Chinese)
王慧东 , 周来宏 . 四旋翼无人机反步积分自适应控制器设计 [J ] . 兵工学报 , 2021 , 42 ( 6 ): 1283 - 1289 . DOI: 10.3969/j.issn.1000-1093.2021.06.019 http://doi.org/10.3969/j.issn.1000-1093.2021.06.019 针对四旋翼无人机在实际应用过程中出现质量变化的情况,基于自适应控制理论设计质量观测器,用于实时观测无人机的质量并修正其质量参数。在经典反步控制器 (CBC)基础上,结合质量观测器和第一类控制误差积分,提出反步积分自适应控制器 (BIAC), 用于无人机的轨迹跟踪控制。该控制器的设计过程基于Lyapunov稳定性理论,能够保证系统的控制误差渐进稳定。应用MATLAB/Simulink软件环境完成轨迹跟踪仿真实验。仿真结果表明:在无人机存在质量慢变或质量突变情况下,BIAC可以更好地估计无人机实时质量;与CBC相比,地球坐标系Ex<sub></sub>ey<sub></sub>ez<sub></sub>e下z<sub></sub>e轴 轨迹误差减小80%左右,跟踪精度大为提高。
WANG H D , ZHOU L H . A backstepping integral adaptive controller for quadrotor UAV [J ] . Acta Armamentarii , 2021 , 42 ( 6 ): 1283 - 1289 . (in Chinese)
LIU H , LI D J , KIM J , et al . Real-time implementation of decoupled controllers for multirotor aircrafts [J ] . Journal of Intelligent & Robotic Systems , 2014 , 73 ( 1 ): 197 - 207 .
LIU H , ZHAO W B , HONG S , et al . Robust backstepping-based trajectory tracking control for quadrotors with time delays [J ] . IET Control Theory & Applications , 2019 , 13 ( 12 ): 1945 - 1954 . DOI: 10.1049/cth2.v13.12 http://doi.org/10.1049/cth2.v13.12 https://onlinelibrary.wiley.com/toc/17518652/13/12 https://onlinelibrary.wiley.com/toc/17518652/13/12
REKABI F , SHIRAZI F A , SADIGH M J , et al . Nonlinear H∞ measurement feedback control algorithm for quadrotor position tracking [J ] . Journal of the Franklin Institute , 2020 , 357 ( 11 ): 6777 - 6804 . DOI: 10.1016/j.jfranklin.2020.04.056 http://doi.org/10.1016/j.jfranklin.2020.04.056 https://linkinghub.elsevier.com/retrieve/pii/S0016003220303136 https://linkinghub.elsevier.com/retrieve/pii/S0016003220303136
赵振华 , 肖亮 , 姜斌 , 等 . 基于扩张状态观测器的四旋翼无人机快速非奇异终端滑模轨迹跟踪控制 [J ] . 控制与决策 , 2022 , 37 ( 9 ): 2201 - 2210 .
ZHAO Z H , XIAO L , JIANG B , et al. Fast nonsingular terminal sliding mode trajectory tracking control of a quadrotor UAV based on extended state observers [J ] . Control and Decision , 2022 , 37 ( 9 ): 2201 - 2210 . (in Chinese)
修杨 , 邓宏彬 , 危怡然 , 等 . 基于参数估计的四旋翼无人机自适应鲁棒路径跟随控制器 [J ] . 兵工学报 , 2022 , 43 ( 8 ): 1926 - 1938 .
XIU Y , DENG H B , WEI Y R , et al. Adaptive robust path following controller for quadrotor UAVs based on parameter estimation [J ] . Acta Armamentarii , 2022 , 43 ( 8 ): 1926 - 1938 . (in Chinese) DOI: 10.12382/bgxb.2021.0444 http://doi.org/10.12382/bgxb.2021.0444 To improve the path following accuracy and flight robustness of quadrotor UAVs, an adaptive robust path following controller based on parameter estimation is proposed. The controller can adaptively estimate the gyroscopic factors and drag coefficients for the UAV model, compensate the system’s control input based on estimated values, and offset the negative impact of the external environment with an anti-interference capacity. The controller can effectively improve the path following and anti-interference performance of quadrotor UAVs. First, a nonlinear mechanical model of a quadrotor UAV is established. Second, the path following targets for UAVs are divided into attitude angle targets and moving position targets. Third, the backstepping sliding mode method and adaptive control method are used to design the control input equation and the estimation updating law of the UAV. Simultaneously, the asymptotic stability of the UAV attitude system and motion position system is verified by applying the Lyapunov method. Lastly, the effectiveness and superiority of the proposed controller are verified by simulation and experiments.
李俊芳 , 李峰 , 吉月辉 , 等 . 四旋翼无人机轨迹稳定跟踪控制 [J ] . 控制与决策 , 2020 , 35 ( 2 ): 349 - 356 .
LI J F , LI F , JI Y H , et al . Trajectory stable tracking control of quadrotor UAV [J ] . Control and Decision , 2020 , 35 ( 2 ): 349 - 356 . (in Chinese)
武晓晶 , 韩欣芮 , 吴学礼 , 等 . 动力学参数未知的四旋翼无人机预定性能控制 [J/OL ] . 北京航空航天大学学报 , ( 2022-04-08 ) [2022-11-10] . https://doi.org/ 10.13700/j.bh.10-01-5965.2021.0714 https://dx.doi.org/10.13700/j.bh.10-01-5965.2021.0714 .
WU X J , HAN X R , WU X L , et al . Prescribed performance control for quadrotor UAV with unknown kinetic parameters [J/OL ] . Journal of Beijing University of Aeronautics and Astronautics , ( 2022-04-08 ) [2022-11-10] . https://doi.org/ 10.13700/j.bh.1001-5965.2021.0714 https://dx.doi.org/10.13700/j.bh.1001-5965.2021.0714 . (in Chinese)
沈智鹏 , 曹晓明 . 输入受限四旋翼飞行器的模糊自适应动态面轨迹跟踪控制 [J ] . 控制与决策 , 2019 , 34 ( 7 ): 1401 - 1408 .
SHEN Z P , CAO X M . Fuzzy adaptive dynamic surface trajectory tracking control for quadrotor UAV with input constraints [J ] . Control and Decision , 2019 , 34 ( 7 ): 1401 - 1408 . (in Chinese)
WANG F Y , ZHANG H , LIU D . Adaptive dynamic programming:an introduction [J ] . IEEE Computational Intelligence Magazine , 2009 , 4 ( 2 ): 39 - 47 . DOI: 10.1109/MCI.2009.932261 http://doi.org/10.1109/MCI.2009.932261 http://ieeexplore.ieee.org/document/4840325/ http://ieeexplore.ieee.org/document/4840325/
JIANG Y , JIANG Z P . Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J ] . Automatica , 2012 , 48 ( 10 ): 2699 - 2704 . DOI: 10.1016/j.automatica.2012.06.096 http://doi.org/10.1016/j.automatica.2012.06.096 https://linkinghub.elsevier.com/retrieve/pii/S0005109812003664 https://linkinghub.elsevier.com/retrieve/pii/S0005109812003664
MODARES H , LEWIS F L . Linear quadratic tracking control of partially-unknown continuous-time systems using rein-forcement learning [J ] . IEEE Transactions on Automatic Control , 2014 , 59 ( 11 ): 3051 - 3056 . DOI: 10.1109/TAC.9 http://doi.org/10.1109/TAC.9 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9
ZHU L M , MODARES H , PEEN G O , et al . Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning [J ] . IEEE Transactions on Control Systems Technology , 2014 , 23 ( 1 ): 264 - 273 . DOI: 10.1109/TCST.87 http://doi.org/10.1109/TCST.87 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=87 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=87
庞文砚 , 范家璐 , 姜艺 , 等 . 基于强化学习的部分线性离散时间系统的最优输出调节 [J ] . 自动化学报 , 2022 , 48 ( 9 ): 2242 - 2253 .
PANG W Y , FAN J L , JIANG Y , et al. Optimal output regulation of partially linear discrete-time systems using reinforcement learning [J ] . Acta Automatica Sinica , 2022 , 48 ( 9 ): 2242 - 2253 . (in Chinese)
MENG Q Q , PENG Y J . Output-feedback quadratic tracking control of continuous-time systems by using off-policy reinforcement learning with neural networks observer [C ] //Proceedings of 2020 Chinese Control And Decision Conference. Hefei , China : IEEE , 2020 : 1504 - 1509 .
罗傲 , 肖文彬 , 周琪 , 等 . 基于强化学习的一类具有输入约束非线性系统最优控制 [J ] . 控制理论与应用 , 2022 , 39 ( 1 ): 154 - 164 .
LUO A , XIAO W B , ZHOU Q , et al . Optimal control for a class of nonlinear systems with input constraints based on reinforcement learning [J ] . Control Theory & Applications , 2022 , 39 ( 1 ): 154 - 164 . (in Chinese)
袁兆麟 , 何润姿 , 姚超 , 等 . 基于强化学习的浓密机底流浓度在线控制算法 [J ] . 自动化学报 , 2021 , 47 ( 7 ): 1558 - 1571 .
YUAN Z L , HE R Z , YAO C , et al . Online reinforcement learning control algorithm for concentration of thickener underflow [J ] . Acta Automatica Sinica , 2021 , 47 ( 7 ): 1558 - 1571 . (in Chinese)
FENG Y T , ZHANG M , GUO W H , et al . Adaptive optimal control of space tether system for payload capture via policy iteration [J ] . Transactions of Nanjing University of Aeronautics and Astronautics , 2021 , 38 ( 4 ): 560 - 570 .
BARBIERI E , ALBA-FLORES R . On the infinite-horizon LQ tracker [J ] . Systems & Control Letters , 2000 , 40 ( 2 ): 77 - 82 . DOI: 10.1016/S0167-6911(00)00004-9 http://doi.org/10.1016/S0167-6911(00)00004-9 https://linkinghub.elsevier.com/retrieve/pii/S0167691100000049 https://linkinghub.elsevier.com/retrieve/pii/S0167691100000049
TUTSOY O , BARKANA D E , TUGAL H . Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay [J ] . ISA Transactions , 2018 , 76 : 67 - 77 . DOI: S0019-0578(18)30096-X http://doi.org/S0019-0578(18)30096-X In this paper, an adaptive controller is developed for discrete time linear systems that takes into account parametric uncertainty, internal-external non-parametric random uncertainties, and time varying control signal delay. Additionally, the proposed adaptive control is designed in such a way that it is utterly model free. Even though these properties are studied separately in the literature, they are not taken into account all together in adaptive control literature. The Q-function is used to estimate long-term performance of the proposed adaptive controller. Control policy is generated based on the long-term predicted value, and this policy searches an optimal stabilizing control signal for uncertain and unstable systems. The derived control law does not require an initial stabilizing control assumption as in the ones in the recent literature. Learning error, control signal convergence, minimized Q-function, and instantaneous reward are analyzed to demonstrate the stability and effectiveness of the proposed adaptive controller in a simulation environment. Finally, key insights on parameters convergence of the learning and control signals are provided.Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
MODARES H , LEWIS F L , JIANG Z P . H ∞ Tracking control of completely unknown continuous-time systems via off-policy reinforcement learning [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2015 , 26 ( 10 ): 2550 - 2562 . DOI: 10.1109/TNNLS.2015.2441749 http://doi.org/10.1109/TNNLS.2015.2441749 http://ieeexplore.ieee.org/document/7132753/ http://ieeexplore.ieee.org/document/7132753/
ABU-KHALAF M , LEWIS F L . Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J ] . Automatica , 2005 , 41 ( 5 ): 779 - 791 . DOI: 10.1016/j.automatica.2004.11.034 http://doi.org/10.1016/j.automatica.2004.11.034 https://linkinghub.elsevier.com/retrieve/pii/S0005109805000105 https://linkinghub.elsevier.com/retrieve/pii/S0005109805000105
0
浏览量
356
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号