一种元学习和强化学习结合的多飞行器协同制导律

doi:10.12382/bgxb.2024.0568

摘要/Abstract

摘要： 针对高超声速再入滑翔飞行器在复杂环境中以指定角度同时命中目标的协同制导问题，提出一种基于元学习和强化学习算法的协同制导律。考虑复杂作战环境的干扰，建立协同制导问题的马尔可夫决策模型，以飞行器运动状态和比例导引系数作为状态空间和动作空间，综合考虑多飞行器攻击目标的相对距离、剩余飞行时间差以及过载情况设计奖励函数。基于元学习理论和强化学习算法将近端策略优化算法与门控循环单元相结合，通过学习相似协同制导任务的共同特征，提高协同制导策略在复杂干扰环境下的命中精度，实现攻击角度和攻击时间约束，同时提升协同制导策略对不同作战场景的适应性。仿真结果表明：该协同制导律能够在复杂战场环境下实现多飞行器以指定攻击角度对目标的同时攻击，并快速适应新的协同制导任务，在协同作战场景发生变化时仍能保持良好性能。

关键词: 高超声速再入滑翔飞行器, 协同制导, 元学习, 强化学习, 近端策略优化

Abstract: For the cooperative guidance problem of high-hypersonic re-entry gliding vehicles to hit targets at a specified angle in a complex environment, a cooperative guidance law based on meta-learning and reinforcement learning algorithms is proposed. Considering the interference of complex combat environments, a Markov decision model for the cooperative guidance problem is established, taking gliding vehicles’ motion status and proportional guidance factor as the state space and action space. A reward function is designed by comprehensively considering the vehicle-target distance, remaining flight time difference, and overload situation for multiple gliding vehicles attacking the target. Based on meta-learning theory and reinforcement learning algorithms, this study combines proximal policy optimization algorithms with gated recurrent units to learn the common features of similar cooperative guidance tasks. This approach enhances the accuracy of cooperative guidance strategies in complex interference environments, achieving constraints on attack angles and attack timing, while also improving the adaptability of cooperative guidance strategies to different combat scenarios. Simulation results indicate that this cooperative guidance law enables multiple aircraft to simultaneously attack targets at specified attack angles in complex battlefield environments and quickly adapts to new cooperative guidance tasks, maintaining good performance even when the cooperative combat scenario changes.

Key words: hypersonic re-entry gliding vehicle, cooperative guidance, meta learning, reinforcement learning, proximal policy optimization

中图分类号:

V249.31

王存灿, 王晓芳, 林海. 一种元学习和强化学习结合的多飞行器协同制导律[J]. 兵工学报, doi: 10.12382/bgxb.2024.0568.

WANG Cuncan, WANG Xiaofang, LIN Hai. A Cooperative Guidance Law for Multi-aircraft Combining Meta-Learning and Reinforcement Learning[J]. Acta Armamentarii, doi: 10.12382/bgxb.2024.0568.

[1]	胡砚洋, 何凡, 白成超. 高超声速飞行器末制导段协同避障决策方法[J]. 兵工学报, 2024, 45(9): 3147-3160.
[2]	孙浩, 黎海青, 梁彦, 马超雄, 吴翰. 基于知识辅助深度强化学习的巡飞弹组动态突防决策[J]. 兵工学报, 2024, 45(9): 3161-3176.
[3]	陈文杰, 崔小红, 王斌锐. 安全最优跟踪控制算法与机械手仿真[J]. 兵工学报, 2024, 45(8): 2688-2697.
[4]	于航, 李清玉, 戴可人, 李豪杰, 邹尧, 张合. 三维自适应固定时间多导弹协同制导律[J]. 兵工学报, 2024, 45(8): 2646-2657.
[5]	张堃, 华帅, 袁斌林, 李阳. 多平台网络化制导交接技术[J]. 兵工学报, 2024, 45(7): 2171-2181.
[6]	王霄龙, 陈洋, 胡棉, 李旭东. 基于改进深度Q网络的机器人持续监测路径规划[J]. 兵工学报, 2024, 45(6): 1813-1823.
[7]	罗雨雨, 丁伟, 明振军, 李传浩, 王国新, 阎艳, 王玉茜. 面向OODA作战流程的防空火力网端对端智能构建算法[J]. 兵工学报, 2024, 45(12): 4231-4245.
[8]	娄抒瀚, 王冲冲, 龚炜, 邓立原, 李莉. 基于MLAT-DRL算法的协同区域信息采集策略[J]. 兵工学报, 2024, 45(12): 4423-4434.
[9]	董明泽, 温庄磊, 陈锡爱, 杨炅坤, 曾涛. 安全凸空间与深度强化学习结合的机器人导航方法[J]. 兵工学报, 2024, 45(12): 4372-4382.
[10]	李加申, 王晓芳, 林海. 引入虚拟目标的高超声速巡航导弹智能机动突防策略[J]. 兵工学报, 2024, 45(11): 3856-3867.
[11]	王伟, 于之晨, 林时尧, 杨婧, 王宏. 针对机动目标的三维领从协同制导律[J]. 兵工学报, 2024, 45(10): 3538-3554.
[12]	傅妍芳, 雷凯麟, 魏佳宁, 曹子建, 杨博, 王炜, 孙泽龙, 李秦洁. 基于演员-评论家框架的层次化多智能体协同决策方法[J]. 兵工学报, 2024, 45(10): 3385-3396.
[13]	李松, 麻壮壮, 张蕴霖, 邵晋梁. 基于安全强化学习的多智能体覆盖路径规划[J]. 兵工学报, 2023, 44(S2): 101-113.
[14]	曹子建, 孙泽龙, 闫国闯, 傅妍芳, 杨博, 李秦洁, 雷凯麟, 高领航. 基于强化学习的无人机集群对抗策略推演仿真[J]. 兵工学报, 2023, 44(S2): 126-134.
[15]	王雨辰, 王伟, 林时尧, 杨婧, 王少龙, 尹瞾. 考虑攻击时间及空间角度约束的三维自适应滑模协同制导律设计[J]. 兵工学报, 2023, 44(9): 2778-2790.

一种元学习和强化学习结合的多飞行器协同制导律

A Cooperative Guidance Law for Multi-aircraft Combining Meta-Learning and Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价