欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (7): 240568-.doi: 10.12382/bgxb.2024.0568

• • 上一篇    下一篇

一种元学习和强化学习结合的多飞行器协同制导律

王存灿, 王晓芳*(), 林海   

  1. 北京理工大学 空天科学与技术学院, 北京 100081

A Cooperative Guidance Law Based on Meta-learning and Reinforcement Learning for Multiple Aerial Vehicles

WANG Cuncan, WANG Xiaofang*(), LIN Hai   

  1. School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China
  • Received:2024-07-10 Online:2025-08-12

摘要:

针对高超声速再入滑翔飞行器在复杂环境中以指定角度同时命中目标的协同制导问题,提出一种基于元学习和强化学习算法的协同制导律。考虑复杂作战环境的干扰,建立协同制导问题的马尔可夫决策模型,以飞行器运动状态和比例导引系数作为状态空间和动作空间,综合考虑多飞行器攻击目标的相对距离、剩余飞行时间差以及过载情况设计奖励函数。基于元学习理论和强化学习算法将近端策略优化算法与门控循环单元相结合,通过学习相似协同制导任务的共同特征,提高协同制导策略在复杂干扰环境下的命中精度,实现攻击角度和攻击时间约束,同时提升协同制导策略对不同作战场景的适应性。仿真结果表明:该协同制导律能够在复杂战场环境下实现多飞行器以指定攻击角度对目标的同时攻击,并快速适应新的协同制导任务,在协同作战场景发生变化时仍能保持良好性能。

关键词: 高超声速再入滑翔飞行器, 协同制导, 元学习, 强化学习, 近端策略优化

Abstract:

For the cooperative guidance issue of high-hypersonic re-entry gliding vehicles to simultaneously hit a target at a specified angle in a complex environment,a cooperative guidance law based on meta-learning and reinforcement learning algorithms is proposed.Considering the interference caused by complex combat environment,a Markov decision model for the cooperative guidance issue is established,taking the gliding vehicles’ motion status and proportional guidance factor as the state space and action space.A reward function is designed by comprehensively considering the vehicle-target distance,remaining flight time difference,and overload situation for multiple gliding vehicles attacking a target.Based on meta-learning theory and reinforcement learning algorithm,the proximal policy optimization algorithms are combined with the gated recurrent units to learn the common features of similar cooperative guidance tasks.This approach enhances the accuracy of cooperative guidance strategies in complex interference environments to achieve the constraints on angle of attack and attack time,while also improving the adaptability of cooperative guidance strategy to different combat scenarios.Simulated results indicate that the proposed cooperative guidance law enables multiple aerial vehicles to simultaneously attack a target at a specified attack angle in complex battlefield environment and quickly adapt to new cooperative guidance tasks.The cooperative guidance law maintains good performance even when the cooperative combat scenario changes.

Key words: hypersonic re-entry gliding vehicle, cooperative guidance, meta-learning, reinforcement learning, proximal policy optimization

中图分类号: