欢迎访问《兵工学报》官方网站,今天是

兵工学报

• •    下一篇

基于DDQN-D3PG的无人机空战分层决策

王昱1*(),李远鹏1,郭中宇1,李硕1,任田君2   

  1. (1. 沈阳航空航天大学 自动化学院,辽宁 沈阳 110136;2. 西安科为实业发展有限责任公司,陕西 西安 710000)
  • 收稿日期:2024-10-21 修回日期:2025-02-08
  • 通讯作者: *邮箱:wangyu@sau.edu.cn
  • 基金资助:
    国家自然科学基金项目(61906125、62373261); 辽宁省属本科高校基本科研业务费项目(LJ232410143020、LJ212410143047)

Hierarchical Decision-making For UAV Air Combat Based on DDQN-D3PG

WANG Yu1*(), LI Yuanpeng1, GUO Zhongyu1, LI Shuo1, REN Tianjun2   

  1. (1. School of Automation, Shenyang Aerospace University, Shenyang 110136, Liaoning, China; 2. Xi'an Kewei Industrial Development Co., Ltd, Xi'an 710000, Shaanxi, China)
  • Received:2024-10-21 Revised:2025-02-08

摘要: 强化学习在无人机空战应用中面临僵化的奖励函数与单一的模型难以处理高维连续状态空间中复杂任务的挑战,严重限制了算法在动态多变态势下的决策泛化能力。针对上述问题,融合分层式与分布式架构的精髓,提出一种集成深度双Q网络(Double Deep Q-Network,DDQN)与深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法的自主决策框架。根据敌我双方在不同态势下的优势差异,设计一系列基于不同奖励函数权重组合的DDPG算法模型,并以此构建底层分布式深度确定性策略梯度(Distributed DDPG,D3PG)决策网络。引入擅长处理离散动作空间的DDQN算法构建上层决策网络,根据实时态势变化自主地选择并切换至最合适的底层策略模型,实现决策的即时调整与优化。为进一步提升对战环境的真实性与挑战性,在DDPG算法的训练中引入自我博弈机制,构建具备高度智能化的敌方决策模型。通过与智能化对手的博弈训练过程以及随机初始态势下的仿真实验验证了所提方法的有效性和先进性。

关键词: 无人机空战, 分层强化学习, 权重切换, 深度双Q网络, 分布式深度确定性策略梯度

Abstract: Reinforcement learning in unmanned aerial vehicle (UAV) air combat applications faces challenges, which are posed by the difficulty of rigid reward functions and single models in handling complex tasks within high-dimensional continuous state spaces. This severely limits the algorithm's decision generalization capability in dynamic and varied situations. Addressing the aforementioned issues, an autonomous decision-making framework that integrates the essence of hierarchical and distributed architectures has been proposed, incorporating the Deep Double Q-Network (DDQN) and Deep Deterministic Policy Gradient (DDPG) algorithms. Based on the advantage differences between the opposing forces in various situations, a series of DDPG algorithm models with different reward function weight combinations have been designed to construct a bottom-level distributed DDPG (D3PG) decision network. Introducing the DDQN algorithm, which excels in handling discrete action spaces, to construct the top-level decision-making network, allows for autonomous selection and switching to the most suitable bottom-level policy model based on real-time situation changes, thereby achieving instant adjustment and optimization of decision-making. To further enhance the realism and challenge of the combat environment, a self-play mechanism is introduced into the DDPG algorithm training to construct an enemy decision-making model with high intelligence. The effectiveness and advancement of the method are verified through the training process of competing against intelligent opponents and simulation experiments under random initial situations.

Key words: UAV aerial combat, hierarchical reinforcement learning, weight switching, double deep Q-network, distributed deep deterministic policy gradient

中图分类号: