欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (8): 240978-.doi: 10.12382/bgxb.2024.0978

• • 上一篇    下一篇

基于DDQN-D3PG的无人机空战分层决策

王昱1,*(), 李远鹏1, 郭中宇1, 李硕1, 任田君2   

  1. 1.沈阳航空航天大学 自动化学院, 辽宁 沈阳 110136
    2.西安科为实业发展有限责任公司, 陕西 西安 710000
  • 收稿日期:2024-10-21 上线日期:2025-08-28
  • 通讯作者:
  • 基金资助:
    国家自然科学基金项目(61906125); 国家自然科学基金项目(62373261); 辽宁省高校基本科研业务费项目(LJ232410143020); 辽宁省高校基本科研业务费项目(LJ212410143047)

Hierarchical Decision-making for UAV Air Combat Based on DDQN-D3PG

WANG Yu1,*(), LI Yuanpeng1, GUO Zhongyu1, LI Shuo1, REN Tianjun2   

  1. 1. School of Automation, Shenyang Aerospace University, Shenyang 110136, Liaoning, China
    2. Xi'an Kewei Industrial Development Co., Ltd., Xi'an 710000, Shaanxi, China
  • Received:2024-10-21 Online:2025-08-28

摘要:

强化学习在无人机空战应用中面临僵化的奖励函数与单一模型难以处理高维连续状态空间中复杂任务的挑战,严重限制了算法在动态多变态势下的决策泛化能力。针对上述问题,融合分层式与分布式架构的精髓,提出一种集成深度双Q网络(Double Deep Q-Network,DDQN)与深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法的自主决策框架。根据敌我双方在不同态势下的优势差异,设计一系列基于不同奖励函数权重组合的DDPG算法模型,并以此构建底层分布式深度确定性策略梯度(Distributed DDPG,D3PG)决策网络。引入擅长处理离散动作空间的DDQN算法构建上层决策网络,根据实时态势变化自主地选择并切换至最合适的底层策略模型,实现决策的即时调整与优化。为进一步提升红蓝双方无人机近距离空战环境的真实性与挑战性,在DDPG算法的训练中引入自我博弈机制,构建具备高度智能化的敌方决策模型。实验结果表明,新算法在无人机与智能化对手的博弈对抗中胜率最高达96%,相较D3PG等算法提升20%以上,且在多种初始态势下均能稳定战胜对手,充分验证了该方法的有效性和先进性。

关键词: 无人机空战, 强化学习, 分层决策, 深度双Q网络, 分布式深度确定性策略梯度

Abstract:

Application of reinforcement learning in unmanned aerial vehicle (UAV) air combat faces the challenges of which the rigid reward functions and single models are used to handle the complex tasks difficultly in high-dimensional continuous state spaces.This severely limits the decision-making generalization capability in dynamic and of algorithm varied situations.Addressing the aforementioned issues,an autonomous decision-making framework with the deep double Q-network (DDQN) and deep deterministic policy gradient (DDPG) algorithms is proposed,which integrates the essence of hierarchical and distributed architectures.Based on the advantage differences between the opposing forces in various situations,a series of DDPG algorithm models with different reward function weight combinations are designed to construct a bottom-level distributed deep deterministic policy gradient (D3PG) decision-making network.The DDQN algorithm which excels in handling discrete action spaces is introduced to construct a top-level decision-making network.It allows for autonomous selection and switching to the most suitable bottom-level policy model based on real-time situation changes,thereby achieving the instant adjustment and optimization of decisions.To further enhance the realism and challenge of combat environment,a self-play mechanism is introduced into the DDPG algorithm training to construct an enemy decision-making model with high intelligence.The experimental results demonstrate that UAVs equipped with the proposed algorithm achieve a maximum win rate of 96% in adversarial engagements against intelligent opponents,which is increased by more than 20% compared to those of baseline algorithms such as D3PG.Moreover,it consistently defeats the opponents under various initial conditions,confirming the effectiveness and advancement of the proposed algorithm.

Key words: UAV aerial combat, reinforcement learning, hierarchical decision-making, double deep Q-network, distributed deep deterministic policy gradient

中图分类号: