Welcome to Acta Armamentarii ! Today is

Acta Armamentarii ›› 2025, Vol. 46 ›› Issue (8): 240978-.doi: 10.12382/bgxb.2024.0978

Previous Articles     Next Articles

Hierarchical Decision-making for UAV Air Combat Based on DDQN-D3PG

WANG Yu1,*(), LI Yuanpeng1, GUO Zhongyu1, LI Shuo1, REN Tianjun2   

  1. 1. School of Automation, Shenyang Aerospace University, Shenyang 110136, Liaoning, China
    2. Xi'an Kewei Industrial Development Co., Ltd., Xi'an 710000, Shaanxi, China
  • Received:2024-10-21 Online:2025-08-28
  • Contact: WANG Yu

Abstract:

Application of reinforcement learning in unmanned aerial vehicle (UAV) air combat faces the challenges of which the rigid reward functions and single models are used to handle the complex tasks difficultly in high-dimensional continuous state spaces.This severely limits the decision-making generalization capability in dynamic and of algorithm varied situations.Addressing the aforementioned issues,an autonomous decision-making framework with the deep double Q-network (DDQN) and deep deterministic policy gradient (DDPG) algorithms is proposed,which integrates the essence of hierarchical and distributed architectures.Based on the advantage differences between the opposing forces in various situations,a series of DDPG algorithm models with different reward function weight combinations are designed to construct a bottom-level distributed deep deterministic policy gradient (D3PG) decision-making network.The DDQN algorithm which excels in handling discrete action spaces is introduced to construct a top-level decision-making network.It allows for autonomous selection and switching to the most suitable bottom-level policy model based on real-time situation changes,thereby achieving the instant adjustment and optimization of decisions.To further enhance the realism and challenge of combat environment,a self-play mechanism is introduced into the DDPG algorithm training to construct an enemy decision-making model with high intelligence.The experimental results demonstrate that UAVs equipped with the proposed algorithm achieve a maximum win rate of 96% in adversarial engagements against intelligent opponents,which is increased by more than 20% compared to those of baseline algorithms such as D3PG.Moreover,it consistently defeats the opponents under various initial conditions,confirming the effectiveness and advancement of the proposed algorithm.

Key words: UAV aerial combat, reinforcement learning, hierarchical decision-making, double deep Q-network, distributed deep deterministic policy gradient

CLC Number: