欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (S1): 250606-.doi: 10.12382/bgxb.2025.0606

• • 上一篇    下一篇

通过交叉注意力和专家奖励塑形优化空战威胁评估的强化学习方法

孙康1, 薛丁瑞1,2, 范继1, 林玉清2,*(), 李博1, 王柯新1, 刘建成1, 卫思雯3,**()   

  1. 1 西北机电工程研究所, 陕西 咸阳 712099
    2 长安大学 信息工程学院, 陕西 西安 710018
    3 西安电子科技大学 计算机科学与技术学院, 陕西 西安 710071

A Reinforcement Learning Method for Optimizing Air Combat Threat Assessment via Cross-attention Mechanisms and Expert-guided Reward Shaping

SUN Kang1, XUE Dingrui1,2, FAN Ji1, LIN Yuqing2,*(), LI Bo1, WANG Kexin1, LIU Jiancheng1, WEI Siwen3,**()   

  1. 1 Northwest Institute of Mechanical and Electrical Engineering, Xianyang 712099,Shannxi, China
    2 School of Information EngineeringChang’an University, Xi’an 710018,Shannxi, China
    3 School of Computer Science and TechnologyXidian University, Xi’an 710071,Shannxi, China
  • Received:2025-07-08 Online:2025-11-06

摘要:

空中威胁评估在现代军事行动中仍然是一个关键环节,尤其在高动态、不确定的作战环境中更具挑战性。传统方法难以有效处理多目标威胁、实时决策及环境不确定性。为此,提出动态交叉注意力与自适应熵(Dynamic Cross-Attention-Adaptive Entropy Soft-actor-critic-Twin,DCA-AEST)相结合的框架,该框架结合了2个创新组件:动态交叉注意力特征提取(Dynamic Cross-Attention Feature Extraction,DCAFE)和双网络自适应熵SAC(Soft Actor-Critic)。DCAFE模块利用层次化的交叉注意力机制,从复杂的多源战场数据中动态提取高阶特征交互,从而提高威胁检测和优先级排序的准确性;AEST模块融合强化学习与专家经验引导的奖励塑造机制,结合自适应熵正则化,实现实时策略优化与稳定性控制。在高保真对抗作战仿真环境中开展实验验证,结果表明:所提框架在多种进攻策略下的性能表现提升约4.3%~10.8%,显著优于对比算法;特别是在分散攻击、突袭等动态策略进攻中表现出更优的适应性与评估准确性。研究表明,提出的框架能够有效提升复杂战场环境下威胁评估的决策稳定性与准确率,具有较强的军事应用价值。

关键词: 威胁评估, 动态交叉注意力, 专家奖励塑形, 强化学习

Abstract:

Aerial target threat assessment remains a critical component in modern military operations,particularly in highly dynamic and uncertain combat environments.The traditional methods are difficult to effectively handle multi-target threats,real-time decision-making,and environmental uncertainties.To address these limitations,this paper proposed a DCA-AEST framework which combines two novel modules:dynamic cross-attention feature extraction (DCAFE) and adaptive entropy SAC-Twin (AEST).The DCAFE module utilizes a hierarchical cross-attention mechanism to dynamically extract high-order feature interactions from complex multi-source battlefield data,thereby enhancing the accuracy of threat detection and prioritization.The AEST module integrates the reinforcement learning with the expert-guided reward shaping and adaptive entropy regularization,allowing the module to adaptively optimize its threat evaluation strategy in real-time.The proposed DCA-AEST framework is rigorously validated through extensive experiments in a high-fidelity adversarial combat simulation environment.The results demonstrate that DCA-AEST framework has superior performance in comparison to state-of-the-art models,showcasing significant improvements in adaptability,decision-making stability,and threat assessment accuracy in dynamic and uncertain combat scenarios.

Key words: threat assessment, dynamic cross-attention, expert-guided reward shaping, reinforcement learning