欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (3): 240357-.doi: 10.12382/bgxb.2024.0357

• • 上一篇    下一篇

一种探索率自适应设置的强化学习雷达干扰决策方法

张旺, 邵学辉, 唐慧龙, 魏建林, 王伟*()   

  1. 哈尔滨工程大学 智能科学与工程学院, 黑龙江 哈尔滨 150001
  • 收稿日期:2024-05-10 上线日期:2025-03-26
  • 通讯作者:
  • 基金资助:
    国家自然科学基金项目(62271163)

A Reinforcement Learning-based Radar Jamming Decision-making Method with Adaptive Setting of Exploration Rate

ZHANG Wang, SHAO Xuehui, TANG Huilong, WEI Jianlin, WANG Wei*()   

  1. School of Intelligent Science and Engineering, Harbin Engineering University, Harbin 150001, Heilongjiang, China
  • Received:2024-05-10 Online:2025-03-26

摘要:

针对当前基于强化学习的雷达干扰决策方法依据单一因素、固定规律设置探索率参数导致算法收敛需要的对抗回合次数增多的问题,提出一种探索率自适应设置的强化学习雷达干扰决策方法。基于模拟退火法的Metropolis参数调节准则,结合对抗过程中干扰机已识别的雷达工作状态数量、干扰成功次数、算法收敛曲线变化率及干扰机对雷达的认知程度,推导一种探索率自适应设置准则。依据干扰动作的有效性,设计一种干扰动作空间裁剪策略,减小干扰动作空间维度,进一步提高算法收敛速度。在仿真实验中,设计两个不同的雷达工作状态图,并结合Q学习算法予以对比验证。仿真结果表明,在雷达工作状态转换关系发生变化的情况下,新方法均可完成探索率的自适应设置,与基于模拟退火法以及单一因素、固定规律的探索率设置方案相比,新方法在两个状态图下收敛需要的对抗回合次数分别减少了18%、26%、45%和42%、44%、48%,同时还可获得更大的收益和更高的干扰成功率,为基于强化学习的多功能雷达干扰决策提供了一种新的探索率设置思路。

关键词: 多功能雷达, 雷达干扰决策, 强化学习, 探索率

Abstract:

The current radar jamming decision-making method based on reinforcement learning sets the exploration rate parameter according to a single factor and fixed law,which leads to the increase in the number of confrontation rounds required for algorithm convergence.A reinforcement learning-based radar jamming decision-making method with adaptive setting of exploration rate is proposed.Based on the Metropolis parameter adjustment criterion of simulated annealing method,an adaptive setting criterion of exploration rate is derived from the number of radar operating states recognized by jammers,the number of jamming successes,the change rate of algorithm convergence curve and the jammer’s cognition of radar in the process of countermeasures.According to the effectiveness of jamming action,a jamming action space clipping strategy is designed to reduce the dimension of jamming action space and further improve the convergence speed of the algorithm.In the simulation experiment,two different radar working state diagrams are designed and compared by using the Q-learning algorithm.The simulated results show that the proposed method can achieve the adaptive setting of exploration rate when the radar working state transition relationship changes.Compared with the exploration rate setting scheme based on simulated annealing method,single factor and fixed law,the number of confrontation rounds required for the convergence of the proposed method in the two state diagrams is reduced by 18%,26%,45% and 42%,44%,48%,respectively.At the same time,it can also obtain greater benefits and higher jamming success rate,which provides a new idea of exploration rate setting for multi-functional radar jamming decision-making based on reinforcement learning.

Key words: multi-functional radar, radar jamming decision-making, reinforcement learning, exploration rate