一种探索率自适应设置的强化学习雷达干扰决策方法

doi:10.12382/bgxb.2024.0357

摘要/Abstract

摘要：

针对当前基于强化学习的雷达干扰决策方法依据单一因素、固定规律设置探索率参数导致算法收敛需要的对抗回合次数增多的问题,提出一种探索率自适应设置的强化学习雷达干扰决策方法。基于模拟退火法的Metropolis参数调节准则,结合对抗过程中干扰机已识别的雷达工作状态数量、干扰成功次数、算法收敛曲线变化率及干扰机对雷达的认知程度,推导一种探索率自适应设置准则。依据干扰动作的有效性,设计一种干扰动作空间裁剪策略,减小干扰动作空间维度,进一步提高算法收敛速度。在仿真实验中,设计两个不同的雷达工作状态图,并结合Q学习算法予以对比验证。仿真结果表明,在雷达工作状态转换关系发生变化的情况下,新方法均可完成探索率的自适应设置,与基于模拟退火法以及单一因素、固定规律的探索率设置方案相比,新方法在两个状态图下收敛需要的对抗回合次数分别减少了18%、26%、45%和42%、44%、48%,同时还可获得更大的收益和更高的干扰成功率,为基于强化学习的多功能雷达干扰决策提供了一种新的探索率设置思路。

关键词: 多功能雷达, 雷达干扰决策, 强化学习, 探索率

Abstract:

The current radar jamming decision-making method based on reinforcement learning sets the exploration rate parameter according to a single factor and fixed law,which leads to the increase in the number of confrontation rounds required for algorithm convergence.A reinforcement learning-based radar jamming decision-making method with adaptive setting of exploration rate is proposed.Based on the Metropolis parameter adjustment criterion of simulated annealing method,an adaptive setting criterion of exploration rate is derived from the number of radar operating states recognized by jammers,the number of jamming successes,the change rate of algorithm convergence curve and the jammer’s cognition of radar in the process of countermeasures.According to the effectiveness of jamming action,a jamming action space clipping strategy is designed to reduce the dimension of jamming action space and further improve the convergence speed of the algorithm.In the simulation experiment,two different radar working state diagrams are designed and compared by using the Q-learning algorithm.The simulated results show that the proposed method can achieve the adaptive setting of exploration rate when the radar working state transition relationship changes.Compared with the exploration rate setting scheme based on simulated annealing method,single factor and fixed law,the number of confrontation rounds required for the convergence of the proposed method in the two state diagrams is reduced by 18%,26%,45% and 42%,44%,48%,respectively.At the same time,it can also obtain greater benefits and higher jamming success rate,which provides a new idea of exploration rate setting for multi-functional radar jamming decision-making based on reinforcement learning.

Key words: multi-functional radar, radar jamming decision-making, reinforcement learning, exploration rate

张旺, 邵学辉, 唐慧龙, 魏建林, 王伟. 一种探索率自适应设置的强化学习雷达干扰决策方法[J]. 兵工学报, 2025, 46(3): 240357-.

ZHANG Wang, SHAO Xuehui, TANG Huilong, WEI Jianlin, WANG Wei. A Reinforcement Learning-based Radar Jamming Decision-making Method with Adaptive Setting of Exploration Rate[J]. Acta Armamentarii, 2025, 46(3): 240357-.

图/表 15

图1 基于强化学习的雷达干扰决策

Fig.1 Radar jamming decision-making based on reinforcement learning

图2 雷达工作状态转换关系

Fig.2 Radar state transition relationship

表1 状态转移矩阵

Table 1 State transition matrix

S	S'
S	1	2	3	4	5	6	7	8	9
1	0.65	0.07	0.28	0	0	0	0	0	0
2	0.13	0.35	0.11	0.34	0.07	0	0	0	0
3	0	0.54	0.20	0.11	0	0.15	0	0	0
4	0	0.22	0.17	0.12	0.29	0	0.20	0	0
5	0	0.17	0	0.28	0.09	0	0.23	0	0.23
6	0	0	0.35	0	0	0.53	0.07	0.05	0
7	0	0	0	0.18	0.27	0.32	0.04	0	0.19
8	0	0	0	0	0	0.57	0	0.02	0.41
9	0	0	0	0	0	0	0	0.17	0.83

表2 状态转移矩阵2

Table 2 State transition matrix 2

S	S'
S	1	2	3	4	5	6	7	8	9
1	0.32	0.35	0.32	0	0	0	0	0	0
2	0	0.32	0	0.16	0.52	0	0	0	0
3	0.02	0	0.18	0.27	0	0..21	0.32	0	0
4	0	0.18	0.31	0.09	0.11	0	0.31	0	0
5	0	0.16	0	0.30	0.35	0	0.2	0	0
6	0	0	0.26	0	0	0.38	0	0.15	0.21
7	0	0	0.35	0.14	0	0	0.16	0	0.35
8	0	0	0	0	0	0.21	0	0.59	0.2
9	0	0	0	0	0	0.41	0.05	0.04	0.50

图3 探索率自适应设置策略仿真

Fig.3 Simulation of exploration rate adaptive setting strategy

图4 干扰动作空间裁剪策略仿真

Fig.4 Simulation of jamming action space cropping strategy

表3 自适应探索率收敛矩阵

Table 3 Adaptive exploration rate convergence matrix

S	a
S	1	2	3	4	5	6	7	8	9
1	37.0	50.1	53.6	0	0	0	0	0	0
2	44.4	56.5	53.6	65.8	81.0	0	0	0	0
3	0	50.6	41.6	61.9	0	65.8	0	0	0
4	0	56.1	-0.6	59.2	78.6	0	81.0	0	0
5	0	54.0	0	63.7	67.4	0	81.0	0	100
6	0	0	49.8	0	0	60.1	-0.5	81	0
7	0	0	0	62.4	78.9	59.8	35.5	0	100
8	0	0	0	0	0	59.3	0	0	100
9	0	0	0	0	0	0	0	0	0

图5 Q表总和收敛曲线

Fig.5 Q-table total convergence curve

图6 各变量变化情况

Fig.6 Changes in various variables

图7 自适应探索率

Fig.7 Adaptive exploration rate

图8 不同探索率对算法收敛性能的影响

Fig.8 The influences of different exploration rates on the convergence performance of algorithm

表4 模拟退火法的探索率收敛矩阵

Table 4 Exploration rate convergence matrix of simulated annealing method

S	a
S	1	2	3	4	5	6	7	8	9
1	50.1	63.8	52.2	0	0	0	0	0	0
2	49.5	63.8	52.1	65.8	81.0	0	0	0	0
3	0	63.8	51.0	63.2	0	65.0	0	0	0
4	0	63.8	50.2	63.5	79.0	0	81.0	0	0
5	0	63.8	0	63.8	78.9	0	81.0	0	100
6	0	0	50.9	0	0	62.9	79	79.5	0
7	0	0	0	63.5	79.0	62.2	74.3	0	100
8	0	0	0	0	0	62.2	0	24.1	100
9	0	0	0	0	0	0	0	0	0

图9 不同探索率下的干扰成功率

Fig.9 Jamming success rates under different exploration rates

图10 不同探索率曲线

Fig.10 Different exploration rate curves

图11 性能对比

Fig.11 Performance comparison

参考文献 24

[1]	张柏开, 朱卫纲. MFR认知干扰决策体系构建及关键技术[J]. 系统工程与电子技术, 2020, 42(9):1969-1975. doi: 10.3969/j.issn.1001-506X.2020.09.12
	ZHANG B K, ZHU W G. Construction and key technologies of MFR cognitive interference decision system[J]. Systems Engineering and Electronic Technology, 2020, 42(9):1969-1975. (in Chinese)
[2]	ZHANG W X, MA D, ZHAO Z K, et al. Design of cognitive jamming decision-making system against MFR based on reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2023, 72(8):10048-10062.
[3]	HORNE C, RITCHIE M, GRIFFITHS H. Proposed ontology for cognitive radar systems[J]. IET Radar,Sonar & Navigation, 2018, 12(12):1363-1370.
[4]	ZHOU H J. An introduction of cognitive electronic warfare system[C]// Proceedings of the 2018 CSPS Volume III:Systems 7th Communications,Signal Processing,and Systems. Singapore: Springer, 2020:1202-1210.
[5]	黄知涛, 王翔, 赵雨睿. 认知电子战综述[J]. 国防科技大学学报, 2023, 45(5):1-11.
	HUANG Z T, WANG X, ZHAO Y R. Review of cognitive electronic warfare[J]. Journal of National University of Defense Technology, 2023, 45(5):1-11. (in Chinese)
[6]	ZHANG W X, ZHAO T, ZHAO Z K, et al. Performance analysis of deep reinforcement learning-based intelligent cooperative jamming method confronting multi-functional networked radar[J]. Signal Processing, 2023, 207:108965.
[7]	冯路为, 刘松涛, 徐华志. 基于POMDP模型的智能雷达干扰决策方法[J]. 系统工程与电子技术, 2023, 45(9):2755-2760. doi: 10.12305/j.issn.1001-506X.2023.09.13
	FENG L W, LIU S T, XU H Z. Intelligent radar interference decision-making method based on POMDP model[J]. Systems Engineering and Electronic Technology, 2023, 45(9):2755-2760. (in Chinese)
[8]	ZHANG C D, WANG L, JIANG R D, et al. Radar jamming decision-making in cognitive electronic warfare:a review[J]. IEEE Sensors Journal, 2023, 23(11):11383-11403.
[9]	GONG Y, CHEN W Y, ZHONG H. Automatic radar jamming strategy generation based on EWD3Q algorithm[C]// Proceedings of the 2022 International Conference on Virtual Reality,Human-Computer Interaction and Artificial Intelligence. Washington,D.C.,US: IEEE, 2022:18-22.
[10]	WANG Y J, ZHANG W X. An intelligent interference decision-making method for countering radar system[C]// Proceedings of the 2023 International Conference on Microwave and Millimeter Wave Technology. Washington,D.C.,US: IEEE, 2023:1-3.
[11]	LI K, JIU B, WANG P H, et al. Radar active antagonism through deep reinforcement learning:a way to address the challenge of mainlobe jamming[J]. Signal Processing, 2021, 186:108130.
[12]	李云杰, 朱云鹏, 高梅国. 基于Q-学习算法的认知雷达对抗过程设计[J]. 北京理工大学学报, 2015, 35(11):1194-1199.
	LI Y J, ZHU Y P, GAO M G. Design of cognitive radar countermeasures process based on Q-learning algorithm[J]. Transactions of Beijing Institute of Technology, 2015, 35(11):1194-1199. (in Chinese)
[13]	邢强, 贾鑫, 朱卫纲. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40(5):1031-1035.
	XING Q, JIA X, ZHU W G. Intelligent radar countermeasures based on Q-learning[J]. Systems Engineering and Electronic Technology, 2018, 40(5):1031-1035. (in Chinese)
[14]	张柏开, 朱卫纲. 基于Q-Learning的多功能雷达认知干扰决策方法[J]. 电讯技术, 2020, 60(2):129-136.
	ZHANG B K, ZHU W G. Multi-functional radar cognitive interference decision-making method based on Q-Learning[J]. Telecommunications Technology, 2020, 60 (2):129-136. (in Chinese)
[15]	张柏开, 朱卫纲. 对多功能雷达的DQN认知干扰决策方法[J]. 系统工程与电子技术, 2020, 42(4):819-825. doi: 10.3969/j.issn.1001-506X.2020.04.12
	ZHANG B K, ZHU W G. DQN cognitive interference decision method for multi-functional radar[J]. Systems Engineering and Electronic Technology, 2020, 42(4):819-825. (in Chinese)
[16]	朱霸坤, 朱卫纲, 李伟, 等. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12):3685-3695. doi: 10.12305/j.issn.1001-506X.2022.12.12
	ZHU B K, ZHU W G, LI W, et al. A multi-functional radar intelligent jamming decision method based on prior knowledge[J]. Systems Engineering and Electronic Technology, 2022, 44(12):3685-3695. (in Chinese)
[17]	ZHANG C D, SONG Y Q, JIANG R D, et al. A cognitive electronic jamming decision-making method based on Q-Learning and ant colony fusion algorithm[J]. Remote Sensing, 2023, 15(12):3108.
[18]	PAN Z S, LI Y J, WANG S F, et al. Joint optimization of jamming type selection and power control for countering multi-function radar based on deep reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59 (4):4651-4665.
[19]	ZHANG Y J, HUO W B, HUANG Y L, et al. Jamming policy generation via heuristic programming reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(6):8782-8799.
[20]	廖艳苹, 谢榕浩. 基于双层强化学习的多功能雷达认知干扰决策方法[J]. 应用科技, 2023, 50(6):56-62.
	LIAO Y P, XIE R H. Multi functional radar cognitive interference decision-making method based on double-layer reinforcement learning[J]. Applied Science and Technology, 2023, 50(6):56-62. (in Chinese)
[21]	LI H Q, LI Y L, HE C, et al. Cognitive electronic jamming decision-making method based on improved Q-learning algorithm[J]. International Journal of Aerospace Engineering, 2021, 2021(1):8647386.
[22]	侯志杰. 雷达个体识别及干扰决策[D]. 西安: 西安电子科技大学, 2021.
	HOU Z J. Radar individual identification and jamming decision[D]. Xi’an: Xidian University, 2021. (in Chinese)
[23]	尹依伊, 王晓芳, 周健. 基于Q学习的多无人机协同航迹规划方法[J]. 兵工学报, 2023, 44(2):484-495. doi: 10.12382/bgxb.2021.0606
	YIN Y Y, WANG X F, ZHOU J. Multi UAV collaborative trajectory planning method based on Q-learning[J]. Acta Armamentarii, 2023, 44(2):484-495. (in Chinese)
[24]	毛少卿. 基于强化学习的智能干扰决策方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2021.
	MAO S Q. Research on intelligent interference decision making method based on reinforcement learning[D]. Harbin: Harbin Institute of Technology, 2021. (in Chinese)