A Reinforcement Learning-based Radar Jamming Decision-making Method with Adaptive Setting of Exploration Rate

doi:10.12382/bgxb.2024.0357

Abstract

Abstract:

The current radar jamming decision-making method based on reinforcement learning sets the exploration rate parameter according to a single factor and fixed law,which leads to the increase in the number of confrontation rounds required for algorithm convergence.A reinforcement learning-based radar jamming decision-making method with adaptive setting of exploration rate is proposed.Based on the Metropolis parameter adjustment criterion of simulated annealing method,an adaptive setting criterion of exploration rate is derived from the number of radar operating states recognized by jammers,the number of jamming successes,the change rate of algorithm convergence curve and the jammer’s cognition of radar in the process of countermeasures.According to the effectiveness of jamming action,a jamming action space clipping strategy is designed to reduce the dimension of jamming action space and further improve the convergence speed of the algorithm.In the simulation experiment,two different radar working state diagrams are designed and compared by using the Q-learning algorithm.The simulated results show that the proposed method can achieve the adaptive setting of exploration rate when the radar working state transition relationship changes.Compared with the exploration rate setting scheme based on simulated annealing method,single factor and fixed law,the number of confrontation rounds required for the convergence of the proposed method in the two state diagrams is reduced by 18%,26%,45% and 42%,44%,48%,respectively.At the same time,it can also obtain greater benefits and higher jamming success rate,which provides a new idea of exploration rate setting for multi-functional radar jamming decision-making based on reinforcement learning.

Key words: multi-functional radar, radar jamming decision-making, reinforcement learning, exploration rate

ZHANG Wang, SHAO Xuehui, TANG Huilong, WEI Jianlin, WANG Wei. A Reinforcement Learning-based Radar Jamming Decision-making Method with Adaptive Setting of Exploration Rate[J]. Acta Armamentarii, 2025, 46(3): 240357-.

Figures/Tables 15

References 24

[1]	张柏开, 朱卫纲. MFR认知干扰决策体系构建及关键技术[J]. 系统工程与电子技术, 2020, 42(9):1969-1975. doi: 10.3969/j.issn.1001-506X.2020.09.12
	ZHANG B K, ZHU W G. Construction and key technologies of MFR cognitive interference decision system[J]. Systems Engineering and Electronic Technology, 2020, 42(9):1969-1975. (in Chinese)
[2]	ZHANG W X, MA D, ZHAO Z K, et al. Design of cognitive jamming decision-making system against MFR based on reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2023, 72(8):10048-10062.
[3]	HORNE C, RITCHIE M, GRIFFITHS H. Proposed ontology for cognitive radar systems[J]. IET Radar,Sonar & Navigation, 2018, 12(12):1363-1370.
[4]	ZHOU H J. An introduction of cognitive electronic warfare system[C]// Proceedings of the 2018 CSPS Volume III:Systems 7th Communications,Signal Processing,and Systems. Singapore: Springer, 2020:1202-1210.
[5]	黄知涛, 王翔, 赵雨睿. 认知电子战综述[J]. 国防科技大学学报, 2023, 45(5):1-11.
	HUANG Z T, WANG X, ZHAO Y R. Review of cognitive electronic warfare[J]. Journal of National University of Defense Technology, 2023, 45(5):1-11. (in Chinese)
[6]	ZHANG W X, ZHAO T, ZHAO Z K, et al. Performance analysis of deep reinforcement learning-based intelligent cooperative jamming method confronting multi-functional networked radar[J]. Signal Processing, 2023, 207:108965.
[7]	冯路为, 刘松涛, 徐华志. 基于POMDP模型的智能雷达干扰决策方法[J]. 系统工程与电子技术, 2023, 45(9):2755-2760. doi: 10.12305/j.issn.1001-506X.2023.09.13
	FENG L W, LIU S T, XU H Z. Intelligent radar interference decision-making method based on POMDP model[J]. Systems Engineering and Electronic Technology, 2023, 45(9):2755-2760. (in Chinese)
[8]	ZHANG C D, WANG L, JIANG R D, et al. Radar jamming decision-making in cognitive electronic warfare:a review[J]. IEEE Sensors Journal, 2023, 23(11):11383-11403.
[9]	GONG Y, CHEN W Y, ZHONG H. Automatic radar jamming strategy generation based on EWD3Q algorithm[C]// Proceedings of the 2022 International Conference on Virtual Reality,Human-Computer Interaction and Artificial Intelligence. Washington,D.C.,US: IEEE, 2022:18-22.
[10]	WANG Y J, ZHANG W X. An intelligent interference decision-making method for countering radar system[C]// Proceedings of the 2023 International Conference on Microwave and Millimeter Wave Technology. Washington,D.C.,US: IEEE, 2023:1-3.
[11]	LI K, JIU B, WANG P H, et al. Radar active antagonism through deep reinforcement learning:a way to address the challenge of mainlobe jamming[J]. Signal Processing, 2021, 186:108130.
[12]	李云杰, 朱云鹏, 高梅国. 基于Q-学习算法的认知雷达对抗过程设计[J]. 北京理工大学学报, 2015, 35(11):1194-1199.
	LI Y J, ZHU Y P, GAO M G. Design of cognitive radar countermeasures process based on Q-learning algorithm[J]. Transactions of Beijing Institute of Technology, 2015, 35(11):1194-1199. (in Chinese)
[13]	邢强, 贾鑫, 朱卫纲. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40(5):1031-1035.
	XING Q, JIA X, ZHU W G. Intelligent radar countermeasures based on Q-learning[J]. Systems Engineering and Electronic Technology, 2018, 40(5):1031-1035. (in Chinese)
[14]	张柏开, 朱卫纲. 基于Q-Learning的多功能雷达认知干扰决策方法[J]. 电讯技术, 2020, 60(2):129-136.
	ZHANG B K, ZHU W G. Multi-functional radar cognitive interference decision-making method based on Q-Learning[J]. Telecommunications Technology, 2020, 60 (2):129-136. (in Chinese)
[15]	张柏开, 朱卫纲. 对多功能雷达的DQN认知干扰决策方法[J]. 系统工程与电子技术, 2020, 42(4):819-825. doi: 10.3969/j.issn.1001-506X.2020.04.12
	ZHANG B K, ZHU W G. DQN cognitive interference decision method for multi-functional radar[J]. Systems Engineering and Electronic Technology, 2020, 42(4):819-825. (in Chinese)
[16]	朱霸坤, 朱卫纲, 李伟, 等. 基于先验知识的多功能雷达智能干扰决策方法[J]. 系统工程与电子技术, 2022, 44(12):3685-3695. doi: 10.12305/j.issn.1001-506X.2022.12.12
	ZHU B K, ZHU W G, LI W, et al. A multi-functional radar intelligent jamming decision method based on prior knowledge[J]. Systems Engineering and Electronic Technology, 2022, 44(12):3685-3695. (in Chinese)
[17]	ZHANG C D, SONG Y Q, JIANG R D, et al. A cognitive electronic jamming decision-making method based on Q-Learning and ant colony fusion algorithm[J]. Remote Sensing, 2023, 15(12):3108.
[18]	PAN Z S, LI Y J, WANG S F, et al. Joint optimization of jamming type selection and power control for countering multi-function radar based on deep reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59 (4):4651-4665.
[19]	ZHANG Y J, HUO W B, HUANG Y L, et al. Jamming policy generation via heuristic programming reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(6):8782-8799.
[20]	廖艳苹, 谢榕浩. 基于双层强化学习的多功能雷达认知干扰决策方法[J]. 应用科技, 2023, 50(6):56-62.
	LIAO Y P, XIE R H. Multi functional radar cognitive interference decision-making method based on double-layer reinforcement learning[J]. Applied Science and Technology, 2023, 50(6):56-62. (in Chinese)
[21]	LI H Q, LI Y L, HE C, et al. Cognitive electronic jamming decision-making method based on improved Q-learning algorithm[J]. International Journal of Aerospace Engineering, 2021, 2021(1):8647386.
[22]	侯志杰. 雷达个体识别及干扰决策[D]. 西安: 西安电子科技大学, 2021.
	HOU Z J. Radar individual identification and jamming decision[D]. Xi’an: Xidian University, 2021. (in Chinese)
[23]	尹依伊, 王晓芳, 周健. 基于Q学习的多无人机协同航迹规划方法[J]. 兵工学报, 2023, 44(2):484-495. doi: 10.12382/bgxb.2021.0606
	YIN Y Y, WANG X F, ZHOU J. Multi UAV collaborative trajectory planning method based on Q-learning[J]. Acta Armamentarii, 2023, 44(2):484-495. (in Chinese)
[24]	毛少卿. 基于强化学习的智能干扰决策方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2021.
	MAO S Q. Research on intelligent interference decision making method based on reinforcement learning[D]. Harbin: Harbin Institute of Technology, 2021. (in Chinese)

S	S'
S	1	2	3	4	5	6	7	8	9
1	0.65	0.07	0.28	0	0	0	0	0	0
2	0.13	0.35	0.11	0.34	0.07	0	0	0	0
3	0	0.54	0.20	0.11	0	0.15	0	0	0
4	0	0.22	0.17	0.12	0.29	0	0.20	0	0
5	0	0.17	0	0.28	0.09	0	0.23	0	0.23
6	0	0	0.35	0	0	0.53	0.07	0.05	0
7	0	0	0	0.18	0.27	0.32	0.04	0	0.19
8	0	0	0	0	0	0.57	0	0.02	0.41
9	0	0	0	0	0	0	0	0.17	0.83

S	S'
S	1	2	3	4	5	6	7	8	9
1	0.65	0.07	0.28	0	0	0	0	0	0
2	0.13	0.35	0.11	0.34	0.07	0	0	0	0
3	0	0.54	0.20	0.11	0	0.15	0	0	0
4	0	0.22	0.17	0.12	0.29	0	0.20	0	0
5	0	0.17	0	0.28	0.09	0	0.23	0	0.23
6	0	0	0.35	0	0	0.53	0.07	0.05	0
7	0	0	0	0.18	0.27	0.32	0.04	0	0.19
8	0	0	0	0	0	0.57	0	0.02	0.41
9	0	0	0	0	0	0	0	0.17	0.83

S	S'
S	1	2	3	4	5	6	7	8	9
1	0.32	0.35	0.32	0	0	0	0	0	0
2	0	0.32	0	0.16	0.52	0	0	0	0
3	0.02	0	0.18	0.27	0	0..21	0.32	0	0
4	0	0.18	0.31	0.09	0.11	0	0.31	0	0
5	0	0.16	0	0.30	0.35	0	0.2	0	0
6	0	0	0.26	0	0	0.38	0	0.15	0.21
7	0	0	0.35	0.14	0	0	0.16	0	0.35
8	0	0	0	0	0	0.21	0	0.59	0.2
9	0	0	0	0	0	0.41	0.05	0.04	0.50

S	S'
S	1	2	3	4	5	6	7	8	9
1	0.32	0.35	0.32	0	0	0	0	0	0
2	0	0.32	0	0.16	0.52	0	0	0	0
3	0.02	0	0.18	0.27	0	0..21	0.32	0	0
4	0	0.18	0.31	0.09	0.11	0	0.31	0	0
5	0	0.16	0	0.30	0.35	0	0.2	0	0
6	0	0	0.26	0	0	0.38	0	0.15	0.21
7	0	0	0.35	0.14	0	0	0.16	0	0.35
8	0	0	0	0	0	0.21	0	0.59	0.2
9	0	0	0	0	0	0.41	0.05	0.04	0.50

S	a
S	1	2	3	4	5	6	7	8	9
1	37.0	50.1	53.6	0	0	0	0	0	0
2	44.4	56.5	53.6	65.8	81.0	0	0	0	0
3	0	50.6	41.6	61.9	0	65.8	0	0	0
4	0	56.1	-0.6	59.2	78.6	0	81.0	0	0
5	0	54.0	0	63.7	67.4	0	81.0	0	100
6	0	0	49.8	0	0	60.1	-0.5	81	0
7	0	0	0	62.4	78.9	59.8	35.5	0	100
8	0	0	0	0	0	59.3	0	0	100
9	0	0	0	0	0	0	0	0	0