A Semi-supervised Learning Method for Intelligent Decision Making of Submarine Maneuvering Evasion

doi:10.12382/bgxb.2023.0684

Abstract

Abstract:

When a submarine defends against the incoming torpedoes, it is subjected to the weakly observable environment under water, and the target information obtained is sparse. The setting of maneuvering parameters is a key part of submarine tactical decision-making. The existing methods for setting the maneuvering parameters inevitably introduce observation errors in modeling, there is lack of a means to respond to the evolution of situation, and due to the scarcity of military experts, and it is very expensive to obtain the flexible tactical confrontation samples of military experts. To solve the above difficulties, an intelligent tactical decision-making method based on the combination of self-coding and active Q-learning strategy is proposed. By introducing a contrasting predictive coding autoencoder, the mutual information entropy between the time series input and the context is maximized, and the representation ability of sparse time series input is improved. The representation input is combined with the active reinforcement learning task to reduce the label demand rate of the agent and improve the environmental feedback ability of parameter setting. The datasets of God perspective and red perspective are constructed based on the data collected in the past three years. Experiments based on this dataset show that the decision accuracies of the proposed method and the model ablation experiment without sparse time series auto-encoder reach 98% and 78%, respectively, while their label demand rates are only 4% and 44%, respectively. Compared with the proposed method and the classical time series classification model, the decision accuracy of the proposed method is improved by 14% and 9%, and the decision accuracy error compared with real human action is only 1% different from that of the supervised model under the condition that the label demand rate is reduced to 24%~44%. It is explained that the proposed model can greatly reduce the label demand while ensuring the decision-making accuracy.

Key words: submarine evasion defense, sparse labels, active Q-learning, self coding, intelligent decision-making

CLC Number:

TJ301

YANG Jing, WU Jinping, LIU Jian, WANG Yongjie, DONG Hanquan. A Semi-supervised Learning Method for Intelligent Decision Making of Submarine Maneuvering Evasion[J]. Acta Armamentarii, 2024, 45(10): 3474-3487.

Figures/Tables 13

References 34

[1]	杨震, 赵娟. 论当代中国的海洋军事观:制海权与海上反介入[J]. 复旦国际关系评论, 2015(2): 160-179.
	YANG Z, ZHAO J. On contemporary China’s maritime military view: command of the sea and maritime anti intervention[J]. Fudan International Studies Review, 2015(2): 160-179. (in Chinese)
[2]	佚名. 俄拟于年内完成新型鱼雷测试[J]. 现代军事, 2017, 4(4): 13.
	Anon. Russia plans to complete testing of new torpedoes within the year[J]. Modern Military, 2017, 4(4): 13. (in Chinese)
[3]	何心怡, 卢军, 张思宇, 等. 国外鱼雷现状与启示[J]. 数字海洋与水下攻防, 2020, 3(2): 87-93.
	HE X Y, LU J, ZHANG S Y, et al. The current situation and enlightenment of torpedoes abroad[J]. Digital Ocean and Underwater Attack and Defense, 2020, 3(2): 87-93. (in Chinese)
[4]	吴金平. 潜艇作战建模与仿真[M]. 北京: 国防工业出版社, 2017.
	WU J P. Submarine combat modeling and simulation[M]. Beijing: National Defense Industry Press, 2017. (in Chinese)
[5]	施征. 俄罗斯潜艇消音技术[续][J]. 现代舰船, 2002(7): 25-27.
	SHI Z. Russian submarine silencing technology [continued][J]. Modern Ships, 2002(7): 25-27. (in Chinese)
[6]	瞿幼苗. 面向智能决策的推理引擎技术[D]. 西安: 西北工业大学, 2018.
	QU Y M. Reasoning engine technology for intelligent decision making[D]. Xi’an: Northwestern Polytechnical University, 2018. (in Chinese)
[7]	王璐, 霍其恩, 李青山, 等基于并行搜索优化的指控系统自适应决策方法[J]. 软件学报, 2022, 33(5): 1774-1799.
	WANG L, HUO Q E, LI Q S, et al. Self-adaptation decision-making based on parallel search optimization for command and control information system[J]. Journal of Software, 2022, 33(5): 1774-1799. (in Chinese)
[8]	张磊潇, 胡伟文, 孙慧玲. 舰艇综合防御鱼雷的作战决策及其关联分析[J]. 兵工学报, 2020, 41(5): 967-974. doi: 10.3969/j.issn.1000-1093.2020.05.016
	ZHANG L X, HU W W, SUN H L. Combat decision and correlation analysis of warship integrated defense torpedo[J]. Acta Armamentarii, 2020, 41(5): 967-974. (in Chinese) doi: 10.3969/j.issn.1000-1093.2020.05.016
[9]	曲泓玥. 基于被动声纳实景仿真的水声对抗性能优化[D]. 哈尔滨: 哈尔滨工程大学, 2020.
	QU H Y. Optimization of underwater acoustic countermeasures performance based on passive sonar real scene simulation[D]. Harbin: Harbin Engineering University, 2020. (in Chinese)
[10]	GENG T, ZHANG A, LU G S. Consensus intuitionistic fuzzy group decision-making method for aircraft cockpit display and control system evaluation[J]. Journal of Systems Engineering and Electronics, 2013, 24(4): 634-641. doi: 10.1109/JSEE.2013.00074
[11]	Anon. Defender/deceptor acoustic countermeasures[J]. Jane’s Defence Weekly, 2017, 10(11): 1-4.
[12]	JIANG W, HAN D Q, FAN X, et al. Research on threat assessment based on dempster-shafer evidence theory[C]// Green Communications and Networks. Dordrecht,the Netherlands: Springer, 2012, 113(Part 1): 975-984.
[13]	陈保香, 曹奇英, 夏祖勋. 案例推理在海军战术决策中的应用[J]. 华东船舶工业学院学报, 2000(5): 45-49.
	CHEN B X, CAO Q Y, XIA Z X. The application of case based reasoning in naval tactical decision making[J]. Journal of East China Shipbuilding Institute of Technology, 2000(5): 45-49. (in Chinese)
[14]	ZHOU H Y, ZHANG S H, PENG J Q, et al. Informer: beyond efficient transformer for long sequence time-series forecasting: arXiv:2012.07436[R/OL]. Ithaca, NY, US: Cornell University, 2020(2020-12-14). https://arxiv.org/abs/2012.07436.
[15]	JEHA P, BOHLKE S M, MERCADO P, et al. PSA-GAN: progressive self attention GANs for synthetic time series: arXiv:2108.00981[R/OL]. Ithaca, NY, US: Cornell University, 2021(2021-08-02). https://arxiv.org/abs/2108.00981.
[16]	LAI G, CHANG W C, YANG Y, et al. Modeling long- and short-term temporal patterns with deep neural networks: arXiv: 1703.07015[R/OL]. Ithaca, NY, US: Cornell University, 2017(2017-03-21). https://arxiv.org/abs/1703.07015.
[17]	LEWIS D D, GALE W A. A sequential algorithm for training text classifiers: arXiv:cmp-lg/9407020[R/OL]. Ithaca, NY, US: Cornell University, 1994(1994-07-24). https://arxiv.org/abs/cmp-lg/9407020.
[18]	TONG S, KOLLER D. Support vector machine active learning with applications to text classification[J]. The Journal of Machine Learning Research, 2001, 2: 45-66.
[19]	FREUND Y, SEUNG H S, SHAMIR E, et al. Selective sampling using the query by committee algorithm[J]. Machine Learning, 1997, 28(2/3): 133-168.
[20]	YANG T, FENG Y, ChENG G, et al. Critical events based resource layer structure dynamic adaptive optimization method[J]. IEEE Access, 2019, 7: 36710-36721.
[21]	DAYOUB F, SÜNDERHAUF N, CORKE P I. Episode-based active learning with Bayesian neural networks[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, US:IEEE, 2017: 498-500.
[22]	JOSHI A, PORIKLI F, PAPANI K N. Scalable active learning for multiclass image classification[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 34: 2259-2273. pmid: 22997129
[23]	DONMEZ P, CARBONELL J, SCHNEIDER J. A probabilistic framework to learn from multiple annotators with time-varying accuracy[C]// Proceedings of the 2010 SIAM International Conference on Data Mining.Columbus, OH, US: SIAM, 2010: 826-837.
[24]	SHENG V, PROVOST F, IPEIROTIS P. Get another label? improving data quality and data mining using multiple, noisy labelers[C]// Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, NV, US: Association for Computing Machinery, 2008: 614-622.
[25]	赵东方. 主动探索强化学习算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2020.
	ZHAO D F. Active exploration of reinforcement learning algorithm[D]. Harbin: Harbin Institute of Technology, 2020. (in Chinese)
[26]	CHEN L, ZHANG Y L, FENG Y H, et al. A human-machine agent based on active reinforcement learning for target classification in wargame[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023: 1-13.
[27]	LIU Z M, WANG J Y, GONG S G, et al. Deep reinforcement active learning for human-in-the-loop person re-identification[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea:IEEE, 2019: 6121-6130.
[28]	HUANG H L, HUANG J C, FENG Y H, et al. On the improvement of reinforcement active learning with the involvement of cross entropy to address one-shot learning problem[J]. PLoS ONE, 2019, 14(6): e0217408.
[29]	LAI C I. Contrastive predictive coding based feature for automatic speaker verification: arXiv:1904.01575[R/OL]. Ithaca, NY, US: Cornell University, 2019(2019-04-01). https://arxiv.org/abs/1904.01575.
[30]	丁永忠. 潜射自航式声诱饵发射方向研究[J]. 航空计算技术, 2014, 44(6): 59-61, 66.
	DING Y Z. Research on the launch direction of submarine launched self propelled acoustic bait[J]. Aeronautical computing technology, 2014, 44(6): 59-61, 66. (in Chinese)
[31]	张方方, 李文哲, 董晓明, 等. 噪声干扰器作用下反潜鱼雷主动自导性能数值分析[J]. 水下无人系统学报, 2020, 28(1): 33-38.
	ZHANG F F, LI W Z, DONG X M, et al. Numerical analysis of active homing performance of anti submarine torpedo under the action of noise jammer[J]. Journal of Underwater Unmanned Systems, 2020, 28(1): 33-38. (in Chinese)
[32]	陈颜辉. 水面舰艇综合防御鱼雷决策关键技术[J]. 火力与指挥控制, 2019, 44(6): 102-105.
	CHEN Y H. Key technologies for decision-making of Surface combatant integrated defense torpedo[J]. Fire and Command and control, 2019, 44(6): 102-105. (in Chinese)
[33]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9: 1735-1780. doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[34]	LI H T, CHEN W C, LEVY A, et al. One-shot learning with memory-augmented neural networks using a 64-kbit, 118 GOPS/W RRAM-based non-volatile associative memory[C]// Proceedings of 2021 Symposium on VLSI Technology. Kyoto, Japan:IEEE, 2021: 1-2.

实体	特征含义
潜艇	经度,纬度,深度,航向,速度,探测半径,机动性能参数(2),水声等相关参数(6),其他参数(6)
鱼雷	经度,纬度,深度,航向,速度,机动性能(2),探测参数(4),剩余航程,机动参数(3),其他参数(5)
声诱饵	经度,纬度,释放时间,持续时间,性能参数(6)
干扰器	经度,纬度,释放时间,持续时间,性能参数(6)

实体	特征含义
潜艇	经度,纬度,深度,航向,速度,探测半径,机动性能参数(2),水声等相关参数(6),其他参数(6)
鱼雷	经度,纬度,深度,航向,速度,机动性能(2),探测参数(4),剩余航程,机动参数(3),其他参数(5)
声诱饵	经度,纬度,释放时间,持续时间,性能参数(6)
干扰器	经度,纬度,释放时间,持续时间,性能参数(6)

样本类型	奖惩因子	训练集		测试集
样本类型	奖惩因子	精度	标签	精度	标签
完全信息	(0.7,-1.0,1.0,-0.7)	0.98	0.04	0.98	0.04
	(0.7,-5.0,1.0,-0.7)	0.98	0.22	0.98	0.22
	(0.7,-10,1.0,-0.7)	0.98	0.30	0.98	0.28
红方视角	(0.7,-1.0,1.0,-0.7)	0.98	0.04	0.97	0.04
	(0.7,-5.0,1.0,-0.7)	0.98	0.26	0.98	0.27
	(0.7,-10,1.0,-0.7)	0.98	0.32	0.97	0.32
稀疏时序5有效步	(0.7,-1.0,1.0,-0.7)	0.93	0.14	0.92	0.14
	(0.7,-5.0,1.0,-0.7)	0.93	0.42	0.92	0.40
	(0.7,-10,1.0,-0.7)	0.93	0.47	0.91	0.47
稀疏时序3有效步	(0.7,-1.0,1.0,-0.7)	0.85	0.40	0.81	0.40
	(0.7,-5.0,1.0,-0.7)	0.89	0.52	0.90	0.53
	(0.7,-10,1.0,-0.7)	0.85	0.40	0.86	0.51

样本类型	奖惩因子	训练集		测试集
样本类型	奖惩因子	精度	标签	精度	标签
完全信息	(0.7,-1.0,1.0,-0.7)	0.98	0.04	0.98	0.04
	(0.7,-5.0,1.0,-0.7)	0.98	0.22	0.98	0.22
	(0.7,-10,1.0,-0.7)	0.98	0.30	0.98	0.28
红方视角	(0.7,-1.0,1.0,-0.7)	0.98	0.04	0.97	0.04
	(0.7,-5.0,1.0,-0.7)	0.98	0.26	0.98	0.27
	(0.7,-10,1.0,-0.7)	0.98	0.32	0.97	0.32
稀疏时序5有效步	(0.7,-1.0,1.0,-0.7)	0.93	0.14	0.92	0.14
	(0.7,-5.0,1.0,-0.7)	0.93	0.42	0.92	0.40
	(0.7,-10,1.0,-0.7)	0.93	0.47	0.91	0.47
稀疏时序3有效步	(0.7,-1.0,1.0,-0.7)	0.85	0.40	0.81	0.40
	(0.7,-5.0,1.0,-0.7)	0.89	0.52	0.90	0.53
	(0.7,-10,1.0,-0.7)	0.85	0.40	0.86	0.51

决策者	平均决策时间	策略分布	防御成功概率
人在回路	+27s决策	Ⅰ(0.64),Ⅱ(0.12),Ⅲ(0.24)	0.84
智能算法	先决策	Ⅰ(0.96),Ⅰ(0.04)	0.92