Video Object Detection Method for Tank Fire Control System Based on Spatial-temporal Convolution Feature Memory Model

doi:10.3969/j.issn.1000-1093.2020.09.002

Abstract

Abstract: Video object detection technology is an effective means to improve the battlefield object search capability of tank fire control system. In view of the video object detection task of tank fire control system，a video object detection method based on spatial-temporal convolution feature memory model is proposed. The spatial-temporal convolution feature alignment mechanism is combined with convolutional gated recurrent unit to construct a spatial-temporal convolution feature memory model，which can simultaneously model the apparent features and motion information of object in multiple video frames to transfer and fuse the object information in video frames. The feature extraction network and the detection sub-network are combined with the deformable convolution networks，and the non-maximum suppression of video sequences is used in the detection process to improve the performance of detection for deformed and occluded objects. A tank fire control system video object detection dataset is established，which considers different object types， scales，occlusions and other conditions，and can provide the basis for testing for different video object detection methods. The test results show that the mean average precision of the proposed method is the higher than those of R-FCN，D&T and MANet，and the proposed method can better meet the application requirements of equipment.

Key words: tankfirecontrolsystem, videoobjectdetection, spatial-temporalconvolutionfeaturealignment, memorymodel, deformableconvolution, convolutionalgatedrecurrentunit

CLC Number:

TJ810.3⁺76

DAI Wenjun， CHANG Tianqing， CHU Kaixuan， ZHANG Lei， GUO Libin. Video Object Detection Method for Tank Fire Control System Based on Spatial-temporal Convolution Feature Memory Model[J]. Acta Armamentarii, 2020, 41(9): 1708-1718.

References ［1］

［1］

ZOU Z X，SHI Z W，GUO Y H，et al. Object detection in 20 years: a survey［EB/OL］. (2019-05-16)［2019-11-19］. https:∥arxiv.org/abs/1905.05055.［2］章毓晋.中国图像工程:2018［J］.中国图象图形学报，2019，24(5): 665-676.ZHANG Y J.Image engineering in China:2018［J］.Journal of Image and Graphics，2019，24(5): 665-676. (in Chinese)［3］张顺，龚怡宏，王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用［J］.计算机学报，2019，42(3):3-32.ZHANG S，GONG Y H，WANG J J.The development of deep convolution neural network and its applications on computer vision［J］.Chinese Journal of Computers，2019，42(3):3-32. (in Chinese)［4］朱竞夫，赵碧君，王钦钊.现代坦克火控系统［M］.北京:国防工业出版社，2003.ZHU J F，ZHAO B J，WANG Q Z.Modern tank fire control system［M］.Beijing: National Defense Industry Press，2003.(in Chinese)［5］孙皓泽，常天庆，王全东，等.一种基于分层多尺度卷积特征提取的坦克装甲目标图像检测方法［J］.兵工学报，2017，38(9): 1681-1691.SUN H Z，CHANG T Q，WANG Q D，et al. Image detection method for tank and armored targets based on hierarchical multi-scale convolution feature extraction［J］.Acta Armamentarii，2017，38(9): 1681-1691. (in Chinese)［6］王全东，常天庆，张雷，等.面向多尺度坦克装甲车辆目标检测的改进 Faster R-CNN 算法［J］.计算机辅助设计与图形学学报，2018，30(12): 2278-2291.WANG Q D，CHANG T Q，ZHANG L，et al. An improved Faster R-CNN algorithm for detection of multi-scale tank armored vehicle targets［J］.Journal of Computer-Aided Design & Computer Graphics，2018，30(12): 2278-2291.(in Chinese)［7］ RUSSAKOVSKY O，DENG J，SU H，et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision，2015，115(3): 211-252.［8］ KANG K，LI H，XIAO T，et al. Object detection in videos with tubelet proposal networks［C］∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu，HI，US: IEEE，2017: 727-735.［9］ MASON L, MENGLOWG Z, MARIE W，et al. Looking fast and slow:memory-guided mobile video object detection ［EB/OL］.(2019-03-25) ［2019-11-19］.https:∥arxiv. org/abs/1903. 10172.［10］ DENG J J，PAN Y W，YAO T，et al. Relation distillation networks for video object detection［C］∥Proceedings of the IEEE International Conference on Computer Vision. Seoul，Korea: IEEE，2019: 7023-7032.［11］ KANG K，LI H S，YAN J J，et al. T-CNN: tubelets with convolutional neural networks for object detection from videos［J］.IEEE Transactions on Circuits and Systems for Video Technology，2017，28(10): 2896-2907.［12］ FEICHTENHOFER C，PINZ A，ZISSERMAN A. Detect to track and track to detect［C］∥Proceedings of the IEEE International Conference on Computer Vision. Venice， Italy: IEEE，2017:3038-3046.［13］ WEI H，POOYA K, TOM L P，et al. Seq-NMS for video object detection ［EB/OL］.(2016-08-22) ［2019-11-19］. https:∥arxiv.org/abs/1602.08465.［14］ DOSOVITSKIY A，FISCHER P，ILG E，et al. FlowNet: learning optical flow with convolutional net-works［C］∥Proceedings of the IEEE International Conference on Computer Vision. Santiago，Chile: IEEE，2015:2758-2766.［15］ ZHU X Z，WANG Y J，DAI J F，et al. Flow-guided feature aggregation for video object detection［C］∥Proceedings of the IEEE International Conference on Computer Vision. Venice，Italy: IEEE，2017: 408-417.［16］ LIPTON Z C，ZACHARY J，ELKAN C. A critical review of recurrent neural networks for sequence learning ［EB/OL］. (2015-10-17) ［2019-11-19］.https:∥arxiv.org/abs/1506.00019.［17］ DONAHUE J，ANNE H L，GUADARRAMA S，et al. Long-term recurrent convolutional networks for visual recognition and description［C］∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston，MA，US: IEEE，2015: 2625-2634.［18］ KYUNGHYUN C，BART V M，CAGLAR G，et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation ［EB/OL］.(2014-09-03) ［2019-11-19］.https:∥arxiv.org/abs/1406.1078.［19］ SHI X J，CHEN Z R，WANG H，et al.Convolutional LSTM network: a machine learning approach for precipitation nowcasting［C］∥Advances in Neural Information Processing Systems.Cambridge，MA，US:MIT Press，2015:802-810.［20］ NICOLAS B，LI Y，CHRIS P，et al. Delving deeper into convolutional networks for learning video representations ［EB/OL］. (2016-03-01) ［2019-11-19］.https:∥arxiv.org/abs1511.06432.［21］ LU Y，LU C，TANG C K. Online video object detection using association LSTM［C］∥Proceedings of the IEEE International Conference on Computer Vision.Venice，Italy:IEEE，2017:2344-2352.［22］ LIU W，ANGUELOV D，ERHAN D，et al. SSD: single shot multibox detector［C］∥Proceedings of the European Conference on Computer Vision.Amsterdam，the Netherlands: Springer，2016: 21-37.［23］ XIAO F，JAE L Y.Video object detection with an aligned spatial-temporal memory［C］∥Proceedings of the European Conference on Computer Vision.Munich，Germany:Springer，2018: 485-501.［24］ DAI J F，QI H，XIONG Y，et al. Deformable convolutional networks［C］∥Proceedings of the IEEE International Conference on Computer Vision. Venice，Italy: IEEE，2017:764-773.［25］ HE K M，ZHANG X，REN S，et al.Deep residual learning for image recognition［C］∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas ，NV，US: IEEE，2016: 770-778.［26］ DAI J F，LI Y，HE K M，et al. R-FCN: object detection via region-based fully convolutional networks［C］∥Proceedings of Conference and Workshop on Neural Information Processing Systems. Cambridge，MA，US: MIT Press，2016:379-387.［27］ SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition［EB/OL］.(2015-04-10)［2019-11-19］. https:∥arxiv.org/abs/1409.1556.［28］ SZEGEDY C，LIU W，JIA Y，et al. Going deeper with convolutions［C］∥Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston，MA，US：IEEE，2015: 1-9.［29］ WANG S Y，ZHOU Y C，YAN J J，et al.Fully motion-aware network for video object detection［C］∥Proceedings of the European Conference on Computer Vision.Munich，Germany:Springer，2018:557-573.