基于时空卷积特征记忆模型的坦克火控系统视频目标检测方法

doi:10.3969/j.issn.1000-1093.2020.09.002

摘要/Abstract

摘要： 视频目标检测技术是提升坦克火控系统战场目标搜索能力的有效手段。针对面向坦克火控系统的视频目标检测任务，提出一种基于时空卷积特征记忆模型的视频目标检测方法。将时空卷积特征校准机制与卷积门控循环单元相结合，建立时空卷积特征记忆模型，同时对多个视频帧中目标的表观特征及运动信息进行建模，以传递并融合视频帧中的目标信息。在特征提取网络以及检测子网络中结合可形变卷积，在检测过程中应用视频序列非极大值抑制，提高对形变以及遮挡目标的检测能力。构建一个包含多种目标类型、尺度、遮挡等条件的坦克火控系统视频目标检测数据集，为多种目标检测方法的测试提供依据。测试结果表明，与R-FCN、 D&T以及MANet等目标检测方法相比，所提方法的平均精度均值最高，能够更好地满足装备的应用需求。

关键词: 坦克火控系统, 视频目标检测, 时空卷积特征校准, 记忆模型, 可形变卷积, 卷积门控循环单元

Abstract: Video object detection technology is an effective means to improve the battlefield object search capability of tank fire control system. In view of the video object detection task of tank fire control system，a video object detection method based on spatial-temporal convolution feature memory model is proposed. The spatial-temporal convolution feature alignment mechanism is combined with convolutional gated recurrent unit to construct a spatial-temporal convolution feature memory model，which can simultaneously model the apparent features and motion information of object in multiple video frames to transfer and fuse the object information in video frames. The feature extraction network and the detection sub-network are combined with the deformable convolution networks，and the non-maximum suppression of video sequences is used in the detection process to improve the performance of detection for deformed and occluded objects. A tank fire control system video object detection dataset is established，which considers different object types， scales，occlusions and other conditions，and can provide the basis for testing for different video object detection methods. The test results show that the mean average precision of the proposed method is the higher than those of R-FCN，D&T and MANet，and the proposed method can better meet the application requirements of equipment.

Key words: tankfirecontrolsystem, videoobjectdetection, spatial-temporalconvolutionfeaturealignment, memorymodel, deformableconvolution, convolutionalgatedrecurrentunit

中图分类号:

TJ810.3⁺76

戴文君，常天庆，褚凯轩，张雷，郭理彬. 基于时空卷积特征记忆模型的坦克火控系统视频目标检测方法[J]. 兵工学报, 2020, 41(9): 1708-1718.

DAI Wenjun， CHANG Tianqing， CHU Kaixuan， ZHANG Lei， GUO Libin. Video Object Detection Method for Tank Fire Control System Based on Spatial-temporal Convolution Feature Memory Model[J]. Acta Armamentarii, 2020, 41(9): 1708-1718.

参考文献［1］

［1］

ZOU Z X，SHI Z W，GUO Y H，et al. Object detection in 20 years: a survey［EB/OL］. (2019-05-16)［2019-11-19］. https:∥arxiv.org/abs/1905.05055.［2］章毓晋.中国图像工程:2018［J］.中国图象图形学报，2019，24(5): 665-676.ZHANG Y J.Image engineering in China:2018［J］.Journal of Image and Graphics，2019，24(5): 665-676. (in Chinese)［3］张顺，龚怡宏，王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用［J］.计算机学报，2019，42(3):3-32.ZHANG S，GONG Y H，WANG J J.The development of deep convolution neural network and its applications on computer vision［J］.Chinese Journal of Computers，2019，42(3):3-32. (in Chinese)［4］朱竞夫，赵碧君，王钦钊.现代坦克火控系统［M］.北京:国防工业出版社，2003.ZHU J F，ZHAO B J，WANG Q Z.Modern tank fire control system［M］.Beijing: National Defense Industry Press，2003.(in Chinese)［5］孙皓泽，常天庆，王全东，等.一种基于分层多尺度卷积特征提取的坦克装甲目标图像检测方法［J］.兵工学报，2017，38(9): 1681-1691.SUN H Z，CHANG T Q，WANG Q D，et al. Image detection method for tank and armored targets based on hierarchical multi-scale convolution feature extraction［J］.Acta Armamentarii，2017，38(9): 1681-1691. (in Chinese)［6］王全东，常天庆，张雷，等.面向多尺度坦克装甲车辆目标检测的改进 Faster R-CNN 算法［J］.计算机辅助设计与图形学学报，2018，30(12): 2278-2291.WANG Q D，CHANG T Q，ZHANG L，et al. An improved Faster R-CNN algorithm for detection of multi-scale tank armored vehicle targets［J］.Journal of Computer-Aided Design & Computer Graphics，2018，30(12): 2278-2291.(in Chinese)［7］ RUSSAKOVSKY O，DENG J，SU H，et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision，2015，115(3): 211-252.［8］ KANG K，LI H，XIAO T，et al. Object detection in videos with tubelet proposal networks［C］∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu，HI，US: IEEE，2017: 727-735.［9］ MASON L, MENGLOWG Z, MARIE W，et al. Looking fast and slow:memory-guided mobile video object detection ［EB/OL］.(2019-03-25) ［2019-11-19］.https:∥arxiv. org/abs/1903. 10172.［10］ DENG J J，PAN Y W，YAO T，et al. Relation distillation networks for video object detection［C］∥Proceedings of the IEEE International Conference on Computer Vision. Seoul，Korea: IEEE，2019: 7023-7032.［11］ KANG K，LI H S，YAN J J，et al. T-CNN: tubelets with convolutional neural networks for object detection from videos［J］.IEEE Transactions on Circuits and Systems for Video Technology，2017，28(10): 2896-2907.［12］ FEICHTENHOFER C，PINZ A，ZISSERMAN A. Detect to track and track to detect［C］∥Proceedings of the IEEE International Conference on Computer Vision. Venice， Italy: IEEE，2017:3038-3046.［13］ WEI H，POOYA K, TOM L P，et al. Seq-NMS for video object detection ［EB/OL］.(2016-08-22) ［2019-11-19］. https:∥arxiv.org/abs/1602.08465.［14］ DOSOVITSKIY A，FISCHER P，ILG E，et al. FlowNet: learning optical flow with convolutional net-works［C］∥Proceedings of the IEEE International Conference on Computer Vision. Santiago，Chile: IEEE，2015:2758-2766.［15］ ZHU X Z，WANG Y J，DAI J F，et al. Flow-guided feature aggregation for video object detection［C］∥Proceedings of the IEEE International Conference on Computer Vision. Venice，Italy: IEEE，2017: 408-417.［16］ LIPTON Z C，ZACHARY J，ELKAN C. A critical review of recurrent neural networks for sequence learning ［EB/OL］. (2015-10-17) ［2019-11-19］.https:∥arxiv.org/abs/1506.00019.［17］ DONAHUE J，ANNE H L，GUADARRAMA S，et al. Long-term recurrent convolutional networks for visual recognition and description［C］∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston，MA，US: IEEE，2015: 2625-2634.［18］ KYUNGHYUN C，BART V M，CAGLAR G，et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation ［EB/OL］.(2014-09-03) ［2019-11-19］.https:∥arxiv.org/abs/1406.1078.［19］ SHI X J，CHEN Z R，WANG H，et al.Convolutional LSTM network: a machine learning approach for precipitation nowcasting［C］∥Advances in Neural Information Processing Systems.Cambridge，MA，US:MIT Press，2015:802-810.［20］ NICOLAS B，LI Y，CHRIS P，et al. Delving deeper into convolutional networks for learning video representations ［EB/OL］. (2016-03-01) ［2019-11-19］.https:∥arxiv.org/abs1511.06432.［21］ LU Y，LU C，TANG C K. Online video object detection using association LSTM［C］∥Proceedings of the IEEE International Conference on Computer Vision.Venice，Italy:IEEE，2017:2344-2352.［22］ LIU W，ANGUELOV D，ERHAN D，et al. SSD: single shot multibox detector［C］∥Proceedings of the European Conference on Computer Vision.Amsterdam，the Netherlands: Springer，2016: 21-37.［23］ XIAO F，JAE L Y.Video object detection with an aligned spatial-temporal memory［C］∥Proceedings of the European Conference on Computer Vision.Munich，Germany:Springer，2018: 485-501.［24］ DAI J F，QI H，XIONG Y，et al. Deformable convolutional networks［C］∥Proceedings of the IEEE International Conference on Computer Vision. Venice，Italy: IEEE，2017:764-773.［25］ HE K M，ZHANG X，REN S，et al.Deep residual learning for image recognition［C］∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas ，NV，US: IEEE，2016: 770-778.［26］ DAI J F，LI Y，HE K M，et al. R-FCN: object detection via region-based fully convolutional networks［C］∥Proceedings of Conference and Workshop on Neural Information Processing Systems. Cambridge，MA，US: MIT Press，2016:379-387.［27］ SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition［EB/OL］.(2015-04-10)［2019-11-19］. https:∥arxiv.org/abs/1409.1556.［28］ SZEGEDY C，LIU W，JIA Y，et al. Going deeper with convolutions［C］∥Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston，MA，US：IEEE，2015: 1-9.［29］ WANG S Y，ZHOU Y C，YAN J J，et al.Fully motion-aware network for video object detection［C］∥Proceedings of the European Conference on Computer Vision.Munich，Germany:Springer，2018:557-573.