Research on Improved YOLOv5-based Military Target Recognition Algorithm Used in Complex Battlefield Environment

Xiaoru SONG; Kang LIU; Song GAO; Chaobo CHEN; Kun YAN

doi:10.12382/bgxb.2022.0736

您当前的位置：

首页 >

文章列表页 >

Research on Improved YOLOv5-based Military Target Recognition Algorithm Used in Complex Battlefield Environment

更新时间：2025-08-18

- Research on Improved YOLOv5-based Military Target Recognition Algorithm Used in Complex Battlefield Environment
- Acta Armamentarii Vol. 45, Issue 3, Pages: 934-947(2024)
- 作者机构：
  
  1. 西安工业大学电子信息工程学院, 陕西西安 710021
  2. 机电动态控制重点实验室, 陕西西安 710065
- 作者简介：
- 基金信息：
- DOI：10.12382/bgxb.2022.0736
  CLC：
- Received：21 August 2022，
  
  Published Online：29 March 2024，
  
  Published：22 March 2024
- 稿件说明：
移动端阅览
Xiaoru SONG, Kang LIU, Song GAO, et al. Research on Improved YOLOv5-based Military Target Recognition Algorithm Used in Complex Battlefield Environment[J]. Acta Armamentarii, 2024, 45(3): 934-947.
DOI：

Xiaoru SONG, Kang LIU, Song GAO, et al. Research on Improved YOLOv5-based Military Target Recognition Algorithm Used in Complex Battlefield Environment[J]. Acta Armamentarii, 2024, 45(3): 934-947. DOI： 10.12382/bgxb.2022.0736.

摘要

复杂战场环境下军事目标识别技术是提升战场情报获取能力的基础和关键。针对当前军事目标识别技术在复杂战场环境下漏检误检率高、实时性差等问题

提出一种基于改进YOLOv5模型的PB-YOLO军事目标识别算法。将改进的目标识别算法对于陆战场军事单元的识别锚框进行重新聚类

以提升模型对于目标大小适应度

加速模型收敛;采用通道-空间并行注意力机制

增加模型对复杂战场环境下目标特征信息与位置信息关注度;在特征融合网络部分使用BiFPN以提升模型对于特征的融合能力与速度;采用Alpha_IoU损失函数加速模型收敛

解决当真实框与预测框重合时IoU计算退化问题。实验结果表明

在自建军事目标数据集下

改进算法与主流目标识别算法相比

在保证模型空间复杂度的同时

mAP值达到了90.17%。消融实验对比结果表明

改进后网络较原模型精度提升11.57%

具有较好的识别性能

能够为战场情报获取提供有效的技术支撑。

Abstract

Military target recognition technology used in complex battlefield environment is the basis and key to improve the battlefield intelligence acquisition capability. A PB-YOLO military target recognition algorithm based on the improved YOLOv5 model is proposed to solve the problems of high missed and false detection rates and poor real-time performance of current military target recognition technology in complex battlefield environments. The improved target recognition algorithm is re-clustered for the identification anchor boxes of military units in the land battlefield to improve the model’s fitness for target size and accelerate the convergence of model

and the channel-spatial parallel attention mechanism is used to increase the model’s attention to the feature information and location information of the targets in complex battlefield environments. BiFPN is used in the feature fusion network part to improve the fusion ability and speed of the model for features

and the Alpha_IoU loss function is used to accelerate the convergence of model

and solve the problem of IoU calculation degradation when the real frame and the predicted frame overlap. The experimental results show that

compared with the mainstream target recognition algorithm

the mAP value obtained by the improved algorithm reaches 90.17% while ensuring the model space complexity under the self-built military target data set.

Through the comparison of ablation experiments

the results show that

compared with the original model

the accuracy of the improved network is improved by 11.57%

and it has better recognition performance

which can provide effective technical support for battlefield intelligence acquisition.

关键词

Keywords

references

CHEN C , HUANG J , PAN C Y , et al. Military image scene recognition based on CNN and semantic information [C ] //Proceedings of the 2018 3rd International Conference on Mechanical, Control and Computer Engineering.Huhhot,China:IEEE , 2018 : 573 - 577 .

于博文 , 吕明 . 改进的YOLOv3算法及其在军事目标检测中的应用 [J ] . 兵工学报 , 2022 , 43 ( 2 ): 345 - 354 . DOI: 10.3969/j.issn.1000-1093.2022.02.012 http://doi.org/10.3969/j.issn.1000-1093.2022.02.012 复杂环境下军事目标检测技术是提高战场态势生成、分析能力的基础和关键。针对军事目标检测任务在复杂环境下传统检测算法的检测性能较低问题，提出一种基于改进YOLOv3的军事目标检测算法，通过深度学习实现复杂环境下军事目标的自动检测。构建军事目标图像数据集，为各类目标检测算法提供测试环境；在网络结构上通过引入可形变卷积改进的ResNet50-D残差网络作为特征提取网络，提高网络对形变目标的检测精度和速度；在特征融合阶段引入双注意力机制和特征重构模块，增强目标特征的表征能力，抑制干扰，提升检测精度；利用DIOU损失函数和Focal损失函数重新设计目标检测器的损失函数，进一步提高其对军事目标的检测精度；在军事目标图像数据集中进行测试实验。实验结果表明，改进的YOLOv3算法相比于原YOLOv3算法，平均精度均值提高了2.98%，检测速度提高了8.6帧/s，具有较好的检测性能，可为战场态势生成、分析提供有效的辅助技术支持。

YU B W , LÜ M . Improved YOLOv3 algorithm and its application in military target detection [J ] . Acta Armamentarii , 2022 , 43 ( 2 ): 345 - 354 . (in Chinese) DOI: 10.3969/j.issn.1000-1093.2022.02.012 http://doi.org/10.3969/j.issn.1000-1093.2022.02.012 Military target detection in a complex environment is the basis and key to improving battlefield situation generation and analysis capability. For the military target detection tasks, the detection performance of traditional detection algorithms in complex environment is low. A military target detection algorithm based on improved YOLOv3 algorithm is proposed to automatically detect the military targets in complex environment through deep learning. A military target image dataset is constructed to provide a testing environment for various target detection algorithms. The detection accuracy and speed of deformable target are improved by introducing the deformable convolutional improved ResNet50-D residual network as feature extraction network. In the stage of feature fusion, a dual-attention mechanism and feature reconstruction module are introduced to enhance the characterization ability of target features, suppress the interference, and improve the detection accuracy. The loss function of target detector is redesigned by using DIOU Loss functions and Focal Loss to funther improve the detection accuracy of military targets. The experimental results show that the improved YOLOv3 algorithm improves the average detection accuracy by 2.98% and the detection speed by 8.6 frames/s compared with the original YOLOv3 algorithm. The improved YOLOv3 algorithm has better detection performance and can provide effective auxiliary technical support for battlefield situation generation and analysis.

刘俊 , 孟伟秀 , 余杰 , 等 . 面向军事目标识别的DRFCN深度网络设计及实现 [J ] . 光电工程 , 2019 , 46 ( 4 ): 21 - 30 .

LIU J , MENG W X , YU J , et al. Design and implementation of DRFCN deep network for military target recognition [J ] . Optoelectronic Engineering , 2019 , 46 ( 4 ): 21 - 30 . (in Chinese)

GIRSHICK R , DONAHUE J , DARRELL T , et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C ] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ,US:IEEE , 2014 : 580 - 587 .

WANG Z Z , XIE K , ZHANG X Y , et al. Small-object detection based on YOLO and dense block via image super-resolution [J ] . IEEE Access , 2021 , 9 : 56416 - 56429 . DOI: 10.1109/ACCESS.2021.3072211 http://doi.org/10.1109/ACCESS.2021.3072211 https://ieeexplore.ieee.org/document/9399458/ https://ieeexplore.ieee.org/document/9399458/

GIRSHICK R . Fast R-CNN [C ] //Proceedings of International Conference on Computer Vision . Piscataway, NJ, US : IEEE , 2015 : 1440 - 1448 .

OTHMANI M . A vehicle detection and tracking method for traffic video based on faster R-CNN [J ] . Multimedia Tools and Applications , 2022 , 81 ( 20 ): 28347 - 28365 . DOI: 10.1007/s11042-022-12715-4 http://doi.org/10.1007/s11042-022-12715-4

SI J X , LIN J J , JIANG F , et al. Hand-raising gesture detection in real classrooms using improved R-FCN [J ] . Neurocomputing , 2019 , 359 : 69 - 76 . DOI: 10.1016/j.neucom.2019.05.031 http://doi.org/10.1016/j.neucom.2019.05.031 This paper proposes a novel method for hand-raising detection in real classroom environments. Different from traditional motion detection, the hand-raising detection is quite challenging due to complex backgrounds, various gestures and low resolutions. To solve these challenges, we build up a large-scale dataset from videos of real classrooms, and propose a novel neural network architecture based on region-based, fully convolutional networks (R-FCN). Specifically, we first design an adaptive templates selection algorithm for various gestures of hand-raising detection. Secondly, for better detection of small-size hands, we design a feature pyramid to simultaneously capture the detail and highly semantic features. Compared with state-of-the-art object detection algorithms, our method achieves impressive results on our hand-raising dataset. After extensive testing, the accuracy of the hand-raising detection achieves 90% in mean Average Precision (mAP), which can satisfy real applications. (C) 2019 Elsevier B.V.

ZHANG C , ZHOU J W , WANG H W , et al. Multi-species individual tree segmentation and identification based on improved mask R-CNN and UAV imagery in mixed forests [J ] . Remote Sensing , 2022 , 14 ( 4 ): 874 - 874 . DOI: 10.3390/rs14040874 http://doi.org/10.3390/rs14040874 https://www.mdpi.com/2072-4292/14/4/874 https://www.mdpi.com/2072-4292/14/4/874 High-resolution UAV imagery paired with a convolutional neural network approach offers significant advantages in accurately measuring forestry ecosystems. Despite numerous studies existing for individual tree crown delineation, species classification, and quantity detection, the comprehensive situation in performing the above tasks simultaneously has rarely been explored, especially in mixed forests. In this study, we propose a new method for individual tree segmentation and identification based on the improved Mask R-CNN. For the optimized network, the fusion type in the feature pyramid network is modified from down-top to top-down to shorten the feature acquisition path among the different levels. Meanwhile, a boundary-weighted loss module is introduced to the cross-entropy loss function Lmask to refine the target loss. All geometric parameters (contour, the center of gravity and area) associated with canopies ultimately are extracted from the mask by a boundary segmentation algorithm. The results showed that F1-score and mAP for coniferous species were higher than 90%, and that of broadleaf species were located between 75–85.44%. The producer’s accuracy of coniferous forests was distributed between 0.8–0.95 and that of broadleaf ranged in 0.87–0.93; user’s accuracy of coniferous was distributed between 0.81–0.84 and that of broadleaf ranged in 0.71–0.76. The total number of trees predicted was 50,041 for the entire study area, with an overall error of 5.11%. The method under study is compared with other networks including U-net and YOLOv3. Results in this study show that the improved Mask R-CNN has more advantages in broadleaf canopy segmentation and number detection.

REDMON J , DIVVALA S , GIRSHICK R , et al. You only look once: unified, real-time object detection [C ] //Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NE,US:IEEE , 2016 : 779 - 788 .

孙剑明 , 韩生权 , 沈子成 , 等 . 基于双卷积链的双目人体姿态距离定位识别 [J/OL ] . 兵工学报 : 1 - 8 [2022-08-23 ] . DOI: 10.12382/bgxb.2021.0629 https://dx.doi.org/10.12382/bgxb.2021.0629 .

SUN J M , HAN S Q , SHEN Z C , et al. Binocular human body pose distance localization and recognition based on double convolution chain [J/OL ] . Acta Armamentarii : 1 - 8 [2022-08-23 ] .DOI: 10.12382/bgxb.2021.0629 https://dx.doi.org/10.12382/bgxb.2021.0629 . (in Chinese)

XU M J , WANG X H , ZHANG S , et al. Detection algorithm of aerial vehicle target based on improved YOLOv3 [J ] . Journal of Physics:Conference Series , 2022 , 2284 ( 1 ): 012 - 022 .

KONG H , CHEN Z , YUE W J , et al. Improved YOLOv4 for pedestrian detection and counting in UAV images [J ] . Computational Intelligence and Neuroscience , 2022 , 2022 : 6106853 - 6106853 .

LIU G , CAO Z X , LIU S , et al. An improved SSD method for infrared target detection based on convolutional neural network [J ] . Journal of Computational Methods in Sciences and Engineering , 2022 , 22 ( 4 ): 1393 - 1408 . DOI: 10.3233/JCM-226112 http://doi.org/10.3233/JCM-226112 https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/JCM-226112 https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/JCM-226112 Target detection is the basis for automatic target recognition system of infrared imaging guidance to complete subsequent tasks such as recognition and tracking. Existing systems have not the autonomous learning ability of target feature, and it will be powerless once the task environment exceeds the pre-planned condition. The single-stage target detection based on deep learning has the ability of autonomous learning and high computational efficiency, which is an effective way to solve the problem of infrared imaging guidance target detection in complex environment. SSD (Single Shot MultiBox Detector) is a classical single-stage detection model, however, the convolution layer with strong semantic information in SSD has low resolution, which is not conducive to small target detection. In addition, the location loss of SSD does not consider the impact of target scale change. Therefore, this paper puts forward two improvement ideas in view of SSD: (1) Starting from FPN (Feature Pyramid Network), feature channel’s importance is distinguished through efficient channel attention mechanism, the contribution of each feature layer to the fusion output is described based on the learnable weight, and the feature weighted fusion of bidirectional multi-scale is realized between the feature layer which has low resolution and strong semantics and the feature layer which has high resolution and weak semantics. (2) Starting from IoU (Intersection over Union) and considering non overlapping parts and geometric relationship between the predicted box and the ground-truth box, the location loss of SSD that remains invariable to the target scale change is constructed to improve the sensitivity of the detection model to the locating error of small target. The experimental results show that, for 300 × 300 input, the presented method achieves 84.7% mAP (mean Average Precision) on VOC2007 test and for 512 × 512 input, it reaches 86.6%. On the self-built infrared aircraft data set, the proposed method achieves 81.1% mAP and can detect more small targets. Without affecting detection speed, the presented method on experimental results outperforms some comparable state-of-the-art models such as YOLOv3 (You Only Look Once), DSSD (Deconvolutional Single Shot Multibox Detector), RSSD (Rainbow Single Shot Multibox Detector) and FSSD (Fusion Single Shot Multibox Detector).

LIN T Y , GOYAL P , GRISHICK R , et al. Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 2 ): 318 - 327 . DOI: 10.1109/TPAMI.34 http://doi.org/10.1109/TPAMI.34 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=34 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=34

潘浩 . 基于深度学习的军事目标识别 [D ] . 杭州 : 杭州电子科技大学 , 2018 .

PAN H . Military target recognition based on deep learning [D ] . Hangzhou : Hangzhou Dianzi University , 2018 . (in Chinese)

ZENG G Z , SONG R , HU X , et al. pplying convolutional neural network for military object detection on embedded platform [J ] . Computer Engineering and Technology , 2018 , 22 ( 1 ): 131 - 141 .

陈龙 , 张峰 , 蒋升 . 小样本条件下基于深度森林学习模型的典型军事目标识别方法 [J ] . 中国电子科学研究院学报 , 2019 , 14 ( 3 ): 232 - 237 .

CHEN L , ZHANG F , JIANG S . Typical military target recognition method based on deep forest learning model under small sample conditions [J ] . Journal of Chinese Academy of Electronic Sciences , 2019 , 14 ( 3 ): 232 - 237 . (in Chinese)

林洋 , 董宝良 , 刘泽平 . 一种基于CGAN和GcForest的军事目标识别方法 [J ] . 信息技术 , 2020 , 44 ( 3 ): 134 - 138 .

LIN Y , DONG B L , LIU Z P . A military target recognition method based on CGAN and GcForest [J ] . Information Technology , 2020 , 44 ( 3 ): 134 - 138 . (in Chinese)

杨朝红 , 王伟男 . 基于优化SSD300的小尺度典型军事目标识别方法研究 [J ] . 电脑与信息技术 , 2020 , 28 ( 4 ): 19 - 22 .

YANG C H , WANG W N . Research on small-scale typical military target recognition method based on optimized SSD300 [J ] . Computer and Information Technology , 2020 , 28 ( 4 ): 19 - 22 . (in Chinese)

李鑫 , 王晟全 , 李昂 . 基于非监督网络的军事目标识别算法的研究 [J ] . 电光与控制 , 2021 , 28 ( 10 ): 36 - 39 .

LI X , WANG S Q , LI A . Research on military target recognition algorithm based on unsupervised network [J ] . Electro Optic and Control , 2021 , 28 ( 10 ): 36 - 39 . (in Chinese)

PARK J , WOO S , LEE J Y , et al. Bam: Bottleneck attention module:arXiv:1807.0 6514 [R ] . Ithaca,NY,US : Cornell University , 2018 :1807.0 6514.

TAN M X , PANG R M , LE Q V . Efficientdet: Scalable and efficient object detection [C ] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA,US:IEEE , 2020 : 10781 - 10790 .

HE J B , ERFANI S , MA X J , et al. alpha-IoU: a family of power intersection over union losses for bounding box regression [J ] . Advances in Neural Information Processing Systems , 2021 , 34 : 20230 - 20242 .

LÜ H H , YAN H B , LIU K Y , et al. YOLOv5-AC: attention mechan-ism-based lightweight YOL-Ov5 for trck pedestrian detection [J ] . Sensors , 2022 , 22 ( 15 ): 5903 . DOI: 10.3390/s22155903 http://doi.org/10.3390/s22155903 https://www.mdpi.com/1424-8220/22/15/5903 https://www.mdpi.com/1424-8220/22/15/5903 In response to the dangerous behavior of pedestrians roaming freely on unsupervised train tracks, the real-time detection of pedestrians is urgently required to ensure the safety of trains and people. Aiming to improve the low accuracy of railway pedestrian detection, the high missed-detection rate of target pedestrians, and the poor retention of non-redundant boxes, YOLOv5 is adopted as the baseline to improve the effectiveness of pedestrian detection. First of all, L1 regularization is deployed before the BN layer, and the layers with smaller influence factors are removed through sparse training to achieve the effect of model pruning. In the next moment, the context extraction module is applied to the feature extraction network, and the input features are fully extracted using receptive fields of different sizes. In addition, both the context attention module CxAM and the content attention module CnAM are added to the FPN part to correct the target position deviation in the process of feature extraction so that the accuracy of detection can be improved. What is more, DIoU_NMS is employed to replace NMS as the prediction frame screening algorithm to improve the problem of detection target loss in the case of high target coincidence. Experimental results show that compared with YOLOv5, the AP of our YOLOv5-AC model for pedestrians is 95.14%, the recall is 94.22%, and the counting frame rate is 63.1 FPS. Among them, AP and recall increased by 3.78% and 3.92%, respectively, while the detection speed increased by 57.8%. The experimental results verify that our YOLOv5-AC is an effective and accurate method for pedestrian detection in railways.

HE K M , ZHANG X Y , REN S Q , et al. Deep residual learning for image recognition [C ] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV,US:IEEE , 2016 : 770 - 778 .

IOFFE S , SZEGEDY C . Batch normalization:accel-erating deep network training by reduc-ing internal covariateshift [C ] //Proceedings of International Conference on Machine Learning. Miami, FL,US:ICMLA , 2015 : 448 - 456 .

任江涛 , 李定主 , 屠惠琳 . 基于训练网络的目标检测方法及应用 [J ] . 火力与指挥控制 , 2020 , 45 ( 4 ): 173 - 177 .

REN J T , LI D Z , TU H L . Target detection method and application based on training network [J ] . Firepower and Command and Control , 2020 , 45 ( 4 ): 173 - 177 . (in Chinese)

Views

1098

下载量

CNKI被引量

Alert me when the article has been cited

提交

Tools

Publicity Resources

Weak Supervision-based Infrared Small Target Segmentation Method

Degrad Target Detection Algorithm

Sand-dust Image Restoration Using Gray Compensation and Feature Fusion

Enhanced Multi-scale Target Detection Method Based on YOLOv5

A Method for Specific Communication Emitter Identification Based on Multi-Domain Feature Fusion

Related Author

Long ZHAO

Diankun CHEN

Yeru WANG

Feiwei QIN

Huajie XU

Shu LIU

Haili ZHAO

Linlin WAN

Related Institution

School of Cyberspace Security, Hangzhou Dianzi University

School of Computer Science and Technology, Hangzhou Dianzi University

School of Information and Electronic Engineering, Zhejiang University of Science and Technology

Hangzhou Zhiyuan Research Institute Co., Ltd.

School of Electronic and Information Engineering, Changchun University of Science and Technology

Postal code：100089
Tel：010-68963060/68962718 Fax：010-68963025 Email：bgxb@cos.org.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备05059581号-4 京公网安备11010802024360号
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.
Total Visits：44506 Visits Today：486

⁰