复杂场景下多光谱特征融合的航拍车辆检测

doi:10.12382/bgxb.2025.0413

摘要/Abstract

摘要：

针对无人机航拍车辆目标检测中存在的多光谱特征失配、复杂场景干扰以及检测精度不足等问题，提出一种基于YOLOv10的多特征空间联合优化网络——多尺度门控融合网络（Multi-Scale Gated Fusion Network，MSGF-Net）。该网络通过双流特征提取并在主干引入门控局部-全局融合模块，实现可见光与红外特征在多特征空间中的有效交互与联合优化，缓解特征失配带来的影响并增强特征表达。在特征金字塔网络后，引入跨模态嵌入模块，逐像素加权融合多特征空间信息，进一步提升不同光谱特征间的互补性。在公开数据集DroneVehicle上的实验结果表明：MSGF-Net的mAP₀_.₅和mAP₀_._5：0_.₉₅分别达到83.4%和63.9%，相较于单通道模型YOLOv10n有显著提升，且相比于C²Former和TSDADet等代表性的多模态融合算法提升超过9个百分点，结果有力地证明了模型在低光、雾天、遮挡等复杂场景下检测具有更优的鲁棒性和准确性。

关键词: 计算机视觉, 多光谱特征融合, YOLOv10, 复杂场景, 目标检测

Abstract:

To address the challenges of multispectral feature mismatch，complex scene interference，and insufficient detection accuracy in drone aerial vehicle target detection，this paper proposes Multi-Scale Gated Fusion Network（MSGF-Net），a YOLOv10-based multi-feature space joint optimization network.First，the network employs a dual-stream feature extraction network and introduces a gated local-global fusion （GLGF） module into the backbone.This enables effective interaction and joint optimization of visible and infrared features across mult-feature spaces，thereby mitigating the impact of feature mismatch and enhancing feature representation.Subsequently，after the feature pyramid network，MSGF-Net incorporates a cross-modulation block （CMB） module to perform the pixel-wise weighted fusion of multi-feature space information，further improving the complementarity between different spectral features.Experiments on the public DroneVehicle dataset demonstrate that MSGF-Net achieves 83.4% mAP₀_.₅and 63.9% mAP₀_._5：0_.₉₅，showing a significant improvement compared to the single-channel model YOLOv10n.Furthermore，compared to leading multi-modal fusion algorithms like C2Former and TSDADet，MSGF-Net increases the mAP₀_._5：0_.₉₅ by over 9 percentage points，providing compelling evidence of its superior accuracy and robustness.

Key words: computer vision, multi-spectral feature fusion, YOLOv10, complex scene, target detection

赵子杰, 沈诗淇, 应展烽, 李科廷, 李瑞星, 唐世玮. 复杂场景下多光谱特征融合的航拍车辆检测[J]. 兵工学报, 2025, 46(S1): 250413-.

ZHAO Zhijie, SHEN Shiqi, YING Zhanfeng, LI Keting, LI Ruixing, TANG Shiwei. Aerial Vehicle Detection based on Multispectral Feature Fusion in Complex Scene[J]. Acta Armamentarii, 2025, 46(S1): 250413-.

图/表 8

参考文献 22

[1]	HE Z G, HE Y Q. AS-Faster-RCNN:an improved object detection algorithm for airport scene based on faster R-CNN[J]. IEEE Access, 2025, 13:36050-36064.
[2]	HUSSAIN M. YOLO-v1 to YOLO-v8,the rise of YOLO and its complementary nature toward digital manufacturing and industri-al defect detection[J]. Machines, 2023, 11(7):677.
[3]	SUN W, DAI L, ZHANG X R, et al. RSOD:Real-time small object detection algorithm in UAV-based traffic monitoring[J]. Applied Intelligence, 2021, 52:8448-8463.
[4]	POUR H H, LI F, WEGMETH L, et al. A machine learning framework for automated accident detection based on multimodal sensors in cars[J]. Sensors, 2022, 22(10):3634.
[5]	SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10):6700-6713.
[6]	ZHANG L, LIU Z Y, ZHU X Y, et al. Weakly aligned feature fusionfor multimodal object detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 36(3):4145-4159.
[7]	江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4):137-151.
	JIANG B, QU R K, LI Y D, et al. A review of UAV aerial target detection based on deep learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4):137-151. (in Chinese)
[8]	WANG Q W, CHI Y K, SHEN T, et al. Improving RGB-infrared object detection by reducing cross-modality redundancy[J]. Remote Sensing, 2022, 14(9):2020.
[9]	WANG A, CHEN H, LIU L H, et al. YOLOv10:real-time end-to-end object detection[Z].arXiv:2405.14458.
[10]	VOLLMER M. Infrared thermal imaging[M]//Computer Vision:A Reference Guide. Cham: Springer International Publishing,2020:1-4.
[11]	REN W Q, MA L, ZHANG J W, et al. Gated fusion network for single image dehazing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,Utah, US:IEEE,2018:3253-3261.
[12]	DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Waikoloa,HI, US:IEEE,2021:3560-3569.
[13]	SHI Z C, HU J, REN J, et al. HS-FPN:High frequency and spatial perception FPN for tiny object detection[Z].arXiv:2412.10116,2024.
[14]	WANG D H, ZHAO T, YU W H, et al. Deep multimodal complementarity learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34(12):10213-10224
[15]	张宸嘉, 朱磊, 俞璐. 卷积神经网络中的注意力机制综述[J]. 计算机工程与应用, 2021, 57(20):64-72. doi: 10.3778/j.issn.1002-8331.2105-0135
	ZHANG C J, ZHU L, YU L. A review of attention mechanisms in convolutional neural networks[J]. Journal of Computer Engineering & Applications, 2021, 57(20):64-72. (in Chinese)
[16]	解宇敏, 张浪文, 余孝源, 等. 可见光-红外特征交互与融合的 YOLOv5目标检测算法[J]. 控制理论与应用, 2024, 41(5):914-922.
	XIE Y M, ZHANG L W, YU X Y, et al. YOLOv5 object detection algorithm with visible-infrared feature interaction and fusion[J]. Control Theory & Applications, 2024, 41(5):914-922. (in Chinese)
[17]	SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10):6700-6713.
[18]	ZHAO Y, LV W Y, XU S L, et al. Detrs beat yolos on real-time object detection[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Seattle,WA, US:IEEE,2024:16965-16974.
[19]	GU A, DAO T. Mamba:Linear-time sequence modeling with selective state spaces[Z].arXiv:2312.00752,2023.
[20]	YUAN M X, WEI X X. C²former:Calibrated and complementary transformer for RGB-infrared object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62:5403712.
[21]	YUAN M X, WANG Y Y, WEI X X. Translation,scale and rotation:cross-modal alignment meets RGB-infrared vehicle detection[C]// Proceedings of the European Conference on Computer Vision 2022.Tel-Aviv, Israel: European Association for Computer Vision,2022:509-525.
[22]	ZHOU K L, CHEN L S, CAO X. Improving multispectral pedestrian detection by addressing modality imbalance problems[C]// Proceedings of the Computer Vision-ECCV 2020: 16th European Conference,Glasgow,UK:IEEE,2020:787-803.

模态	白天	夜晚	深夜
RGB 模态
IR 模态

模态	白天	夜晚	深夜
RGB 模态
IR 模态

参数	设置
Epochs	300
Batch	24
Workers	4
Imgsz	640
Optimizer	SGD
Lrf	0.01
lr0	0.1
Momentum	0.973

参数	设置
Epochs	300
Batch	24
Workers	4
Imgsz	640
Optimizer	SGD
Lrf	0.01
lr0	0.1
Momentum	0.973

模型	+MSGF	+GLGF	+CMB	mAP_0.5/ %	mAP_0.5：0.95/ %	Para/M	GFLOPs
A				74.8	52.9	2.266	6.5
B	√			81.4	61.7	4.491	12.7
C	√	√		82.8	62.8	5.337	16.5
D	√	√	√	83.4	64.0	5.424	16.7