欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2025, Vol. 46 ›› Issue (S1): 250413-.doi: 10.12382/bgxb.2025.0413

• • 上一篇    下一篇

复杂场景下多光谱特征融合的航拍车辆检测

赵子杰1,*(), 沈诗淇1, 应展烽2, 李科廷1, 李瑞星1, 唐世玮1   

  1. 1 南京理工大学 瞬态物理全国重点实验室, 江苏 南京 210094
    2 南京理工大学 能源与动力工程学院, 江苏 南京 210094

Aerial Vehicle Detection based on Multispectral Feature Fusion in Complex Scene

ZHAO Zhijie1,*(), SHEN Shiqi1, YING Zhanfeng2, LI Keting1, LI Ruixing1, TANG Shiwei1   

  1. 1 School of National Key Laboratory of Transient PhysicsNanjing University of Science and Technology, Nanjing 210094,Jiangsu, China
    2 School of energy and Power EngineeringNanjing University of Science and Technology, Nanjing 210094,Jiangsu, China
  • Received:2025-05-27 Online:2025-11-06

摘要:

针对无人机航拍车辆目标检测中存在的多光谱特征失配、复杂场景干扰以及检测精度不足等问题,提出一种基于YOLOv10的多特征空间联合优化网络——多尺度门控融合网络(Multi-Scale Gated Fusion Network,MSGF-Net)。该网络通过双流特征提取并在主干引入门控局部-全局融合模块,实现可见光与红外特征在多特征空间中的有效交互与联合优化,缓解特征失配带来的影响并增强特征表达。在特征金字塔网络后,引入跨模态嵌入模块,逐像素加权融合多特征空间信息,进一步提升不同光谱特征间的互补性。在公开数据集DroneVehicle上的实验结果表明:MSGF-Net的mAP0.5mAP0.5:0.95分别达到83.4%和63.9%,相较于单通道模型YOLOv10n有显著提升,且相比于C2Former和TSDADet等代表性的多模态融合算法提升超过9个百分点,结果有力地证明了模型在低光、雾天、遮挡等复杂场景下检测具有更优的鲁棒性和准确性。

关键词: 计算机视觉, 多光谱特征融合, YOLOv10, 复杂场景, 目标检测

Abstract:

To address the challenges of multispectral feature mismatch,complex scene interference,and insufficient detection accuracy in drone aerial vehicle target detection,this paper proposes Multi-Scale Gated Fusion Network(MSGF-Net),a YOLOv10-based multi-feature space joint optimization network.First,the network employs a dual-stream feature extraction network and introduces a gated local-global fusion (GLGF) module into the backbone.This enables effective interaction and joint optimization of visible and infrared features across mult-feature spaces,thereby mitigating the impact of feature mismatch and enhancing feature representation.Subsequently,after the feature pyramid network,MSGF-Net incorporates a cross-modulation block (CMB) module to perform the pixel-wise weighted fusion of multi-feature space information,further improving the complementarity between different spectral features.Experiments on the public DroneVehicle dataset demonstrate that MSGF-Net achieves 83.4% mAP0.5and 63.9% mAP0.5:0.95,showing a significant improvement compared to the single-channel model YOLOv10n.Furthermore,compared to leading multi-modal fusion algorithms like C2Former and TSDADet,MSGF-Net increases the mAP0.5:0.95 by over 9 percentage points,providing compelling evidence of its superior accuracy and robustness.

Key words: computer vision, multi-spectral feature fusion, YOLOv10, complex scene, target detection