Welcome to Acta Armamentarii ! Today is

Acta Armamentarii ›› 2025, Vol. 46 ›› Issue (S1): 250413-.doi: 10.12382/bgxb.2025.0413

Previous Articles     Next Articles

Aerial Vehicle Detection based on Multispectral Feature Fusion in Complex Scene

ZHAO Zhijie1,*(), SHEN Shiqi1, YING Zhanfeng2, LI Keting1, LI Ruixing1, TANG Shiwei1   

  1. 1 School of National Key Laboratory of Transient PhysicsNanjing University of Science and Technology, Nanjing 210094,Jiangsu, China
    2 School of energy and Power EngineeringNanjing University of Science and Technology, Nanjing 210094,Jiangsu, China
  • Received:2025-05-27 Online:2025-11-06
  • Contact: ZHAO Zhijie

Abstract:

To address the challenges of multispectral feature mismatch,complex scene interference,and insufficient detection accuracy in drone aerial vehicle target detection,this paper proposes Multi-Scale Gated Fusion Network(MSGF-Net),a YOLOv10-based multi-feature space joint optimization network.First,the network employs a dual-stream feature extraction network and introduces a gated local-global fusion (GLGF) module into the backbone.This enables effective interaction and joint optimization of visible and infrared features across mult-feature spaces,thereby mitigating the impact of feature mismatch and enhancing feature representation.Subsequently,after the feature pyramid network,MSGF-Net incorporates a cross-modulation block (CMB) module to perform the pixel-wise weighted fusion of multi-feature space information,further improving the complementarity between different spectral features.Experiments on the public DroneVehicle dataset demonstrate that MSGF-Net achieves 83.4% mAP0.5and 63.9% mAP0.5:0.95,showing a significant improvement compared to the single-channel model YOLOv10n.Furthermore,compared to leading multi-modal fusion algorithms like C2Former and TSDADet,MSGF-Net increases the mAP0.5:0.95 by over 9 percentage points,providing compelling evidence of its superior accuracy and robustness.

Key words: computer vision, multi-spectral feature fusion, YOLOv10, complex scene, target detection