To address the challenges of multispectral feature mismatch,complex scene interference,and insufficient detection accuracy in drone aerial vehicle target detection,this paper proposes Multi-Scale Gated Fusion Network(MSGF-Net),a YOLOv10-based multi-feature space joint optimization network.First,the network employs a dual-stream feature extraction network and introduces a gated local-global fusion (GLGF) module into the backbone.This enables effective interaction and joint optimization of visible and infrared features across mult-feature spaces,thereby mitigating the impact of feature mismatch and enhancing feature representation.Subsequently,after the feature pyramid network,MSGF-Net incorporates a cross-modulation block (CMB) module to perform the pixe
l-wise weighted fusion of multi-feature space information,further improving the complementarity between different spectral features.Experiments on the public DroneVehicle dataset demonstrate that MSGF-Net achieves 83.4%
mAP
0
.
5
and 63.9%
mAP
0
.
5:0
.
95
,showing a significant improvement compared to the single-channel model YOLOv10n.Furthermore,compared to leading multi-modal fusion algorithms like C2Former and TSDADet,MSGF-Net increases the
mAP
0
.
5:0
.
95
by over 9 percentage points,providing compelling evidence of its superior accuracy and robustness.