中兵智能创新研究院有限公司,北京 100072
群体协同与自主实验室,北京 100072
中国北方车辆研究所,北京 100072
通信作者邮箱:yangwangeducn@gmail.com
收稿:2025-06-30,
网络首发:2025-12-25,
纸质出版:2026-04
移动端阅览
储华珍, 谢天, 潘贝, 等. 融合统计特征的多模态三维目标检测方法[J]. 兵工学报, 2026,47(4):250578.
CHU Huazhen, XIE Tian, PAN Bei, et al. A Multimodal 3D Object Detection Method with Integrated Statistical Features[J]. Acta Armamentarii, 2026, 47(4): 250578.
储华珍, 谢天, 潘贝, 等. 融合统计特征的多模态三维目标检测方法[J]. 兵工学报, 2026,47(4):250578. DOI: 10.12382/bgxb.2025.0578.
CHU Huazhen, XIE Tian, PAN Bei, et al. A Multimodal 3D Object Detection Method with Integrated Statistical Features[J]. Acta Armamentarii, 2026, 47(4): 250578. DOI: 10.12382/bgxb.2025.0578.
三维目标检测是自动驾驶与军事智能感知的关键环节,其性能直接影响复杂环境下的态势认知与决策质量。尽管多模态融合取得进展,但弱纹理目标识别、稀疏点云表征及跨模态配准误差仍构成主要瓶颈。针对上述问题,提出一种融合统计特征的多模态三维目标检测框架,旨在增强模型对真实复杂环境中多样化场景的适应能力和鲁棒性。通过构建精细的点云与图像之间的投影映射实现点级语义特征的有效获取;在体素化过程中引入局部点云的均值与标准差作为统计先验,显著提升特征表示的一致性和稳定性。融合后的多模态特征被输入至三维检测主干网络,完成目标的精确识别与定位。基于KITTI数据集的实验结果显示,新方法在小目标检测任务中表现尤为突出,平均精度较现有优秀模型PV-RCNN提升0.87%,验证了新方法在复杂场景下的实用性和稳健性,为三维目标检测技术的发展提供了新的思路与方法支持。
3D object detection is a critical component of autonomous driving and military intelligent perception
which directly shapes the situational awareness and decision-making in complex environments. The weak-texture target recognition
sparse point cloud representation and cross-modal registration errors remain major bottlenecks although progress has been made in multimodal fusion. To address the aforementioned issues
a multimodal 3D object detection framework that incorporates the statistical features is proposed for enhancing the model's adaptability and robustness to the diverse realworld complex scenarios. The point-level semantic features are effectively acquisited by constructing a refined projection mapping between point cloud and image. The mean and standard deviation of local point cloud as statistical priors are introduced during voxelization
significantly improving the consistency and stability of feature representations. The fused multimodal features are then fed into a 3D detection backbone network to achieve the precise recognition and localization of an object. The experimental results based on the KITTI dataset demonstrate that the proposed method performs particularly well in small object detection tasks
and improves the average detection accuracy by 0.87% compared to PV-RCNN model. These results validate the practicality and robustness of the proposed method in complex scenarios
providing new insights and methodological support for the advancement of 3D object detection technologies.
ZHOU Y, TUZEL O.VoxelNet:end-to-end learning for point cloud based 3D object detection:arXiv:1711.06396 [R].Ithaca, NY, US:Cornell University, 2017:1711.06396.
ZHANG G, CHEN J N, GAO G H, et al. SAFDNet:a simple and effective network for fully sparse 3D object detection:arXiv:2403.05817 [R].Ithaca, NY, US: Cornell University, 2024:2403.05817.
SHI S S, WANG X G, LI H S. PointRCNN:3D object proposal generation and de-tection from point cloud[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, US:IEEE, 2019:770-779.
SHI S S, GUO C X, JIANG L, et al. PV-RCNN:point-voxel feature set abstraction for 3D object detection [C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, US:IEEE, 2020:10529-10538.
YAN L F, YAN P, XIONG S Z, et al. MonoCD: monocular 3D object detection with complementary depths: arXiv: 2404.03181 [R].Ithaca, NY, US:Cornell University, 2024:2404.03181.the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, US:IEEE, 2024:10248-10257.
CHEN D, LI J, GUIZILINI V, et al. Viewpoint equivariance for multi-view 3D object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada:IEEE, 2023:9213-9222.
赵亚男,王显才,高利,等.联合多视角互投影融合的三维目标检测方法[J].北京理工大学学报,2022,42(12):1273-1282.
ZHAO Y N, WANG X C, GAO L, et al.3D target detection method combined with multi-view mutual projection fusion [J]. Transactions of Beijing Institute of Technology, 2022, 42 (12): 1273-1282. (in Chinese)
王烨茹,杨耿,刘述,等.基于图神经网络的车辆目标遮蔽关重部位检测[J].兵工学报,2024,45(增刊1):242-251.
WANG Y R, YANG G, LIU S, et al. GCN-based detection of occluded key parts of vehicle target [J]. Acta Armamentarii, 2024, 45 (S1): 242-251. (in Chinese)
WANG S H, LIU Y F, WANG T C, et al. Exploring object-centric temporal modeling for efficient multi-view 3D object detection:arXiv:2303.11926[R].Ithaca, NY, US:Cornell University, 2023:2303.11926.
VORA S, LANG A H, HELOU B, et al. Pointpainting:sequential fusion for 3D object detection:arXiv:1911.10150[R].Ithaca, NY, US:Cornell University, 2019:1911.10150.
CHEN Q, TANG S H, YANG Q, et al. Cooper: cooperative perception for connected autonomous vehicles based on 3D point clouds[C]∥Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems. Dallas, TX, US:IEEE, 2019:514-524.
ARNOLD E, DIANATI M, DE TEMPLE R, et al. Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (3): 1852-1864.
WANG X H, LAN J H, WANG B X, et al. AVFP-MVX:multimodal VoxelNet with attention mechanism and voxel feature pyramid [J]. IEEE Sensors Journal, 2023, 23 (6): 6139-6149.
WU Y S, CHEN K, ZHANG T Y, et al. Large-scale contrastive language-audio pretraining with feature fusion and keyword-tocaption augmentation [C]∥ Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island, Greece:IEEE, 2023:1-5.
CHEN Z L, FU L L, YAO J, et al. Learnable graph convolutional network and feature fusion for multi-view learning [J]. Information Fusion, 2023, 95:109-119.
李正峰,钟铭恩,张亿鸿,等.基于多任务特征融合与正交注意力的交通环境感知算法[J].工程科学学报,2025,47(6):1303-1313.
LI Z F, ZHONG M E, ZHANG Y H, et al. Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention [J]. Chinese Journal of Engineering, 2025, 47 (6): 1303-1313. (in Chinese)
张梅,金叶,朱金辉,等.特征级语义感知引导的多模态图像融合算法[J].电子与信息学报,2025,47(8):2909-2918.
ZHANG M, JIN Y, ZHU J H, et al. FSG:feature-level semanticaware guided for multimodal image fusion algorithm [J]. Journal of Electronics & Information Technology, 2025, 47 (8): 2909-2918. (in Chinese)
LI Z L, XU X G, LIM S N, et al. UniMODE:unified monocular 3D object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, US:IEEE, 2024:16561-16570.
田枫,宗内丽,刘芳,等.多模态融合的三维目标检测方法研究[J].计算机工程与应用,2024,60(13):113-123.
TIAN F, ZONG N L, LIU F, et al. Research on 3D object detection method based on multimodal fusion [J]. Computer Engineering and Applications, 2024, 60 (13): 113-123. (in Chinese)
LIN Z W, LIU Z, XIA Z Y, et al. RCBEVDet:radar-camera fusion in bird's eye view for 3D object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, US:IEEE, 2024:14928-14937.
LI Z X, LAN S Y, ALVAREZ J M, et al. BEVNeXt:reviving dense BEV frameworks for 3d object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, US:IEEE, 2024:20113-20123.
WANG L, ZHANG X Y, SONG Z Y, et al. Multi-modal 3D object detection in autonomous driving: a survey and taxonomy [J]. IEEE Transactions on Intelligent Vehicles, 2023, 8 (7): 3781-3798.
WANG Z N, FENG D, ZHOU Y Y, et al. Inferring spatial uncertainty in object detection[C]∥Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, NV, US:IEEE, 2020:5792-5799.
YANG H H, WANG W X, CHEN M H, et al. PVT-SSD:singlestage 3D object detector with point-voxel transformer [C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada:IEEE, 2023:13476-13487.
HOANG H A, BUI D C, YOO M. TSSTDet:transformation-based 3-D object detection via a spatial shape transformer [J]. IEEE Sensors Journal, 2024, 24 (5): 7126-7139.
YU Z C, QIU B J, KHONG A W H. ViKIENet:towards efficient 3D object detection with virtual key instance enhanced network [C]∥Proceedings of the Computer Vision and Pattern Recognition Conference. Nashville, TN, US:IEEE, 2025:11844-11853.
0
浏览量
18
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号