
浏览全部资源
扫码关注微信
中国兵器工业计算机应用技术研究所, 北京 100089
Received:01 September 2025,
Online First:03 February 2026,
Published:2025
移动端阅览
Chunpu LIU, Yonghong GUO, Quanfa XIU, et al. A Metric Learning-based Supervision Method for Monocular Depth Estimation Models[J]. Acta Armamentarii, 2025, 46(S2): 250798.
Chunpu LIU, Yonghong GUO, Quanfa XIU, et al. A Metric Learning-based Supervision Method for Monocular Depth Estimation Models[J]. Acta Armamentarii, 2025, 46(S2): 250798. DOI: 10.12382/bgxb.2025.0798.
单目深度估计旨在通过单张图像来预测场景的深度值。目前单目深度估计领域仍缺少便捷、高效的特征监督方法
难以在训练过程中对单目深度估计模型的特征空间进行高质量的学习。针对以上问题
提出一种基于度量学习的单目深度估计模型监督方法。使用基于深度差的样本类型区分方式来实现单目深度估计模型的特征空间中正、负样本的区分。根据单目深度估计任务的回归型特点
使用多范围策略来细化对负特征样本的处理
进一步提高负特征样本在特征空间监督中起到的效果。不同场景类型下的实验结果表明
所提方法能够在单目深度估计模型的训练过程中对特征空间的学习提供有效的监督
提高单目深度估计模型训练后的预测准确度。
Monocular depth estimation aims to predict the depth values of a scene from a single image.In the area of monocular depth estimation
it is difficult to conduct high-quality learning of a well-structured feature space for monocular depth estimation model during the training process due to the lack of convenient and effective feature-level supervision methods.In respect of the problem above
this paper proposes a metric learning-based supervision method for monocular depth estimation.In the proposed method
a depth difference-based feature sample identification method is introduced to distinguish the positive and negative feature samples in feature space.Based on the continuous-regression nature of monocular depth estimation
a multi-range strategy is used to further refines the handling of negative feature samples
thereby strengthening their supervisory impact.Extensive experiments across diverse scene types prove that the proposed method provides effective supervision on the learning of feature space during model training and significantly improves the prediction accuracy of monocular depth estimation models.
ZANUTTIGH P , MARIN G , DAL M C , et al. Time-of-flight and structured light depth cameras [M ] . Berlin , Germany : Springer , 2016 .
ZHAO J , XIE X J , XU X , et al. Multi-view learning overview:Recent progress and new challenges [J ] . Information Fusion , 2017 , 38 : 43 - 54 .
江俊君 , 李震宇 , 刘贤明 . 基于深度学习的单目深度估计方法综述 [J ] . 计算机学报 , 2022 , 45 ( 6 ): 1276 - 1307 .
JIANG J J , LI Z Y , LIU X M . Deep learning based monocular depth estimation:a survey [J ] . Chinese Journal of Computers , 2022 , 45 ( 6 ): 1276 - 1307 . (in Chinese)
ZHANG R , TSAI P S , CRYER J E , et al. Shape-from-shading:a survey [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 1999 , 21 ( 8 ): 690 - 706 .
IKEUCHI K , HORN B K P . Numerical shape from shading and occluding boundaries [J ] . ArtificialIntelligence , 1981 , 17 ( 1-3 ): 141 - 184 .
TOMASI C , KANADE T . Shape and motion from image streams under orthography:a factorization method [J ] . International Journal of Computer Vision , 1992 , 9 : 137 - 154 .
EIGEN D , PUHRSCH C , FERGUS R . Depth map prediction from a single image using a multi-scale deep network [C ] // Proceedings of Advances in Neural Information Processing Systems.Montreal ,Canada: 2014 ,27.
LI Z Y , CHEN Z H , LIU X M , et al. Depthformer:exploiting long-range correlation and local information for accurate monocular depth estimation [J ] . Machine Intelligence Research , 2023 , 20 ( 6 ): 837 - 854 .
YUAN W H , GU X D , DAI Z Z , et al. Neural window fully-connected crfs for monocular depth estimation [C ] //Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition.New Orleans,LA, US:IEEE , 2022 : 3916 - 3925 .
ZHAO W L , RAO Y M , LIU Z Y , et al. Unleashing text-to-image diffusion models for visual perception [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver,BC,Canada : 2023 : 5729 - 5739 .
DUAN Y Q , GUO X D , ZHU Z . Diffusiondepth:diffusion denoising approach for monocular depth estimation [C ] // Proceedings of European Conference on Computer Vision.Cham , Switzerland : Springer Nature , 2024 : 432 - 449 .
RANFTL R , LASINGER K , HAFNER D , et al. Towards robust monocular depth estimation:Mixing datasets for zero-shot cross-dataset transfer [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 44 ( 3 ): 1623 - 1637 .
PATIL V , SAKARIDIS C , LINIGER A , et al. P3depth:Monocular depth estimation with a piecewise planarity prior [C ] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,LA, USA:IEEE , 2022 : 1610 - 1621 .
YIN W , LIU Y F , SHEN C H , et al. Enforcing geometric constraints of virtual normal for depth prediction [C ] //Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach,CA, US:IEEE , 2019 : 5684 - 5693 .
LU J W , HU J L , ZHOU J . Deep metric learning for visual understanding:An overview of recent advances [J ] . IEEE Signal Processing Magazine , 2017 , 34 ( 6 ): 76 - 84 .
SCHROFF F , KALENICHENKO D , PHILBIN J . FaceNet:a unified embedding for face recognition and clustering [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA , US : IEEE , 2015 : 815 - 823 .
LIU Q Y , PENG J T , CHEN N , et al. Category-specific prototype self-refinement contrastive learning for few-shot hyperspectral image classification [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2023 , 61 : 1 - 16 .
WANG X H , ZHAO K , ZHANG R X , et al. ContrastMask:contrastive learning to segment every thing [C ] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,LA, US:IEEE , 2022 : 11604 - 11613 .
WANG W , ZHOU T , YU F , et al. Exploring cross-image pixel contrast for semantic segmentation [C ] // Proceedings of the IEEE/CVFInternational Conference on Computer Vision . Virtual (Online) : IEEE , 2021 : 7303 - 7313 .
SILBERMAN N , HOIEM D , KOHLI P,etal . Indoor segmentation and support inference from rgbd images [C ] //Proceedings of the 12th European Conference on Computer Vision.Florence, Italy:Springer , 2012 : 746 - 760 .
GEIGER A , LENZ P , STILLER C , et al. Vision meets robotics:the kitti dataset [J ] . The International Journal of Robotics Research , 2013 , 32 ( 11 ): 1231 - 1237 .
LIU Z , LIN Y T , CAO Y , et al. Swin transformer:hierarchical vision transformer using shifted windows [C ] // Proceedings of the IEEE/CVFInternational Conference on Computer Vision . Virtual (Online) : IEEE , 2021 : 10012 - 10022 .
HADSELL R , CHOPRA S , LECUN Y . Dimensionality reduction by learning an invariant mapping [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York,NY, US:IEEE , 2006 , 2 : 1735 - 1742 .
LEE J H , HAN M K , KO D W , et al. From big to small:Multi-scale local planar guidance for monocular depth estimation [J ] . arXiv preprint arXiv:1907.10326,2019 .
HOWARD A , SANDLER M , CHU G , et al. Searching forMobileNetv3 [C ] //Proceedings of the IEEE/CVF International Conference on Computer Vision.Seoul, Korea:IEEE , 2019 : 1314 - 1324 .
BHAT S F , ALHASHIM I , WONKA P . AdaBins:depth estimation using adaptive bins [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Virtual (Online) : IEEE , 2021 : 4009 - 4018 .
RANFTL R , BOCHKOVSKIY A , KOLTUN V . Vision transformers for dense prediction [C ] // Proceedings of the IEEE/CVFInternational Conference on Computer Vision . Virtual (Online) : IEEE , 2021 : 12179 - 12188 .
AGARWAL A , ARORA C . Attention attention everywhere:Monocular depth prediction with skip attention [C ] //Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Vancouver,BC, Canada:IEEE , 2023 : 5861 - 5870 .
0
Views
0
下载量
0
CNKI被引量
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024360号