欢迎访问《兵工学报》官方网站,今天是

兵工学报 ›› 2024, Vol. 45 ›› Issue (10): 3631-3641.doi: 10.12382/bgxb.2023.0740

• • 上一篇    下一篇

融合注意力机制和多层动态形变卷积的多视图立体视觉重建方法

孙凯1, 张成1,*(), 詹天1, 苏迪2   

  1. 1 北京理工大学 宇航学院 飞行器动力学与控制教育部重点实验室, 北京 100081
    2 杭州极弱磁场国家重大科技基础设施研究院, 浙江 杭州 310051

Multi-view Stereo Vision Reconstruction Network with Fusion Attention Mechanism and Multi-layer Dynamic Deformable Convolution

SUN Kai1, ZHANG Cheng1,*(), ZHAN Tian1, SU Di2   

  1. 1 Key Laboratory of Dynamics and Control of Flight Vehicle of Ministry of Education, School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China
    2 National Institute of Extremely-Weak Magnetic Field Infrastructure, Hangzhou 310051, Zhejiang, China
  • Received:2023-08-10 Online:2023-10-19

摘要:

针对现有多视图立体视觉(Multi-View Stereo, MVS)技术提取弱纹理区域和非郎伯体曲面特征信息不充分及重建效果不理想问题,提出一种融合注意力机制和多层动态形变卷积的AMDC-PatchmatchNet方法。构建一种融合坐标注意力的特征提取网络,能更准确地捕捉重建对象的边缘形状和纹理特征,同时融合一种基于动态形变卷积的自适应感受野模块,根据不同尺度的特征自适应调整感受野的大小和形状,获得兼具全局和细节的特征表示。在DTU数据集上的测试结果表明,所提方法相较于主流MVS方法,点云重建整体性指标提高2.8%,并且在航空影像数据集上验证了模型的泛化能力。

关键词: 多视图立体视觉, 注意力机制, 动态形变卷积, 深度学习

Abstract:

The existing multi-view stereo vision technology is not enough to extract the feature information of weak texture region and non-Lambert surface, and its reconstruction effect is not ideal. An AMDC-PatchmatchNet method with fusion attention mechanism and multi-layer dynamic deformable convolution is proposed for the problems above. In this method, a feature extraction network integrating the coordinate attention is constructed, which can capture the edge shape and texture features of reconstructed objects more accurately. At the same time, an adaptive receptive field module based on dynamic deformable convolution is integrated in the feature extraction network, and the size and shape of receptive field can be adjusted adaptively according to different scales of features to obtain both global and detailed feature representation. The generalization ability of the AMDC-PatchmatchNet method is verified on the aerial image data sets. The test results on DTU data sets show that the overall index of point cloud reconstruction of the proposed method is improved by 2.8% compared with those of mainstream MVS methods.

Key words: multi-view stereo vision, attention mechanism, dynamic deformable convolution, deep learning

中图分类号: