基于深度学习的高精度相机相对位姿估计网络

彭智勇; 吴磊; 肖博

doi:10.16725/j.1673-808X.202260

基于深度学习的高精度相机相对位姿估计网络

High precision camera relative pose estimation network based on deep learning

摘要

摘要: 相机相对位姿估计是指恢复或确定图像对在成像时相机的相对位置和姿态关系，是图像拼接、三维重建、SLAM等计算机视觉领域中的关键问题。传统算法为提高精度需要反复迭代运算，计算量大、耗时长；现有基于深度学习算法大多以左、右图像为输入，基于像素语义特征获得位姿参数，处理数据量大，模型结构复杂。针对以上问题，提出了一种以匹配点对为输入的相机相对位姿估计深度学习网络。获取图像对的匹配点后，首先通过匹配点对分类网络将匹配点对分为内点(匹配误差较小的匹配点对)和外点(匹配误差较大的匹配点对)；然后以所有内点为输入，通过相机相对位姿参数解算网络一次性获得相机的相对旋转、平移参数。实验结果表明，提出的方法较传统算法速度提高了1.9倍，且具有更小的误差；与现有基于像素语义特征的深度学习算法相比，平均精度更高，且网络处理的数据量更少，具有更轻量化的网络结构；同时设计的网络结构可适应不同数量匹配点对的输入。

Abstract: Relative pose estimation of camera is to calculate the relative position and pose of the camera during imaging. It is a key problem in the field of computer vision, such as image mosaic, 3D reconstruction, SLAM and so on. In order to obtain the most accurate results, the traditional algorithm needs repeated iterative, so it has a large amount of calculation and time consuming. Most of the existing deep learning algorithms take the left and right images as the input, and obtain the pose parameters based on the semantic features of pixels, so it has a large amount of data and complex model structure. To solve the above problems, a new deep learning network for camera relative pose estimation was proposed, which took the correspondences as the input of network. After obtaining the correspondences between two images, firstly, the correspondences were divided into inliers (the correspondences with small matching error) and outliers (the others correspondences with large matching error) by a classification network. Then, taking all inliers as inputs, the relative rotation and translation parameters of the camera were obtained quickly by a calculation network of camera relative pose parameter. The experimental results show that the proposed method is 1.9 times faster than the traditional algorithm and has less error; The new algorithm has better accuracy precision than the existing deep learning algorithm based on semantic features of pixels, but it processes less data and has a lighter network structure; At the same time, the designed network structure can adapt to the input of different number of correspondences.

HTML全文

参考文献(31)

施引文献

资源附件(0)