期刊论文详细信息
PATTERN RECOGNITION 卷:102
PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning
Article
Zhai, Guangyao1  Liu, Liang1  Zhang, Linjian1,2  Liu, Yong1  Jiang, Yunliang3 
[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[2] Netease Inc, Fuxi AI Lab, Guangzhou, Peoples R China
[3] Huzhou Univ, Sch Informat Engn, Huzhou, Peoples R China
关键词: Ego-motion;    Pose estimation;    Deep learning;    Recurrent Convolutional Neural Networks;    Data augmentation;   
DOI  :  10.1016/j.patcog.2019.107187
来源: Elsevier
PDF
【 摘 要 】

Visual ego-motion estimation is one of the longstanding problems which estimates the movement of cameras from images. Learning based ego-motion estimation methods have seen an increasing attention since its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of learning based visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to learn a map from input image pairs to the corresponding ego-motion, which is parameterized as 6-DoF transformation matrix. We introduce a two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU. The feature-encoding module encodes the short-term motion feature in an image pair, while the memorypropagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6-channel tensor for feature-encoding module to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through the memory-propagating module to generate the relative transformation pose of each image pair. In addition, we have designed a series of data augmentation methods to avoid the overfitting problem and improve the performance of the model when facing challengeable scenarios such as high-speed or reverse driving. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark and Malaga 2013 Dataset. The experiments show a competitive performance of the proposed method to the state-of-the-art monocular geometric and learning methods and encourage further exploration of learning-based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results. (C) 2020 Elsevier Ltd. All rights reserved.

【 授权许可】

Free   

【 预 览 】
附件列表
Files Size Format View
10_1016_j_patcog_2019_107187.pdf 4577KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:0次