学位论文详细信息
3D Ground Truth Generation Using Pre-Trained Deep Neural Networks
Machine Learning;Computer Vision;Autonomous Driving;Deep Learning;Object Detection;Data Mining
Lee, Jungwookadvisor:Waslander, Steven ; affiliation1:Faculty of Engineering ; Waslander, Steven ;
University of Waterloo
关键词: Object Detection;    Autonomous Driving;    Master Thesis;    Data Mining;    Computer Vision;    Machine Learning;    Deep Learning;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/14720/3/Lee_Jungwook.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】

Training 3D object detectors on publicly available data has been limited to small datasetsdue to the large amount of effort required to generate annotations. The difficulty of labelingin 3D using 2.5D sensors, such as LIDAR, is attributed to the high spatial reasoning skillsrequired to deal with occlusion and partial viewpoints. Additionally, the current methodsto label 3D objects are cognitively demanding due to frequent task switching. Reducingboth task complexity and the amount of task switching done by annotators is key toreducing the effort and time required to generate 3D bounding box annotations. Wetherefore seek to reduce the burden on the annotators by leveraging existing 3D objectdetectors using deep neural networks.This work introduces a novel ground truth generation method that combines humansupervision with pre-trained neural networks to generate per-instance 3D point cloud seg-mentation, 3D bounding boxes, and class annotations. The annotators provide objectanchor clicks which behave as a seed to generate instance segmentation results in 3D. Thepoints belonging to each instance are then used to regress object centroids, bounding boxdimensions, and object orientation. The deep neural network model used to generate thesegmentation masks and bounding box parameters is based on the PointNet architecture.We develop our approach with reliance on the KITTI dataset to analyze the qualityof the generated ground truth. The neural network model is trained on KITTI trainingsplit and the 3D bounding box outputs are generated using annotation clicks collectedfrom the validation split. The validation split of KITTI detection dataset contains 3712frames of pointcloud and image scenes and it took 16.35 hours to label with the followingmethod. Based on these results, our approach is 19 times faster than the latest published3D object annotation scheme. Additionally, it is found that the annotators spent lesstime per object as the number of objects in the scenes increase, making it a very efficientfor multi-object labeling. Furthermore, the quality of the generated 3D bounding boxes,using the labeling method, is compared against the KITTI ground truth. It is shown thatthe model performs on par with the current state-of-the-art 3D detectors and the labelingprocedure does not negatively impact the output quality of the bounding boxes. Lastly, theproposed scheme is applied to previously unseen data from the Autonomoose self-drivingvehicle to demonstrate generalization capabilities of the network.

【 预 览 】
附件列表
Files Size Format View
3D Ground Truth Generation Using Pre-Trained Deep Neural Networks 18016KB PDF download
  文献评价指标  
  下载次数:65次 浏览次数:64次