An image is nothing but a projection of the physical world around us, where objects do not occur randomly but follow certain spatial rules. Many existing computer vision approaches tend to ignore this aspect of understanding images. In this work, we build representations and propose strategies for exploiting such constraints towards extracting a 3D understanding of a scene from its single image.We model a scene in terms of its spatial layout abstracted as a box, object cuboids, cameraviewpoint, and interactions between them. We take a supervised approach towards estimation, and learn models from training data that is fully annotated with the 3D spatial extent of objects, walls, and floor. We assume the world is populated with axis aligned objects and surfaces, and exploit constrained appearance models which use geometric cues from the scene. Our methods are tailored towards indoor scenes that are highly structured and require careful spatial reasoning.We show that our box layout representation is able to capture the full spatialextent of a 3D scene, which we can successfully estimate even forheavily cluttered rooms. Similarly, by exploiting the geometric constraints offered by the scene, we can approximate the extent of the objects as cuboids in 3D. The box layout provides rich contextual information for detecting objects. We show that modeling the 3D interactions between object cuboids and scene layout improves object detection. Finally, we show how to use our 3D spatial layout models together with object cuboid models to predict the free space in the scene.
【 预 览 】
附件列表
Files
Size
Format
View
3D spatial layout and geometric constraints for scene understanding