学位论文详细信息
Structured visual understanding, generation and reasoning
Scene graph;Structured visual understanding;Visual generation;Reasoning;Vision and language
Yang, Jianwei ; Parikh, Devi Interactive Computing Batra, Dhruv Crandall, David Lee, Stefan Hoffman, Judy ; Parikh, Devi
University:Georgia Institute of Technology
Department:Interactive Computing
关键词: Scene graph;    Structured visual understanding;    Visual generation;    Reasoning;    Vision and language;   
Others  :  https://smartech.gatech.edu/bitstream/1853/62744/1/YANG-DISSERTATION-2020.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

The world around us is highly structured. In the real world, a single object usually consists of multiple components organized in some structures (e.g., a person has different body parts), and multiple objects usually exist in a scene and interact with each other in predictable ways (e.g., man playing basketball). This structure manifests itself in the visual data that captures the world around us and in the text describing it and thus can potentially provide a strong inductive bias to various vision tasks. In this thesis, we focus on exploiting the structures existing in visual data to improve visual understanding, generation and reasoning. Specifically, for visual understanding, we model structure at different levels to improve image classification, scene graph generation and representation learning. In visual generation, we exploit the foreground-background structure in images to generate images in a layer-wise manner to reduce blending artifacts between foreground and background. Finally, we use the structured visual representations as the intermediate interface to bridge visual perception and reasoning to address different vision and language tasks, including image captioning and visual question generation. Through extensive experiments, we demonstrate that leveraging structure in visual data can not only improve the model performance, but also make vision and language models more grounded and interpretable.

【 预 览 】
附件列表
Files Size Format View
Structured visual understanding, generation and reasoning 43908KB PDF download
  文献评价指标  
  下载次数:31次 浏览次数:11次