学位论文详细信息
Learning and adapting visual models for multiple specialized tasks
Action Recognition, Visual Relationship Detection, Image Situations, Multi Task Training
Mallya, Arun Mohanray
关键词: Action Recognition, Visual Relationship Detection, Image Situations, Multi Task Training;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/101314/MALLYA-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

A key requirement for any agent that wishes to interact with the visual world is the ability to understand the behavior of objects in the scene, primarily through visual means. We humans, through our cognitive system, are able to localize other people and objects in scenes, understand their relationship to the surrounding environment, and reason about not only their actions and attributes, but also about concepts which require knowledge beyond what is afforded by the pixels in visual input, such as possible future states, motion, a person’s motivations, and so on. In this thesis, we outline work that takes small steps towards solving this daunting task of replicating the human visual cognitive system.This dissertation presents methods for predicting actions, interactions with objects, and increasingly structured scenarios from single images. We devise simple methods that make use of a variety of cues by taking into account the structure inherent in the tasks we aim to solve. We show that by solving these tasks as an intermediate step and using their outputs as features, we can develop methods that operate on visual and language inputs to improve performance on tasks that require high-level image information, such as answering questions about images and producing captions for images.One issue that accompanies the learning of multiple tasks with separate deep networks, such as the work described above, is the need to store separate models, which increases storage requirements and affects scalability. We formulate and present two novel methods that draw inspiration from network pruning and weight quantization that can reuse parts of an existing network for learning new tasks with minimal additional overhead, without hurting performance on tasks that were learned earlier.

【 预 览 】
附件列表
Files Size Format View
Learning and adapting visual models for multiple specialized tasks 48091KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:27次