学位论文

【摘要】

Prior to deep learning it was common to approach computer vision problems as describing a model that could be learned from a relatively small amount of data by incorporating domain knowledge. For example, image prediction tasks such as intrinsic image decomposition were approached by thinking about what reflectance and shading look like. In the case of reflectance, a Mondrian image; and in the case of shading, a smooth image. The difficult portion was how to formalize this prior domain knowledge into a model.Deep learning has changed this paradigm. While deep learning hasn’t eliminated the value of domain knowledge, for many problems we now think in terms of model architectures and losses instead. While a choice of model architecture limits the types of results possible, neural networks tend to be less task dependent than domain specific methods. In fact, for almost any problem there is a fairly simple formula for using neural networks to get good results. 1. Collect labeled data, 2. Choose a network architecture, 3. Define a loss and train. However, there are still tasks where we might not be able to collect a lot of labeled data of a particular form (Grave OCR), or tasks where we can’t easily describe an unambiguous loss on easily collected data (Intrinsic Image Decomposition, Image correction including rain, cracks and glare), or a task where we want to do many similar tasks without having to train each one independently (face adjustment).A unifying theme of my work is that generic representations can be learned from data and those learned representation can be used to make otherwise under-constrained problems tractable. Pre- deep learning this generic representation takes the form of a LEARCH-based model more recent work builds on auto-encoder representations. For authoring decompositions and removing rain, cracks, and glare, autoencoder models are learned from fake data and then shown to be applicable on real images. For learning to decompose rainy images cycle consistency losses are incorporated to learn without examples of de-rained images. In Face-to-Face transformation, an attribute sensitive image-to-image representation is pretrained and then a low dimensional representation for image attribute transformations is described. In Grave OCR we learn to generate data and learn the image decomposition model simultaneously, allowing us to learn how to predict image annotations without labeled data. Finally in evaluating intrinsic image decomposition, we explore evaluating intrinsic image models using human perception annotations. We show that human annotation evaluation has some issues and does not appear to differentiate between qualitatively different models. We propose a new task-specific procedure for evaluating intrinsic image decomposition using re- painting and reshading and show that it can be used to identify differences between model that are currently unidentified.

【预览】

附件列表
Files	Size	Format	View
Learning and evaluating image representations	83574KB	PDF	download


Learning and evaluating image representations
Computer Vision;Deep Learning;Image processing;Image representation;intrinsic images;intrinsic image decomposition;rain removal;OCR
Rock, Jason
关键词: Computer Vision; Deep Learning; Image processing; Image representation; intrinsic images; intrinsic image decomposition; rain removal; OCR;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/106236/ROCK-DISSERTATION-2019.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：33次	浏览次数：9次

【 摘 要 】

【 预 览 】

【摘要】

【预览】