期刊论文

【摘要】

Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.

【授权许可】

Unknown

Sensors
Extracting Effective Image Attributes with Refined Universal Detection

Xinyu Xiao¹ Lifei Song¹ Chunhong Pan¹ Qiang Yu¹ Chunxia Zhang²
[1] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;
关键词: attribute extraction; Refined Universal Detection; word tree; image captioning;
DOI : 10.3390/s21010095
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：0次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】