| CAAI Transactions on Intelligence Technology | |
| Learning DALTS for cross-modal retrieval | |
| article | |
| Zheng Yu1  Wenmin Wang1  | |
| [1] School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University | |
| 关键词: information retrieval; text analysis; natural language processing; image segmentation; image retrieval; recurrent neural nets; cross-modal retrieval; DALTS; domain-adaptive limited text space; image space; Flickr8K; Flickr30K; MSCOCO; text space features; B6135 Optical; image and video signal processing; C5260B Computer vision and image processing techniques; C5290 Neural computing techniques; C6130D Document processing techniques; C6180N Natural language processing; C7250R Information retrieval techniques; | |
| DOI : 10.1049/trit.2018.1051 | |
| 学科分类:数学(综合) | |
| 来源: Wiley | |
PDF
|
|
【 摘 要 】
Cross-modal retrieval has been recently proposed to find an appropriate subspace, where the similarity across different modalities such as image and text can be directly measured. In this study, different from most existing works, the authors propose a novel model for cross-modal retrieval based on a domain-adaptive limited text space (DALTS) rather than a common space or an image space. Experimental results on three widely used datasets, Flickr8K, Flickr30K and Microsoft Common Objects in Context (MSCOCO), show that the proposed method, dubbed DALTS, is able to learn superior text space features which can effectively capture the necessary information for cross-modal retrieval. Meanwhile, DALTS achieves promising improvements in accuracy for cross-modal retrieval compared with the current state-of-the-art methods.
【 授权许可】
CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202107100000062ZK.pdf | 323KB |
PDF