期刊论文详细信息
卷:142
Multi-scale image-text matching network for scene and spatio-temporal images
Article
关键词: RETRIEVAL;    SYSTEM;   
DOI  :  10.1016/j.future.2023.01.004
来源: SCIE
【 摘 要 】

In recent years, with the development of deep learning technology, computer vision and natural lan-guage processing have made significant progress, and establishing the relationship between computer vision and natural language processing has attracted more and more attention. The spatio-temporal images taken by satellites or aircrafts and scene images with people and other things are the main focus area. Existing methods have yielded excellent results in image-text matching, but there is still room for improvement in effectively using coarse and fine-grained information. We propose a method to solve this problem using multi-scale graph convolutional neural networks. We extracted the multi-scale features of images and texts for matching separately. Global and local matching are used to calculate the overall image sentence and local image-word similarity. Local matching is divided into two stages, first, the node level matches the correspondence between the learning region and the word. Next, the structure level matches the correspondence between the learning region and the phrase to make the matching more comprehensive. Finally, we verified our model on Flickr30k, MSCOCO and RSICD datasets.(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

【 授权许可】

Free   

  文献评价指标  
  下载次数:0次 浏览次数:0次