期刊论文

【摘要】

The goal of person text-image matching is to retrieve images of specific pedestrians using natural language. Although a lot of research results have been achieved in persona text-image matching, existing methods still face two challenges. First,due to the ambiguous semantic information in the features, aligning the textual features with their corresponding image features is always tricky. Second, the absence of semantic information in each local feature of pedestrians poses a significant challenge to the network in extracting robust features that match both modalities. To address these issues, we propose a model for explicit semantic feature extraction and effective information supplement. On the one hand, by attaching the textual and image features with consistent and clear semantic information, the course-grained alignment between the textual and corresponding image features is achieved. On the other hand, an information supplement network is proposed, which captures the relationships between local features of each modality and supplements them to obtain more complete local features with semantic information. In the end, the local features are then concatenated to a comprehensive global feature, which capable of precise alignment of the textual and described image features. We did extensive experiments on CUHK-PEDES dataset and RSTPReid dataset, the experimental results show that our method has better performance. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.

【授权许可】

【预览】

附件列表
Files	Size	Format	View
RO202310105722143ZK.pdf	2174KB	PDF	download

Frontiers in Physics
Feature semantic alignment and information supplement for Text-based person search
Physics
Hang Zhou¹ Xuening Tian ² Yuling Huang³ Fan Li ⁴
[1] Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China;LTH Engineering College at Campus Helsingborg, Lund University, Lund, Sweden;School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China;null;
关键词: cross-modal retrieval; neural network; Text-based person search; deep learning; Text-based image retrieval;
DOI : 10.3389/fphy.2023.1192412
received in 2023-03-23, accepted in 2023-04-18, 发布年份 2023
来源: Frontiers
PDF


	文献评价指标
	下载次数：7次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】