期刊论文

【摘要】

Cross-modal recipe retrieval has gained prominence due to its ability to retrieve a text representation given an image representation and vice versa. Clustering these recipe representations based on similarity is essential to retrieve relevant information about unknown food images. Existing studies cluster similar recipe representations in the latent space based on class names. Due to inter-class similarity and intraclass variation, associating a recipe with a class name does not provide sufficient knowledge about recipes to determine similarity. However, recipe title, ingredients, and cooking actions provide detailed knowledge about recipes and are a better determinant of similar recipes. In this study, we utilized this additional knowledge of recipes, such as ingredients and recipe title, to identify similar recipes, emphasizing attention especially on rare ingredients. To incorporate this knowledge, we propose a knowledge-infused multimodal cooking representation learning network, Ki-Cook, built on the procedural attribute of the cooking process. To the best of our knowledge, this is the first study to adopt a comprehensive recipe similarity determinant to identify and cluster similar recipe representations. The proposed network also incorporates ingredient images to learn multimodal cooking representation. Since the motivation for clustering similar recipes is to retrieve relevant information for an unknown food image, we evaluated the ingredient retrieval task. We performed an empirical analysis to establish that our proposed model improves the Coverage of Ground Truth by 12% and the Intersection Over Union by 10% compared to the baseline models. On average, the representations learned by our model contain an additional 15.33% of rare ingredients compared to the baseline models. Owing to this difference, our qualitative evaluation shows a 39% improvement in clustering similar recipes in the latent space compared to the baseline models, with an inter-annotator agreement of the Fleiss kappa score of 0.35.

【授权许可】

【预览】

附件列表
Files	Size	Format	View
RO202310107595592ZK.pdf	944KB	PDF	download

Frontiers in Big Data
Ki-Cook: clustering multimodal cooking representations through knowledge-infused learning
Big Data
Saini Rohan Rao¹ Revathy Venkataramanan² Amit Sheth² Ronak Kaoshik³ Anirudh Sundara Rajan⁴ Swati Padhee⁵
[1] Department of Computational Science and Engineering, Technical University of Munich, Munich, Germany;Department of Computer Science, Artificial Intelligence Research Institute, University of South Carolina, Columbia, SC, United States;Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, United States;Department of Computer Science, University of Wisconsin Madison, Madison, WI, United States;Department of Computer Science, Wright State University, Dayton, OH, United States;
关键词: cooking process modeling; cross-modal retrieval; ingredient prediction; knowledge-infused learning; multimodal learning; representation learning; clustering;
DOI : 10.3389/fdata.2023.1200840
received in 2023-04-05, accepted in 2023-06-26, 发布年份 2023
来源: Frontiers
PDF


	文献评价指标
	下载次数：6次	浏览次数：3次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】