期刊论文详细信息
Journal of Big Data
Task-agnostic representation learning of multimodal twitter data for downstream applications
Amit K. Roy-Chowdhury1  Ryan Rivas1  Vagelis Hristidis1  Evangelos E. Papalexakis1  Sudipta Paul1 
[1] Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, USA;
关键词: Joint embedding;    Machine learning;    Twitter;    Deep learning;    Multimodal data;   
DOI  :  10.1186/s40537-022-00570-x
来源: Springer
PDF
【 摘 要 】

Twitter is a frequent target for machine learning research and applications. Many problems, such as sentiment analysis, image tagging, and location prediction have been studied on Twitter data. Much of the prior work that addresses these problems within the context of Twitter focuses on a subset of the types of data available, e.g. only text, or text and image. However, a tweet can have several additional components, such as the location and the author, that can also provide useful information for machine learning tasks. In this work, we explore the problem of jointly modeling several tweet components in a common embedding space via task-agnostic representation learning, which can then be used to tackle various machine learning applications. To address this problem, we propose a deep neural network framework that combines text, image, and graph representations to learn joint embeddings for 5 tweet components: body, hashtags, images, user, and location. In our experiments, we use a large dataset of tweets to learn a joint embedding model and use it in multiple tasks to evaluate its performance vs. state-of-the-art baselines specific to each task. Our results show that our proposed generic method has similar or superior performance to specialized application-specific approaches, including accuracy of 52.43% vs. 48.88% for location prediction and recall of up to 15.93% vs. 12.12% for hashtag recommendation.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202202177940595ZK.pdf 2570KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:13次