期刊论文详细信息
Applied Sciences
Towards Robust Word Embeddings for Noisy Texts
Yerai Doval1  Carlos Gómez-Rodríguez2  Jesús Vilares2 
[1] Grupo COLE, Escola Superior de Enxeñaría Informática, Universidade de Vigo, 36310 Vigo, Spain;Universidade da Coruña, CITIC. Grupo LyS, Departamento de Ciencias da Computación e Tecnoloxías da Información, 15071 A Coruña, Spain;
关键词: natural language processing;    semantics;    word embeddings;    noisy texts;    social media;   
DOI  :  10.3390/app10196893
来源: DOAJ
【 摘 要 】

Research on word embeddings has mainly focused on improving their performance on standard corpora, disregarding the difficulties posed by noisy texts in the form of tweets and other types of non-standard writing from social media. In this work, we propose a simple extension to the skipgram model in which we introduce the concept of bridge-words, which are artificial words added to the model to strengthen the similarity between standard words and their noisy variants. Our new embeddings outperform baseline models on noisy texts on a wide range of evaluation tasks, both intrinsic and extrinsic, while retaining a good performance on standard texts. To the best of our knowledge, this is the first explicit approach at dealing with these types of noisy texts at the word embedding level that goes beyond the support for out-of-vocabulary words.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次