期刊论文详细信息
Frontiers in Big Data
A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
关键词: word embeddings;    fairness;    digital ethics;    natural language processing;    training data;    language models;   
DOI  :  10.3389/fdata.2021.625290
来源: DOAJ
【 摘 要 】

Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in different forms, depending on the language and the cultural context. In this work, we extend our previous work and further investigate how bias varies in different languages. We examine Italian and Swedish word embeddings for gender and origin bias, and demonstrate how an origin bias concerning local migration groups in Switzerland is included in German word embeddings. We propose BiasWords, a method to automatically detect new forms of bias. Finally, we discuss how cultural and language aspects are relevant to the impact of bias on the application and to potential mitigation measures.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次