Applied Sciences | |
Towards a Universal Semantic Dictionary | |
Eszter Iklódi1  Gábor Borbély1  Gábor Recski1  MariaJose Castro-Bleda2  | |
[1] Budapest University of Technology and Economics, 1111 Budapest, Hungary;VRAIN Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València, 46022 Valencia, Spain; | |
关键词: natural language processing; semantics; word embeddings; multilingual embeddings; translation; artificial neural networks; | |
DOI : 10.3390/app9194060 | |
来源: DOAJ |
【 摘 要 】
A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, multilingual embedding space, is proposed in this paper. Previous approaches learned translation matrices between two specific languages, while this method learns translation matrices between a given language and a shared, multilingual space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case, two different training data were applied: Dinu’s English−Italian benchmark data, and English−Italian translation pairs extracted from the PanLex database. In the second case, only the PanLex database was used. The system performs on English−Italian languages with the best setting significantly better than the baseline system given by Mikolov, and it provides a comparable performance with more sophisticated systems. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.
【 授权许可】
Unknown