会议论文详细信息
2019 The 5th International Conference on Electrical Engineering, Control and Robotics
Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness
无线电电子学;计算机科学
Luo, Mengting^1^2 ; He, Linchao^1^2 ; Guo, Mingyue^1^2 ; Han, Fei^1^2 ; Tian, Long^1^2 ; Pu, Haibo^1^2 ; Zhang, Dejun^3
Lab of Agricultural Information Engineering, Sichuan Agricultural University, Yaan
0086-625014, China^1
Key Laboratory of Agricultural Information Engineering of Sichuan Province, Yaan
0086-625014, China^2
Faculty of Information Engineering, China University of Geosciences, Wuhan
0086-430074, China^3
关键词: Feature space;    High dimensions;    K-nearest neighbors;    Machine translations;    Parallel data;    Semantic Space;    Similarity retrieval;    Word translation;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/533/1/012051/pdf
DOI  :  10.1088/1757-899X/533/1/012051
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

Nearest neighbor search is playing a critical role in machine word translation, due to its ability to obtain the lingual labels of source word embeddings by searching k Nearest Neighbor ( k NN) target embeddings from a shared bilingual semantic space. However, aligning two language distributions into a shared space usually requires amounts of target label, and k NN retrieval causes hubness problem in high-dimensions feature space. Although most the best-k retrievals get rid of hubs in the list of translation candidates to mitigate the hubness problem, it is flawed to eliminate hubs. Because hub also has a correct source word query corresponding to it and should not be crudely excluded. In this paper, we introduce an unsupervised machine word translation model based on Generative Adversarial Nets (GANs) with Bilingual Similarity retrieval, namely, Unsupervised-BSMWT. Our model addresses three main challenges: (1) reduce the dependence of parallel data with GANs in a fully unsupervised way. (2) Significantly decrease the training time of adversarial game. (3) Propose a novel Bilingual Similarity retrieval for mitigating hubness pollution regardless of whether it is a hub. Our model efficiently performs competitive results in 74min exceeding previous GANs-based models.

【 预 览 】
附件列表
Files Size Format View
Word-to-word Machine Translation: Bilateral Similarity Retrieval for Mitigating Hubness 330KB PDF download
  文献评价指标  
  下载次数:25次 浏览次数:26次