期刊论文详细信息
Jisuanji kexue
Construction Method of Parallel Corpus for Minority Language Machine Translation
LIU Yan, XIONG De-yi1 
[1] College of Intelligence and Computing,Tianjin University,Tianjin 300350,China;
关键词: parallel corpus|minority language|neural machine translation;   
DOI  :  10.11896/jsjkx.210900012
来源: DOAJ
【 摘 要 】

The training performance of neural machine translation depends heavily on the scale and quality of parallel corpus.Unlike some common languages,the construction of high-quality parallel corpora between Chinese and minority languages has been lagging.The existing minority language parallel corpora are mostly constructed by using automatic sentence alignment technology and network resources,which has many limitations such as domain and quality confined.Although high-quality parallel corpora could be constructed by manual,it lacks relevant experience and method.From the perspective of machine translation practitioners and researchers,this article introduces a cost-effective method to manually construct parallel corpus between minority languages and Chinese,including its overall goals,implementation process,engineering details,and the final result.This article tries and accumulats various experiences in the construction process,and finally forms a summary of the methods and suggestions for constructing parallel corpora from minority languages to Chinese.In the end,this paper successfully constructs 0.5 million high-quality parallel corpora from Persian to Chinese,Hindi to Chinese,and Indonesian to Chinese.The experimental results prove the quality of our constructed corpora,and it improves the performance of the minority language neural machine translation models.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:2次