期刊论文详细信息
IEEE Access 卷:10
RESHAPE: Reverse-Edited Synthetic Hypotheses for Automatic Post-Editing
Baikjin Jung1  Jong-Hyeok Lee1  Wonkee Lee1  Jaehun Shin1 
[1] Department of Computer Science and Engineering, POSTECH, Pohang, Republic of Korea;
关键词: Automatic post-editing;    back-translation;    decoding strategy;    machine translation;    synthetic data generation;   
DOI  :  10.1109/ACCESS.2022.3154768
来源: DOAJ
【 摘 要 】

Synthetic training data has been extensively used to train Automatic Post-Editing (APE) models in many recent studies because the quantity of human-created data has been considered insufficient. However, the most widely used synthetic APE dataset, eSCAPE, overlooks respecting the minimal editing property of genuine data, and this defect may have been a limiting factor for the performance of APE models. This article suggests adapting back-translation to APE to constrain edit distance, while using stochastic sampling in decoding to maintain the diversity of outputs, to create a new synthetic APE dataset, RESHAPE. Our experiments show that (1) RESHAPE contains more samples resembling genuine APE data than eSCAPE does, and (2) using RESHAPE as new training data improves APE models’ performance substantially over using eSCAPE.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次