期刊论文详细信息
BMC Bioinformatics
Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
Research Article
Yi Qian1  Eric I-Chao Chang2  Junichi Tsujii2  Luoxin Chen3  Junsheng Wei3  Yubo Fan3  Yan Xu4  Sophia Ananiadou5 
[1] Jinhua People’s Hospital, Jinhua, China;Microsoft Research Asia, Beijing, China;State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education, Beihang University, Beijing, China;State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education, Beihang University, Beijing, China;Microsoft Research Asia, Beijing, China;The National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK;
关键词: Test Word;    Label Propagation;    Name Entity Recognition;    Parallel Corpus;    Language Pair;   
DOI  :  10.1186/s12859-015-0606-0
 received in 2014-09-14, accepted in 2015-04-29,  发布年份 2015
来源: Springer
PDF
【 摘 要 】

BackgroundElectronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates.ResultsWe compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20.ConclusionsWe constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms.

【 授权许可】

Unknown   
© Xu et al.; licensee BioMed Central. 2015. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

【 预 览 】
附件列表
Files Size Format View
RO202311103930753ZK.pdf 1142KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  文献评价指标  
  下载次数:4次 浏览次数:0次