会议论文详细信息
AMIA 2012 Annual Symposium
Active LearningBased Corpus Annotation — The PATHOJEN Experience
Udo Hahn ; Elena Beisswanger ; Ekaterina Buyko ; Erik Faessler
PID  :  129386
来源: CEUR
PDF
【 摘 要 】
We report on basic design decisions and novel annotation procedures underlying the development of PATHOJEN, a corpus of MEDLINE abstracts annotated for pathological phenomena, including diseases as a proper subclass. This named entity type is known to be hard to delineate and capture by annotation guidelines. We here propose a twocategory encoding schema where we distinguish short from long mention spans, the first covering standardized terminology (e.g. diseases), the latter accounting for less structured descriptive statements about normdeviant states, as well as criteria and observations that might signal pathologies. The second design decision relates to the way annotation instances are sampled. Here we subscribe to an Active Learningbased approach which is known to save annotation costs without sacrificing annotation quality by means of a sample bias. By design, Active Learning picks up ‘hard’ to annotate instances for human annotators, whereas ‘easier’ ones are passed over to the automatic classifier
【 预 览 】
附件列表
Files Size Format View
Active LearningBased Corpus Annotation — The PATHOJEN Experience 216KB PDF download
  文献评价指标  
  下载次数:49次 浏览次数:37次