期刊论文详细信息
BMC Bioinformatics
A structural SVM approach for reference parsing
Research
Daniel X Le1  Jie Zou1  George R Thoma1  Xiaoli Zhang1 
[1] Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, 20894, Bethesda, MD, USA;
关键词: Support Vector Machine;    Sequence Learning;    Conditional Random Field;    Support Vector Machine Method;    Binary Feature;   
DOI  :  10.1186/1471-2105-12-S3-S7
来源: Springer
PDF
【 摘 要 】

BackgroundAutomated extraction of bibliographic data, such as article titles, author names, abstracts, and references is essential to the affordable creation of large citation databases. References, typically appearing at the end of journal articles, can also provide valuable information for extracting other bibliographic data. Therefore, parsing individual reference to extract author, title, journal, year, etc. is sometimes a necessary preprocessing step in building citation-indexing systems. The regular structure in references enables us to consider reference parsing a sequence learning problem and to study structural Support Vector Machine (structural SVM), a newly developed structured learning algorithm on parsing references.ResultsIn this study, we implemented structural SVM and used two types of contextual features to compare structural SVM with conventional SVM. Both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. We also compared SVM and structural SVM to Conditional Random Field (CRF). The experimental results show that structural SVM and CRF achieve similar accuracies at token- and chunk-levels.ConclusionsWhen only basic observation features are used for each token, structural SVM achieves higher performance compared to SVM since it utilizes the contextual label features. However, when the contextual observation features from neighboring tokens are combined, SVM performance improves greatly, and is close to that of structural SVM after adding the second order contextual observation features. The comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing.

【 授权许可】

CC BY   
© The Author(s) 2011. This article is published under license to BioMed Central Ltd. This article is in the public domain.

【 预 览 】
附件列表
Files Size Format View
RO202311091825477ZK.pdf 321KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  文献评价指标  
  下载次数:1次 浏览次数:0次