| BMC Bioinformatics | |
| Semi-automated screening of biomedical citations for systematic reviews | |
| Research Article | |
| Christopher H Schmid1  Thomas A Trikalinos2  Joseph Lau2  Carla Brodley3  Byron C Wallace4  | |
| [1] Biostatistics Research Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA;Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA;Department of Computer Science, Tufts University, Medford, MA, USA;Department of Computer Science, Tufts University, Medford, MA, USA;Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA; | |
| 关键词: Chronic Obstructive Pulmonary Disease; Support Vector Machine; Active Learning; Minority Class; Unify Medical Language System; | |
| DOI : 10.1186/1471-2105-11-55 | |
| received in 2009-08-10, accepted in 2010-01-26, 发布年份 2010 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundSystematic reviews address a specific clinical question by unbiasedly assessing and analyzing the pertinent literature. Citation screening is a time-consuming and critical step in systematic reviews. Typically, reviewers must evaluate thousands of citations to identify articles eligible for a given review. We explore the application of machine learning techniques to semi-automate citation screening, thereby reducing the reviewers' workload.ResultsWe present a novel online classification strategy for citation screening to automatically discriminate "relevant" from "irrelevant" citations. We use an ensemble of Support Vector Machines (SVMs) built over different feature-spaces (e.g., abstract and title text), and trained interactively by the reviewer(s).Semi-automating the citation screening process is difficult because any such strategy must identify all citations eligible for the systematic review. This requirement is made harder still due to class imbalance; there are far fewer "relevant" than "irrelevant" citations for any given systematic review. To address these challenges we employ a custom active-learning strategy developed specifically for imbalanced datasets. Further, we introduce a novel undersampling technique. We provide experimental results over three real-world systematic review datasets, and demonstrate that our algorithm is able to reduce the number of citations that must be screened manually by nearly half in two of these, and by around 40% in the third, without excluding any of the citations eligible for the systematic review.ConclusionsWe have developed a semi-automated citation screening algorithm for systematic reviews that has the potential to substantially reduce the number of citations reviewers have to manually screen, without compromising the quality and comprehensiveness of the review.
【 授权许可】
CC BY
© Wallace et al; licensee BioMed Central Ltd. 2010
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311099298827ZK.pdf | 1438KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
PDF