Systematic Reviews | |
Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature | |
Research | |
Aziz Mert Ipekci1  Nicola Low1  Diana Buitrago-Garcia1  Leonie Heron1  Hira Imeri1  Michel Counotte2  Quentin Haas3  Poorya Amini4  Julien Knafou5  Nikolay Borissov6  Douglas Teodoro7  | |
[1] Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland;Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland;Wageningen Bioveterinary Research, Wageningen University & Research, Wageningen, The Netherlands;Risklick AG, Bern, Switzerland;Risklick AG, Bern, Switzerland;CTU Bern, University of Bern, Bern, Switzerland;University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland;University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland;CTU Bern, University of Bern, Bern, Switzerland;University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland;Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland; | |
关键词: COVID-19; Living systematic review; Literature screening; Text classification; Language model; Deep learning; Transfer learning; | |
DOI : 10.1186/s13643-023-02247-9 | |
received in 2022-07-25, accepted in 2023-04-24, 发布年份 2023 | |
来源: Springer | |
【 摘 要 】
BackgroundThe COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19-related publications to help scale up the epidemiological curation process.MethodsIn this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6365 publications manually classified into two classes, three subclasses, and 22 sub-subclasses relevant for epidemiological triage purposes. In a k-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article.ResultsThe ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best-performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset.ConclusionThis study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.
【 授权许可】
CC BY
© The Author(s) 2023
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202309076622121ZK.pdf | 3898KB | download | |
Fig. 2 | 221KB | Image | download |
MediaObjects/13046_2023_2715_MOESM5_ESM.pdf | 1570KB | download | |
Fig. 1 | 396KB | Image | download |
Fig. 4 | 3514KB | Image | download |
Fig. 1 | 90KB | Image | download |
42004_2023_911_Article_IEq33.gif | 1KB | Image | download |
Fig. 1 | 2421KB | Image | download |
Fig. 2 | 182KB | Image | download |
Fig. 2 | 40KB | Image | download |
MediaObjects/13690_2023_1131_MOESM1_ESM.docx | 36KB | Other | download |
【 图 表 】
Fig. 2
Fig. 2
Fig. 1
42004_2023_911_Article_IEq33.gif
Fig. 1
Fig. 4
Fig. 1
Fig. 2
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]