期刊论文详细信息
BMC Bioinformatics
A pipeline for the retrieval and extraction of domain-specific information with application to COVID-19 immune signatures
Software
Steven H. Kleinstein1  Adam J. H. Newton2  Robert A. McDougal3  David Chartash4 
[1] Department of Pathology, Yale School of Medicine, Yale University, 06511, New Haven, CT, USA;Department of Immunobiology, Yale School of Medicine, Yale University, 06511, New Haven, CT, USA;Program in Computational Biology and Bioinformatics, Yale University, 06511, New Haven, CT, USA;Department of Physiology and Pharmacology, SUNY Downstate Health Sciences University, 11203, Brooklyn, NY, USA;Yale Center for Medical Informatics, Yale School of Medicine, Yale University, 06511, New Haven, CT, USA;Department of Biostatistics, Yale School of Public Health, Yale University, 06511, New Haven, CT, USA;Department of Pathology, Yale School of Medicine, Yale University, 06511, New Haven, CT, USA;Yale Center for Medical Informatics, Yale School of Medicine, Yale University, 06511, New Haven, CT, USA;Department of Biostatistics, Yale School of Public Health, Yale University, 06511, New Haven, CT, USA;Program in Computational Biology and Bioinformatics, Yale University, 06511, New Haven, CT, USA;Yale Center for Medical Informatics, Yale School of Medicine, Yale University, 06511, New Haven, CT, USA;Department of Biostatistics, Yale School of Public Health, Yale University, 06511, New Haven, CT, USA;School of Medicine, University College Dublin - National University of Ireland, Dublin, Co. Dublin, Republic of Ireland;
关键词: COVID-19;    Biomarkers;    Data mining;    Immunity;    Knowledge bases;   
DOI  :  10.1186/s12859-023-05397-8
 received in 2023-01-26, accepted in 2023-06-23,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

BackgroundThe accelerating pace of biomedical publication has made it impractical to manually, systematically identify papers containing specific information and extract this information. This is especially challenging when the information itself resides beyond titles or abstracts. For emerging science, with a limited set of known papers of interest and an incomplete information model, this is of pressing concern. A timely example in retrospect is the identification of immune signatures (coherent sets of biomarkers) driving differential SARS-CoV-2 infection outcomes.ImplementationWe built a classifier to identify papers containing domain-specific information from the document embeddings of the title and abstract. To train this classifier with limited data, we developed an iterative process leveraging pre-trained SPECTER document embeddings, SVM classifiers and web-enabled expert review to iteratively augment the training set. This training set was then used to create a classifier to identify papers containing domain-specific information. Finally, information was extracted from these papers through a semi-automated system that directly solicited the paper authors to respond via a web-based form.ResultsWe demonstrate a classifier that retrieves papers with human COVID-19 immune signatures with a positive predictive value of 86%. The type of immune signature (e.g., gene expression vs. other types of profiling) was also identified with a positive predictive value of 74%. Semi-automated queries to the corresponding authors of these publications requesting signature information achieved a 31% response rate.ConclusionsOur results demonstrate the efficacy of using a SVM classifier with document embeddings of the title and abstract, to retrieve papers with domain-specific information, even when that information is rarely present in the abstract. Targeted author engagement based on classifier predictions offers a promising pathway to build a semi-structured representation of such information. Through this approach, partially automated literature mining can help rapidly create semi-structured knowledge repositories for automatic analysis of emerging health threats.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202309157575750ZK.pdf 2247KB PDF download
Fig. 2 255KB Image download
Fig. 7 48KB Image download
MediaObjects/41408_2023_892_MOESM12_ESM.xlsx 22KB Other download
Fig. 4 1975KB Image download
Fig. 1 322KB Image download
Fig. 3 368KB Image download
Fig. 3 232KB Image download
MediaObjects/12888_2023_5109_MOESM1_ESM.docx 17KB Other download
Fig. 1 225KB Image download
Fig. 4 1862KB Image download
Fig. 1 295KB Image download
Fig. 1 313KB Image download
MediaObjects/41408_2023_890_MOESM1_ESM.docx 3614KB Other download
MediaObjects/40249_2023_1120_MOESM1_ESM.tif 1308KB Other download
Fig. 2 139KB Image download
12888_2023_5115_Article_IEq1.gif 1KB Image download
MediaObjects/12951_2020_626_MOESM1_ESM.docx 4431KB Other download
Fig. 3 1103KB Image download
Fig. 3 155KB Image download
40517_2023_266_Article_IEq45.gif 1KB Image download
40517_2023_266_Article_IEq47.gif 1KB Image download
40517_2023_266_Article_IEq54.gif 1KB Image download
40517_2023_266_Article_IEq61.gif 1KB Image download
40517_2023_266_Article_IEq62.gif 1KB Image download
40517_2023_266_Article_IEq63.gif 1KB Image download
40517_2023_266_Article_IEq64.gif 1KB Image download
40517_2023_266_Article_IEq67.gif 1KB Image download
Fig. 2 187KB Image download
40517_2023_266_Article_IEq1.gif 1KB Image download
Fig. 1 273KB Image download
40517_2023_266_Article_IEq2.gif 1KB Image download
MediaObjects/12888_2023_5015_MOESM1_ESM.xlsx 119KB Other download
40517_2023_266_Article_IEq4.gif 1KB Image download
Fig. 1 336KB Image download
40517_2023_266_Article_IEq6.gif 1KB Image download
Fig. 1 726KB Image download
Fig. 1 1086KB Image download
Fig. 5 1206KB Image download
MediaObjects/12947_2023_311_MOESM4_ESM.docx 81KB Other download
40854_2023_507_Article_IEq17.gif 1KB Image download
MediaObjects/12951_2023_1985_MOESM3_ESM.pdf 1543KB PDF download
Fig. 2 1138KB Image download
Fig. 2 227KB Image download
Fig. 1 99KB Image download
MediaObjects/42004_2023_979_MOESM3_ESM.pdf 20727KB PDF download
Fig. 6 737KB Image download
Fig. 1 592KB Image download
Fig. 2 226KB Image download
40854_2023_507_Article_IEq23.gif 1KB Image download
Fig. 1 180KB Image download
Fig. 2 413KB Image download
Fig. 10 4137KB Image download
Fig. 2 252KB Image download
MediaObjects/12888_2023_5025_MOESM1_ESM.doc 70KB Other download
13690_2023_1151_Article_IEq1.gif 1KB Image download
Fig. 1 1190KB Image download
910KB Image download
40854_2023_507_Article_IEq31.gif 1KB Image download
Scheme 1 960KB Image download
Fig. 1 75KB Image download
Fig. 1 244KB Image download
Fig. 3 544KB Image download
Fig. 2 262KB Image download
Fig. 1 543KB Image download
Fig. 2 79KB Image download
Fig. 1 1388KB Image download
Fig. 3 2769KB Image download
13570_2023_282_Article_IEq4.gif 1KB Image download
【 图 表 】

13570_2023_282_Article_IEq4.gif

Fig. 3

Fig. 1

Fig. 2

Fig. 1

Fig. 2

Fig. 3

Fig. 1

Fig. 1

Scheme 1

40854_2023_507_Article_IEq31.gif

Fig. 1

13690_2023_1151_Article_IEq1.gif

Fig. 2

Fig. 10

Fig. 2

Fig. 1

40854_2023_507_Article_IEq23.gif

Fig. 2

Fig. 1

Fig. 6

Fig. 1

Fig. 2

Fig. 2

40854_2023_507_Article_IEq17.gif

Fig. 5

Fig. 1

Fig. 1

40517_2023_266_Article_IEq6.gif

Fig. 1

40517_2023_266_Article_IEq4.gif

40517_2023_266_Article_IEq2.gif

Fig. 1

40517_2023_266_Article_IEq1.gif

Fig. 2

40517_2023_266_Article_IEq67.gif

40517_2023_266_Article_IEq64.gif

40517_2023_266_Article_IEq63.gif

40517_2023_266_Article_IEq62.gif

40517_2023_266_Article_IEq61.gif

40517_2023_266_Article_IEq54.gif

40517_2023_266_Article_IEq47.gif

40517_2023_266_Article_IEq45.gif

Fig. 3

Fig. 3

12888_2023_5115_Article_IEq1.gif

Fig. 2

Fig. 1

Fig. 1

Fig. 4

Fig. 1

Fig. 3

Fig. 3

Fig. 1

Fig. 4

Fig. 7

Fig. 2

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  文献评价指标  
  下载次数:4次 浏览次数:0次