期刊论文详细信息
Journal of Biomedical Semantics
Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
Research
Lang Li1  Weixin Xie1  Kunjie Fan1  Shijun Zhang1 
[1] Department of Biomedical Informatics, Ohio State University, 43210, Columbus, OH, USA;
关键词: Active learning;    Deep learning;    Drug-drug interaction;    Information retrieval;    Random negative sampling;    Positive sampling;    Similarity sampling;    Uncertainty sampling;   
DOI  :  10.1186/s13326-023-00287-7
 received in 2022-03-09, accepted in 2023-04-29,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

BackgroundDrug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper.ResultsPubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.ConclusionsBy integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202308158624743ZK.pdf 3395KB PDF download
40517_2023_256_Article_IEq11.gif 1KB Image download
40517_2023_258_Article_IEq114.gif 1KB Image download
40517_2023_258_Article_IEq122.gif 1KB Image download
40517_2023_258_Article_IEq133.gif 1KB Image download
MediaObjects/40249_2023_1063_MOESM8_ESM.docx 62KB Other download
40517_2023_256_Article_IEq33.gif 1KB Image download
Fig. 1 229KB Image download
40517_2023_256_Article_IEq34.gif 1KB Image download
MediaObjects/41021_2023_273_MOESM3_ESM.docx 42KB Other download
40517_2023_256_Article_IEq35.gif 1KB Image download
12936_2023_4577_Article_IEq66.gif 1KB Image download
40517_2023_256_Article_IEq36.gif 1KB Image download
MediaObjects/12888_2023_4818_MOESM4_ESM.pdf 4381KB PDF download
MediaObjects/12888_2023_4780_MOESM2_ESM.docx 19KB Other download
603KB Image download
40517_2023_256_Article_IEq38.gif 1KB Image download
Fig. 2 295KB Image download
Fig. 1 2661KB Image download
Fig. 4 961KB Image download
MediaObjects/12302_2023_737_MOESM1_ESM.docx 12190KB Other download
MediaObjects/12888_2023_4885_MOESM2_ESM.doc 48KB Other download
Fig. 6 218KB Image download
Fig. 2 192KB Image download
Fig. 7 183KB Image download
MediaObjects/12974_2023_2804_MOESM3_ESM.tif 12261KB Other download
Fig. 3 462KB Image download
Fig. 1 395KB Image download
Fig. 5 262KB Image download
Fig. 1 113KB Image download
Fig. 2 612KB Image download
Fig. 6 744KB Image download
MediaObjects/12888_2023_4793_MOESM1_ESM.pdf 183KB PDF download
Fig. 2 237KB Image download
MediaObjects/12974_2023_2797_MOESM7_ESM.docx 23KB Other download
Fig. 3 50KB Image download
40517_2023_256_Article_IEq47.gif 1KB Image download
Fig. 1 86KB Image download
Fig. 1 252KB Image download
12888_2023_4880_Article_IEq1.gif 1KB Image download
MediaObjects/41408_2023_830_MOESM1_ESM.pdf 1496KB PDF download
MediaObjects/12888_2023_4880_MOESM1_ESM.docx 22KB Other download
Fig. 2 450KB Image download
40517_2023_256_Article_IEq51.gif 1KB Image download
MediaObjects/13750_2023_304_MOESM2_ESM.docx 13KB Other download
40517_2023_256_Article_IEq52.gif 1KB Image download
Fig. 2 1329KB Image download
40517_2023_256_Article_IEq53.gif 1KB Image download
Fig. 2 104KB Image download
Fig. 3 380KB Image download
Fig. 3 286KB Image download
40517_2023_256_Article_IEq55.gif 1KB Image download
MediaObjects/13750_2023_304_MOESM6_ESM.xlsx 80KB Other download
Fig. 4 498KB Image download
MediaObjects/13750_2023_304_MOESM7_ESM.docx 26KB Other download
MediaObjects/12888_2023_4818_MOESM5_ESM.pdf 946KB PDF download
Fig. 3 974KB Image download
【 图 表 】

Fig. 3

Fig. 4

40517_2023_256_Article_IEq55.gif

Fig. 3

Fig. 3

Fig. 2

40517_2023_256_Article_IEq53.gif

Fig. 2

40517_2023_256_Article_IEq52.gif

40517_2023_256_Article_IEq51.gif

Fig. 2

12888_2023_4880_Article_IEq1.gif

Fig. 1

Fig. 1

40517_2023_256_Article_IEq47.gif

Fig. 3

Fig. 2

Fig. 6

Fig. 2

Fig. 1

Fig. 5

Fig. 1

Fig. 3

Fig. 7

Fig. 2

Fig. 6

Fig. 4

Fig. 1

Fig. 2

40517_2023_256_Article_IEq38.gif

40517_2023_256_Article_IEq36.gif

12936_2023_4577_Article_IEq66.gif

40517_2023_256_Article_IEq35.gif

40517_2023_256_Article_IEq34.gif

Fig. 1

40517_2023_256_Article_IEq33.gif

40517_2023_258_Article_IEq133.gif

40517_2023_258_Article_IEq122.gif

40517_2023_258_Article_IEq114.gif

40517_2023_256_Article_IEq11.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  文献评价指标  
  下载次数:14次 浏览次数:24次