期刊论文详细信息
BMC Bioinformatics
Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
Research
Takenao Ohkawa1  Thi Thanh Thuy Phan1 
[1] Department of Information Science, Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada, 657-8501, Kobe, Japan;
关键词: Biomedical text mining;    Information extraction;    k;    Protein protein interaction;   
DOI  :  10.1186/s12859-016-1100-z
来源: Springer
PDF
【 摘 要 】

BackgroundProtein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to enhance the performance of the existing methods.ResultsOur feature-based method incorporates the strength of many kinds of diverse features, such as lexical and word context features derived from sentences, syntactic features derived from parse trees, and features using existing patterns to extract PPIs automatically from articles. Among these abundant features, we assemble the related features into four groups and define the contribution level (CL) for each group, which consists of related features. Our method consists of two steps. First, we divide the training set into subsets based on the structure of the sentence and the existence of significant keywords (SKs) and apply the sentence patterns given in advance to each subset. Second, we automatically perform feature selection based on the CL values of the four groups that consist of related features and the k-nearest neighbor algorithm (k-NN) through three approaches: (1) focusing on the group with the best contribution level (BEST1G); (2) unoptimized combination of three groups with the best contribution levels (U3G); (3) optimized combination of two groups with the best contribution levels (O2G).ConclusionsOur method outperforms other state-of-the-art PPI extraction systems in terms of F-score on the HPRD50 corpus and achieves promising results that are comparable with these PPI extraction systems on other corpora. Further, our method always obtains the best F-score on all the corpora than when using k-NN only without exploiting the CLs of the groups of related features.

【 授权许可】

CC BY   
© Phan and Ohkawa. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311106905217ZK.pdf 1216KB PDF download
MediaObjects/13011_2023_568_MOESM2_ESM.docx 26KB Other download
Fig. 7 1070KB Image download
MediaObjects/13011_2023_568_MOESM3_ESM.docx 32KB Other download
MediaObjects/12888_2023_5202_MOESM1_ESM.docx 29KB Other download
12951_2015_155_Article_IEq78.gif 1KB Image download
40538_2023_473_Article_IEq1.gif 1KB Image download
Fig. 8 474KB Image download
【 图 表 】

Fig. 8

40538_2023_473_Article_IEq1.gif

12951_2015_155_Article_IEq78.gif

Fig. 7

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  文献评价指标  
  下载次数:1次 浏览次数:0次