期刊论文详细信息
BMC Genomics
Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach
Jian Wang1  Zhe He2  Tingting Zhao3  Jinfeng Zhang4  Jinchan Qu4  Jie Hao4  Dongrui Zhong4  Albert Steppi5  Pei-Yau Lung6 
[1] CloudMedx, 94301, Palo Alto, CA, USA;College of Communication and Information, Florida State University, 32306, Tallahassee, FL, USA;Department of Geography, Florida State University, 32306, Tallahassee, FL, USA;Department of Statistics, Florida State University, 32306, Tallahassee, FL, USA;Laboratory of Systems Pharmacology at Harvard Medical School, 02115, Boston, MA, USA;Verisk – Insurance Solutions, 06457, Middletown, CT, USA;
关键词: Protein-protein interactions;    Mutations;    Text mining;    Biomedical literature retrieval;    Protein interactions affected by mutations;   
DOI  :  10.1186/s12864-020-07185-7
来源: Springer
PDF
【 摘 要 】

BackgroundInformation on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation.ResultsOur system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score.ConclusionsThe performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202104286426045ZK.pdf 888KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:1次