学位论文详细信息
Text and Network Mining for Literature-Based Scientific Discovery inBiomedicine.
Information Extraction;Natural Language Processing;Text Mining;Bioinformatics;Literature-based Discovery;Network Analysis;Computer Science;Engineering;Science;Computer Science & Engineering
Ozgur, ArzucanJagadish, Hosagrahar V. ;
University of Michigan
关键词: Information Extraction;    Natural Language Processing;    Text Mining;    Bioinformatics;    Literature-based Discovery;    Network Analysis;    Computer Science;    Engineering;    Science;    Computer Science & Engineering;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/78956/ozgur_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
Most of the new and important findings in biomedicine are only available in the text of the published scientific articles. The first goal of this thesis is to design methods based on natural language processing and machine learning to extract information about genes, proteins, and their interactions from text. We introduce a dependency tree kernel based relation extraction method to identify the interacting protein pairs in a sentence. We propose two kernel functions based on cosine similarity and edit distance among the dependency tree paths connecting the protein names. Using these kernel functions with supervised and semi-supervised machine learning methods, we report significant improvement (59.96% F-Measure performance over the AIMED data set) compared to the previous results in the literature. We also address the problem of distinguishing factual information from speculative information. Unlike previous methods that formulate the problem as a sentence classification task, we propose a two-step method to identify the speculative fragments of sentences. First, we use supervised classification to identify the speculation keywords using a diverse set of linguistic features that represent their contexts. Next, we use the syntactic structures of the sentences to resolve their linguistic scopes. Our results show that the method is effective in identifying speculative portions of sentences. The speculation keyword identification results are close to the upper bound of human inter-annotator agreement. The second goal of this thesis is to generate new scientific hypotheses using the literature-mined protein/gene interactions. We propose a literature-based discovery approach, where we start with a set of genes known to be related to a given concept and integrate text mining with network centrality analysis to predict novel concept-related genes. We present the application of the proposed approach to two different problems, namely predicting gene-disease associations and predicting genes that are important for vaccine development. Our results provide new insights and hypotheses worth future investigations in these domains and show the effectiveness of the proposed approach for literature-based discovery.
【 预 览 】
附件列表
Files Size Format View
Text and Network Mining for Literature-Based Scientific Discovery inBiomedicine. 2238KB PDF download
  文献评价指标  
  下载次数:21次 浏览次数:54次