学位论文详细信息
Automated Natural-Language Processing for Integration and Functional Annotation of Complex Biological Systems.
Natural Language Processing;Wnt Signaling;Genomic Functional Annotation;Prostate Cancer;Epigenetics;Literature Summarization;Engineering;Health Sciences;Science;Bioinformatics
Santos, Carlos F.Omenn, Gilbert S. ;
University of Michigan
关键词: Natural Language Processing;    Wnt Signaling;    Genomic Functional Annotation;    Prostate Cancer;    Epigenetics;    Literature Summarization;    Engineering;    Health Sciences;    Science;    Bioinformatics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/58394/csantos_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This dissertation discusses the use of automated natural language processing (NLP) for characterization of biomolecular events in signal transduction pathway databases. I also discuss the use of a dynamic map engine for efficiently navigating large biomedical document collections and functionally annotating high-throughput genomic data. An application is presented where NLP software, beginning with genomic expression data,automatically identifies and joins disparate experimental observations supporting biochemical interaction relationships between candidate genes in the Wnt signaling pathway.I discuss the need for accurate named entity resolution to the biological sequence databases and how sequence-based approaches can unambiguously link automatically-extractedassertions to their respective biomolecules in a high-speed manner. I then demonstrate a search engine, BioSearch-2D, which renders the contents of large biomedical document collections into a single, dynamic map. With this engine, the prostate cancer epigeneticsliterature is analyzed and I demonstrate that the summarization map closely matches that provided by expert human review articles. Examples include displays which prominently feature genes such as the androgen receptor and glutathione S-transferase P1 together withthe National Library of Medicine’s Medical Subject Heading (MeSH) descriptions which match the roles described for those genes in the human review articles. In a secondapplication of BioSearch-2D, I demonstrate the engine’s application as a context-specific functional annotation system for cancer-related gene signatures. Our engine matches the annotation produced by a Gene Ontology-based annotation engine for 6 cancer-related gene signatures. Additionally, it assigns highly-significant MeSH terms as annotation for the gene list which are not produced by the GO-based engine. I find that the BioSearch-2D displayfacilitates both the exploration of large document collections in the biomedical literature as well as provides users with an accurate annotation engine for ad-hoc gene sets. In the future, the use of both large-scale biomedical literature summarization engines and automated protein-protein interaction discovery software could greatly assist manual and expensive data curation efforts involving describing complex biological processes or disease states.

【 预 览 】
附件列表
Files Size Format View
Automated Natural-Language Processing for Integration and Functional Annotation of Complex Biological Systems. 690KB PDF download
  文献评价指标  
  下载次数:24次 浏览次数:25次