学位论文详细信息
Declarative Querying For Biological Sequences.
Sequence Databases;Querying;Suffix Trees;Selectivity Estimation;Computer Science;Engineering;Computer Science & Engineering
Tata, SandeepAnn Arbor ;
University of Michigan
关键词: Sequence Databases;    Querying;    Suffix Trees;    Selectivity Estimation;    Computer Science;    Engineering;    Computer Science & Engineering;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/55670/tatas_1.pdf?sequence=2&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Life science research labs today manage increasing volumes of sequencedata. Much of the data management and querying today is accomplishedprocedurally using Perl, Python, or Java programs that integrate datafrom different sources and query tools. The dangers of this proceduralapproach are well known to the database community-- a) severelimitations on the ability to rapidly express queries and b)inefficient query plans due to the lack of sophisticated optimizationtools. This situation is likely to get worse with advances inhigh-throughput technologies that make it easier to quickly producevast amounts of sequence data. The need for a declarative andefficient system to manage and query biological sequence data isurgent. To address this need, we designed the Periscope/SQ system.Periscope/SQ extends current relational systems to enablesophisticated queries on sequence data and can optimize and executethese queries efficiently. This thesis describes the problems that need to be solved to make itpossible to build the Periscope/SQ system.First, we describe thealgebraic framework which forms the backbone of Periscope/SQ. Second,we describe algorithms to construct large scale suffix tree indexesfor efficiently answering sequence queries. Third, we describetechniques for selectivity estimation and optimization in the contextof queries over biological sequences. Next, we demonstrate how some ofthe techniques developed for Periscope/SQ can be applied to produce apowerful mining algorithm that we call FLAME. Finally, wedescribe GeneFinder, a biological application built on top ofPeriscope/SQ. GeneFinder is currently being used to predict the targets oftranscription factors.Today, genomic and proteomic sequences are the most abundantlyavailable source of high-quality biological data. By making it possible todeclaratively and efficiently query vast amount of sequence data,Periscope/SQ opens the door to vast improvements in the pace ofbioinformatics research.

【 预 览 】
附件列表
Files Size Format View
Declarative Querying For Biological Sequences. 1425KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:9次