学位论文详细信息
Techniques to improve genome assembly quality
Genome assembly;High performance computing
Nihalani, Rahul ; Aluru, Srinivas Computational Science and Engineering Vuduc, Richard Jordan, King Wang, May Dongmei Catalyurek, Umit V. ; Aluru, Srinivas
University:Georgia Institute of Technology
Department:Computational Science and Engineering
关键词: Genome assembly;    High performance computing;   
Others  :  https://smartech.gatech.edu/bitstream/1853/61272/1/NIHALANI-DISSERTATION-2019.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

De-novo genome assembly is an important problem in the field of genomics. Discovering and analysing genomes of different species has numerous applications. For humans, it can lead to early detection of disease traits and timely prevention of diseases like cancer. In addition, it is useful in discovering genomes of unknown species. Even though it has received enormous attention in the last couple of decades, the problem remains unsolved to a satisfactory level, as shown in various scientificstudies. Paired-endsequencing is a technology that sequences pairs of short strands from a genome, called reads. The pairs of reads originate from nearby genomic locations, and are commonly used to help more accurately determine the genomiclocation of individual reads and resolve repeats in genome assembly. In this thesis, we describe the genome assembly problem, and the key challenges involved in solving it.We discuss related work where we describe the two most popular models to approach the problem: de-Bruijn graphs and overlap graphs, along with their pros and cons. We describe our proposed techniques to improve the quality of genome assembly. Our main contribution in this work is designing a de-Bruijn graph based assemblyalgorithm to effectively utilize paired reads to improve genome assembly quality. We also discuss how our algorithm tackles some of the key challenges involved in genomeassembly. We adapt this algorithm to design a parallel strategy to obtain high quality assembly for large datasets such as rice within reasonable time-frame. In addition, we describe our work on probabilistically estimating overlap graphs for large short reads datasets. We discuss the results obtained for our work, and conclude with some future work.

【 预 览 】
附件列表
Files Size Format View
Techniques to improve genome assembly quality 875KB PDF download
  文献评价指标  
  下载次数:30次 浏览次数:11次