学位论文详细信息
Improving quality of high-throughput sequencing reads
Next-Generation Sequencing (NGS);Third-Generation Sequencing (TGS);error correction;variant calling;germline variant;somatic variant
Heo, Yun
关键词: Next-Generation Sequencing (NGS);    Third-Generation Sequencing (TGS);    error correction;    variant calling;    germline variant;    somatic variant;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/88064/HEO-DISSERTATION-2015.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Rapid advances in high-throughput sequencing (HTS) technologies have led to an exponential increase in the amount of sequencing data. HTS sequencing reads, however, contain far more errors than does data collected through traditional sequencing methods. Errors in HTS reads degrade the quality of downstream analyses. Correcting errors has been shown to improve the quality of these analyses.Correcting errors in sequencing data is a time-consuming and memory-intensive process. Even though many methods for correcting errors in HTS data have been developed, no one could correct errors with high accuracy while using a small amount of memory and in a short time. Another problem in using error correction methods is that no standard or comprehensive method is yet available to evaluate the accuracy and effectiveness of these error correction methods.To alleviate these limitations and analyze error correction outputs, this dissertation presents three novel methods. The first one, known as BLESS (Bloom-filter-based error correction solution for high-throughput sequencing reads), is a new error correction method that uses a Bloom filter as the main data structure. Compared to previous methods, it allows for the correction of errors with the highest accuracy at an average of 40 X memory usage reduction. BLESS is parallelized using hybrid OpenMP and MPI programming, which makes BLESS one of the fastest error correction tools. The second method, known as SPECTACLE (Software Package for Error Correction Tool Assessment on Nucleic Acid Sequences), supplies a standard way to evaluate error correction methods. SPECTACLE is the comprehensive method that can (1) do a quantitative analysis on both DNA and RNA corrected reads from any sequencing platforms and (2) handle diploid genomes and differentiate heterozygous alleles from sequencing errors.Lastly, this research analyzes the effect of sequencing errors on variant calling, which is one of the most important clinical applications for HTS data. For this, the environments for tracing the effect of sequencing errors on germline and somatic variant calling was developed. Using the environment, this research studies how sequencing errors degrade the results of variant calling and how the results can be improved. Based on the new findings, ROOFTOP (RemOve nOrmal reads From TumOr samPles) that can improve the accuracy of somatic variant calling by removing normal cells in tumor samples.A series of studies on sequencing errors in this dissertation would be helpful to understand how sequencing errors degrade downstream analysis outputs and how the quality of sequencing data could be improved by removing errors in the data.

【 预 览 】
附件列表
Files Size Format View
Improving quality of high-throughput sequencing reads 4840KB PDF download
  文献评价指标  
  下载次数:28次 浏览次数:48次