学位论文

【摘要】

The rise of next-generation sequencing has produced an abundance of data with almost limitless analysis applications. As sequencing technology decreases in cost and increases in throughput, the amount of available data is quickly outpacing improve- ments in processor speed. Analysis methods must also increase in scale to remain computationally tractable. At the same time, larger datasets and the availability of population-wide data offer a broader context with which to improve accuracy.This thesis presents three tools that improve the scalability of sequencing data storage and analysis. First, a lossy compression method for RNA-seq alignments offers extreme size reduction without compromising downstream accuracy of isoform assembly and quantitation. Second, I describe a graph genome analysis tool that filters population variants for optimal aligner performance. Finally, I offer several methods for improving CNV segmentation accuracy, including borrowing strength across samples to overcome the limitations of low coverage. These methods compose a practical toolkit for improving the computational power of genomic analysis.

【预览】

附件列表
Files	Size	Format	View
Methods for Identifying Variation in Large-Scale Genomic Data	3230KB	PDF	download


Methods for Identifying Variation in Large-Scale Genomic Data
computational genomics;compression;graph genome;alignment;copy number analysis;Computer Science
Pritt, Mark JacobLangmead, Benjamin ;
Johns Hopkins University
关键词: computational genomics; compression; graph genome; alignment; copy number analysis; Computer Science;
Others : https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/60131/PRITT-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
瑞士\|英语
来源: JOHNS HOPKINS DSpace Repository
PDF


	文献评价指标
	下载次数：40次	浏览次数：70次

【 摘 要 】

【 预 览 】

【摘要】

【预览】