学位论文详细信息
Functional Interpretation of High-Throughput Sequencing Data.
bioinformatics;next-generation sequencing;gene set enrichment testing;functional interpretation;ChIP-seq;RNA-seq;Genetics;Molecular;Cellular and Developmental Biology;Science (General);Statistics and Numeric Data;Health Sciences;Science;Bioinformatics
Lee, CheeParker, Stephen Cj ;
University of Michigan
关键词: bioinformatics;    next-generation sequencing;    gene set enrichment testing;    functional interpretation;    ChIP-seq;    RNA-seq;    Genetics;    Molecular;    Cellular and Developmental Biology;    Science (General);    Statistics and Numeric Data;    Health Sciences;    Science;    Bioinformatics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/120731/cheelee_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Functional interpretation of high-throughput sequencing (HTS) data provides insight into biological systems, including important pathways in the context under study. A common approach is gene set enrichment (GSE) testing. GSE emerged in the age of microarrays as a way to biologically interpret long lists of differentially expressed genes (DEGs). However, HTS data has characteristics not present in microarray data that can bias GSE results. My thesis is focused on identifying, characterizing, and accounting for biases to improve functional interpretation in HTS data. In this thesis, I present GSE tests designed for ChIP-seq data and RNA-seq data. Our tests have applications beyond HTS data, which we show by using them to analyze genomic features, including mappability and repeat content. ChIP-Enrich is a GSE test for ChIP-seq data. It includes a database of locus definitions to annotate peaks to different gene loci (such as exons, introns, promoters, and other intergenic regions), which allows for biological discovery unique to different regions. ChIP-Enrich empirically adjusts for the observed bias due to the varying lengths of these gene loci in its enrichment test. RNA-Enrich is a GSE test for RNA-seq data. RNA-Enrich corrects for the selection bias often observed in RNA-seq data, where long and highly expressed genes are more likely to be identified as DEGs. Unlike other GSE tests for RNA-seq data, RNA-Enrich does not require permutations or a cut-off to define DEGs, and works well with small sample sizes. For both ChIP-Enrich and RNA-Enrich, we showed well-calibrated type I error compared to competing methods. Finally, we characterize sequence mappability, which is one potential bias in the interpretation of HTS data. We characterize properties of the main contributors of low mappability (transposons and segmental duplications), overall mappability, and their relationship with gene locus length and function. Across different transcribed and regulatory regions, certain gene functions showed unique signatures involving significantly more/fewer associated repeats, higher/lower mappability, and longer/shorter locus length. Our analyses provide insight into evolutionary selection pressures that maintain complexity of gene regulation. Overall, we demonstrate that considering characteristics of the human genome is essential to improving functional interpretation of HTS data.

【 预 览 】
附件列表
Files Size Format View
Functional Interpretation of High-Throughput Sequencing Data. 6945KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:24次