学位论文详细信息
The Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies.
Genotype Imputation;Statistical Genetics;Bioinformatics;Next-generation Sequencing;Phylogenetic Diversity;Study Design;Genetics;Molecular;Cellular and Developmental Biology;Neurosciences;Psychiatry;Public Health;Statistics and Numeric Data;Health Sciences;Science;Bioinformatics
Zhang, PengBurmeister, Margit ;
University of Michigan
关键词: Genotype Imputation;    Statistical Genetics;    Bioinformatics;    Next-generation Sequencing;    Phylogenetic Diversity;    Study Design;    Genetics;    Molecular;    Cellular and Developmental Biology;    Neurosciences;    Psychiatry;    Public Health;    Statistics and Numeric Data;    Health Sciences;    Science;    Bioinformatics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/99798/penzhang_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Technological advances now allow investigators to use sequencing data to identify genetic risk variants for complex diseases. However, it is still expensive to sequence a large sample of individuals. While genotype imputation can augment sequence studies, challenges still remain, such as imputation with population or family structures and imputation of rare variants. This dissertation aims to tackle these two challenges.The first project considers imputation with family structures, which extended from an existing imputation program that assumes unrelated individuals in a sample. I propose a strategy for imputing data with family structures and apply it to a family-based association study for bipolar disorder. The results suggest the involvement of ion channelopathy in bipolar pathogenesis.The second and third projects provide sampling strategies for next-generation sequencing. The goal is to select a subset from a study sample that incorporates maximal number of variants when sequenced, or to achieve maximal imputation accuracy when impute the sequences of the rest study sample using the sequenced subset or both. In the second project, I propose the ;;most diverse panel” by adapting the concept of the phylogenetic diversity. This strategy assumes that the panel with the biggest overall tree length in the phylogenetic tree represents the longest evolutionary time, allowing the maximal number of mutation events to occur. Sequencing such a panel can thus identify the maximal number of variants. In the third project I propose the ;;most representative panel” by considering both the selected and unselected haplotypes. The goal is to identify at least one optimal selected reference haplotype for each unselected haplotype. Because it is computationally impossible to perform an exhaustive search for a large sample size, I develop a hill-climbing algorithm that updates a randomly selected panel a predefined number of iterations or until it converges. Using simulated sequence data and real sequence data from the 1000 Genomes Project, I compare the two proposed panels to randomly selected panels and provide suggestions on which algorithm to use when planning sequencing studies with specific study samples.

【 预 览 】
附件列表
Files Size Format View
The Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies. 4856KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:43次