会议论文详细信息
Genetic Analysis Workshop 17
PROCEEDINGS Open Access Genome-wide association analysis of GAW17 data using an empirical Bayes variable selection
生物科学;医药卫生
Vitara Pungpapong ; Libo Wang ; Yanzhu Lin ; Dabao Zhang ; Min Zhang*
Others  :  http://www.biomedcentral.com/content/pdf/1753-6561-5-S9-S5.pdf
PID  :  43877
来源: CEUR
PDF
【 摘 要 】

Next-generation sequencing technologies enable us to explore rare functional variants. However, most current statistical techniques are too underpowered to capture signals of rare variants in genome-wide association studies. We propose a supervised coalescing of single-nucleotide polymorphisms to obtain gene-based markers that can stably reveal possible genetic effects related to rare alleles. We use a newly developed empirical Bayes variable selection algorithm to identify associations between studied traits and genetic markers. Using our novel method, we analyzed the three continuous phenotypes in the GAW17 data set across 200 replicates, with intriguing results. Background With the advent of next-generation sequencing, rare variants such as single-nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) less than 5% are getting more attention in genome-wide associa- tion studies (GWAS). Because of the small variance at a locus with a single rare allele, it is difficult to detect the allele’s association with the phenotype of interest. One approach to tackling this problem is to collapse multiple rare SNPs within a defined region and treat them as a single predictor in the model. Known genetic regions are used in the collapsing process to get gene-based markers. Penalized orthogonal-components regression (POCRE) [1] is used to perform this task. Genome-wide association studies are challenged by the “curse of dimensionality”; that is, a large number of SNPs are genotyped (large p) from a small number of biological samples (small n). As a result, an increasing effort has been devoted to selecting variables in high- dimensional data. One strategy for dealing with variable selection is through the thresholding concept. Empirical Bayes thresholding [2,3] was proposed to estimate sparse sequences observed in Gaussian white noise. Here, we use the empirical Bayes thresholding method to select variables in linear regressions with efficient implementation. Final models are obtained by entering gene-based markers and environmental factors possibly associated with the phenotype of interest. All analyses are based on three continuous phenotypes in the GAW17 data set across 200 replicates. Methods Data set The genome-wide association of the three continuous phenotypes (Q1, Q2, and Q4) in the GAW17 data set [4] was investigated. All analyses presented here are based on the genotype of 697 unrelated individuals. The genotype data were recoded into counts of minor alleles using PLINK [5]. The other three traits (Age, Sex, and Smoke) were used in the model to consider the environ- mental effects. The analyses were performed for all 200 replicates. Supervised coalescing of SNPs in a genetic region The GAW17 data consist of 3,205 autosomal genes with 24,487 SNPs, where only 3,132 SNPs (12.79%) have MAF ≥ 0.05. A large proportion of these rare variants present challenges for statistical analyses to detect their associations to a phenotype of interest wh

【 预 览 】
附件列表
Files Size Format View
PROCEEDINGS Open Access Genome-wide association analysis of GAW17 data using an empirical Bayes variable selection 398KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:4次