期刊论文详细信息
BMC Bioinformatics
Generating samples for association studies based on HapMap data
Jing Li1  Yixuan Chen1 
[1] Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH 44106, USA
Others  :  1169123
DOI  :  10.1186/1471-2105-9-44
 received in 2007-03-22, accepted in 2008-01-24,  发布年份 2008
PDF
【 摘 要 】

Background

With the completion of the HapMap project, a variety of computational algorithms and tools have been proposed for haplotype inference, tag SNP selection and genome-wide association studies. Simulated data are commonly used in evaluating these new developed approaches. In addition to simulations based on population models, empirical data generated by perturbing real data, has also been used because it may inherit specific properties from real data. However, there is no tool that is publicly available to generate large scale simulated variation data by taking into account knowledge from the HapMap project.

Results

A computer program (gs) was developed to quickly generate a large number of samples based on real data that are useful for a variety of purposes, including evaluating methods for haplotype inference, tag SNP selection and association studies. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local linkage disequilibrium (LD) patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. Both quantitative and qualitative traits have been incorporated in the program. Phenotypes are generated based on a disease model, or based on the effect of a quantitative trait nucleotide, both of which can be specified by users. In addition to single-locus disease models, two-locus disease models have also been implemented that can incorporate any degree of epistasis. Users are allowed to specify all nine parameters in a 3 × 3 penetrance table. For several commonly used two-locus disease models, the program can automatically calculate penetrances based on the population prevalence and marginal effects of a disease that users can conveniently specify.

Conclusion

The program gs can effectively generate large scale genetic and phenotypic variation data that can be used for evaluating new developed approaches. It is freely available from the authors' web site at http://www.eecs.case.edu/~jxl175/gs.html webcite.

【 授权许可】

   
2008 Li and Chen; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416123405461.pdf 708KB PDF download
Figure 6. 130KB Image download
Figure 5. 132KB Image download
Figure 4. 139KB Image download
Figure 3. 51KB Image download
Figure 2. 51KB Image download
Figure 1. 18KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]The International HapMap Consortium: A haplotype map of the human genome. Nature 2005, 437:1299-320.
  • [2]Durrant C, Zondervan KT, Cardon LR, Hunt S, Deloukas P, Morris AP: Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet 2004, 75:35-43.
  • [3]Li J, Jiang T: Haplotype-based linkage disequilibrium mapping via direct data mining. Bioinformatics 2005, 21:4384-93.
  • [4]Hudson RR: Generating samples under a Wright-Fisher neutral model. Bioinformatics 2002, 18:337-8.
  • [5]Marchini J, Donnelly P, Cardon LR: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 2005, 37:413-7.
  • [6]Evans DM, Marchini J, Morris AP, Cardon LR: Two-stage two-locus models in genome-wide association. PLoS Genet 2006, 2(9):e157.
  • [7]Lonita L, Man M: Optimal two-stage strategy for detecting interacting genes in complex diseases. BMC Genetics 2006, 7:39. BioMed Central Full Text
  • [8]Terwilliger JD, Ott J: Handbook of Human Genetic Linkage. Baltimore, Johns Hopkins; 1994.
  • [9]Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005, 21:263-5.
  • [10]Lynch M, Walsh B: Genetics and analysis of quantitative traits. Sinauer Associates, MA, USA; 1998.
  • [11]Li W, Reich J: A Complete Enumeration and Classification of Two-Locus Disease Models. Hum Hered 2000, 50:334-349.
  • [12]Matsumoto M, Nishimura T: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans on Modeling and Computer Simulation 1998, 8:3-30.
  • [13]ENCODE Project [http://www.genome.gov/10506161] webcite
  • [14]Schymick JC, Scholz SW, Fung H-C, Britton A, Arepalli S, Gibbs JR, Lombardi F, Matarin M, Kasperaviciute D, Hernandez DG, Crews C, Bruijn L, Rothstein J, Mora G, Restagno G, Chio A, Singleton A, Hardy J, Traynor BJ: Genome-wide genotyping in amyotrophic lateral sclerosis and neurologically normal controls. Lancet Neurology 2007, 6(4):322-328.
  • [15]Coriell Institute for Medical Research [http://ccr.coriell.org] webcite
  • [16]Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 2006, 78:629-644.
  • [17]Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science 2002, 296(5576):2225-9.
  • [18]Sham P: Statistics in human genetics. New York, NY: Oxford University Press; 1998.
  文献评价指标  
  下载次数:0次 浏览次数:0次