期刊论文详细信息
BMC Genetics
Sequencing genes in silico using single nucleotide polymorphisms
Lue Ping Zhao1  John A Hansen2  Xin Huang1  Shuying Sue Li1  Bo Zhang1  Xinyi Cindy Zhang1 
[1] Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA;School of Medicine, University of Washington, Seattle, WA, 98195, USA
关键词: imputation;    multi-allelic gene;    1000 Genomes Project;    SNPs;    In silico;   
Others  :  1122532
DOI  :  10.1186/1471-2156-13-6
 received in 2011-10-31, accepted in 2012-01-30,  发布年份 2012
PDF
【 摘 要 】

Background

The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs) discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive.

Results

To accelerate the translation from discovery to functional studies, we propose an

    i
n
    s
ilico gene
    s
equencing method (ISS), which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles) at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%). This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC) Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes.

Conclusions

Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate genes for more detailed functional and mechanistic studies.

【 授权许可】

   
2012 Zhang et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150214021723686.pdf 379KB PDF download
Figure 6. 17KB Image download
Figure 5. 43KB Image download
Figure 4. 41KB Image download
Figure 3. 48KB Image download
Figure 2. 32KB Image download
Figure 1. 27KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Manolio TA: Genomewide Association Studies and Assessment of the Risk of Disease. New England Journal of Medicine 2010, 363(2):166-176.
  • [2]Hawn TR, Verbon A, Lettinga KD, Zhao LP, Li SS, Laws RJ, Skerrett SJ, Beutler B, Schroeder L, Nachman A, Ozinsky A, Smith KD, Aderem A: A common dominant TLR5 stop codon polymorphism abolishes flagellin signaling and is associated with susceptibility to legionnaires' disease. Journal of Experimental Medicine 2003, 198(10):1563-1572.
  • [3]Li SS, Wang H, Smith A, Zhang B, Zhang XC, Schoch G, Geraghty D, Hansen JA, Zhao LP: Predicting Multiallelic Genes Using Unphased and Flanking Single Nucleotide Polymorphisms. Genetic Epidemiology 2011, 35(2):85-92.
  • [4]Howie BN, Donnelly P, Marchini J: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 2009, 5(6):e1000529.
  • [5]Marchini J, Howie B, Myers S, McVean G, Donnelly P: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 2007, 39(7):906-913.
  • [6]Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR: MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes. Genetic Epidemiology 2010, 34(8):816-834.
  • [7]Browning SR: Multilocus association mapping using variable-length Markov chains. Am J Hum Genet 2006, 78(6):903-913.
  • [8]Li SS, Khalid N, Carlson C, Zhao LP: Estimating haplotype frequencies and standard errors for multiple single nucleotide polymorphisms. Biostatistics 2003, 4(4):513-522.
  • [9]Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995, 12(5):921-927.
  • [10]Hawley ME, Kidd KK: HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J Hered 1995, 86(5):409-411.
  • [11]Long JC, Williams RC, Urbanek M: An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics 1995, 56(3):799-810.
  • [12]Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, Jiang R, Messer CJ, Chew A, Han JH, Duan J, Carr JL, Lee MS, Koshy B, Kumar AM, Zhang G, Newell WR, Windemuth A, Xu C, Kalbfleisch TS, Shaner SL, Arnold K, Schulz V, Drysdale CM, Nandabalan K, Judson RS, Ruano G, Vovis GF: Haplotype variation and linkage disequilibrium in 313 human genes. Science 2001, 293(5529):489-493.
  • [13]Akaike H: New Look at Statistical-Model Identification. Ieee Transactions on Automatic Control 1974, Ac19(6):716-723.
  • [14]Lubbe JCAvd: Information Theory. Cambridge: Cambridge University Press; 1997.
  • [15]Zhang XC, Li SS, Wang H, Hansen JA, Zhao LP: Empirical evaluations of analytical issues arising from predicting HLA alleles using multiple SNPs. BMC Genet 2011, 12:39.
  • [16]Altshuler DL, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Collins FS, De la Vega FM, Donnelly P, Egholm M, Flicek P, Gabriel SB, Gibbs RA, Knoppers BM, Lander ES, Lehrach H, Mardis ER, McVean GA, Nickerson D, Peltonen L, Schafer AJ, Sherry ST, Wang J, Wilson RK, Gibbs RA, Deiros D, Metzker M, Muzny D, Reid J, Wheeler D, et al.: A map of human genome variation from population-scale sequencing. Nature 2010, 467(7319):1061-1073.
  • [17]Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, Todd JA, Donnelly P, Barrett JC, Davison D, Easton D, Evans D, Leung HT, Marchini JL, Morris AP, Spencer CCA, Tobin MD, Attwood AP, Boorman JP, Cant B, Everson U, Hussey JM, Jolley JD, Knight AS, Koch K, Meech E, et al.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447(7145):661-678.
  • [18]Gloyn AL, Weedon MN, Owen KR, Turner MJ, Knight BA, Hitman G, Walker M, Levy JC, Sampson M, Halford S, McCarthy MI, Hattersley AT, Frayling TM: Large-scale association studies of variants in genes encoding the pancreatic beta-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 2003, 52(2):568-572.
  • [19]Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype structure in the human genome. Nat Genet 2001, 29(2):229-232.
  • [20]The International HapMap C: The International HapMap Project. Nature 2003, 426(6968):789-796.
  • [21]Zhao H, Pfeiffer R, Gail MH: Haplotype analysis in population genetics and association studies. Pharmacogenomics 2003, 4(2):171-178.
  • [22]Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 2002, 70(2):425-434.
  • [23]Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 2001, 294(5547):1719-1723.
  • [24]Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science 2002, 296(5576):2225-2229.
  文献评价指标  
  下载次数:34次 浏览次数:12次