| BMC Genomics | |
| Functional annotation signatures of disease susceptibility loci improve SNP association analysis | |
| Alvaro NA Monteiro1  Merlise A Clyde2  Gary Lipton2  Edwin S Iversen2  | |
| [1] Cancer Epidemiology Program, H. Lee Moffitt Cancer Center & Research Institute, 12902 Magnolia Drive, 33612 Tampa, FL, USA;Department of Statistical Science, Duke University, Box 90251, 27708–0251 Durham, NC, USA | |
| 关键词: ENCODE project; Bayesian analysis; Functional annotations; SNPs; GWAS; Association study; | |
| Others : 1217199 DOI : 10.1186/1471-2164-15-398 |
|
| received in 2013-12-20, accepted in 2014-05-13, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Background
Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study’s analysis plan.
Results
We developed a Bayesian statistical model for the prior probability of phenotype–genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super–track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen–2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non–informative predictors and evaluated the model’s ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP’s presence in the GC. Further, using data from a genome–wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome–wide scale and improves power to detect associations.
Conclusions
We show how diverse functional annotations can be efficiently combined to create ‘functional signatures’ that predict the a priori odds of a variant’s association to a trait and how these signatures can be integrated into a standard genome–wide–scale association analysis, resulting in improved power to detect truly associated variants.
【 授权许可】
2014 Iversen et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150705054937105.pdf | 815KB | ||
| Figure 1. | 104KB | Image | |
| Figure 2. | 60KB | Image | |
| Figure 1. | 58KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 1.
【 参考文献 】
- [1]Manolio TA: Genomewide association studies and assessment of the risk of disease . N Engl J Med 2010, 363(2):166-176. doi:10.1056/NEJMra0905980. PMID:20647212. http://www.nejm.org/doi/pdf/10.1056/NEJMra0905980 webcite
- [2]Freedman ML, Monteiro ANA, Gayther SA, Coetzee GA, Risch A, Plass C, Casey G, Biasi MD, Carlson C, Duggan D, James M, Liu P, Tichelaar JW, Vikis HG, You M, Mills IG: Principles for the post–GWAS functional characterization of cancer risk loci . Nat Genet 2011, 43(6):513-518. doi:10.1038/ng.840
- [3]Witte JS, Greenland S, Haile RW, Bird CL: Hierarchical regression analysis applied to a study of multiple dietary exposures and breast cancer . Epidemiology 1994, 5(6):612-621.
- [4]Aragaki CC, Greenland S, Probst-Hensch N, Haile RW: Hierarchical modeling of gene-environment interactions: estimating NAT2 genotype–specific dietary effects on adenomatous polyps . Cancer Epidemiol Biomarkers & Prev 1997, 6(5):307-314. http://cebp.aacrjournals.org/content/6/5/307.full.pdf+html webcite
- [5]Hung RJ, Brennan P, Malaveille C, Porru S, Donato F, Boffetta P, Witte JS: Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer . Cancer Epidemiol Biomarkers & Prev 2004, 13(6):1013-1021.
- [6]Hung RJ, Baragatti M, Thomas D, McKay J, Szeszenia-Dabrowska N, Zaridze D, Lissowska J, Rudnai P, Fabianova E, Mates D, Foretova L, Janout V, Bencko V, Chabrier A, Moullan N, Canzian F, Hall J, Boffetta P, Brennan P: Inherited predisposition of lung cancer: a hierarchical modeling approach to DNA repair and cell cycle control pathways . Cancer Epidemiol Biomarkers & Prev 2007, 16(12):2736-2744.
- [7]Veyrieras JB, Kim SY, Dermitzakis ET, Gilad Y, Stephens M, Pritchard JK: High-resolution mapping of expression-QTLs yields insight into human gene regulation . PLoS Genet 2008, 4(10):1000214. doi:10.1371/journal.pgen.1000214
- [8]Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits . Proc Nat Acad Sci 2009, 106(23):9362-9367. doi:10.1073/pnas.0903103106. http://www.pnas.org/content/106/23/9362.full.pdf+html webcite
- [9]Lee SI, Dudley AM, Drubin D, Silver PA, Krogan NJ, Peér D, Koller D: Learning a prior on regulatory potential from eQTL data . PLoS Genet 2009, 5(1):1000358. doi:10.1371/journal.pgen.1000358
- [10]Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ: Trait–associated SNPs are more likely to be eQTLs: Annotation to enhance discovery from GWAS . PLoS Genet 2010, 6(4):1000888. doi:10.1371/journal.pgen.1000888
- [11]An integrated encyclopedia of DNA elements in the human genome Nature 2012, 489:57-74. doi:10.1038/nature11247
- [12]Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M: Linking disease associations with regulatory information in the human genome . Genome Res 2012, 22(9):1748-1759. doi:10.1101/gr.136127.111. http://genome.cshlp.org/content/22/9/1748.full.pdf+html webcite
- [13]Carbonetto P, Stephens M: Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn’s disease . PLoS Genet 2013, 9(10):1003770. doi:10.1371/journal.pgen.1003770
- [14]Pickrell JK: Joint analysis of functional genomic data and genome-wide association studies of 18 human traits . 2014. arXiv 1311.4843 [q-bio.GN]
- [15]Marchini J, Howie B, Myers S, McVean G, Donnelly P: A new multipoint method for genomewide association studies by imputation of genotypes . Nat Genet 2007, 39:906-913.
- [16]Servin B, Stephens M: Imputation–based analysis of association studies: Candidate regions and quantitative traits . PLoS Genet 2007, 3(7):114. doi:10.1371/journal.pgen.0030114
- [17]Wakefield J: A Bayesian measure of the probability of false discovery in genetic epidemiology studies . Am J Hum Genet 2007, 81(2):208-227. doi:10.1086/519024
- [18]Wakefield J: Bayes factors for genome–wide association studies: comparison with p-values . Genet Epidemiol 2009, 33(1):79-86. doi:10.1002/gepi.20359
- [19]The ENCODE Project Consortium: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project . Nature 2007, 447(7146):799-816.
- [20]The ENCODE Project Consortium: A user’s guide to the encyclopedia of DNA elements (ENCODE) . PLoS Biology 2011, 9(4):1001046. doi:10.1371/journal.pbio.1001046
- [21]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq . Nat Methods 2008, 5(7):621-628. doi:10.1038/nmeth.1226
- [22]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory–efficient alignment of short DNA sequences to the human genome . Genome Biol 2009, 10(3):25. BioMed Central Full Text
- [23]Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal A, Feil R, Schreiber SL, Lander ES: A bivalent chromatin structure marks key developmental genes in embryonic stem cells . Cell 2006, 125(2):315-326.
- [24]Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim T-K, Koche RP, Lee W, Mendenhall E, O’Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE: Genome–wide maps of chromatin state in pluripotent and lineage–committed cells . Nature 2007, 448(7153):553-560. doi:10.1038/nature06008
- [25]Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A, Weaver M, Shafer A, Lee K, Neri F, Humbert R, Singer MA, Richmond TA, Dorschner MO, McArthur M, Hawrylycz M, Green RD, Navas PA, Noble WS, Stamatoyannopoulos JA: Genome–scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays . Nat Methods 2006, 3(7):511-518. doi:10.1038/nmeth890
- [26]Sabo PJ, Hawrylycz M, Wallace JC, Humbert R, Yu M, Shafer A, Kawamoto J, Hall R, Mack J, Dorschner MO, McArthur M, Stamatoyannopoulos JA: Discovery of functional noncoding elements by digital analysis of chromatin structure . Proc Nat Acad Sci 2004, 101(48):16837-6842.
- [27]Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M, Weissman S, Snyder M: CREB binds to multiple loci on human chromosome 22 . Mol Cell Biol 2004, 24(9):3804-3814.
- [28]Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M: Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array– and sequencing–based technologies . Genome Res 2007, 17(6):898-909.
- [29]Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M: Distribution of nf–κb–binding sites across human chromosome 22 . Proc Nat Acad Sci USA 2003, 100(21):12247-12252.
- [30]Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S: Genome–wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing . Nat Methods 2007, 4(8):651-657. doi:10.1038/nmeth1068
- [31]Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB: PeakSeq enables systematic scoring of ChIP–seq experiments relative to controls . Nat Biotech 2009, 27(1):66-75. doi:10.1038/nbt.1518
- [32]Siepel A, Pollard K, Haussler D: New methods for detecting lineage–specific selection . Res Computat Mol Biol 2006, 3909:190-205.
- [33]Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large–scale variation in the human genome . Nat Genet 2004, 36(9):949-951.
- [34]Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW: Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome . Cytogenet & Genome Res 2006, 115(3/4):205-214.
- [35]Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations . Nat Methods 2010, 7:248-249.
- [36]Boyle A, Hong E, Hariharan M, Cheng Y, Schaub M, Kasowski M, Karczewski K, Park J, Hitz B, Weng S, Cherry J, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB . Genome Res 2012, 22(9):1790-1797. doi:10.1101/gr.137323.112
- [37]Hans CM: Bayesian lasso regression . Biometrika 2009, 96:835-845.
- [38]Richardson S, Bottolo L, Rosenthal JS: Bayesian models for sparse regression analysis of high dimensional data . In Bayesian Statistics 9. Edited by Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM. Oxford: Oxford University Press; 2011.
- [39]Griffin JE, Brown PJ: Bayesian adaptive lassos with non–convex penalization. Technical report, University of Kent, 2007
- [40]Hoggart CJ, Whittaker JC, Balding DJ, De Iorio M: Simultaneous analysis of all SNPs in genome–wide and re–sequencing association studies . PLoS Genet 2008, 4(7):1000130. doi:10.1371/journal.pgen.1000130
- [41]Griffin JE, Brown PJ: Inference with normal–gamma prior distributions in regression problems . Bayesian Anal 2010, 5(1):171-188.
- [42]Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, et al.: Systematic localization of common disease–associated variation in regulatory DNA . Science 2012, 337(6099):1190-1195. doi:10.1126/science.1222794. http://www.sciencemag.org/content/337/6099/1190.full.pdf webcite
- [43]Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY, Yen C-A, Schmitt AD, Espinoza CA, Ren B: A high–resolution map of the three–dimensional chromatin interactome in human cells . Nature 2013, 503:290-294. doi:10.1038/nature12644
- [44]Sanyal A, Lajoie BR, Jain G, Dekker J: The long–range interaction landscape of gene promoters . Nature 2012, 489:109-113. doi:10.1038/nature11279
- [45]Song H, Ramus SJ, Tyrer J, Bolton KL, Gentry-Maharaj A, Wozniak E, Anton-Culver H, Chang-Claude J, Cramer DW, DiCioccio R, Dork T, Goode EL, Goodman MT, Schildkraut JM, Sellers T, Baglietto L, Beckmann MW, Beesley J, Blaakaer J, Carney ME, Chanock S, Chen Z, Cunningham JM, Dicks E, Doherty JA, Durst M, Ekici AB, Fenstermacher D, Fridley BL, Giles G, et al.: A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2 . Nature Genet 2009, 42:996-1000. doi:10.1038/ng.424
- [46]Bolton KL, Tyrer J, Song H, Ramus SJ, Notaridou M, Jones C, Sher T, Gentry-Maharaj A, Wozniak E, Tsai Y-Y, Weidhaas J, Paik D, Van Den Berg DJ, Stram DO, Pearce CL, Wu AH, Brewster W, Anton-Culver H, Ziogas A, Narod SA, Levine DA, Kaye SB, Brown R, Paul J, Flanagan J, Sieh W, McGuire V, Whittemore AS, Campbell I, Gore ME, et al.: Common variants at 19p13 are associated with susceptibility to ovarian cancer . Nat Genet 2010, 42:880-884.
- [47]Goode EL, Chenevix-Trench G, Song H, Ramus SJ, Notaridou M, Lawrenson K, Widschwendter M, Vierkant RA, Larson MC, Krüger-Kjaer S, Birrer MJ, Berchuck A, Schildkraut J, Tomlinson I, Kiemeney LA, Cook LS, Gronwald J, Garcia-Closas M, Gore ME, Campbell I, Whittemore AS, Sutphen R, Phelan C, Anton-Culver H, Pearce CL, Lambrechts D, Rossing MA, Chang-Claude J, Moysich KB, Goodman MT, et al.: A genome-wide association study identifies susceptibility loci for ovarian cancer at 2q31 and 8q24 . Nat Genet 2010, 42:874-879. doi:10.1038/ng.668
- [48]Pharoah PDP, Tsai Y-Y, Ramus SJ, Phelan CM, Goode EL, Lawrenson K, Buckley M, Fridley BL, Tyrer JP, Shen H, Weber R, Karevan R, Larson MC, Song H, Tessier DC, Bacot F, Vincent D, Cunningham JM, Dennis J, Dicks E, Aben KK, Anton-Culver H, Antonenkova N, Armasu SM, Baglietto L, Bandera EV, Beckmann MW, Birrer MJ, Bloom G, Bogdanova N, et al.: GWAS meta–analysis and replication identifies three novel susceptibility loci for ovarian cancer . Nat Genet 2013, 45:362-370. doi:10.1038/ng.2564
- [49]Bojesen SE, Pooley KA, Johnatty SE, Beesley J, Michailidou K, Tyrer JP, Edwards SL, Pickett HA, Shen HC, Smart CE, Hillman KM, Mai PL, Lawrenson K, Stutz MD, Lu Y, Karevan R, Woods N, Johnston RL, French JD, Chen X, Weischer M, Nielsen SF, Maranian MJ, Ghoussaini M, Ahmed S, Baynes C, Bolla MK, Wang Q, Dennis J, McGuffog L: Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer . Nat Genet 2013, 45:371-384. doi:10.1038/ng.2566
- [50]Permuth-Wey J, Lawrenson K, Shen HC, Velkova A, Tyrer JP, Chen Z, Lin H-Y, Ann Chen Y, Tsai Y-Y, Qu X, Ramus SJ, Karevan R, Lee J, Lee N, Larson MC, Aben KK, Anton-Culver H, Antonenkova N, Antoniou AC, Armasu SM, Bacot F, Baglietto L, Bandera EV, Barnholtz-Sloan J, Beckmann MW, Birrer MJ, Bloom G, Bogdanova N, Brinton LA, Brooks-Wilson A, et al.: Identification and molecular characterization of a new ovarian cancer susceptibility locus at 17q21.31 . Nat Commun 2013, 4:1627. doi:10.1038/ncomms2613
- [51]Jeffreys H: Theory of Probability. Oxford: Oxford Univ. Press; 1961.
- [52]The 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes . Nature 2012, 491:56-65.
- [53]Kass RE, Raftery AE: Bayes factors . J Am Stat Assoc 1995, 90:773-795.
- [54]Wilson MA, Iversen ES, Clyde MA, Schmidler SC, Schildkraut JM: Supplement to “Bayesian model search and multilevel inference for SNP association studies” . Ann Appl Stat 2010, 4(3):1342-1364.
- [55]Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, Pohl A, Pheasant M, Meyer LR, Learned K, Hsu F, Hillman-Jackson J, Harte RA, Giardine B, Dreszer TR, Clawson H, Barber GP, Haussler D, Kent WJ: The UCSC genome browser database: update 2010 . Nucleic Acids Res 2010, 38(suppl 1):613-619. doi:10.1093/nar/gkp939. http://nar.oxfordjournals.org/content/38/suppl_1/D613.full.pdf+html webcite
- [56]Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E, Sirotkin K: dbSNP: the NCBI database of genetic variation . Nucleic Acids Res 2001, 29(1):308-311.
- [57]Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC . Genome Res 2002, 12(6):996-1006.
- [58]Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJM: ORegAnno: an open access database and curation system for literature–derived promoters, transcription factor binding sites and regulatory variation . Bioinformatics 2006, 22(5):637-640.
- [59]Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony S, Sleumer MC, Bilenky M, Haeussler M, Griffith M, Gallo SM, Giardine B, Hooghe B, Blanco E, Ticoll A, Lithwick S, Portales–Casamar E, Donaldson IJ, Robertson G, Wadelius C, Vlieghe D, Halfon MS, Wasserman W, Hardison R, Bergman CM, Jones SJM, Van Loo, P: ORegAnno: an open–access community–driven resource for regulatory annotation . Nucleic Acids Res 2008, 36(suppl 1):107-113.
- [60]Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines . J Chem Phys 1953, 21:1087-1091.
- [61]Gilks WR, Richardson S, Spiegelhalter DJ: Introducing Markov chain Monte Carlo . In Markov Chain Monte Carlo in Practice. Edited by Gilks WR, Richardson S, Spiegelhalter DJ. London: Chapman and Hall; 1996.
- [62]Gelman A, Rubin DB: Inference from iterative simulation using multiple sequences (with discussion) . Stat Sci 1992, 7:457-511.
- [63]Heidelberger P, Welch P: Simulation run length control in the presence of an initial transient . Oper Res 1983, 31:1109-1144.
- [64]Raftery AE, Lewis SM: Implementing MCMC . In Markov Chain Monte Carlo in Practice. Edited by Gilks WR, Richardson S, Spiegelhalter DJ. London: Chapman and Hall; 1996:115-127.
- [65]Geweke J: Evaluating the accuracy of sampling–based approaches to calculating posterior moments . In Bayesian Statistics 4. Edited by Bernado J, Erger J, AP D, Smith A. Oxford, UK: Clarendon Press; 1992.
- [66]Plummer M, Best N, Cowles K, Vines K: CODA: Output Analysis and Diagnostics for MCMC . 2010. R package version 0.13-5. http://CRAN.R-project.org/package=coda webcite
- [67]Ihaka R, Gentleman R: R: A language for data analysis and graphics . J Comput Graph Stat 1996, 5(3):299-314.
- [68]Permuth-Wey J, Kim D, Tsai Y-Y, Lin H-Y, Chen YA, Barnholtz-Sloan J, Birrer MJ, Bloom G, Chanock SJ, Chen Z, Cramer DW, Cunningham JM, Dagne G, Ebbert-Syfrett J, Fenstermacher D, Fridley BL, Garcia-Closas M, Gayther SA, Ge W, Gentry-Maharaj A, Gonzalez-Bosquet J, Goode EL, Iversen E, Jim H, Kong W, McLaughlin J, Menon U, Monteiro ANA, Narod SA, Pharoah PDP, et al.: LIN28B polymorphisms influence susceptibility to epithelial ovarian cancer . Cancer Res 2011, 71(11):3896-3903. doi:10.1158/0008-5472.CAN-10-4167. http://cancerres.aacrjournals.org/content/71/11/3896.full.pdf+html webcite
PDF