BioData Mining | |
Diverse convergent evidence in the genetic analysis of complex disease: coordinating omic, informatic, and experimental evidence to better identify and validate risk factors | |
Timothy H Ciesielski5  Sarah A Pendergrass1  Marquitta J White2  Nuri Kodaman2  Rafal S Sobota2  Minjun Huang4  Jacquelaine Bartlett5  Jing Li4  Qinxin Pan4  Jiang Gui3  Scott B Selleck1  Christopher I Amos3  Marylyn D Ritchie1  Jason H Moore3  Scott M Williams5  | |
[1] Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA | |
[2] Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232-0700, USA | |
[3] Community and Family Medicine, Section of Biostatistics & Epidemiology, Geisel School of Medicine, Hanover, NH 03766, USA | |
[4] Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA | |
[5] Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA | |
关键词: False positives; False negatives; Type 1 error; Type 2 error; Omics; GWAS; Heterogeneity; Complex disease; Validation; Replication; | |
Others : 1084060 DOI : 10.1186/1756-0381-7-10 |
|
received in 2013-09-18, accepted in 2014-06-08, 发布年份 2014 | |
【 摘 要 】
In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions.
【 授权许可】
2014 Ciesielski et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150113144045670.pdf | 654KB | download | |
Figure 2. | 68KB | Image | download |
Figure 1. | 82KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, Brooks LD, Cardon LR, Daly M, Donnelly P, Fraumeni JF Jr, Freimer NB, Gerhard DS, Gunter C, Guttmacher AE, Guyer MS, Harris EL, Hoh J, Hoover R, Kong CA, Merikangas KR, Morton CC, Palmer LJ, Phimister EG, Rice JP, Roberts J, et al.: Replicating genotype-phenotype associations. Nature 2007, 447:655-660.
- [2]Igl BW, Konig IR, Ziegler A: What do we mean by 'replication' and 'validation' in genome-wide association studies? Hum Hered 2009, 67:66-68.
- [3]Ioannidis JP: Why most published research findings are false. PLoS Med 2005, 2:e124.
- [4]Ioannidis JP: Microarrays and molecular research: noise discovery? Lancet 2005, 365:454-455.
- [5]Tyler AL, Asselbergs FW, Williams SM, Moore JH: Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. Bioessays 2009, 31:220-227.
- [6]Greene CS, Penrod NM, Williams SM, Moore JH: Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS One 2009, 4:e5639.
- [7]Liu YJ, Papasian CJ, Liu JF, Hamilton J, Deng HW: Is replication the gold standard for validating genome-wide association findings? PLoS One 2008, 3:e4037.
- [8]Williams SM, Haines JL: Correcting away the hidden heritability. Ann Hum Genet 2011, 75:348-350.
- [9]Zaykin DV, Zhivotovsky LA: Ranks of genuine associations in whole-genome scans. Genetics 2005, 171:813-823.
- [10]Daumer M, Held U, Ickstadt K, Heinz M, Schach S, Ebers G: Reducing the probability of false positive research findings by pre-publication validation - experience with a large multiple sclerosis database. BMC Med Res Methodol 2008, 8:18.
- [11]Malley JD, Dasgupta A, Moore JH: The limits of p-values for biological data mining. BioData Min 2013, 6:10.
- [12]Nuzzo R: Scientific method: statistical errors. Nature 2014, 506:150-152.
- [13]Hill AB: The environment and disease: association or causation? Proc R Soc Med 1965, 58:295-300.
- [14]Phillips CV, Goodman KJ: The missed lessons of Sir Austin Bradford Hill. Epidemiol Perspect Innov 2004, 1:3.
- [15]Dudbridge F, Gusnanto A: Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 2008, 32:227-234.
- [16]PubMed. [http://www.ncbi.nlm.nih.gov/pubmed/ webcite]
- [17]GEO: Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/geo/ webcite]
- [18]NCBI: National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov/ webcite]
- [19]KEGG: Kyoto Encyclopedia of Genes and Genomes. [http://www.genome.jp/kegg/ webcite]
- [20]GO: The Gene Ontology. [http://www.geneontology.org/ webcite]
- [21]Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, et al.: Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat Genet 2009, 41:657-665.
- [22]Timmann C, Thye T, Vens M, Evans J, May J, Ehmen C, Sievertsen J, Muntau B, Ruge G, Loag W, Ansong D, Antwi S, Asafo-Adjei E, Nguah SB, Kwakye KO, Akoto AO, Sylverken J, Brendel M, Schuldt K, Loley C, Franke A, Meyer CG, Agbenyega T, Ziegler A, Horstmann RD: Genome-wide association study indicates two novel resistance loci for severe malaria. Nature 2012, 489:443-446.
- [23]Gauci R, Bennett D, Clark IA, Bryant C: The induction of tyrosine aminotransferase activity and its use as an indirect assay for endotoxin in mice infected with Plasmodium vinckei petteri. Int J Parasitol 1982, 12:279-284.
- [24]Williams SM, Canter JA, Crawford DC, Moore JH, Ritchie MD, Haines JL: Problems with genome-wide association studies. Science 2007, 316:1840-1842.
- [25]Lehmann JM, Moore LB, Smith-Oliver TA, Wilkison WO, Willson TM, Kliewer SA: An antidiabetic thiazolidinedione is a high affinity ligand for peroxisome proliferator-activated receptor gamma (PPAR gamma). J Biol Chem 1995, 270:12953-12956.
- [26]Saxena R, Voight BF, Lyssenko V, Burtt NP, De Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Bostrom K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Rastam L, Speliotes EK, Taskinen MR, Tuomi T, et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007, 316:1331-1336.
- [27]Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, et al.: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007, 316:1341-1345.
- [28]Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, McCarthy MI, Hattersley AT: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 2007, 316:1336-1341.
- [29]Consortium IMSG: Comprehensive follow-up of the first genome-wide association study of multiple sclerosis identifies KIF21B and TMEM39A as susceptibility loci. Hum Mol Genet 2010, 19:953-962.
- [30]Sterne JA, Davey Smith G: Sifting the evidence-what's wrong with significance tests? BMJ 2001, 322:226-231.
- [31]Fisher RA: The arrangement of field experiments. J Min Agric Great Britain 1926, 33:503-513.
- [32]Fisher RA: Statistical Methods for Research Workers, Volume 80. London: Oliver and Boyd; 1950.
- [33]Kraft P: Curses--winner's and otherwise--in genetic epidemiology. Epidemiology 2008, 19:649-651. discussion 657–648
- [34]Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG: Replication validity of genetic association studies. Nat Genet 2001, 29:306-309.
- [35]Rothman KJ: No adjustments are needed for multiple comparisons. Epidemiology 1990, 1:43-46.
- [36]Bender R, Lange S: Multiple test procedures other than Bonferroni's deserve wider use. BMJ 1999, 318:600-601.
- [37]Panagiotou OA, Willer CJ, Hirschhorn JN, Ioannidis JP: The power of meta-analysis in genome-wide association studies. Annu Rev Genomics Hum Genet 2013, 14:441-465.
- [38]Gisev N, Bell JS, Chen TF: Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm 2013, 9:330-338.
- [39]Reif DM, Sypa M, Lock EF, Wright FA, Wilson A, Cathey T, Judson RR, Rusyn I: ToxPi GUI: an interactive visualization tool for transparent integration of data from diverse sources of evidence. Bioinformatics 2013, 29:402-403.
- [40]Reif DM, Martin MT, Tan SW, Houck KA, Judson RS, Richard AM, Knudsen TB, Dix DJ, Kavlock RJ: Endocrine profiling and prioritization of environmental chemicals using ToxCast data. Environ Health Perspect 2010, 118:1714-1720.
- [41]Hauser MA, Li YJ, Takeuchi S, Walters R, Noureddine M, Maready M, Darden T, Hulette C, Martin E, Hauser E, Xu H, Schmechel D, Stenger JE, Dietrich F, Vance J: Genomic convergence: identifying candidate genes for Parkinson's disease by combining serial analysis of gene expression and genetic linkage. Hum Mol Genet 2003, 12:671-677.
- [42]Liang X, Slifer M, Martin ER, Schnetz-Boutaud N, Bartlett J, Anderson B, Zuchner S, Gwirtsman H, Gilbert JR, Pericak-Vance MA, Haines JL: Genomic convergence to identify candidate genes for Alzheimer disease on chromosome 10. Hum Mutat 2009, 30:463-471.
- [43]Jia P, Ewers JM, Zhao Z: Prioritization of epilepsy associated candidate genes by convergent analysis. PLoS One 2011, 6:e17162.
- [44]Okada Y, Wu D, Trynka G, Raj T, Terao C, Ikari K, Kochi Y, Ohmura K, Suzuki A, Yoshida S, Graham RR, Manoharan A, Ortmann W, Bhangale T, Denny JC, Carroll RJ, Eyler AE, Greenberg JD, Kremer JM, Pappas DA, Jiang L, Yin J, Ye L, Su DF, Yang J, Xie G, Keystone E, Westra HJ, Esko T, Metspalu A, et al.: Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 2014, 506:376-381.
- [45]Glazier AM, Nadeau JH, Aitman TJ: Finding genes that underlie complex traits. Science 2002, 298:2345-2349.
- [46]Rothman KJ: Causes. Am J Epidemiol 1976, 104:587-592.
- [47]Pendergrass SA, Hayes E, Farina G, Lemaire R, Farber HW, Whitfield ML, Lafyatis R: Limited systemic sclerosis patients with pulmonary arterial hypertension show biomarkers of inflammation and vascular injury. PLoS One 2010, 5:e12106. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0012106 webcite
- [48]Kim JH, Cheong HS, Park JS, Jang AS, Uh ST, Kim YH, Kim MK, Choi IS, Cho SH, Choi BW, Bae JS, Park CS, Shin HD: A genome-wide association study of total serum and mite-specific IgEs in asthma patients. PLoS One 2013, 8:e71958. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0071958 webcite
- [49]Higareda-Almaraz JC, Valtierra-Gutierrez IA, Hernandez-Ortiz M, Contreras S, Hernandez E, Encarnacion S: Analysis and prediction of pathways in HeLa cells by integrating biological levels of organization with systems-biology approaches. PLoS One 2013, 8:e65433. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0065433 webcite
- [50]Lian X, Selekman J, Bao X, Hsiao C, Zhu K, Palecek SP: A small molecule inhibitor of SRC family kinases promotes simple epithelial differentiation of human pluripotent stem cells. PLoS One 2013, 8:e60016. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0060016 webcite
- [51]Kingsley EP, Manceau M, Wiley CD, Hoekstra HE: Melanism in peromyscus is caused by independent mutations in agouti. PLoS One 2009, 4:e6435. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0006435 webcite