BMC Medical Genetics | |
Data mining of high density genomic variant data for prediction of Alzheimer's disease risk | |
Valentin Dinu1  Natalia Briones2  | |
[1] Department of Biomedical Informatics, Arizona State University, Mayo Clinic, Samuel C. Johnson Research Bldg. 13212 East Shea Boulevard, Scottsdale, Arizona 85259, USA;Computational Biosciences Program, School of Mathematics and Statistical Sciences, Arizona State University, 1711 South Rural Road, Tempe, Arizona, 85287-1804, USA | |
关键词: Random Forest; SNPs; GWAS; Late-Onset Alzheimer's Disease; | |
Others : 1177922 DOI : 10.1186/1471-2350-13-7 |
|
received in 2011-07-22, accepted in 2012-01-25, 发布年份 2012 | |
【 摘 要 】
Background
The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs) at genes implicated in immune response, cholesterol/lipid metabolism, and cell membrane processes have been confirmed by genome-wide association studies (GWAS) to be associated with late-onset Alzheimer's disease (LOAD), a percentage of AD heritability continues to be unexplained. We try to find other genetic variants that may influence LOAD risk utilizing data mining methods.
Methods
Two different approaches were devised to select SNPs associated with LOAD in a publicly available GWAS data set consisting of three cohorts. In both approaches, single-locus analysis (logistic regression) was conducted to filter the data with a less conservative p-value than the Bonferroni threshold; this resulted in a subset of SNPs used next in multi-locus analysis (random forest (RF)). In the second approach, we took into account prior biological knowledge, and performed sample stratification and linkage disequilibrium (LD) in addition to logistic regression analysis to preselect loci to input into the RF classifier construction step.
Results
The first approach gave 199 SNPs mostly associated with genes in calcium signaling, cell adhesion, endocytosis, immune response, and synaptic function. These SNPs together with APOE and GAB2 SNPs formed a predictive subset for LOAD status with an average error of 9.8% using 10-fold cross validation (CV) in RF modeling. Nineteen variants in LD with ST5, TRPC1, ATG10, ANO3, NDUFA12, and NISCH respectively, genes linked directly or indirectly with neurobiology, were identified with the second approach. These variants were part of a model that included APOE and GAB2 SNPs to predict LOAD risk which produced a 10-fold CV average error of 17.5% in the classification modeling.
Conclusions
With the two proposed approaches, we identified a large subset of SNPs in genes mostly clustered around specific pathways/functions and a smaller set of SNPs, within or in proximity to five genes not previously reported, that may be relevant for the prediction/understanding of AD.
【 授权许可】
2012 Briones and Dinu; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150504032920772.pdf | 434KB | download | |
Figure 2. | 40KB | Image | download |
Figure 1. | 55KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Park A: Alzheimer's Unlocked. (cover story). Time 2010, 176(17):53.
- [2]Hollingworth P, Harold D, Jones L, Owen MJ, Williams J: Alzheimer's disease genetics: current knowledge and future challenges. Int J Geriatr Psychiatry 2010.
- [3]Mawuenyega KG, Sigurdson W, Ovod V, Munsell L, Kasten T, Morris JC, Yarasheski KE, Bateman RJ: Decreased clearance of CNS beta-amyloid in Alzheimer's disease. Science 2010, 330(6012):1774.
- [4]Gatz M, Reynolds CA, Fratiglioni L, Johansson B, Mortimer JA, Berg S, Fiske A, Pedersen NL: Role of genes and environments for explaining Alzheimer disease. Arch Gen Psychiatry 2006, 63(2):168-174.
- [5]Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Williams A, Jones N, Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, et al.: Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet 2009, 41(10):1088-1093.
- [6]Tanzi RE, Bertram L: Twenty years of the Alzheimer's disease amyloid hypothesis: a genetic perspective. Cell 2005, 120(4):545-555.
- [7]Coon KD, Myers AJ, Craig DW, Webster JA, Pearson JV, Lince DH, Zismann VL, Beach TG, Leung D, Bryden L, Halperin RF, Marlowe L, Kaleem M, Walker DG, Ravid R, Heward CB, Rogers J, Papassotiropoulos A, Reiman EM, Hardy J, Stephan DA: A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. J Clin Psychiatry 2007, 68(4):613-618.
- [8]Avramopoulos D: Genetics of Alzheimer's disease: recent advances. Genome Med 2009, 1(3):34. BioMed Central Full Text
- [9]Reiman EM, Webster JA, Myers AJ, Hardy J, Dunckley T, Zismann VL, Joshipura KD, Pearson JV, Hu-Lince D, Huentelman MJ, Craig DW, Coon KD, Liang WS, Herbert RH, Beach T, Rohrer KC, Zhao AS, Leung D, Bryden L, Marlowe L, Kaleem M, Mastroeni D, Grover A, Heward CB, Ravid R, Rogers J, Hutton ML, Melquist S, Petersen RC, Alexander GE, Caselli RJ, Kukull W, Papassotiropoulos A, Stephan DA: GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. Neuron 2007, 54(5):713-720.
- [10]Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE: Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 2007, 39(1):17-23.
- [11]Lambert JC, Heath S, Even G, Campion D, Sleegers K, Hiltunen M, Combarros O, Zelenika D, Bullido MJ, Tavernier B, Letenneur L, Bettens K, Berr C, Pasquier F, Fievet N, Barberger-Gateau P, Engelborghs S, De Deyn P, Mateo I, Franck A, Helisalmi S, Porcellini E, Hanon O, European Alzheimer's Disease Initiative Investigators, de Pancorbo MM, Lendon C, Dufouil C, Jaillard C, Leveillard T, Alvarez V, Bosco P, et al.: Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nat Genet 2009, 41(10):1094-1099.
- [12]Carrasquillo MM, Belbin O, Hunter TA, Ma L, Bisceglio GD, Zou F, Crook JE, Pankratz VS, Dickson DW, Graff-Radford NR, Petersen RC, Morgan K, Younkin SG: Replication of CLU, CR1, and PICALM associations with alzheimer disease. Arch Neurol 2010, 67(8):961-964.
- [13]Seshadri S, Fitzpatrick AL, Ikram MA, DeStefano AL, Gudnason V, Boada M, Bis JC, Smith AV, Carassquillo MM, Lambert JC, Harold D, Schrijvers EM, Ramirez-Lorca R, Debette S, Longstreth WT Jr, Janssens AC, Pankratz VS, Dartigues JF, Hollingworth P, Aspelund T, Hernandez I, Beiser A, Kuller LH, Koudstaal PJ, Dickson DW, Tzourio C, Abraham R, Antunez C, Du Y, Rotter JI, et al.: Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA 2010, 303(18):1832-1840.
- [14]Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, Carrasquillo MM, Abraham R, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Jones N, Stretton A, Thomas C, Richards A, Ivanov D, Widdowson C, Chapman J, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Brown KS, Passmore PA, Craig D, et al.: Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nat Genet 2011, 43(5):429-435.
- [15]Naj AC, Jun G, Beecham GW, Wang LS, Vardarajan BN, Buros J, Gallins PJ, Buxbaum JD, Jarvik GP, Crane PK, Larson EB, Bird TD, Boeve BF, Graff-Radford NR, De Jager PL, Evans D, Schneider JA, Carrasquillo MM, Ertekin-Taner N, Younkin SG, Cruchaga C, Kauwe JS, Nowotny P, Kramer P, Hardy J, Huentelman MJ, Myers AJ, Barmada MM, Demirci FY, Baldwin CT, et al.: Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nat Genet 2011, 43(5):436-441.
- [16]Bertram L, Lange C, Mullin K, Parkinson M, Hsiao M, Hogan MF, Schjeide BM, Hooli B, Divito J, Ionita I, Jiang H, Laird N, Moscarillo T, Ohlsen KL, Elliott K, Wang X, Hu-Lince D, Ryder M, Murphy A, Wagner SL, Blacker D, Becker KD, Tanzi RE: Genome-wide association analysis reveals putative Alzheimer's disease susceptibility loci in addition to APOE. Am J Hum Genet 2008, 83(5):623-632.
- [17]Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, Ivanov D, Pocklington A, Abraham R, Hollingworth P, Sims R, Gerrish A, Pahwa JS, Jones N, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, et al.: Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer's disease. PLoS One 2010, 5(11):e13950.
- [18]Bertram L, Lill CM, Tanzi RE: The genetics of Alzheimer disease: back to the future. Neuron 2010, 68(2):270-281.
- [19]Bertram L: Alzheimer's Genetics in the GWAS Era: A Continuing Story of 'Replications and Refutations'. Curr Neurol Neurosci Rep 2011, 11(3):246-253.
- [20]Nizzari M, Venezia V, Repetto E, Caorsi V, Magrassi R, Gagliani MC, Carlo P, Florio T, Schettini G, Tacchetti C, Russo T, Diaspro A, Russo C: Amyloid precursor protein and Presenilin1 interact with the adaptor GRB2 and modulate ERK 1,2 signaling. J Biol Chem 2007, 282(18):13833-13844.
- [21]Dinu V, Zhao H, Miller PL: Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis. J Biomed Inform 2007, 40(6):750-760.
- [22]Alahari SK, Reddig PJ, Juliano RL: The integrin-binding protein Nischarin regulates cell migration by inhibiting PAK. EMBO J 2004, 23(14):2777-2788.
- [23]Eswaran J, Soundararajan M, Kumar R, Knapp S: UnPAKing the class differences among p21-activated kinases. Trends Biochem Sci 2008, 33(8):394-403.
- [24]Chakravarthy B, Rashid A, Brown L, Tessier L, Kelly J, Menard M: Association of Gap-43 (neuromodulin) with microtubule-associated protein MAP-2 in neuronal cells. Biochem Biophys Res Commun 2008, 371(4):679-683.
- [25]Fu G, Vallee S, Rybakin V, McGuire MV, Ampudia J, Brockmeyer C, Salek M, Fallen PR, Hoerter JA, Munshi A, Huang YH, Hu J, Fox HS, Sauer K, Acuto O, Gascoigne NR: Themis controls thymocyte selection through regulation of T cell antigen receptor-mediated signaling. Nat Immunol 2009, 10(8):848-856.
- [26]Patrick MS, Oda H, Hayakawa K, Sato Y, Eshima K, Kirikae T, Iemura S, Shirai M, Abe T, Natsume T, Sasazuki T, Suzuki H: Gasp, a Grb2-associating protein, is critical for positive selection of thymocytes. Proc Natl Acad Sci USA 2009, 106(38):16345-16350.
- [27]Labrecque N, Baldwin T, Lesage S: Molecular and genetic parameters defining T-cell clonal selection. Immunol Cell Biol 2010.
- [28]Mancuso M, Calsolaro V, Orsucci D, Carlesi C, Choub A, Piazza S, Siciliano G: Mitochondria, cognitive impairment, and Alzheimer's disease. Int J Alzheimers Dis 2009, 2009:951548.
- [29]Yano T: The energy-transducing NADH: quinone oxidoreductase, complex I. Mol Aspects Med 2002, 23(5):345-368.
- [30]Cardoso SM, Proenca MT, Santos S, Santana I, Oliveira CR: Cytochrome c oxidase is decreased in Alzheimer's disease platelets. Neurobiol Aging 2004, 25(1):105-110.
- [31]Rhein V, Song X, Wiesner A, Ittner LM, Baysang G, Meier F, Ozmen L, Bluethmann H, Drose S, Brandt U, Savaskan E, Czech C, Gotz J, Eckert A: Amyloid-beta and tau synergistically impair the oxidative phosphorylation system in triple transgenic Alzheimer's disease mice. Proc Natl Acad Sci USA 2009, 106(47):20057-20062.
- [32]KEGG PATHWAY Database [http://www.genome.jp/kegg/pathway.html] webcite
- [33]Majidi M, Hubbs AE, Lichy JH: Activation of extracellular signal-regulated kinase 2 by a novel Abl-binding protein, ST5. J Biol Chem 1998, 273(26):16608-16614.
- [34]Hebert SS, Papadopoulou AS, Smith P, Galas MC, Planel E, Silahtaroglu AN, Sergeant N, Buee L, De Strooper B: Genetic ablation of Dicer in adult forebrain neurons results in abnormal tau hyperphosphorylation and neurodegeneration. Hum Mol Genet 2010, 19(20):3959-3969.
- [35]Gustke N, Trinczek B, Biernat J, Mandelkow EM, Mandelkow E: Domains of tau protein and interactions with microtubules. Biochemistry 1994, 33(32):9511-9522.
- [36]Brunden KR, Trojanowski JQ, Lee VM: Advances in tau-focused drug discovery for Alzheimer's disease and related tauopathies. Nat Rev Drug Discov 2009, 8(10):783-793.
- [37]Strubing C, Krapivinsky G, Krapivinsky L, Clapham DE: TRPC1 and TRPC5 form a novel cation channel in mammalian brain. Neuron 2001, 29(3):645-655.
- [38]Shim S, Goh EL, Ge S, Sailor K, Yuan JP, Roderick HL, Bootman MD, Worley PF, Song H, Ming GL: XTRPC1-dependent chemotropic guidance of neuronal growth cones. Nat Neurosci 2005, 8(6):730-735.
- [39]Marambaud P, Dreses-Werringloer U, Vingtdeux V: Calcium signaling in neurodegeneration. Mol Neurodegener 2009, 4:20. BioMed Central Full Text
- [40]Garcia-Arencibia M, Hochfeld WE, Toh PP, Rubinsztein DC: Autophagy, a guardian against neurodegeneration. Semin Cell Dev Biol 2010, 21(7):691-698.
- [41]Levine B, Kroemer G: Autophagy in the pathogenesis of disease. Cell 2008, 132(1):27-42.
- [42]Boland B, Kumar A, Lee S, Platt FM, Wegiel J, Yu WH, Nixon RA: Autophagy induction and autophagosome clearance in neurons: relationship to autophagic pathology in Alzheimer's disease. J Neurosci 2008, 28(27):6926-6937.
- [43]Lipinski MM, Zheng B, Lu T, Yan Z, Py BF, Ng A, Xavier RJ, Li C, Yankner BA, Scherzer CR, Yuan J: Genome-wide analysis reveals mechanisms modulating autophagy in normal brain aging and in Alzheimer's disease. Proc Natl Acad Sci USA 2010, 107(32):14164-14169.
- [44]Milenkovic VM, Brockmann M, Stohr H, Weber BH, Strauss O: Evolution and functional divergence of the anoctamin family of membrane proteins. BMC Evol Biol 2010, 10:319. BioMed Central Full Text
- [45]Hartzell HC, Yu K, Xiao Q, Chien LT, Qu Z: Anoctamin/TMEM16 family members are Ca2+-activated Cl- channels. J Physiol 2009, 587(Pt 10):2127-2139.
- [46]Curtis MA, Faull RL, Eriksson PS: The effect of neurodegenerative diseases on the subventricular zone. Nat Rev Neurosci 2007, 8(9):712-723.
- [47]Schreiber R, Uliyakina I, Kongsuphol P, Warth R, Mirza M, Martins JR, Kunzelmann K: Expression and function of epithelial anoctamins. J Biol Chem 2010, 285(10):7838-7845.
- [48]Meng YA, Yu Y, Cupples LA, Farrer LA, Lunetta KL: Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 2009, 10:78. BioMed Central Full Text
- [49]Breiman L: Random Forests. Mach Learning 2001, 45(1):5-32.
- [50]Tan P, Steinbach Michael, Kumar V: Introduction to data mining. Boston: Pearson Addison Wesley; 2006.
- [51]Breiman L: Bagging predictors. Mach Learning 1996, 24(2):123-140.
- [52]Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81(3):559-575.
- [53]Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software. ACM SIGKDD Explorations Newsletter 2009, 11(1):10.
- [54]Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005, 21(2):263-265.
- [55]Yi Y, Kamata-Sakurai M, Denda-Nagai K, Itoh T, Okada K, Ishii-Schrade K, Iguchi A, Sugiura D, Irimura T: Mucin 21/epiglycanin modulates cell adhesion. J Biol Chem 2010, 285(28):21233-21240.
- [56]Rujkijyanont P, Beyene J, Wei K, Khan F, Dror Y: Leukaemia-related gene expression in bone marrow cells from patients with the preleukaemic disorder Shwachman-Diamond syndrome. Br J Haematol 2007, 137(6):537-544.