BMC Genomics | |
The application of network label propagation to rank biomarkers in genome-wide Alzheimer’s data | |
Shyam Visweswaran2  M Ilyas Kamboh1  M Michael Barmada1  Matthew E Stokes2  | |
[1] Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA;The Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA | |
关键词: Alzheimer’s disease; Single nucleotide polymorphism; Reproducibility; Prediction; Label propagation; Feature ranking; Genome-wide association study; Bioinformatics; | |
Others : 1217474 DOI : 10.1186/1471-2164-15-282 |
|
received in 2013-06-12, accepted in 2014-03-25, 发布年份 2014 | |
【 摘 要 】
Background
Ranking and identifying biomarkers that are associated with disease from genome-wide measurements holds significant promise for understanding the genetic basis of common diseases. The large number of single nucleotide polymorphisms (SNPs) in genome-wide studies (GWAS), however, makes this task computationally challenging when the ranking is to be done in a multivariate fashion. This paper evaluates the performance of a multivariate graph-based method called label propagation (LP) that efficiently ranks SNPs in genome-wide data.
Results
The performance of LP was evaluated on a synthetic dataset and two late onset Alzheimer’s disease (LOAD) genome-wide datasets, and the performance was compared to that of three control methods. The control methods included chi squared, which is a commonly used univariate method, as well as a Relief method called SWRF and a sparse logistic regression (SLR) method, which are both multivariate ranking methods. Performance was measured by evaluating the top-ranked SNPs in terms of classification performance, reproducibility between the two datasets, and prior evidence of being associated with LOAD.
On the synthetic data LP performed comparably to the control methods. On GWAS data, LP performed significantly better than chi squared and SWRF in classification performance in the range from 10 to 1000 top-ranked SNPs for both datasets, and not significantly different from SLR. LP also had greater ranking reproducibility than chi squared, SWRF, and SLR. Among the 25 top-ranked SNPs that were identified by LP, there were 14 SNPs in one dataset that had evidence in the literature of being associated with LOAD, and 10 SNPs in the other, which was higher than for the other methods.
Conclusion
LP performed considerably better in ranking SNPs in two high-dimensional genome-wide datasets when compared to three control methods. It had better performance in the evaluation measures we used, and is computationally efficient to be applied practically to data from genome-wide studies. These results provide support for including LP in the methods that are used to rank SNPs in genome-wide datasets.
【 授权许可】
2014 Stokes et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150706200951563.pdf | 1767KB | download | |
Figure 4. | 76KB | Image | download |
Figure 3. | 123KB | Image | download |
Figure 2. | 65KB | Image | download |
Figure 1. | 84KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Zhang W, Johnson N, Wu B, Kuang R: Signed network propagation for detecting differential gene expressions and DNA copy number variations. In In Book Signed network propagation for detecting differential gene expressions and DNA copy number variations. ACM; 2012:337-344.
- [2]Schork NJ, Murray SS, Frazer KA, Topol EJ: Common vs. rare allele hypotheses for complex diseases. Curr Opin Genet Dev 2009, 19:212-219.
- [3]Stranger BE, Stahl EA, Raj T: Progress and promise of genome-wide association studies for human complex trait genetics. Genetics 2011, 187:367-383.
- [4]Gorlov IP, Gorlova OY, Sunyaev SR, Spitz MR, Amos CI: Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet 2008, 82:100-112.
- [5]Saeys Y, Inza I, Larranaga P: A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23:2507-2517.
- [6]Kira K, Rendell L: A practical approach to feature selection. In ML92: Proceedings of the ninth international workshop on Machine learning. USA: Morgan Kaufmann Publishers Inc; 1992:249-256.
- [7]Greene CS, Penrod NM, Kiralis J, Moore JH: Spatially uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Mining 2009, 2:5. BioMed Central Full Text
- [8]Greene C, Himmelstein D, Kiralis J, Moore J: The Informative Extremes: using Both Nearest and Farthest Individuals Can Improve Relief Algorithms in the Domain of Human Genetics. In Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Volume 6023. Edited by Pizzuti C, Ritchie M, Giacobini M. Springer Berlin/Heidelberg: Lecture Notes in Computer Science; 2010:182-193.
- [9]Stokes M, Visweswaran S: Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease. BioData Mining 2012, 5:20. BioMed Central Full Text
- [10]Yamashita O, Sato MA, Yoshioka T, Tong F, Kamitani Y: Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns. Neuroimage 2008, 42:1414-1429.
- [11]Zhou X, Carbonetto P, Stephens M: Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet 2013, 9(2):e1003264. doi:10.1371/journal.pgen.1003264
- [12]Alzheimer disease overview http://www.ncbi.nlm.nih.gov/books/NBK1161/ webcite
- [13]Avramopoulos D: Genetics of Alzheimer’s disease: recent advances. Genome Med 2009, 1:34. BioMed Central Full Text
- [14]Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE: Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 2007, 39:17-23.
- [15]Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, Carrasquillo MM, Abraham R, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Jones N, Stretton A, Thomas C, Richards A, Ivanov D, Widdowson C, Chapman J, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Brown KS, Passmore PA, Craig D: Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nat Genet 2011, 43:429-435.
- [16]Naj AC, Jun G, Beecham GW, Wang LS, Vardarajan BN, Buros J, Gallins PJ, Buxbaum JD, Jarvik GP, Crane PK, Larson EB, Bird TD, Boeve BF, Graff-Radford NR, De Jager PL, Evans D, Schneider JA, Carrasquillo MM, Ertekin-Taner N, Younkin SG, Cruchaga C, Kauwe JS, Nowotny P, Kramer P, Hardy J, Huentelman MJ, Myers AJ, Barmada MM, Demirci FY, Baldwin CT: Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nat Genet 2011, 43:436-441.
- [17]Hwang T, Sicotte H, Tian Z, Wu B, Kocher JP, Wigle DA, Kumar V, Kuang R: Robust and efficient identification of biomarkers by classifying features on graphs. Bioinformatics 2008, 24:2023-2029.
- [18]Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol 2008, 9:S4.
- [19]Teramoto R: Prediction of Alzheimer’s diagnosis using semi-supervised distance metric learning with label propagation. Comput Biol Chem 2008, 32:438-441.
- [20]Zhou D, Bousquet O, Lal T, Weston J, Scholkopf B: Learning with local and global consistency. Advances in Neural Information Processing Systems 16 2004.
- [21]Kamboh MI, Demirci FY, Wang X, Minster RL, Carrasquillo MM, Pankratz VS, Younkin SG, Saykin AJ, Alzheimer's Disease Neuroimaging I, Jun G, Baldwin C, Logue MW, Buros J, Farrer L, Pericak-Vance MA, Haines JL, Sweet RA, Ganguli M, Feingold E, Dekosky ST, Lopez OL, Barmada MM: Genome-wide association study of Alzheimer's disease. Transl Psychiatry 2012, 2:e117.
- [22]Reiman EM, Webster JA, Myers AJ, Hardy J, Dunckley T, Zismann VL, Joshipura KD, Pearson JV, Hu-Lince D, Huentelman MJ, Craig DW, Coon KD, Liang WS, Herbert RH, Beach T, Rohrer KC, Zhao AS, Leung D, Bryden L, Marlowe L, Kaleem M, Mastroeni D, Grover A, Heward CB, Ravid R, Rogers J, Hutton ML, Melquist S, Petersen RC, Alexander GE: GAB2 Alleles Modify Alzheimer’s Risk in APOE e4 Carriers. 2007, 54:713-720.
- [23]Cariaso M, Lennon G: SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Res 2012, 40:D1308-1312.
- [24]Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D: GeneCards: integrating information about genes, proteins and diseases. Trends Genet 1997, 13:163.
- [25]Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM: Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29:308-311.
- [26]Price AL, Zaitlen NA, Reich D, Patterson N: New approaches to population stratification in genome-wide association studies. Nat Rev Genet 2010, 11:459-463.