BMC Bioinformatics | |
Down-weighting overlapping genes improves gene set analysis | |
Adi Laurentiu Tarca2  Sorin Draghici3  Gaurav Bhatti1  Roberto Romero1  | |
[1] Perinatology Research Branch, NICHD/NIH/DHHS, , Bethesda, Maryland, and Detroit, MI, USA | |
[2] Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA | |
[3] Department of Clinical and Translational Science, Wayne State University, Detroit, MI, USA | |
关键词: Overlapping gene sets; Pathway analysis; Gene set analysis; Gene expression; | |
Others : 1088234 DOI : 10.1186/1471-2105-13-136 |
|
received in 2012-02-14, accepted in 2012-05-18, 发布年份 2012 | |
【 摘 要 】
Background
The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set.
Results
In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results.
Conclusions
PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/ webciteor http://www.bioconductor.org webcite.
【 授权许可】
2012 Tarca et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117085955796.pdf | 1160KB | download | |
Figure 5 . | 51KB | Image | download |
Figure 4 . | 46KB | Image | download |
Figure 3 . | 48KB | Image | download |
Figure 2 . | 47KB | Image | download |
Figure 1 . | 22KB | Image | download |
【 图 表 】
Figure 1 .
Figure 2 .
Figure 3 .
Figure 4 .
Figure 5 .
【 参考文献 】
- [1]Tavazoie S, Hughes JD, Campbell MJ, Cho RJ: Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22:281-285.
- [2]Khatri P, Drăghici S, Ostermeier GC, Krawetz SA: Profiling Gene Expression Using Onto-Express. Genomics 2002, 79(2):266-270.
- [3]Drăghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics 2003, 81(2):98-104.
- [4]Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, et al: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27:29-34.
- [5]Joshi-Tope G, Gillespie M, Vasrik I, D’Eustachio P, Schmidt E, de Bone B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 2005, 33(Database issue):D428-432.
- [6]Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R: A systems biology approach for pathway level analysis. Genome Research 2007, 17(10):1537-1545.
- [7]Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP: Romero R: A novel signaling pathway impact analysis. Bioinformatics 2009, 25:75-82.
- [8]Thomas R, Gohlke JM, Stopper GF, Parham FM, Portier CJ: Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure. Genome Biol 2009, 10(4):R44. BioMed Central Full Text
- [9]Massa MS, Chiogna M, Romualdi C: Gene set analysis exploiting the topology of a pathway. BMC Syst Biol 2010, 4:121.
- [10]Rahnenführer J, Domingues FS, Maydt J, Lengauer T: Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data. Statistical Applications in Genetics and Molecular Biology 2004, 3:Article16.
- [11]Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceeding of The National Academy of Sciences of the USA 2005, 102(43):15545-15550.
- [12]Efron B, Tibshirani R: On testing the significance of sets of genes. Annals of Applied Statistics 2006, 1:107-129.
- [13]Goeman JJ, Buhlmann P: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007, 23:980-987.
- [14]Ackermann M, Strimmer K: A general modular framework for gene set enrichment analysis. BMC Bioinformatics 2009, 10:47. BioMed Central Full Text
- [15]Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology 2004, 3:Article3.
- [16]Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 2007, 8:242. BioMed Central Full Text
- [17]Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 2001, 98(9):5116-5121.
- [18]Pavlidis P, Qin J, Arango V, Mann JJ, Sibille E: Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex. Neurochem Res 2004, 29(6):1213-1222.
- [19]Kanehisa M, Goto S, Kawashima S, Okunom Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res 2004, 32(Database isuue):277-280.
- [20]Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, Landfield PW: Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Natl Acad Sci U.S.A 2004, 101:2173-2178.
- [21]Liang WS, Dunckley T, Beach TG, Grover A, Mastroeni D, Walker DG, Caselli RJ, Kukull WA, McKeel D, Morris JC, Hulette C, Schmechel D, Alexander GE, Reiman EM, Rogers J, Stephan DA: Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain. Physiol Genomics 2007, 28:311-322.
- [22]Zheng B, Liao Z, Locascio JJ, Lesniak KA, Roderick SS, Watt ML, Eklund AC, Zhang-James Y, Kim PD, Hauser MA, Grunblatt E, Moran LB, Mandel SA, Riederer P, Miller RM, Federoff HJ, Wullner U, Papapetropoulos S, Youdim MB, Cantuti-Castelvetri I, Young AB, Vance JM, Davis RL, Hedreen JC, Adler CH, Beach TG, Graeber MB, Middleton FA, Rochet JC, Scherzer CR: PGC-1?, a potential therapeutic target for early intervention in Parkinson’s disease. Sci Transl Med 2010, 2(52):52ra73.
- [23]Zhang Y, James M, Middleton FA, Davis RL: Transcriptional analysis of multiple brain regions in Parkinson’s disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways and suggests novel disease mechanisms. Am J Med Genet B Neuropsychiatr Genet 2005, 137B:5-16.
- [24]Runne H, Kuhn A, Wild EJ, Pratyaksha W, Kristiansen M, Isaacs JD, Regulier E, Delorenzi M, Tabrizi SJ, Luthi-Carter R: Analysis of potential transcriptomic biomarkers for Huntington’s disease in peripheral blood. Proc Natl Acad Sci U.S.A 2007, 104:14424-14429.
- [25]Hong Y, Ho KS, Eu KW, Cheah PY: A susceptibility gene set for early onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis. Clin Cancer Res 2007, 13:1107-1114.
- [26]Sabates-Bellver J, Van der Flier LG, de Palo M, Cattaneo E, Maake C, Rehrauer H, Laczko E, Kurowski MA, Bujnicki JM, Menigatti M, Luz J, Ranalli TV, Gomes V, Pastorelli A, Faggiani R, Anti M, Jiricny J, Clevers H, Marra G: Transcriptome profile of human colorectal adenomas. Mol Cancer Res 2007, 5:1263-1275.
- [27]Hong Y, Downey T, Eu KW, Koh PK, Cheah PY, Koh PK, Cheah PY: A ’metastasis-prone’ signature for early-stage mismatch-repair proficient sporadic colorectal cancer patients and its implications for possible therapeutics. Clin Exp Metastasis 2010, 27:83-90.
- [28]Wang Y, Roche O, Yan MS, Finak G, Evans AJ, Metcalf JL, Hast BE, Hanna SC, Wondergem B, Furge KA, Irwin MS, Kim WY, Teh BT, Grinstein S, Park M, Marsden PA, Ohh M: Regulation of endocytosis via the oxygen-sensing pathway. Nat Med 2009, 15:319-324.
- [29]Lenburg ME, Liou LS, Gerry NP, Frampton GM, Cohen HT, Christman MF: Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data. BMC Cancer 2003, 3:31. BioMed Central Full Text
- [30]Badea L, Herlea V, Dima SO, Dumitrascu T, Popescu I: Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. Hepatogastroenterology 2008, 55:2016-2027.
- [31]Pei H, Li L, Fridley BL, Jenkins GD, Kalari KR, Lingle W, Petersen G, Lou Z, Wang L: FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt. Cancer Cell 2009, 16:259-266.
- [32]Wallace TA, Prueitt RL, Yi M, Howe TM, Gillespie JW, Yfantis HG, Stephens RM, Caporaso NE, Loffredo CA, Ambs S: Tumor immunobiological differences in prostate cancer between African-American and European-American men. Cancer Res 2008, 68:927-936.
- [33]He H, Jazdzewski K, Li W, Liyanarachchi S, Nagy R, Volinia S, Calin GA, Liu CG, Franssila K, Suster S, Kloos RT, Croce CM, de la Chapelle A: The role of microRNA genes in papillary thyroid carcinoma. Proc Natl Acad Sci U.S.A 2005, 102:19075-19080.
- [34]Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W, Pogosova-Agadjanyan EL, Engel JH, Cronk MR, Dorcy KS, McQuary AR, Hockenbery D, Wood B, Heimfeld S, Radich JP: Identification of genes with abnormal expression changes in acute myeloid leukemia. Genes Chromosomes Cancer 2008, 47:8-20.
- [35]Sanchez-Palencia A, Gomez-Morales M, Gomez-Capilla JA, Pedraza V, Boyero L, Rosell R, Farez-Vidal ME: Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer. Int J Cancer 2010, 129(2):355-364.
- [36]Hou J, Aerts J, den Hamer, van Ijcken, den Bakker, Riegman P, van der Leest, van der Spek, Foekens JA, Hoogsteden HC, Grosveld F, Philipsen S: Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE 2010, 5:e10312.
- [37]Barth AS, Kuner R, Buness A, Ruschhaupt M, Merk S, Zwermann L, Kaab S, Kreuzer E, Steinbeck G, Mansmann U, Poustka A, Nabauer M, Sultmann H: Identification of a common gene expression signature in dilated cardiomyopathy across independent microarray studies. J Am Coll Cardiol 2006, 48:1610-1617.
- [38]Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30:207-210.
- [39]Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of The Royal Statistical Society B 1995, 57:289-300.
- [40]Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics 2003, 4(2):249-264.
- [41]Gautier L, Cope L, Bolstad BM, Irizarry RA: affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004, 20(3):307-315.
- [42]Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5(10):R80. BioMed Central Full Text
- [43]Bolstad BM, Irizarry RA, Astrand M, Speed TP: A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on variance and bias. Bioinformatics 2003, 19(2):185-193.
- [44]Smyth GK: Limma: linear models for microarray data. Springer, New York; 2005.
- [45]Efron B, Tibshirani R: GSA: Gene set analysis. 2010. http://CRAN.R-project.org/package=GSA webcite. [R package version 1.03]
- [46]Carlson M, Falcon S, Pages H, Li N: KEGG.db: A set of annotation maps for KEGG. [R package version 2.5.0]
- [47]R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2011. http://www.R-project.org webcite. [ISBN 3-900051-07-0]