期刊论文详细信息
BioData Mining
Using random walks to identify cancer-associated modules in expression data
Neil Abernethy1  John Gennari1  Ali Shojaie2  Deanna Petrochilos1 
[1]Biomedical and Health Informatics, Dept of Biomedical Informatics and Medical Education, University of Washington, Box 357240, 1959 NE Pacific Street, HSB I-264, Seattle, WA 98195-7240, USA
[2]Department of Biostatistics, University of Washington, Box 357232, F-650 Health Sciences Bldg, Seattle, WA, USA
关键词: Walktrap;    Random walk;    Interactions;    Graph theory;    Modules;    Cancer;    Network analysis;   
Others  :  797167
DOI  :  10.1186/1756-0381-6-17
 received in 2012-12-17, accepted in 2013-09-24,  发布年份 2013
PDF
【 摘 要 】

Background

The etiology of cancer involves a complex series of genetic and environmental conditions. To better represent and study the intricate genetics of cancer onset and progression, we construct a network of biological interactions to search for groups of genes that compose cancer-related modules. Three cancer expression datasets are investigated to prioritize genes and interactions associated with cancer outcomes. Using a graph-based approach to search for communities of phenotype-related genes in microarray data, we find modules of genes associated with cancer phenotypes in a weighted interaction network.

Results

We implement Walktrap, a random-walk-based community detection algorithm, to identify biological modules predisposing to tumor growth in 22 hepatocellular carcinoma samples (GSE14520), adenoma development in 32 colorectal cancer samples (GSE8671), and prognosis in 198 breast cancer patients (GSE7390). For each study, we find the best scoring partitions under a maximum cluster size of 200 nodes. Significant modules highlight groups of genes that are functionally related to cancer and show promise as therapeutic targets; these include interactions among transcription factors (SPIB, RPS6KA2 and RPS6KA6), cell-cycle regulatory genes (BRSK1, WEE1 and CDC25C), modulators of the cell-cycle and proliferation (CBLC and IRS2) and genes that regulate and participate in the map-kinase pathway (MAPK9, DUSP1, DUSP9, RIPK2). To assess the performance of Walktrap to find genomic modules (Walktrap-GM), we evaluate our results against other tools recently developed to discover disease modules in biological networks. Compared with other highly cited module-finding tools, jActiveModules and Matisse, Walktrap-GM shows strong performance in the discovery of modules enriched with known cancer genes.

Conclusions

These results demonstrate that the Walktrap-GM algorithm identifies modules significantly enriched with cancer genes, their joint effects and promising candidate genes. The approach performs well when evaluated against similar tools and smaller overall module size allows for more specific functional annotation and facilitates the interpretation of these modules.

【 授权许可】

   
2013 Petrochilos et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140706041705166.pdf 3263KB PDF download
Figure 8. 89KB Image download
Figure 7. 78KB Image download
Figure 6. 103KB Image download
Figure 5. 66KB Image download
Figure 4. 212KB Image download
Figure 3. 165KB Image download
Figure 2. 163KB Image download
Figure 1. 71KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

【 参考文献 】
  • [1]Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102(43):15545-15550.
  • [2]Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004, 20(1):93-99.
  • [3]Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 2007, 8:242.
  • [4]Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JA Jr, Marks JR, Dressman HK, West M, Nevins JR: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 2006, 439(7074):353-357.
  • [5]Liu D, Ghosh D, Lin X: Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinformatics 2008, 9:292.
  • [6]Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nat Genet 2004, 36(10):1090-1098.
  • [7]Efroni S, Schaefer CF, Buetow KH: Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS ONE 2007, 2(5):e425.
  • [8]Shojaie A, Michailidis G: Network enrichment analysis in complex experiments. Stat Appl Genet Mol Biol 2010, 9(1):22.
  • [9]Shojaie A, Michailidis G: Analysis of gene sets based on the underlying regulatory network. J Comput Biol 2009, 16(3):407-426.
  • [10]Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA 2007, 104(21):8685-8690.
  • [11]Jonsson PF, Bates PA: Global topological features of cancer proteins in the human interactome. Bioinformatics 2006, 22(18):2291-2297.
  • [12]Petrochilos D, Abernethy N: Assessing network characteristics of cancer associated genes in metabolic and signaling networks. San Diego, CA; 290-297. [Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2012 IEEE Symposium on 2012]
  • [13]Xu J, Li Y: Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics 2006, 22(22):2800-2805.
  • [14]Kohler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 2008, 82(4):949-958.
  • [15]Li L, Zhang K, Lee J, Cordes S, Davis DP, Tang Z: Discovering cancer genes by integrating network and functional properties. BMC Med Genomics 2009, 2:61.
  • [16]Pujana MA, Han JD, Starita LM, Stevens KN, Tewari M, Ahn JS, Rennert G, Moreno V, Kirchhoff T, Gold B, Assmann V, Elshamy WM, Rual JF, Levine D, Rozek LS, Gelman RS, Gunsalus KC, Greenberg RA, Sobhian B, Bertin N, Venkatesan K, Ayivi-Guedehoussou N, Sole X, Hernandez P, Lazaro C, Nathanson KL, Weber BL, Cusick ME, Hill DE, Offit K, et al.: Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 2007, 39(11):1338-1349.
  • [17]Nibbe RK, Koyuturk M, Chance MR: An integrative -omics approach to identify functional sub-networks in human colorectal cancer. PLoS Comput Biol 2010, 6(1):e1000639.
  • [18]Tu Z, Argmann C, Wong KK, Mitnaul LJ, Edwards S, Sach IC, Zhu J, Schadt EE: Integrating siRNA and protein-protein interaction data to identify an expanded insulin signaling network. Genome Res 2009, 19:1057-1067.
  • [19]Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Muller T: Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 2008, 24(13):i223-i231.
  • [20]Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 2002, 18(Suppl 1):S233-S240.
  • [21]Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol 2007, 3:140.
  • [22]Ulitsky I, Shamir R: Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 2007, 1:8.
  • [23]Komurov K, White MA, Ram PT: Use of data-biased random walks on graphs for the retrieval of context-specific networks from genomic data. PLoS Comput Biol 2010, 6(8):e1000889.
  • [24]Orman GK, Labatut V: Relative evaluation of partition algorithms for complex networks. Ostrava, Czech Republic: IEEE; 20-25. [Networked Digital Technologies, 2009 NDT '09 First International Conference on 2009]
  • [25]Navlakha S, Kingsford C: The power of protein interaction networks for associating genes with diseases. Bioinformatics 2010, 26(8):1057-1063.
  • [26]Yao X, Hao H, Li Y, Li S: Modularity-based credible prediction of disease genes and detection of disease subtypes on the phenotype-gene heterogeneous network. BMC Syst Biol 2011, 5:79-0509-5-79.
  • [27]Li Y, Patra JC: Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics 2010, 26(9):1219-1224.
  • [28]Tu Z, Wang L, Arbeitman MN, Chen T, Sun F: An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 2006, 22(14):e489-e496.
  • [29]van Dongen S, Abreu-Goodger C: Using MCL to extract clusters from networks. Methods Mol Biol 2012, 804:281-295.
  • [30]Wu G, Stein L: A network module-based method for identifying cancer prognostic signatures. Genome Biol 2012, 13(12):R112.
  • [31]Komurov K, Dursun S, Erdin S, Ram PT: NetWalker: a contextual network analysis tool for functional genomics. BMC Genomics 2012, 13:282-2164-13-282.
  • [32]Pons P, Latapy M: Computing communities in large networks using random walks. JGAA 2006, 10(2):191-218.
  • [33]Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27(1):29-34.
  • [34]Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, et al.: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363-2371.
  • [35]Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R: NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res 2005, 33(Database issue):D562-D566.
  • [36]Roessler S, Jia HL, Budhu A, Forgues M, Ye QH, Lee JS, Thorgeirsson SS, Sun Z, Tang ZY, Qin LX, Wang XW: A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients. Cancer Res 2010, 70(24):10202-10212.
  • [37]Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d'Assignies MS, Bergh J, Lidereau R, Ellis P, Harris AL, Klijn JG, Foekens JA, Cardoso F, Piccart MJ, Buyse M, Sotiriou C: TRANSBIG Consortium: strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series . Clin Cancer Res 2007, 13(11):3207-3214.
  • [38]Sabates-Bellver J, Van der Flier LG, de Palo M, Cattaneo E, Maake C, Rehrauer H, Laczko E, Kurowski MA, Bujnicki JM, Menigatti M, Luz J, Ranalli TV, Gomes V, Pastorelli A, Faggiani R, Anti M, Jiricny J, Clevers H, Marra G: Transcriptome profile of human colorectal adenomas. Mol Cancer Res 2007, 5(12):1263-1275.
  • [39]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. JSTOR 1995, 57(1):289-300.
  • [40]Davis S, Meltzer PS: GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 2007, 23(14):1846-1847.
  • [41]Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3:3.
  • [42]Smyth GK: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Edited by Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. New York, NY: Springer; 2005:397.
  • [43]Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5(10):R80.
  • [44]Csardi G, Nepusz T: The igraph software package for complex network research. InterJournal 2006, Complex Systems:1695.
  • [45]Rodrigues FA, de Arruda FG, da Fontoura CL: A complex networks approach for data clustering. ArXiv e-prints 2011,  . arXiv:1101.5141
  • [46]Kamburov A, Wierling C, Lehrach H, Herwig R: ConsensusPathDB--a database for integrating human functional interaction networks. Nucleic Acids Res 2009, 37(Database issue):D623-D628.
  • [47]Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C: WikiPathways: pathway editing for the people. PLoS Biol 2008, 6(7):e184.
  • [48]Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH: PID: the Pathway Interaction Database. Nucleic Acids Res 2009, 37(Database issue):D674-D679.
  • [49]Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD: Computational prediction of human metabolic pathways from the complete human genome. Genome Biol 2005, 6(1):R2.
  • [50]Robertson M: Reactome: clear view of a starry sky. Drug Discov Today 2004, 9(16):684-685.
  • [51]Wang H, Bauzon F, Ji P, Xu X, Sun D, Locker J, Sellers RS, Nakayama K, Nakayama KI, Cobrinik D, Zhu L: Skp2 is required for survival of aberrantly proliferating Rb1-deficient cells and for tumorigenesis in Rb1+/- mice. Nat Genet 2010, 42(1):83-88.
  • [52]Shaoul R, Eliahu L, Sher I, Hamlet Y, Miselevich I, Goldshmidt O, Ron D: Elevated expression of FGF7 protein is common in human gastric diseases. Biochem Biophys Res Commun 2006, 350(4):825-833.
  • [53]Kubota N, Tobe K, Terauchi Y, Eto K, Yamauchi T, Suzuki R, Tsubamoto Y, Komeda K, Nakano R, Miki H, Satoh S, Sekihara H, Sciacchitano S, Lesniak M, Aizawa S, Nagai R, Kimura S, Akanuma Y, Taylor SI, Kadowaki T: Disruption of insulin receptor substrate 2 causes type 2 diabetes because of liver insulin resistance and lack of compensatory beta-cell hyperplasia. Diabetes 2000, 49(11):1880-1889.
  • [54]Huang SP, Bao BY, Hour TC, Huang CY, Yu CC, Liu CC, Lee YC, Huang CN, Pao JB, Huang CH: Genetic variants in CASP3, BMP5, and IRS2 genes may influence survival in prostate cancer patients receiving androgen-deprivation therapy. PLoS One 2012, 7(7):e41219.
  • [55]Bonte D, Lindvall C, Liu H, Dykema K, Furge K, Weinreich M: Cdc7-Dbf4 kinase overexpression in multiple cancers and tumor cell lines is correlated with p53 inactivation. Neoplasia 2008, 10(9):920-931.
  • [56]Burkhart R, Schulte D, Hu D, Musahl C, Gohring F, Knippers R: Interactions of human nuclear proteins P1Mcm3 and P1Cdc46. Europ J Biochem 1995, 228:431-438.
  • [57]Hankinson SE, Willett WC, Colditz GA, Hunter DJ, Michaud DS, Deroo B, Rosner B, Speizer FE, Pollak M: Circulating concentrations of insulin-like growth factor-I and risk of breast cancer. Lancet 1998, 351(9113):1393-1396.
  • [58]Hauge C, Frodin M: RSK and MSK in MAP kinase signalling. J Cell Sci 2006, 119(Pt 15):3021-3023.
  • [59]Bignone PA, Lee KY, Liu Y, Emilion G, Finch J, Soosay AE, Charnock FM, Beck S, Dunham I, Mungall AJ, Ganesan TS: RPS6KA2, a putative tumour suppressor gene at 6q27 in sporadic epithelial ovarian cancer. Oncogene 2007, 26(5):683-700.
  • [60]Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, Sulman EP, Anne SL, Doetsch F, Colman H, Lasorella A, Aldape K, Califano A, Iavarone A: The transcriptional network for mesenchymal transformation of brain tumours. Nature 2010, 463(7279):318-325.
  • [61]Tanaka T, Akira S, Yoshida K, Umemoto M, Yoneda Y, Shirafuji N, Fujiwara H, Suematsu S, Yoshida N, Kishimoto T: Targeted disruption of the NF-IL6 gene discloses its essential role in bacteria killing and tumor cytotoxicity by macrophages. Cell 1995, 80(2):353-361.
  • [62]Hong D, Gupta R, Ancliff P, Atzberger A, Brown J, Soneji S, Green J, Colman S, Piacibello W, Buckle V, Tsuzuki S, Greaves M, Enver T: Initiating and cancer-propagating cells in TEL-AML1-associated childhood leukemia. Science 2008, 319(5861):336-339.
  • [63]Rosenbauer F, Owens BM, Yu L, Tumang JR, Steidl U, Kutok JL, Clayton LK, Wagner K, Scheller M, Iwasaki H, Liu C, Hackanson B, Akashi K, Leutz A, Rothstein TL, Plass C, Tenen DG: Lymphoid cell growth and transformation are suppressed by a key regulatory element of the gene encoding PU.1. Nat Genet 2006, 38(1):27-37.
  • [64]Melkonyan HS, Chang WC, Shapiro JP, Mahadevappa M, Fitzpatrick PA, Kiefer MC, Tomei LD, Umansky SR: SARPs: a family of secreted apoptosis-related proteins. Proc Natl Acad Sci U S A 1997, 94(25):13636-13641.
  • [65]van Dongen S: A cluster algorithm for graphs. Technical Report INS-R0010. Netherlands, Amsterdam: National Research Institute for Mathematics and Computer Science; 2000.
  • [66]Frey BJ, Dueck D: Clustering by passing messages between data points. Science 2007, 315(5814):972-976.
  文献评价指标  
  下载次数:0次 浏览次数:5次