期刊论文详细信息
BMC Bioinformatics
A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression
Yiyi Liu3  Quanquan Gu2  Jack P Hou1  Jiawei Han2  Jian Ma4 
[1] Medical Scholars Program, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
[2] Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
[3] Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
[4] Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
关键词: Gene expression;    Clustering;    Cancer subtype;   
Others  :  1087630
DOI  :  10.1186/1471-2105-15-37
 received in 2013-06-10, accepted in 2014-01-31,  发布年份 2014
PDF
【 摘 要 】

Background

Cancer subtype information is critically important for understanding tumor heterogeneity. Existing methods to identify cancer subtypes have primarily focused on utilizing generic clustering algorithms (such as hierarchical clustering) to identify subtypes based on gene expression data. The network-level interaction among genes, which is key to understanding the molecular perturbations in cancer, has been rarely considered during the clustering process. The motivation of our work is to develop a method that effectively incorporates molecular interaction networks into the clustering process to improve cancer subtype identification.

Results

We have developed a new clustering algorithm for cancer subtype identification, called “network-assisted co-clustering for the identification of cancer subtypes” (NCIS). NCIS combines gene network information to simultaneously group samples and genes into biologically meaningful clusters. Prior to clustering, we assign weights to genes based on their impact in the network. Then a new weighted co-clustering algorithm based on a semi-nonnegative matrix tri-factorization is applied. We evaluated the effectiveness of NCIS on simulated datasets as well as large-scale Breast Cancer and Glioblastoma Multiforme patient samples from The Cancer Genome Atlas (TCGA) project. NCIS was shown to better separate the patient samples into clinically distinct subtypes and achieve higher accuracy on the simulated datasets to tolerate noise, as compared to consensus hierarchical clustering.

Conclusions

The weighted co-clustering approach in NCIS provides a unique solution to incorporate gene network information into the clustering process. Our tool will be useful to comprehensively identify cancer subtypes that would otherwise be obscured by cancer heterogeneity, using high-throughput and high-dimensional gene expression data.

【 授权许可】

   
2014 Liu et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117024324403.pdf 2356KB PDF download
Figure 4. 31KB Image download
Figure 3. 71KB Image download
Fig. 1. 62KB Image download
Figure 1. 52KB Image download
【 图 表 】

Figure 1.

Fig. 1.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al.: Molecular portraits of human breast tumours. Nature 2000, 406(6797):747-752.
  • [2]Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, et al.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001, 98(19):10869-10874.
  • [3]Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403(6769):503-511.
  • [4]Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, et al.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A 2001, 98(24):13790-13795.
  • [5]Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537.
  • [6]Lehmann BD, Bauer JA, Chen X, Sanders ME, Chakravarthy AB, Shyr Y, Pietenpol JA: Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest 2011, 121(7):2750-2767.
  • [7]Bullinger L, Döhner K, Bair E, Fröhling S, Schlenk RF, Tibshirani R, Döhner H, Pollack JR: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med 2004, 350(16):1605-1616.
  • [8]Monti S, Tamayo P, Mesirov J, Golub T: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning 2003, 52(1):91-118.
  • [9]Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, et al.: Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17(1):98-110.
  • [10]Witten DM, Tibshirani R: A framework for feature selection in clustering. J Am Stat Assoc 2010, 105(490):713-726.
  • [11]Bair E, Tibshirani R: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2004, 2(4):E108.
  • [12]Koestler DC, Marsit CJ, Christensen BC, Karagas MR, Bueno R, Sugarbaker DJ, Kelsey KT, Houseman EA: Semi-supervised recursively partitioned mixture models for identifying cancer subtypes. Bioinformatics 2010, 26(20):2578-2585.
  • [13]Shen R, Olshen AB, Ladanyi M: Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009, 25(22):2906-2912.
  • [14]Mo Q, Wang S, Seshan VE, Olshen AB, Schultz N, Sander C, Powers RS, Ladanyi M, Shen R: Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci U S A 2013, 110(11):4245-4250.
  • [15]Barabasi AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nat Rev Genet 2011, 12(1):56-68.
  • [16]Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol 2007, 3:140.
  • [17]Hanisch D, Zien A, Zimmer R, Lengauer T: Co-clustering of biological networks and gene expression data. Bioinformatics 2002, 18(Suppl 1):S145-S154.
  • [18]Hwang T, Atluri G, Xie M, Dey S, Hong C, Kumar V, Kuang R: Co-clustering phenome-genome for phenotype classification and disease gene discovery. Nucleic Acids Res 2012, 40(19):e146.
  • [19]Tanay A, Sharan R, Shamir R: Biclustering algorithms: A survey. Handbook Comput Mol Biol 2005, 9:21-26.
  • [20]Pan F, Zhang X, Wang W: CRD: Fast Co-clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition. In Proceedings of the ACM SIGMOD International Conference on Management of Data: 10-12 June 2008. Edited by Shasha D, Wang JTL. Vancouver: ACM; 2008:173-184.
  • [21]Cheng Y, Church GM: Biclustering of expression data. Proceedings / International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for. Mol Biol 2000, 8:93-103.
  • [22]Eren K, Deveci M, Kucuktunc O, Catalyurek UV: A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinform 2013, 14(3):279-292.
  • [23]Ding C, Li T, Peng W, Park H: Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the 12th ACM SIGKOD International Conference on Knowledge Discovery and Data Mining: 27-30 August 2006. Edited by Ungar LU, Craven M, Gunopulos D, Eliassi-Rad T. Philadelphia: ACM; 2006:126-135.
  • [24]Gu Q, Zhou J: Co-clustering on manifolds. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 28 June-1 July 2009; Paris. Edited by John E, Fogelman FS, Flach P, Zaki M. ACM; 2009:359-368.
  • [25]Li A, Walling J, Ahn S, Kotliarov Y, Su Q, Quezado M, Oberholtzer JC, Park J, Zenklusen JC, Fine HA: Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes. Cancer Res 2009, 69(5):2091-2099.
  • [26]Gao Y, Church G: Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 2005, 21(21):3970-3975.
  • [27]Liu Y, Hayes DN, Nobel A, Marron J: Statistical significance of clustering for high-dimension, low–sample size data. J Am Stat Assoc 2008, 103(483):1281-1293.
  • [28]Alter O, Brown PO, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 2000, 97(18):10101-10106.
  • [29]Jiang DX, Tang C, Zhang AD: Cluster analysis for gene expression data: A survey. Ieee T Knowl Data En 2004, 16(11):1370-1386.
  • [30]Barillot E, Calzone L, Hupe P, Vert J-P, Zinovyev A: Computational systems biology of cancer, vol. 47. CRC Press; 2012.
  • [31]Brin S, Page L: The anatomy of a large-scale hypertextual Web search engine. Comput Networks ISDN Sys 1998, 30(1):107-117.
  • [32]Morrison JL, Breitling R, Higham DJ, Gilbert DR: GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinforma 2005, 6:233. BioMed Central Full Text
  • [33]Higham DJ, Taylor A: The sleekest link algorithm. Institute of Mathematics and Its Applications (IMA) Mathematics Today 2003, 39:192-197.
  • [34]Boyd S, Vandenberghe L: Convex optimization. Cambridge, UK: Cambridge university press; 2004.
  • [35]Brunet JP, Tamayo P, Golub TR, Mesirov JP: Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A 2004, 101(12):4164-4169.
  • [36]Ciriello G, Cerami E, Sander C, Schultz N: Mutual exclusivity analysis identifies oncogenic network modules. Genome Res 2012, 22(2):398-406.
  • [37]Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, et al.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 2011, 39(Database issue):D691-D697.
  • [38]Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH: PID: the Pathway Interaction Database. Nucleic Acids Res 2009, 37(Database issue):D674-D679.
  • [39]Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 2012, 40(Database issue):D109-D114.
  • [40]Wu G, Feng X, Stein L: A human functional protein interaction network and its application to cancer data analysis. Genome biology 2010, 11(5):R53. BioMed Central Full Text
  • [41]Network TCGA: Comprehensive molecular portraits of human breast tumours. Nature 2012, 490(7418):61-70.
  • [42]Wilkerson MD, Hayes DN: ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 2010, 26(12):1572-1573.
  • [43]Nooter K, Brutel de la Riviere G, Look MP, van Wingerden KE, Henzen-Logmans SC, Scheper RJ, Flens MJ, Klijn JG, Stoter G, Foekens JA: The prognostic significance of expression of the multidrug resistance-associated protein (MRP) in primary breast cancer. Br J Cancer 1997, 76(4):486-493.
  • [44]Doyle LA, Yang W, Abruzzo LV, Krogmann T, Gao Y, Rishi AK, Ross DD: A multidrug resistance transporter from human MCF-7 breast cancer cells. Proc Natl Acad Sci U S A 1998, 95(26):15665-15670.
  • [45]Wind NS, Holen I: Multidrug resistance in breast cancer: from in vitro models to clinical studies. Int J Breast Cancer 2011, 2011:967419.
  文献评价指标  
  下载次数:19次 浏览次数:7次