期刊论文详细信息
BioData Mining
Semi-supervised consensus clustering for gene expression data analysis
Yunli Wang1  Youlian Pan2 
[1] National Research Council Canada, 46 Dineen Dr., Fredericton, Canada
[2] National Research Council Canada, 1200 Montreal Rd., Ottawa, Canada
关键词: Gene expression;    Semi-supervised consensus clustering;    Consensus clustering;    Semi-supervised clustering;   
Others  :  795064
DOI  :  10.1186/1756-0381-7-7
 received in 2013-10-18, accepted in 2014-04-05,  发布年份 2014
PDF
【 摘 要 】

Background

Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge.

Methods

We proposed semi-supervised consensus clustering (SSCC) to integrate the consensus clustering with semi-supervised clustering for analyzing gene expression data. We investigated the roles of consensus clustering and prior knowledge in improving the quality of clustering. SSCC was compared with one semi-supervised clustering algorithm, one consensus clustering algorithm, and k-means. Experiments on eight gene expression datasets were performed using h-fold cross-validation.

Results

Using prior knowledge improved the clustering quality by reducing the impact of noise and high dimensionality in microarray data. Integration of consensus clustering with semi-supervised clustering improved performance as compared to using consensus clustering or semi-supervised clustering separately. Our SSCC method outperformed the others tested in this paper.

【 授权许可】

   
2014 Wang and Pan; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140705081009299.pdf 1151KB PDF download
Figure 5. 39KB Image download
Figure 4. 36KB Image download
Figure 3. 59KB Image download
Figure 2. 47KB Image download
Figure 1. 47KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Monti S, Tamayo P, Mesirov J, Golub T: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 2003, 52:91-118.
  • [2]Yu Z, Wong H, Wang H: Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 2007, 23:2888-2896.
  • [3]Kim E, Kim S, Ashlock D, Nam D: Multi-k: accurate classification of microarray subtypes using ensemble k-means clustering. Bioinformatics 2009, 10:260.
  • [4]Lam-on N, Boongoen T, Garett S: LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 2010, 26(12):1513-1519.
  • [5]Swift S, Tucker A, Vinciotti V, Martin N, Orengo C, Liu X, Kellam P: Consensus clustering and functional interpretation of gene expression data. Genome Biol 2004, 5:R94. BioMed Central Full Text
  • [6]Simpson TI, Armstrong JD, Jarman AP: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinformatics 2010, 11:590. BioMed Central Full Text
  • [7]Pan W: Incorporating gene functions as priors in model-based clustering of microarray gene expression data. Bioinformatics 2006, 22(7):795-801.
  • [8]Huang D, Pan W: Incorporating biological knowledge into distance based clustering analysis of gene expression data. Bioinformatics 2006, 22(10):1259-1268.
  • [9]Costa IG, Krause R, Opitz L, Schliep A: Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data. BMC Bioinformatics 2007, 8(Suppl 10):S3. BioMed Central Full Text
  • [10]Chopra P, Kang J, Yang J, Cho HJ, Kim HS, Lee MG: Microarray data mining using landmark gene-guided clustering. BMC Bioinformatics 2008, 9:92. BioMed Central Full Text
  • [11]Dotan-Cohen D, Kasif S, Melkman AA: Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering. Bioinformatics 2009, 25(14):1789-1795.
  • [12]Tari L, Baral C, Kim S: Fuzzy c-means clustering with prior biological knowledge. J Biomed Inf 2009, 42(1):74-81.
  • [13]Doan DD, Wang Y, Pan Y: Utilization of gene ontology in semi-supervised clustering. In IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology: 2011. Paris, France: IEEE Computer Society Press; 2011:1-7.
  • [14]Yu Z, Wong H, You J, Yang Q, Liao H: Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans Nanobiosci 2011, 10(2):76-85.
  • [15]Zelnik-manor L, Perona P: Self-tuning spectral clustering. In Advances in Neural Information Processing Systems: 2004. Vancouver, Canada: Cambridge, MA: MIT Press; 2004:1601-1608.
  • [16]Ng AY, Jordan MI, Weiss Y: On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems: 2001. Vancouver, Canada: Cambridge, MA: MIT Press; 2001:849-856.
  • [17]Luxburg UV: A tutorial on spectral clustering, statistics and computing. ACM Comput Surv 2007, 17(4):395-416.
  • [18]Fern XZ, Brodley CE: Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st International Conference on Machine Learning: 2003; Banff, Alberta. New York, NY: ACM Press; 2003:182-189.
  • [19]Kamvar SD, Klein D, Manning CD: Spectral learning. In International Joint Conference of Artifficial Intelligence (IJCAI): 2003; Acapulco, Mexico. Palo Alto, CA: AAAI Press; 2003:561-566.
  • [20]Chen W, Song Y, Bai H, Lin C, Chang E: Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 2011, 33(3):568-586.
  • [21]deSouto M, Costa I, de Araujo D, Schliep A, Ludermir T: Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 2008, 9:497. BioMed Central Full Text
  • [22]Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 2002, 415:436-442.
  • [23]Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537.
  • [24]Armstrong S, Staunton J, Silverman L, Pieters R, Boer M, Minden M: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 2002, 30(1):41-47.
  • [25]Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 2001, 98(24):13790-13795.
  • [26]Yeoh E, Ross ME, Shurtleff SA, Williams WK, Divyen P, Rami M, Fred GB: Classification, subtype discovery, and prediction of outcome in pediatric acutelymphoblastic leukemia by gene expression profilling. Cancer Cell 2001, 1(2):133-143.
  • [27]Su A, Welsh J, Sapinoso L, Kern S, Dimitrov P, Lapp H: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 2001, 61(20):7388-7393.
  • [28]Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C, Angelo M: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 2001, 98(26):15149-15154.
  • [29]Strehl A, Ghosh J: Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 2002, 3:583-617.
  • [30]Hubert L, Arabie P: Comparing partitions. J Classif 1985, 2(1):193-218.
  • [31]Basu S, Bilenko M, Mooney RJ: Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering. In Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining:2003; Washington, DC. Palo Alto, CA: AAAI Press; 2003:42-49.
  文献评价指标  
  下载次数:105次 浏览次数:5次