期刊论文详细信息
BMC Bioinformatics
Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
Yin Liu3  Peng Qiu2  F Anthony San Lucas3  Zixing Wang1 
[1]Department of Neurobiology and Anatomy, University of Texas Health Science Center at Houston, Houston, Texas, USA
[2]Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
[3]University of Texas Graduate School of Biomedical Sciences, Houston, Texas, USA
关键词: Gene module discovery;    Sample clustering;    Gene co-expression network;    Variable selection;   
Others  :  818546
DOI  :  10.1186/1471-2105-15-153
 received in 2013-12-16, accepted in 2014-05-14,  发布年份 2014
PDF
【 摘 要 】

Background

Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential interactive influence among individual genes. Motivated by ensemble clustering, there is a strong interest in leveraging the structure of gene networks for gene selection, so that the relationship information between genes can be effectively utilized, while the selected genes are expected to preserve all the possible clustering structures in the data.

Results

We present a new filter method that uses the gene connectivity in the gene co-expression network as the evaluation criteria for variable selection. The gene connectivity measures the importance of the genes in term of their expression similarity with others in the co-expression network. The hard threshold and soft threshold transformations are employed to construct the gene co-expression networks. Both simulation studies and real data analysis have shown that the network based on soft thresholding is more effective in selecting relevant variables and provides better clustering results compared to the hard thresholding transformation and two other canonical filter methods for variable selection. Furthermore, a new module analysis approach is proposed to reveal the higher order organization of the gene space, where the genes of a module share significant topological similarity and are associated with a consensus partition of the sample space. We demonstrate that the identified modules can lead to biologically meaningful sample partitions that might be missed by other methods.

Conclusions

By leveraging the structure of gene co-expression network, first we propose a variable selection method that selects individual genes with top connectivity. Both simulation studies and real data application have demonstrated that our method has better performance in terms of the reliability of the selected genes and sample clustering results. In addition, we propose a module recovery method that can help discover novel sample partitions that might be hidden when performing clustering analyses using all available genes. The source code of our program is available at http://nba.uth.tmc.edu/homepage/liu/netVar/ webcite.

【 授权许可】

   
2014 Wang et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711110639223.pdf 2445KB PDF download
Figure 5. 82KB Image download
Figure 4. 73KB Image download
Figure 3. 52KB Image download
Figure 2. 29KB Image download
Figure 1. 73KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Law MH, Figueiredo M, Jain AK: Feature selection in mixture-based clustering. Advances in Neural Information Processing Systems 15: December 2002 2002.
  • [2]Dy FG, Brodley CE: Feature selection for unsupervised learning. J Mach Learn Res 2004, 5:45.
  • [3]Alelyani S, Tang J, Liu H: Feature slection for clustering: review. In Data Clustering: Algorithms and Applications Edited by Charu A, Chandan R. 2013. CRC Press
  • [4]Xu W, Wang M, Zhang X, Wang L, Feng H: SDED: a novel filter method for cancer-related gene selection. Bioinformation 2008, 2(7):301-303.
  • [5]Dash M, Choi K, Scheuermann P, Liu H: Feature Selection for Clustering - A Filter Solution. Proceedings of the Second International Conference on Data Mining 2002, 115-122.
  • [6]Mitra P, Murthy CA, Pal S: Unsupervised feature selection using feature similarity. EEE Transactions on Pattern Analysis and Machine Intelligence 2002, 12.
  • [7]He X, Cai D, Niyogi P: Laplacian score for feature selection. Advances in Neural Information Processing Systems 2006, 8.
  • [8]Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537.
  • [9]Agrawal R, Gehrke J, Gunopulos D, Raghavan P: Automatic subspace clustering of high dimensional data for data mining applications. In IN Proceedings of the 1998 ACM SIGMOD international conference on Management of data, SIGMOD'98. 1998 edition. New York, NY: USA: ACM; 1998:12.
  • [10]Chan Y, Hall P: Using evidence of mixed populaitons to select variables for clustering very high-dimensional data. J Am Stat Assoc 2010, 105(490):12.
  • [11]Xing EP, Karp RM: CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 2001, 17(1):S306-S315.
  • [12]McLachlan GJ, Bean RW, Peel D: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 2002, 18(3):413-422.
  • [13]Pan W, Shen X: Penalized model-based clustering with application to variable selection. J Mach Learn Res 2007, 8:20.
  • [14]Witten D, Tibshirani R: A framework for feature selection in clustering. J Am Stat Assoc 2010, 105(490):14.
  • [15]Langfelder P, Mischel PS, Horvath S: When is hub gene selection better than standard meta-analysis? PLoS One 2013, 8(4):e61505.
  • [16]Wang Z, Xu W, San Lucas FA, Liu Y: Incorporating prior knowledge into Gene network study. Bioinformatics 2013, 29(20):2633-2640.
  • [17]Strehl A, Ghosh J: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 2003, 3:35.
  • [18]Zhang B, Horvath S: A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 2005., 4Article17
  • [19]Qiu P, Gentles AJ, Plevritis SK: Discovering biological progression underlying microarray samples. PLoS Comput Biol 2011, 7(4):e1001123.
  • [20]Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 1999, 96(12):6745-6750.
  • [21]Dudoit S, Fridlyand J: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 2002., 3(7) RESEARCH0036
  • [22]Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci U S A 2000, 97(22):12079-12084.
  • [23]Chow ML, Moler EJ, Mian IS: Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. Physiol Genomics 2001, 5(2):99-111.
  • [24]Borate BR, Chesler EJ, Langston MA, Saxton AM, Voy BH: Comparison of threshold selection methods for microarray gene co-expression matrices. BMC Res Notes 2009, 2:240. BioMed Central Full Text
  文献评价指标  
  下载次数:28次 浏览次数:15次