期刊论文详细信息
BioMedical Engineering OnLine
Clustering gene expression data using a diffraction‐inspired framework
Steven C Dinger3  Michael A Van Wyk1  Sergio Carmona2  David M Rubin3 
[1] Systems & Control Research Group in the School of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
[2] National Health Laboratory Service and Department of Molecular Medicine and Haematology, University of the Witwatersrand, Johannesburg, South Africa
[3] Biomedical Engineering Research Group in the School of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
关键词: Gene‐expression data;    Clustering;    Diffraction;   
Others  :  797982
DOI  :  10.1186/1475-925X-11-85
 received in 2012-08-13, accepted in 2012-11-12,  发布年份 2012
PDF
【 摘 要 】

Background

The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction‐based clustering algorithm however is designed to overcome this necessity for user‐defined parameters, as it is able to automatically search the data for any underlying structure.

Methods

The diffraction‐based clustering algorithm presented in this paper is tested using five well‐known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k‐means, fuzzy c‐means, self‐organising map, hierarchical clustering algorithm, Gaussian mixture model and density‐based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index.

Results

The diffraction‐based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction‐based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms.

Conclusion

The results of the diffraction‐based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data.

【 授权许可】

   
2012 Dinger et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140706092744510.pdf 1481KB PDF download
Figure 12. 40KB Image download
Figure 11. 36KB Image download
Figure 10. 35KB Image download
Figure 9. 18KB Image download
Figure 8. 55KB Image download
Figure 7. 20KB Image download
Figure 6. 20KB Image download
Figure 5. 77KB Image download
Figure 4. 23KB Image download
Figure 3. 129KB Image download
Figure 2. 18KB Image download
Figure 1. 35KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Figure 12.

【 参考文献 】
  • [1]Cobb K: Microarrays: The Search for Meaning in a Vast Sea of Data. Tech. rep., Biomedical Computation Review 2006
  • [2]Statnikov A, et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinoformatics 2005, 21(5):631-634.
  • [3]Domany E: Cluster Analysis of Gene Expression Data. J Stat Physcs 2003, 110(3–6):1117-1139.
  • [4]Jiang D, et al.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans Knowledge Data Eng 2004, 16(11):1370-1386.
  • [5]Frey BJ, Dueck D: Clustering by Passing Messages Between Data Points. Science 2007, 315:972-976.
  • [6]Drăghici S: Data Analysis Tools for DNA Microarrays. London: Chapman & Hall/CRC Mathematical Biology and Medicine Series; 2003.
  • [7]Belacel N, et al.: Clustering Methods for Microarray Gene Expression Data. J Integrative Biol 2006, 10(4):507-531.
  • [8]Sorlie T, et al.: Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications. Proc National Acad Sci 2001, 98(19):10869-10874.
  • [9]Aggarwal CC, et al.: On the Surprising Behaviour of Distance Metrics in High Dimensional Space. Lecture notes, Institute of Computer Science, University of Halle 2001
  • [10]Yeung KY, Ruzzo WL: Principal Component Analysis for Clustering Gene Expression Data. Bioinformatics 2001, 17(9):763-774.
  • [11]Shi J, Luo Z: Nonlinear dimensionality reduction of gene expression data for visualisation and clustering analysis of cancer tissue samples. Comput Biol Med 2010, 40:723-732.
  • [12]Fowles GR: Introduction to Modern Optics. New York: Dover Publications; 1975.
  • [13]Roberts SJ: Parametric and Non‐parametric Unsupervised Cluster Analysis. , Technical Report 2, 1997
  • [14]Leung Y, et al.: Clustering by Scale‐Space Filtering. IEEE Trans Pattern Anal Machine Intelligence 2000, 22(12):1396-1410.
  • [15]Yao Z, et al.: Quantum, Clustering Algorithm based on Exponent Measuring Distance. International Symposium on Knowledge Acquisition and Modeling Workshop, 2008. KAM Workshop 2008, IEEE Catalog Number 10560087, IEEE 2008, 436-439. [ISBN 978‐1‐4244‐3530‐2]
  • [16]Golub TR, et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999, 286:531-537.
  • [17]Gan G, et al.: Data Clustering Theory, Algorithms, and Applications. Philadelphia: SIAM, Society for Industrial and Applied Mathematics; 2007.
  • [18]Tenenbaum JB, de Silva V, Langford JC: A global geometric framework for nonlinear dimensionality reduction. SCIENCE 2000, 290:2319-2323.
  • [19]Maulik U, Bandyopadhyay S: Performance Evaluation of Some Clustering Algorithms and Validity Indices. IEEE Trans Pattern Anal Machine Intelligence 2002, 24(12):1650-1654.
  • [20]Datta S, Datta S: Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 2003, 19(4):459-466.
  • [21]Handl J, Knowles J, Kell DB: Computational cluster validation in post‐genomic data analysis. Bioinformatics 2005, 21(15):3201-3212.
  • [22]Dembele D, Kastner P: Fuzzy c‐means method for clustering microarray data. Bioinformatics 2003, 19(8):973-980.
  • [23]Mills K, et al.: Microarray‐based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. Blood 2009, 114(5):1063-1072.
  • [24]Li Z, et al.: Gene Expression‐Based Classification and Regulatory Networks of Pediatric Acute Lymphoblastic Leukemia. BLOOD 2009, 114(20):4486-4493.
  • [25]Khan J, Wei JS, Ringnér M, et al.: Classification and Diagnostic Prediction of Cancers using Gene Expression Profiling and Artificial Neural Networks. Nat Medicine 2001, 7(6):673-679.
  • [26]Shipp MA, Ross KN, Tamayo P, et al.: Diffuse Large B‐cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning. Nat Medicine 2002, 8:68-74.
  • [27]Pomeroy SL, Tamayo P, Gaasenbeek M, et al.: Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression. Nat Medicine 2002, 415:436-442.
  • [28]Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a data set via the gap statistic. R Stat Soc 2001, 63(2):411-423.
  • [29]Gallegos MT, Ritter G: A robust method for cluster analysis. Ann Stat 2005, 33:347-380.
  • [30]Strehl A, Gosh J: Cluster ensembles–A knowledge reuse framework for combining multiple partitions. J Machine Learning Res 2002, 3:583-617.
  • [31]Monti S, Tamayo P, Mesirov J, Golub T: Consensus clustering: a resampling‐based method for class discovery and visualization of gene expression microarray data. Machine learning 2003, 52:91-118.
  文献评价指标  
  下载次数:137次 浏览次数:30次