期刊论文详细信息
BMC Bioinformatics
Interpolation based consensus clustering for gene expression time series
Tai-Yu Chiu1  Ting-Chieh Hsu1  Chia-Cheng Yen1  Jia-Shung Wang1 
[1] Department of Computer Science, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, HsinChu 30013, Taiwan
关键词: Interpolation;    Affinity propagation;    Consensus clustering;    Gene expression;    Microarray data analyses;   
Others  :  1177483
DOI  :  10.1186/s12859-015-0541-0
 received in 2013-09-18, accepted in 2015-02-27,  发布年份 2015
PDF
【 摘 要 】

Background

Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue.

Results

An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm.

Conclusion

The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

【 授权许可】

   
2015 Chiu et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150501012857464.pdf 3292KB PDF download
Figure 16. 20KB Image download
Figure 15. 108KB Image download
Figure 14. 12KB Image download
Figure 13. 23KB Image download
Figure 12. 122KB Image download
Figure 11. 164KB Image download
Figure 10. 154KB Image download
Figure 9. 75KB Image download
Figure 8. 21KB Image download
Figure 7. 45KB Image download
Figure 6. 67KB Image download
Figure 5. 119KB Image download
Figure 4. 17KB Image download
Figure 3. 23KB Image download
Figure 2. 22KB Image download
Figure 1. 21KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Figure 12.

Figure 13.

Figure 14.

Figure 15.

Figure 16.

【 参考文献 】
  • [1]Bar-Joseph Z: Analyzing time series gene expression data. Bioinformatics 2004, 20(16):2493-503.
  • [2]de Ridder D, de Ridder J, Reinders MJT: Pattern Recognition in Bioinformatics. Brief Bioinformatics 2013, 14(5):633-47.
  • [3]Androulakis IP, Yang E, Almon RR: Analysis of time-series gene expression data: methods, challenges, and opportunities. Annu Rev Biomed Eng 2007, 9:205-28.
  • [4]Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, et al.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 1999, 96(6):2907-912.
  • [5]Yeung KY, Medvedovic M, Bumgarner RE: Clustering gene-expression data with repeated measurements. Genome Biol 2003, 4:R34. BioMed Central Full Text
  • [6]Yeung KY, Bumgarner RE: MedvedovicM: Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 2004, 20(8):1222-32.
  • [7]Schliep A, Costa IG, Steinhoff C, Schönhuth A: Analyzing gene expression time-courses. IEEE/ACM Trans Comput Biol Bioinf 2005, 2(3):179-93.
  • [8]Ng SK, McLachlan GJ, Wang K, Jones LBT, Ng SW: A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 2006, 22(14):1745-52.
  • [9]Li CT, Yuan Y, Wilson R: An unsupervised conditional random fields approach for clustering gene expression time series. Bioinformatics 2008, 24(21):2467-73.
  • [10]Bar-Joseph Z, Gerber G, Gifford DK, Jaakkola TS, Simon I. A new approach to analyzing gene expression time series data. In: Proc. Sixth Ann. Int’l Conf. Computational Biology (RECOMB 02): 2002. p. 39–48.
  • [11]Luan Y, Li H: Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 2003, 19(4):474-82.
  • [12]Frey BJ, Dueck D: Clustering by passing messages between data points. Science 2007, 315(5814):972-976.
  • [13]Leone M, Sumedha WM: Clustering by soft-constraint affinity propagation. Bioinformatics 2007, 23(20):2708-15.
  • [14]Monti S, Tamayo P, Mesirov J, Golub T: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 2003, 52(1-2):91-118.
  • [15]Swift S, Tucker A, Vinciotti V, Martin N, Orengo C, Liu X, Kellam P: Consensus clustering and functional interpretation of gene-expression data. Genome Biol 2004, 5:R94. BioMed Central Full Text
  • [16]Yu Z, Wong H, Wang H: Graph based consensus clustering for class discovery from gene expression data. Bioinformatics 2007, 23(21):2888-96.
  • [17]Avogadri R, Valentini G: Fuzzy ensemble clustering based on random projections for DNA microarray data analysis. Artif Intell Med 2009, 45(2):173-83.
  • [18]Yedidia JS, Freeman WT, Weiss Y: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 2005, 51(7):2282-312.
  • [19]Zhang X, Wang W, Nørvag K, Sebag M. K-AP: Generating Specified K Clusters by Efficient Affinity Propagation. In: Proceedings of the 10th IEEE International Conference on Data Mining: 2010. p. 1187–92.
  • [20]Hubert L, Arabie P: Comparing partitions. J Classif 1985, 2(1):193-218.
  • [21]Yeung KY, Ruzzo WL: Principal component analysis for clustering gene expression data. Bioinformatics 2001, 17(9):763-74.
  • [22]Jiang D, Tang C, Zhang A: Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 2004, 16(11):1370-86.
  • [23]Rousseeuw P: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987, 20(1):53-65.
  • [24]Bolshakova N, Azuaje F: Cluster validation techniques for genome expression data. Signal Process 2003, 83(4):825-33.
  • [25]Bertoni A, Valentini G: Discovering multi-level structures in bio-molecular data through the Bernstein inequality. BMC Bioinformatics 2008, 9:(Suppl 2):S4. BioMed Central Full Text
  • [26]Valentini G: Mosclust: a software library for discovering significant structures in bio-molecular data. Bioinformatics 2007, 23(3):387-9.
  • [27]Smolkin M, Gosh D: Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 2003, 4:36. BioMed Central Full Text
  • [28]Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, et al.: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Bioinformatics 2001, 292(5518):929-34.
  • [29]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al.: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25(1):25-9.
  • [30]Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998, 2(1):65-73.
  • [31]Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, et al.: The transcriptional program of sporulation in budding yeast. Science 1998, 282(5389):699-705.
  • [32]Bandyopadhyay S, Mukhopadhyay A, Maulik U: An improved algorithm for clustering gene expression data. Bioinformatics 2007, 23:2859-65.
  • [33]Tjaden B: An approach for clustering gene expression data with error information. BMC Bioinformatics 2006, 7:17. BioMed Central Full Text
  • [34]Chiu TY, Hsu TC, Wang JS. Ap-based consensus clustering for gene expression time series. In: IEEE International Conference on Pattern Recognition: 2010. p. 2512–5.
  • [35]Ernst J, Nau GJ, Bar-Joseph Z: Clustering short time series gene expression data. Bioinformatics 2005, 21(Supp11):i159-i168.
  • [36]Maulik U, Bandyopadhyay S: Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sensing 2003, 41(5):1075-81.
  • [37]Qin ZS: Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 2006, 22(16):1988-97.
  文献评价指标  
  下载次数:279次 浏览次数:21次