期刊论文详细信息
BMC Research Notes
Finding gene clusters for a replicated time course study
Steven G Self1  Linda Breeden2  Li-Xuan Qin3 
[1] Division of Public Health Science, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA;Division of Basic Science, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA;Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
关键词: Time course;    Replications;    Regression;    Microarray;    Clustering;   
Others  :  1134726
DOI  :  10.1186/1756-0500-7-60
 received in 2013-07-25, accepted in 2014-01-15,  发布年份 2014
PDF
【 摘 要 】

Background

Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies.

Findings

In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast.

Conclusions

The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism.

【 授权许可】

   
2014 Qin et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150306050517711.pdf 2641KB PDF download
Figure 6. 76KB Image download
Figure 5. 223KB Image download
Figure 4. 127KB Image download
Figure 3. 92KB Image download
Figure 2. 126KB Image download
Figure 1. 82KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Kaufman L, Rousseeuw PJ: Finding groups in data: an introduction to cluster analysis. New York: John Wiley & Sons, Inc; 1990.
  • [2]Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95(25):14863-14868.
  • [3]Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281-285.
  • [4]Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL: Model-based clustering and data transformations for gene expression data. Bioinformatics (Oxford, England) 2001, 17(10):977-987.
  • [5]Ghosh D, Chinnaiyan AM: Mixture modelling of gene expression data from microarray experiments. Bioinformatics (Oxford, England) 2002, 18(2):275-286.
  • [6]Qin LX, Self SG: The clustering of regression models method with applications in gene expression data. Biometrics 2006, 62(2):526-533.
  • [7]MacQueen J: Some methods for classification and analysis of multivariate observations. Berkeley, Calif: University of California Press; 1967:281-297. [Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics] http://projecteuclid.org/euclid.bsmsp/1200512992 webcite
  • [8]Celeux G, Govaert G: A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data An 1992, 14(3):315-332.
  • [9]McLachlan G: The classification and mixture maximum likelihood approaches to cluster analysis. Handbook of Statistics 1982, 2(1982):199-208.
  • [10]McLachlan G, Basford K: Mixture models: inference and applications to clustering. New York: Marcel Dekker; 1988.
  • [11]Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 1977, 39(1):1-38.
  • [12]Fraley C, Raftery AE: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 1998, 41(8):578-588.
  • [13]Storey JD: A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol 2002, 64(3):479-498.
  • [14]Park MY, Hastie T, Tibshirani R: Averaged gene expressions for regression. Biostatistics (Oxford, England) 2007, 8(2):212-227.
  • [15]Pramila T, Miles S, GuhaThakurta D, Jemiolo D, Breeden LL: Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. Gene Dev 2002, 16(23):3034-3045.
  • [16]Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273-3297.
  • [17]Zhao LP, Prentice R, Breeden L: Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc Natl Acad Sci USA 2001, 98(10):5631-5636.
  • [18]Li L, Lu Y, Qin LX, Bar-Joseph Z, Werner-Washburne M, Breeden LL: Budding yeast SSD1-V regulates transcript levels of many longevity genes and extends chronological life span in purified quiescent cells. Mol Biol Cell 2009, 20(17):3851-3864.
  • [19]Luan Y, Li H: Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics (Oxford, England) 2003, 19(4):474-482.
  • [20]Ma P, Castillo-Davis CI, Zhong W, Liu JS: A data-driven clustering method for time course gene expression data. Nucleic Acids Res 2006, 34(4):1261-1269.
  • [21]Ng SK, McLachlan GJ, Wang K, Ben-Tovim Jones L, Ng SW: A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics (Oxford, England) 2006, 22(14):1745-1752.
  • [22]Joo Y, Casella G, Hobert J: Bayesian model-based tight clustering for time course data. Computation Stat 2010, 25(1):17-38.
  • [23]Wang K, Ng SK, McLachlan GJ: Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects. BMC Bioinformatics 2012, 13:300. BioMed Central Full Text
  文献评价指标  
  下载次数:65次 浏览次数:16次