BMC Bioinformatics | |
Separate-channel analysis of two-channel microarrays: recovering inter-spot information | |
Gordon K Smyth1  Naomi S Altman2  | |
[1] Department of Mathematics and Statistics, University of Melbourne, Vic 3010, Australia | |
[2] Department of Statistics, The Pennsylvania State University, University Park, PA 16802–2111, USA | |
关键词: Efficiency; Power; False discovery rate; Intraclass correlation; Reference design; Unconnected design; Loop design; | |
Others : 1087870 DOI : 10.1186/1471-2105-14-165 |
|
received in 2012-07-06, accepted in 2013-05-10, 发布年份 2013 | |
【 摘 要 】
Background
Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes.
Results
This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all.
Conclusions
The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
【 授权许可】
2013 Smyth and Altman; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117052743520.pdf | 1155KB | download | |
Figure 6. | 39KB | Image | download |
Figure 5. | 52KB | Image | download |
Figure 4. | 43KB | Image | download |
Figure 3. | 50KB | Image | download |
Figure 2. | 63KB | Image | download |
Figure 1. | 22KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Shalon D, Smith SJ, Brown PO: A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 1996, 6:639-645.
- [2]Holloway AJ, Oshlack A, Diyagama DS, David D, Bowtell DDL, Smyth GK: Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis. BMC Bioinformatics 2006, 7:511. BioMed Central Full Text
- [3]Passos V, Tan F, Winkens B, Berger M: Optimal designs for one- and two-color microarrays using mixed models: a comparative evaluation of their efficiencies. J Comput Biol 2009, 16:67-83.
- [4]Jensen P, Halbrendt N, Fazio G, Makalowska I, Altman N, Praul C, Maximova S, Ngugi HK, Crassweller R, Travis J, McNellis T: Rootstock-regulated gene expression patterns associated with fire blight resistance in apple. BMC Genomics 2012, 13:9. BioMed Central Full Text
- [5]Small HJ, Williams TD, Sturve J, Chipman JK, Southam AD, Bean TP, Lyons BP, Stentiford GD: Gene expression analyses of hepatocellular adenoma and hepatocellular carcinoma from the marine flatfish Limanda limanda. Dis Aquat Organ 2010, 88:127-141.
- [6]Bay LK, Ulstrup KE, Nielsen HB, Jarmer H, Goffard N, Willis BL, Miller DJ, Van Oppen MJH: Microarray analysis reveals transcriptional plasticity in the reef building coral Acropora millepora. Mol Ecol 2009, 18:3062-3075.
- [7]Chen Y, Dougherty ER, Bittner ML: Ratio based decisions and the quantitative analysis of cDNA microarray images. J Biomed Opt 1997, 2:364-374.
- [8]Smyth GK, Yang YH, Speed TP: Statistical issues in microarray data analysis. Methods Mol Biol 2003, 224:111-136.
- [9]Yang YH, Speed TP: Design and analysis of comparative microarray experiments. In Statistical Analysis of Gene Expression Microarray Data. Edited by Speed TP. Chapman & Hall/CRC Press; 2003:35-91.
- [10]Smyth GK: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R and Bioconductor. Edited by Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. New York: Springer; 2005:397-420.
- [11]Altman NS, Hua J: Extending the loop design for 2-channel microarray experiments. Genet Res 2006, 88:153-163.
- [12]Kerr MK: Linear models for microarray data analysis: hidden similarities and differences. J Comput Biol 2003, 10:891-901.
- [13]Smyth G: Individual channel analysis of two-colour microarray data Invited session IPM 11: computational tools for microarray analysis. In 55th Session of the International Statistics Institute. Sydney: International Statistics, Institute; 2005.
- [14]Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol 2000, 7:819-837.
- [15]Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, Gibson G: The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet 2001, 29:389-395.
- [16]Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 2001, 8:625-637.
- [17]Cui X, Hwang JG, Qiu J, Blades NJ, Churchill GA: Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 2005, 6:59-75.
- [18]Lynch A, Neal D, Kelly J, Burtt G, Thorne N: Missing channels in two-colour microarray experiments: combining single-channel and two-channel data. BMC Bioinformatics 2007, 8:26. BioMed Central Full Text
- [19]Yang YH, Thorne NP: Normalization for two-color cDNA microarray data. In Science and Statistics: A Festschrift for Terry Speed. Edited by Goldstein DR. Hayward, Hayward Goldstein DR. . Institute of Mathematical Statistics; 2003:403-418. [IMS Lecture Notes - Monograph, Series, Volume 40]
- [20]Yang YH, Dudoit S, Luu P, Speed TP: Normalization for cDNA microarray data. In Microarrays: Optical Technologies and Informatics. Edited by Bittner ML, Chen Y, Dorsel AN, Dougherty ER. San, Jose: Society for Optical Engineering; 2001:141-152. [Proceedings of SPIE, Vol 4266]
- [21]Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 2002, 12:111-139.
- [22]Stein C: Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press; 1956:197-197.
- [23]Efron B: Data analysis using Stein’s estimator and its generalizations. J Am Stat Assoc 1975, 70(350):311-319.
- [24]Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 2001, 17:509-519.
- [25]Wright GW, Simon RM: A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 2003, 19(18):2448-2455.
- [26]Smyth GK: Linear models and empirical Bayes for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004., 3Article 3
- [27]Smyth GK, Michaud J, Scott H: The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 2005, 21:2067-2075.
- [28]Ritchie M, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, Smyth G: A comparison of background correction methods for two-colour microarrays. Bioinformatics 2007, 23:2700-2707.
- [29]Searle SR, Casella G, McCulloch CE: Variance Components. New York: Wiley; 1992.
- [30]Smyth GK: An efficient algorithm for REML in heteroscedastic regression. J Comput Graphical Stat 2002, 11:836-847.
- [31]Bolstadt BM, Irizarry RA, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19:185-193.
- [32]Callow MJ, Dudoit S, Gong EL, Speed TP, Rubin EM: Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res 2000, 10(12):2022-2029.
- [33]Dabney A, Storey JD: qvalue: Q-value estimation for false discovery rate control. Package version 1.26.0 [http://www.bioconductor.org webcite]
- [34]Storey JD: The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Stat 2003, 31:2013-2035.
- [35]Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B 1995, 57:289-300.
- [36]Oshlack A, Smyth GK: Supplementary Materials for Holloway et al “Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis”. 2006. [http://bioinf.wehi.edu.au/folders/mixture webcite]
- [37]Pinheiro JC, Bates DM: Mixed-Effects Models in S and S-PLUS. New York: Springer-Verlag; 2000.
- [38]Zahn LM, Ma X, Altman NS, Zhang Q, Wall PK, Tian D, Gibas CJ, Gharaibeh R, Leebens-Mack JH, dePamphilis CW, Ma H: Comparative transcriptomics among floral organs of the basal eudicot Eschscholzia californica as reference for floral evolutionary developmental studies. Genome Biol 2010, 11:R101. BioMed Central Full Text
- [39]Kooperberg C, Aragaki A, Strand AD, Olson JM: Significance testing for small microarray experiments. Stat Med 2005, 24(15):2281-2298.
- [40]Diboun I, Wernisch L, Orengo CA, Koltzenburg M: Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma. BMC Genomics 2006, 7:252. BioMed Central Full Text
- [41]Murie C, Woody O, Lee AY, Nadon R: Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics 2009, 10:45. BioMed Central Full Text
- [42]R Development Core Team: R: A language and environment for statistical computing R Foundation for Statistical Computing. 2011. [http://www.R-project.org/ webcite] [Vienna, Austria. ISBN 3-900051-07-0]
- [43]Bioconductor: Open Source Software for Bioinformatics [http://bioconductor.org webcite]
- [44]Smyth GK, Ritchie M, Thorne N, Wettenhall J, Shi W: limma: Linear Models for Microarray Data User’s Guide. Package version 3.14.0 2013, [http://www.bioconductor.org webcite]
- [45]Smyth GK: LIMMA: Linear Models for Microarray Data. [http://bioinf.wehi.edu.au/limma webcite]
- [46]Silver J, Ritchie M, Smyth G: Microarray background correction: maximum likelihood estimation for the normal–exponential convolution. Biostatistics 2009, 10(2):352-363.