BMC Bioinformatics | |
Systematic noise degrades gene co-expression signals but can be corrected | |
Research Article | |
Saskia Freytag1  Melanie Bahlo2  Terence P. Speed3  Johann Gagnon-Bartsch4  | |
[1] Bioinformatics Division, Walter + Eliza Hall Institute, 1G Royal Parade, 3050, Melbourne, Australia;Department of Mathematics and Statistics, University of Melbourne, 3010, Melbourne, Australia;Bioinformatics Division, Walter + Eliza Hall Institute, 1G Royal Parade, 3050, Melbourne, Australia;Department of Mathematics and Statistics, University of Melbourne, 3010, Melbourne, Australia;Department of Medical Biology, University of Melbourne, 3010, Melbourne, Australia;Bioinformatics Division, Walter + Eliza Hall Institute, 1G Royal Parade, 3050, Melbourne, Australia;Department of Mathematics and Statistics, University of Melbourne, 3010, Melbourne, Australia;Department of Statistics, University of California, 367 Evans Hall, 94720, Berkeley, USA;Department of Statistics, University of California, 367 Evans Hall, 94720, Berkeley, USA; | |
关键词: Gene co-expression; Data cleaning; Removal of unwanted variation; Human brain; Epilepsy; | |
DOI : 10.1186/s12859-015-0745-3 | |
received in 2015-05-11, accepted in 2015-09-16, 发布年份 2015 | |
来源: Springer | |
【 摘 要 】
BackgroundIn the past decade, the identification of gene co-expression has become a routine part of the analysis of high-dimensional microarray data. Gene co-expression, which is mostly detected via the Pearson correlation coefficient, has played an important role in the discovery of molecular pathways and networks. Unfortunately, the presence of systematic noise in high-dimensional microarray datasets corrupts estimates of gene co-expression. Removing systematic noise from microarray data is therefore crucial. Many cleaning approaches for microarray data exist, however these methods are aimed towards improving differential expression analysis and their performances have been primarily tested for this application. To our knowledge, the performances of these approaches have never been systematically compared in the context of gene co-expression estimation.ResultsUsing simulations we demonstrate that standard cleaning procedures, such as background correction and quantile normalization, fail to adequately remove systematic noise that affects gene co-expression and at times further degrade true gene co-expression. Instead we show that a global version of removal of unwanted variation (RUV), a data-driven approach, removes systematic noise but also allows the estimation of the true underlying gene-gene correlations. We compare the performance of all noise removal methods when applied to five large published datasets on gene expression in the human brain. RUV retrieves the highest gene co-expression values for sets of genes known to interact, but also provides the greatest consistency across all five datasets. We apply the method to prioritize epileptic encephalopathy candidate genes.ConclusionsOur work raises serious concerns about the quality of many published gene co-expression analyses. RUV provides an efficient and flexible way to remove systematic noise from high-dimensional microarray datasets when the objective is gene co-expression analysis. The RUV method as applicable in the context of gene-gene correlation estimation is available as a BioconductoR-package: RUVcorr.
【 授权许可】
CC BY
© Freytag et al. 2015
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311106965429ZK.pdf | 3235KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]