BMC Genomics | |
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies | |
Yan Lu4  Pengyuan Liu4  Mingyu Liang3  Allen W. Cowley3  Enguo Chen1  Yaning Yang5  Xing Hua3  Zhaoben Fang5  Chen Chu2  | |
[1] Division of Respiratory Medicine, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, Zhejiang, China;Department of Gynecologic Oncology, The Affiliated Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou 310029, Zhejiang, China;Department of Physiology, Medical College of Wisconsin, Milwaukee 53226, WI, USA;Institute for Translational Medicine, School of Medicine, Zhejiang University, Hangzhou 310029, Zhejiang, China;Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, Anhui, China | |
关键词: RNA-Seq; Generalized Poisson; Differential expression; Next-generation sequencing; | |
Others : 1213740 DOI : 10.1186/s12864-015-1676-0 |
|
received in 2014-11-17, accepted in 2015-06-01, 发布年份 2015 | |
【 摘 要 】
Background
The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. RNA-Seq provides a far more precise measurement of transcript levels and their isoforms compared to other methods such as microarrays. A fundamental goal of RNA-Seq is to better identify expression changes between different biological or disease conditions. However, existing methods for detecting differential expression from RNA-Seq count data have not been comprehensively evaluated in large-scale RNA-Seq datasets. Many of them suffer from inflation of type I error and failure in controlling false discovery rate especially in the presence of abnormal high sequence read counts in RNA-Seq experiments.
Results
To address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data. This framework contains new normalization methods based on generalized Poisson distribution modeling sequence count data, followed by permutation-based differential expression tests. We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.
Conclusions
Software implementing our deGPS was released within an R package with parallel computations (https://github.com/LL-LAB-MCW/deGPS webcite). deGPS is a powerful and robust tool for data normalization and detecting different expression in RNA-Seq experiments. Beyond RNA-Seq, deGPS has the potential to significantly enhance future data analysis efforts from many other high-throughput platforms such as ChIP-Seq, MBD-Seq and RIP-Seq.
【 授权许可】
2015 Chu et al.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150615025648707.pdf | 3350KB | download | |
Fig. 7. | 146KB | Image | download |
Fig. 6. | 81KB | Image | download |
Fig. 5. | 42KB | Image | download |
Fig. 4. | 41KB | Image | download |
Fig. 3. | 60KB | Image | download |
Fig. 2. | 33KB | Image | download |
Fig. 1. | 34KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
【 参考文献 】
- [1]Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009; 10(1):57-63.
- [2]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008; 5(7):621-628.
- [3]Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P et al.. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 2013; 14(9):R95. BioMed Central Full Text
- [4]Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M et al.. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008; 320(5881):1344-1349.
- [5]Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010; 11(10):R106. BioMed Central Full Text
- [6]Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1):139-140.
- [7]Joe H, Zhu R. Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biom J. 2005; 47(2):219-229.
- [8]Di Y, Schafer DW, Cumbie JS, Chang JH. The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq. Stat Appl Genet Mol Biol. 2011; 10(1):1-28.
- [9]Auer PL, Doerge RW. A Two-Stage Poisson Model for Testing RNA-Seq Data. Stat Appl Genet Mol Biol. 2011; 10(1):1-26.
- [10]Hardcastle TJ, Kelly KA. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010; 11:422. BioMed Central Full Text
- [11]Leng N, Dawson J, Thomson J, Ruotti V, Rissman A, Smits B et al.. EBSeq: an empirical bayes hierarchical model for inference in RNA-seq experiments. Tech. Rep. 226, Department of Biostatistics and Medical Informatics, University of Wisconsin; 2012.
- [12]Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011; 21:2213-2223.
- [13]Li J, Tibshirani R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-seq data. Stat Methods Med Res. 2011; 22(5):519-36.
- [14]Van de Wiel M, Leday G, Pardo L, Rue H, Van der Vaart A, Van Wieringen W. Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 2012; 14:113-128.
- [15]Li J, Witten DM, Johnstone IM, Tibshirani R. Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics. 2012; 13(3):523-538.
- [16]Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics. 2013; 14(1):91. BioMed Central Full Text
- [17]Affymetrix: Statistical Algorithms Description Document. http://media.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf 2002
- [18]Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J et al.. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002; 30(4): Article ID e15
- [19]Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11(3):R25. BioMed Central Full Text
- [20]Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995; 57(1):289-300.
- [21]Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003; 19(2):185-193.
- [22]Zien AAT, Zimmer R, Lengauer T. Centralization: a new method for the normalization of gene expression data. Bioinformatics. 2001; 17 Suppl 1:S323-331.
- [23]Soneson C. compcodeR-an R package for benchmarking differential expression methods for RNA-seq data. Bioinformatics. 2014; 30(17):2517-2518.
- [24]Strimmer K. A unified approach to false discovery rate estimation. BMC Bioinformatics. 2008; 9:303. BioMed Central Full Text
- [25]Zuber V, Strimmer K. Gene ranking and biomarker discovery under correlation. Bioinformatics. 2009; 25(20):2700-2707.
- [26]Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012;11(5)
- [27]Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L et al.. The developmental transcriptome of Drosophila melanogaster. Nature. 2011; 471(7339):473-479.
- [28]Frazee AC, Langmead B, Leek JT. ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics. 2011; 12:449. BioMed Central Full Text
- [29]Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010; 11(1):31-46.
- [30]Consul PC. Generalized Poisson Distributions: Properties and Applications. Marcel Dekker Incorporated, New York; 1989.
- [31]Srivastava S, Chen L. A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 2010; 38(17): Article ID e170