期刊论文详细信息
BMC Genomics
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies
Yan Lu4  Pengyuan Liu4  Mingyu Liang3  Allen W. Cowley3  Enguo Chen1  Yaning Yang5  Xing Hua3  Zhaoben Fang5  Chen Chu2 
[1] Division of Respiratory Medicine, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, Zhejiang, China;Department of Gynecologic Oncology, The Affiliated Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou 310029, Zhejiang, China;Department of Physiology, Medical College of Wisconsin, Milwaukee 53226, WI, USA;Institute for Translational Medicine, School of Medicine, Zhejiang University, Hangzhou 310029, Zhejiang, China;Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, Anhui, China
关键词: RNA-Seq;    Generalized Poisson;    Differential expression;    Next-generation sequencing;   
Others  :  1213740
DOI  :  10.1186/s12864-015-1676-0
 received in 2014-11-17, accepted in 2015-06-01,  发布年份 2015
PDF
【 摘 要 】

Background

The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. RNA-Seq provides a far more precise measurement of transcript levels and their isoforms compared to other methods such as microarrays. A fundamental goal of RNA-Seq is to better identify expression changes between different biological or disease conditions. However, existing methods for detecting differential expression from RNA-Seq count data have not been comprehensively evaluated in large-scale RNA-Seq datasets. Many of them suffer from inflation of type I error and failure in controlling false discovery rate especially in the presence of abnormal high sequence read counts in RNA-Seq experiments.

Results

To address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data. This framework contains new normalization methods based on generalized Poisson distribution modeling sequence count data, followed by permutation-based differential expression tests. We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.

Conclusions

Software implementing our deGPS was released within an R package with parallel computations (https://github.com/LL-LAB-MCW/deGPS webcite). deGPS is a powerful and robust tool for data normalization and detecting different expression in RNA-Seq experiments. Beyond RNA-Seq, deGPS has the potential to significantly enhance future data analysis efforts from many other high-throughput platforms such as ChIP-Seq, MBD-Seq and RIP-Seq.

【 授权许可】

   
2015 Chu et al.

【 预 览 】
附件列表
Files Size Format View
20150615025648707.pdf 3350KB PDF download
Fig. 7. 146KB Image download
Fig. 6. 81KB Image download
Fig. 5. 42KB Image download
Fig. 4. 41KB Image download
Fig. 3. 60KB Image download
Fig. 2. 33KB Image download
Fig. 1. 34KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

【 参考文献 】
  • [1]Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009; 10(1):57-63.
  • [2]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008; 5(7):621-628.
  • [3]Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P et al.. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 2013; 14(9):R95. BioMed Central Full Text
  • [4]Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M et al.. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008; 320(5881):1344-1349.
  • [5]Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010; 11(10):R106. BioMed Central Full Text
  • [6]Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1):139-140.
  • [7]Joe H, Zhu R. Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biom J. 2005; 47(2):219-229.
  • [8]Di Y, Schafer DW, Cumbie JS, Chang JH. The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq. Stat Appl Genet Mol Biol. 2011; 10(1):1-28.
  • [9]Auer PL, Doerge RW. A Two-Stage Poisson Model for Testing RNA-Seq Data. Stat Appl Genet Mol Biol. 2011; 10(1):1-26.
  • [10]Hardcastle TJ, Kelly KA. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010; 11:422. BioMed Central Full Text
  • [11]Leng N, Dawson J, Thomson J, Ruotti V, Rissman A, Smits B et al.. EBSeq: an empirical bayes hierarchical model for inference in RNA-seq experiments. Tech. Rep. 226, Department of Biostatistics and Medical Informatics, University of Wisconsin; 2012.
  • [12]Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011; 21:2213-2223.
  • [13]Li J, Tibshirani R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-seq data. Stat Methods Med Res. 2011; 22(5):519-36.
  • [14]Van de Wiel M, Leday G, Pardo L, Rue H, Van der Vaart A, Van Wieringen W. Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 2012; 14:113-128.
  • [15]Li J, Witten DM, Johnstone IM, Tibshirani R. Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics. 2012; 13(3):523-538.
  • [16]Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics. 2013; 14(1):91. BioMed Central Full Text
  • [17]Affymetrix: Statistical Algorithms Description Document. http://media.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf 2002
  • [18]Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J et al.. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002; 30(4): Article ID e15
  • [19]Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11(3):R25. BioMed Central Full Text
  • [20]Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995; 57(1):289-300.
  • [21]Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003; 19(2):185-193.
  • [22]Zien AAT, Zimmer R, Lengauer T. Centralization: a new method for the normalization of gene expression data. Bioinformatics. 2001; 17 Suppl 1:S323-331.
  • [23]Soneson C. compcodeR-an R package for benchmarking differential expression methods for RNA-seq data. Bioinformatics. 2014; 30(17):2517-2518.
  • [24]Strimmer K. A unified approach to false discovery rate estimation. BMC Bioinformatics. 2008; 9:303. BioMed Central Full Text
  • [25]Zuber V, Strimmer K. Gene ranking and biomarker discovery under correlation. Bioinformatics. 2009; 25(20):2700-2707.
  • [26]Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012;11(5)
  • [27]Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L et al.. The developmental transcriptome of Drosophila melanogaster. Nature. 2011; 471(7339):473-479.
  • [28]Frazee AC, Langmead B, Leek JT. ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics. 2011; 12:449. BioMed Central Full Text
  • [29]Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010; 11(1):31-46.
  • [30]Consul PC. Generalized Poisson Distributions: Properties and Applications. Marcel Dekker Incorporated, New York; 1989.
  • [31]Srivastava S, Chen L. A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 2010; 38(17): Article ID e170
  文献评价指标  
  下载次数:106次 浏览次数:43次