BMC Genomics | |
Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq | |
Jane H. Christensen3  Qibin Li1  Anders D. Børglum4  Ole Mors2  Mette Nyegaard3  Jia Ju1  Francesco Lescai3  Ross Lazarus5  Per Qvist3  Anto P. Rajkumar3  | |
[1] Beijing Genomics Institute, Shenzhen 518083, China;Research Department P, Aarhus University Hospital, Risskov, Denmark;Center for Integrative Sequencing, iSEQ, Aarhus University, Aarhus 8000, Denmark;Translational Neuropsychiatry Unit, Aarhus University, Aarhus 8240, Denmark;Computational Biology, Baker IDI heart and diabetes institute, Victoria 8008, Australia | |
关键词: Sensitivity and specificity; Quantitative real-time polymerase chain reaction; Predictive value of tests; Next-generation RNA Sequencing; Gene expression; | |
Others : 1221880 DOI : 10.1186/s12864-015-1767-y |
|
received in 2014-09-01, accepted in 2015-07-10, 发布年份 2015 | |
【 摘 要 】
Background
Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples.
Results
False-positivity rate of Cuffdiff2 and false-negativity rates of DESeq2 and TSPM were high. Among the four investigated DEG analysis methods, sensitivity and specificity of edgeR was relatively high. We documented the pooling bias and that the DEGs identified in pooled samples suffered low positive predictive values.
Conclusions
Our results highlighted the need for combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs in future RNA-seq experiments. They indicated limited utility of sample pooling strategies for RNA-seq in similar setups and supported increasing the number of biological replicate samples.
【 授权许可】
2015 Rajkumar et al.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150804061408140.pdf | 2000KB | download | |
Fig. 2. | 59KB | Image | download |
Fig. 1. | 91KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
【 参考文献 】
- [1]Korf I. Genomics: the state of the art in RNA-seq analysis. Nat Methods. 2013; 10:1165-1166.
- [2]Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009; 10:57-63.
- [3]Fang Z, Cui X. Design and validation issues in RNA-seq experiments. Brief Bioinform. 2011; 12:280-287.
- [4]Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 2013; 14:R95. BioMed Central Full Text
- [5]Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics. 2013; 14:91. BioMed Central Full Text
- [6]Kvam VM, Liu P, Si Y. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. Am J Bot. 2012; 99:248-256.
- [7]Engstrom PG, Steijger T, Sipos B, Grant GR, Kahles A, Alioto T, Behr J, Bertone P, Bohnert R, Campagna D, Davis CA, Dobin A, Gingeras TR, Goldman N, Guigo R, Harrow J, Hubbard TJ, Jean G, Kosarev P, Li S, Liu J, Mason CE, Molodtsov V, Ning Z, Ponstingl H, Prins JF, Ratsch G, Ribeca P, Seledtsov I, Solovyev V et al.. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods. 2013; 10:1185-1191.
- [8]Steijger T, Abril JF, Engstrom PG, Kokocinski F, Akerman M, Alioto T, Ambrosini G, Antonarakis SE, Behr J, Bertone P, Bohnert R, Bucher P, Cloonan N, Derrien T, Djebali S, Du J, Dudoit S, Gerstein M, Gingeras TR, Gonzalez D, Grimmond SM, Guigo R, Habegger L, Harrow J, Hubbard TJ, Iseli C, Jean G, Kahles A, Lagarde J, Leng J et al.. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods. 2013; 10:1177-1184.
- [9]Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloe D, Le Gall C, Schaeffer B, Le Crom S, Guedj M, Jaffrezic F. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013; 14:671-683.
- [10]Li CI, Su PF, Shyr Y. Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data. BMC Bioinformatics. 2013; 14:357. BioMed Central Full Text
- [11]Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011; 8:469-477.
- [12]Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013; 31:46-53.
- [13]Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26:139-140.
- [14]Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010; 11:R106. BioMed Central Full Text
- [15]Auer PL, Doerge RW. A Two-stage Poisson model for testing RNA-Seq data. Stat Appl Genet Mol Biol. 2011; 10:1-26.
- [16]Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013; 14:R36. BioMed Central Full Text
- [17]Counting reads in features with htseq-count. http://www-huber. embl.de/users/anders/HTSeq/doc/count.html#count webcite
- [18]Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012; 7:562-578.
- [19]Chuaqui RF, Bonner RF, Best CJ, Gillespie JW, Flaig MJ, Hewitt SM, Phillips JL, Krizman DB, Tangrea MA, Ahram M, Linehan WM, Knezevic V, Emmert-Buck MR. Post-analysis follow-up and validation of microarray experiments. Nat Genet. 2002; 32 Suppl:509-514.
- [20]Abruzzo LV, Wang J, Kapoor M, Medeiros LJ, Keating MJ, Edward Highsmith W, Barron LL, Cromwell CC, Coombes KR. Biological validation of differentially expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide microarrays. J Mol Diagn. 2005; 7:337-345.
- [21]Kendziorski C, Irizarry RA, Chen KS, Haag JD, Gould MN. On the utility of pooling biological samples in microarray experiments. Proc Natl Acad Sci U S A. 2005; 102:4252-4257.
- [22]Peng X, Wood CL, Blalock EM, Chen KC, Landfield PW, Stromberg AJ. Statistical implications of pooling RNA samples for microarray experiments. BMC Bioinformatics. 2003; 4:26. BioMed Central Full Text
- [23]Kendziorski CM, Zhang Y, Lan H, Attie AD. The efficiency of pooling mRNA in microarray experiments. Biostatistics. 2003; 4:465-477.
- [24]Mary-Huard T, Daudin JJ, Baccini M, Biggeri A, Bar-Hen A. Biases induced by pooling samples in microarray experiments. Bioinformatics. 2007; 23:i313-318.
- [25]Xu J, Sun J, Chen J, Wang L, Li A, Helm M, Dubovsky SL, Bacanu SA, Zhao Z, Chen X. RNA-Seq analysis implicates dysregulation of the immune system in schizophrenia. BMC Genomics. 2012; 13 Suppl 8:S2. BioMed Central Full Text
- [26]Ilmjarv S, Hundahl CA, Reimets R, Niitsoo M, Kolde R, Vilo J, Vasar E, Luuk H. Estimating differential expression from multiple indicators. Nucleic Acids Res. 2014; 42(8):e72.
- [27]Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, Robinson MD. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc. 2013; 8:1765-1786.
- [28]Fang Z, Martin J, Wang Z. Statistical methods for identifying differentially expressed genes in RNA-Seq experiments. Cell Biosci. 2012; 2:26. BioMed Central Full Text
- [29]Kasukawa T, Masumoto KH, Nikaido I, Nagano M, Uno KD, Tsujino K, Hanashima C, Shigeyoshi Y, Ueda HR. Quantitative expression profile of distinct functional regions in the adult mouse brain. PLoS One. 2011; 6: Article ID e23228
- [30]Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11:R25. BioMed Central Full Text
- [31]Differential analysis of count data - the DESeq2 package. http://www. bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf webcite
- [32]EdgeR: differential expression analysis of digital gene expression data. User's Guide. http://www. bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf webcite
- [33]Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011; 12:R22. BioMed Central Full Text
- [34]Leek JT, Taub MA, Rasgon JL. A statistical approach to selecting and confirming validation targets in -omics experiments. BMC Bioinformatics. 2012; 13:150. BioMed Central Full Text
- [35]Liu Y, Zhou J, White KP. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 2013: doi:10.1093/bioinformatics/btt1688.
- [36]Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2013; 14:232-243.
- [37]Bast RC. Molecular approaches to personalizing management of ovarian cancer. Ann Oncol. 2011; 22:viii5-viii15.
- [38]Paxinos G, Franklin KBJ. The mouse brain in stereotaxic coordinates. 2nd ed. Academic, San Diego, CA; 2001.
- [39]Fleige S, Pfaffl MW. RNA integrity and the effect on the real-time qRT-PCR performance. Mol Aspects Med. 2006; 27:126-139.