BMC Research Notes | |
Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes | |
Simon Joly1  Emmanuel González2  | |
[1] Montreal Botanical Garden, 4101 Sherbrooke E, Montréal, QC H1X 2B2, Canada;Institut de recherche en biologie végétale, Université de Montréal, 4101 Sherbrooke E, Montréal, H1X 2B2, (QC), Canada | |
关键词: RNA-seq; False positive rates; Non-model organisms; Differential isoform expression; Differential gene expression; de novo transcriptome assembly; | |
Others : 1140599 DOI : 10.1186/1756-0500-6-503 |
|
received in 2013-06-18, accepted in 2013-11-20, 发布年份 2013 | |
【 摘 要 】
Background
High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines.
Results
We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length.
Conclusion
In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.
【 授权许可】
2013 González and Joly; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150325054341130.pdf | 1125KB | download | |
Figure 6. | 46KB | Image | download |
Figure 5. | 68KB | Image | download |
Figure 4. | 77KB | Image | download |
Figure 3. | 70KB | Image | download |
Figure 2. | 73KB | Image | download |
Figure 1. | 42KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Martin JA, Wang Z: Next-generation transcriptome assembly. Nat Rev Genet 2011, 12:671-682.
- [2]Gahlan P, Singh HR, Shankar R, Sharma N, Kumari A, Chawla V, Ahuja PS, Kumar S: De novo sequencing and characterization of Picrorhiza kurrooa transcriptome at two temperatures showed major transcriptome adjustments. BMC Genomics 2012, 13:126. BioMed Central Full Text
- [3]Ward JA, Ponnala L, Weber CA: Strategies for transcriptome analysis in nonmodel plants. Am J Bot 2012, 99:267-276.
- [4]Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011, 29:644-652.
- [5]Lin Y, Li J, Shen H, Zhang L, Papasian CJ, Deng HW: Comparative studies of de novo assembly tools for next-generaten sequencing technologies. Bioinformatics 2011, 27:2031-2037.
- [6]Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, et al.: De novo assembly and analysis of RNA-seq data. Nat Methods 2010, 7:909-912.
- [7]Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 2012, 28:1086-1092.
- [8]Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009, 10:57-63.
- [9]Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008, 18:1509-1517.
- [10]Fonseca NA, Rung J, Brazma A, Marioni JC: Tools for mapping high-throughput sequencing data. Bioinformatics 2012, 28:3169-3177.
- [11]Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A: Differential expression in RNA-seq: a matter of depth. Genome Res 2011, 21:2213-2223.
- [12]Li H, Homer N: A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 2010, 11:473-483.
- [13]Chang S, Puryear J, Cairney J: A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Report 1993, 11:113-116.
- [14]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10:R25. BioMed Central Full Text
- [15]Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma 2011, 12:323. BioMed Central Full Text
- [16]Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26:139-140.
- [17]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359.
- [18]Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012, 7:562-578.
- [19]Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C: EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 2013, 29(8):1035-1043.
- [20]Lindner R, Friedel CC: A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 2012, 7:e52403.
- [21]Garber M, Grabherr MG, Guttman M, Trapnell C: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 2011, 8:469-477.
- [22]Zhao QY, Wang Y, Kong YM, Luo D, Li X, Hao P: Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinforma 2011, 12(14):S2.
- [23]Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol 2010, 11:220. BioMed Central Full Text
- [24]Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinforma 2013, 14:91. BioMed Central Full Text
- [25]Soltis DE, Bell CD, Kim S, Soltis PS: Origin and early evolution of angiosperms. Ann N Y Acad Sci 2008, 1133:3-25.
- [26]Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B: RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 2012, 40:W622-W627.