期刊论文详细信息
BMC Research Notes
The bench scientist's guide to statistical analysis of RNA-Seq data
Jyothi Thimmapuram1  Elizabeth A Ainsworth2  Craig R Yendrek3 
[1] Current Address: Bioinformatics Core, Discovery Park, Purdue University, West Lafayette, IN, 47907, USA;Department of Plant Biology, University of Illinois, Urbana-Champaign, Urbana, IL, 61801, USA;USDA ARS Global Change and Photosynthesis Research Unit, 1201 W. Gregory Drive, Urbana, IL 61801, USA
关键词: Statistical analysis;    Differential Expression;    RNA-Seq;   
Others  :  1165672
DOI  :  10.1186/1756-0500-5-506
 received in 2012-02-28, accepted in 2012-09-10,  发布年份 2012
PDF
【 摘 要 】

Background

RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression.

Findings

When comparing statistical tools, the negative binomial distribution-based methods, edgeR and DESeq, respectively identified 11,995 and 11,317 differentially expressed genes from an RNA-seq dataset generated from soybean leaf tissue grown in elevated O3. However, the number of genes in common between these two methods was only 10,535, resulting in 2,242 genes determined to be differentially expressed by only one method. Upon analysis of the non-significant genes, several limitations of these analytic tools were revealed, including evidence for overly stringent parameters for determining statistical significance of differentially expressed genes as well as increased type II error for high abundance transcripts.

Conclusions

Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical limitations, RNA-Seq provides highly accurate transcript abundance quantification that is comparable to qRT-PCR.

【 授权许可】

   
2012 Yendrek et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416032739950.pdf 675KB PDF download
Figure 4. 214KB Image download
Figure 3. 47KB Image download
Figure 2. 54KB Image download
Figure 1. 108KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Wang Z, Gerstein M, Snyder M: RNA-seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009, 10:57-63.
  • [2]Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008, 18:1509-1517.
  • [3]Brautigam A, Gowik U: What can next generation sequencing do for you? Next generation sequencing as a valuable tool in plant research. Plant Biology 2010, 12:831-841.
  • [4]Nowrousian M: Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems. Eukaryot Cell 2010, 9:1300-131015.
  • [5]Perez-Enciso M, Feretti L: Massive parallel sequencing in animal genetics: wherefroms and wheretos. Anim Genet 2010, 41:561-56913.
  • [6]Croucher NJ, Thomson NR: Studying bacterial transcriptomes using RNA-seq. Curr Opin Microbiol 2010, 13:619-624.
  • [7]Sutherland GT, Janitz M, Kril JJ: Understanding the pathogenesis of Alzheimer's disease: will RNA-Seq realize the promise of transcriptomics? J Neurochem 2011, 166:937-946.
  • [8]Garber M, Grabher MG, Guttman M, Trapnell : Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 2011, 8:469-477.
  • [9]Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2009, 26:139-140.
  • [10]Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol 2010, 11:R106. BioMed Central Full Text
  • [11]Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3:Article 3.
  • [12]Schmutz , et al.: Genome sequence of the palaeopolyploid soybean. Nature 2010, 463:178-183.
  • [13]Ruffalo M, LaFramboise T, Koyuturk M: Comparative analysis of algorthms for next-generation sequencing read alignment. Bioinformatics 2011, 27:2790-2796.
  • [14]Li H, Homer N: A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 2010, 11:473-483.
  • [15]Robinson MD, Smyth GK: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 2007, 23:2881-2887.
  • [16]Cloonan , et al.: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 2008, 5:613-619.
  • [17]Smyth GK: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R and Bioconductor. Edited by Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. Springer, New York; 2005:397-420.
  • [18]Ainsworth EA, Yendrek CR, Sitch S, Collins WJ, Emberson LD: The effects of tropospheric ozone on net primary production and implications for climate change. Annu Rev Plant Biol 2012, 63:637-661.
  • [19]Thimm O, Blaesing O, Gibon Y, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M: MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 2004, 37:914-939.
  • [20]Bilgin DD, DeLucia EH, Clough SJ: A robust plant RNA isolation method suitable for Affymetrix GeneChip analysis and quantitative real-time RT-PCR. Nat Protoc 2009, 4:333-340.
  • [21]Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD, Yeo GW: Determination of tag density required for digital transcriptome analysis: Application to an androgen-sensitive prostate cancer model. PNAS 2008, 105:20179-20184.
  • [22]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 2008, 5:621-628.
  • [23]Fisher RA: The design of experiments. 6th edition. Edinburgh, Oliver and Boyd Ltd; 1951.
  • [24]Auer P, Doerge RW: Statistical design and analysis of RNA sequencing data. Genetics 2010, 185:405-416.
  • [25]Leakey ADB, Xu F, Gillespie KM, McGrath JM, Ainsworth EA, Ort DR: Genomic basis for stimulated respiration by plants growing under elevated carbon dioxide. PNAS 2009, 106:3597-3602.
  • [26]Conklin PL, Barth C: Ascorbic acid, a familiar small molecule intertwined in the response of plants to ozone, pathogens, and the onset of senescence. Plant Cell Environ 2004, 27:959-970.
  • [27]Pell EJ, Schlagnhaufer CD, Arteca RN: Ozone-induced oxidative stress: Mechanisms of action and reaction. Physiol Plant 1997, 100:264-273.
  • [28]Howe EA, Sinha R, Schlauch D, Quackenbush J: RNA-Seq analysis in MeV. Bioinformatics 2011, 27:3209-3210.
  • [29]Cumbie , et al.: GENE-Counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS One 2011, 6:e25279.
  • [30]Zhao WM, et al.: wapRNA: a web-based application for the processing of RNA sequences. Bioinformatics 2011, 27:3076-3077.
  • [31]Wang L, Si YQ, Dedow LK, Shao Y, Liu P, Brutnell TP: A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-Seq. PLoS One 2011, 6:e26426.
  • [32]Zytnicki M, Quesneville H: S-MART, a software toolbox to aid RNA-seq data analysis. PLoS One 2011, 6:e25988.
  • [33]R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2011. ISBN 3-900051-07-0, URL http://www.R-project.org/ webcite
  • [34]Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 2008, 9:321-332.
  • [35]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995, 57:289-300.
  • [36]Libault M, Thibivilliers S, Bilgin DD, Radwan O, Benitez M, Clough SJ, Stacey G: Identification of four soybean reference genes for gene expression normalization. Plant Genome 2008, 1:44-54.
  • [37]Ruijter JM, Ramakers C, Hoogaars WM, Karlen Y, Bakker O, van den Hoff MJ, Moorman AF: Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res 2009, 37:e45.
  • [38]Gillespie KM, Rogers A, Ainsworth EA: Growth at elevated ozone or elevated carbon dioxide concentration alters antioxidant capacity and response to acute oxidative stress in soybean (Glycine max). J Exp Bot 2011, 62:2667-2678.
  文献评价指标  
  下载次数:63次 浏览次数:23次