期刊论文详细信息
BMC Bioinformatics
An integrative method to normalize RNA-Seq data
Cyril Filloux1  Meersseman Cédric2  Philippe Romain1  Forestier Lionel1  Klopp Christophe3  Rocha Dominique2  Maftah Abderrahman1  Petit Daniel1 
[1] Université de Limoges, UMR1061, Unité Génétique Moléculaire Animale, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France
[2] AgroParisTech, UMR1313, Unité Génétique Animale et Biologie Intégrative, Domaine de Vilvert, F-78352 Jouy-en-Josas, France
[3] INRA, Sigenae, Chemin de Borde-Rouge, Auzeville, BP 52627 31326 Castanet-Tolosan Cedex, France
关键词: qRT-PCR;    RNA-sequencing;    Normalization;    Gene expression;   
Others  :  818438
DOI  :  10.1186/1471-2105-15-188
 received in 2013-12-11, accepted in 2014-06-09,  发布年份 2014
PDF
【 摘 要 】

Background

Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method.

Results

We present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request.

Conclusions

The methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required.

【 授权许可】

   
2014 Filloux et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711102144875.pdf 1449KB PDF download
Figure 4. 103KB Image download
Figure 3. 107KB Image download
Figure 2. 73KB Image download
Figure 1. 103KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5:621-628.
  • [2]Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 2008, 5:613-619.
  • [3]Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O’Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo M-L: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 2008, 321:956-960.
  • [4]Oshlack A, Wakefield MJ: Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 2009, 4:14. BioMed Central Full Text
  • [5]Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 2010, 11:R25. BioMed Central Full Text
  • [6]Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 2010, 11:94. BioMed Central Full Text
  • [7]Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 2008, 9:321-332.
  • [8]Dillies M-A, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloë D, Le Gall C, Schaëffer B, Le Crom S, Guedj M, Jaffrézic F, The French StatOmique Consortium: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 2012, 14(6):671-683.
  • [9]Srivastava S, Chen L: A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 2010, 38:e170.
  • [10]Risso D, Schwartz K, Sherlock G, Dudoit S: GC-content normalization for RNA-Seq data. BMC Bioinformatics 2011, 12:480. BioMed Central Full Text
  • [11]Zheng W, Chung LM, Zhao H: Bias detection and correction in RNA-Sequencing data. BMC Bioinformatics 2011, 12:290. BioMed Central Full Text
  • [12]Hansen KD, Irizarry RA, WU Z: Removing technical variability in RNA-seq data using conditional quantile normalization. Biostat Oxf Engl 2012, 13:204-216.
  • [13]Mamanova L, Andrews RM, James KD, Sheridan EM, Ellis PD, Langford CF, Ost TWB, Collins JE, Turner DJ: FRT-seq: amplification-free, strand-specific, transcriptome sequencing. Nat Methods 2010, 7:130-132.
  • [14]Benjamini Y, Speed TP: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 2012, 40:e72.
  • [15]Xu H, Luo X, Qian J, Pang X, Song J, Qian G, Chen J, Chen S: FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 2012, 7(12):e52249.
  • [16]Li J, Jiang H, Wong WH: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol 2010, 11:R50. BioMed Central Full Text
  • [17]Ermonval M, Petit D, Le Duc A, Kellermann O, Gallet P-F: Glycosylation-related genes are variably expressed depending on the differentiation state of a bioaminergic neuronal cell line: implication for the cellular prion protein. Glycoconj J 2009, 26:477-493.
  • [18]Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29:15-21.
  • [19]Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012, 7:562-578.
  • [20]Sipos B, Slodkowicz G, Massingham T, Goldman N: Realistic simulations reveal extensive sample-specificity of RNA-seq biases. 2013. arXiv preprint arXiv:1308.3172
  • [21]Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008, 18:1509-1517.
  • [22]Sun Z, Zhu Y: Systematic comparison of RNA-Seq normalization methods using measurement error models. Bioinforma Oxf Engl 2012, 28:2584-2591.
  • [23]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol 2010, 28:511-515.
  • [24]Hammer Ø, Harper D, Ryan P: Past: paleontological statistics software package for education and data analysis. Palaeontol Electron 2001, 4(4):9. 178kb. http://www.palaeo-electronica.org/2001_1/past/issue1_01.htm webcite
  • [25]Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M, Lee S, Lee B, Kang C, Lee S: Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic Acids Res 2011, 39:e9.
  • [26]Jones DC, Ruzzo WL, Peng X, Katze MG: A new approach to bias correction in RNA-Seq. Bioinformatics 2012, 28:921-928.
  • [27]Gao L, Fang Z, Zhang K, Zhi D, Cui X: Length bias correction for RNA-seq data in gene set analyses. Bioinformatics 2011, 27:662-669.
  • [28]Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ: Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes. Nat Methods 2009, 6:291-295.
  • [29]Aird D, Ross MG, Chen W-S, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A: Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 2011, 12:R18. BioMed Central Full Text
  • [30]Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010, 464:768-772.
  • [31]Hansen KD, Brenner SE, Dudoit S: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 2010, 38:e131.
  • [32]Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 2011, 12:R22. BioMed Central Full Text
  • [33]Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320:1344-1349.
  文献评价指标  
  下载次数:44次 浏览次数:41次