期刊论文详细信息
BMC Bioinformatics
Bias detection and correction in RNA-Sequencing data
Research Article
Lisa M Chung1  Wei Zheng2  Hongyu Zhao3 
[1] Biostatistics Division, Yale School of Public Health, 300 George Street, 06510, New HavenConnecticut, USA;Biostatistics Resource, Keck Laboratory, Yale University, 300 George Street, 06510, New Haven, Connecticut, USA;Biostatistics Resource, Keck Laboratory, Yale University, 300 George Street, 06510, New Haven, Connecticut, USA;Biostatistics Division, Yale School of Public Health, 300 George Street, 06510, New HavenConnecticut, USA;
关键词: Gene Length;    Read Count;    Digital Gene Expression;    Dinucleotide Frequency;    Bias Pattern;   
DOI  :  10.1186/1471-2105-12-290
 received in 2011-03-06, accepted in 2011-07-19,  发布年份 2011
来源: Springer
PDF
【 摘 要 】

BackgroundHigh throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates.ResultsIn this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level correction methods, our method reduces bias in gene-level expression estimates more effectively.ConclusionsOur method identifies and corrects different sources of biases in gene-level expression measures from RNA-Seq data, and provides more accurate estimates of gene expression levels from RNA-Seq. This method should prove useful in meta-analysis of gene expression levels using different platforms or experimental protocols.

【 授权许可】

Unknown   
© Zheng et al; licensee BioMed Central Ltd. 2011. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311093153632ZK.pdf 716KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  文献评价指标  
  下载次数:3次 浏览次数:0次