期刊论文详细信息
BMC Bioinformatics
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
Christopher R Cabanski5  Keary Cavin3  Chris Bizon3  Matthew D Wilkerson1  Joel S Parker4  Kirk C Wilhelmsen4  Charles M Perou4  JS Marron1  D Neil Hayes2 
[1] Lineberger Comprehensive Cancer Center, Chapel Hill, NC, USA
[2] Department of Internal Medicine, Division of Medical Oncology, Multidisciplinary Thoracic Oncology Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
[3] Renaissance Computing Center, Chapel Hill, NC, USA
[4] Department of Genetics, Chapel Hill, NC, USA
[5] Department of Statistics and Operations Research, Chapel Hill, NC, USA
关键词: Bioconductor;    Bioinformatics;    Recalibration;    Quality score;    Next-generation sequencing;   
Others  :  1088149
DOI  :  10.1186/1471-2105-13-221
 received in 2012-05-04, accepted in 2012-08-22,  发布年份 2012
PDF
【 摘 要 】

Background

Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results.

Results

Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy.

Conclusion

ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.

【 授权许可】

   
2012 Cabanski et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117081121357.pdf 624KB PDF download
Figure 3 . 66KB Image download
Figure 2 . 29KB Image download
Figure 1 . 68KB Image download
【 图 表 】

Figure 1 .

Figure 2 .

Figure 3 .

【 参考文献 】
  • [1]Lamlertthon W, Hayward MC, Hayes DN: Emerging technologies for improved stratification of cancer patients: a review of opportunities, challenges, and tools. Cancer J 2011, 17:451-464.
  • [2]Zhang J, Chiodini R, Badr A, Zhang G: The impact of next-generation sequencing on genomics. J Genet Genomics 2011, 38:95-109.
  • [3]DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011, 43:491-498.
  • [4]Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J: SNP detection for massively parallel whole-genome resequencing. Genome Res 2009, 19:1124-1132.
  • [5]Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 2011, 12:443-451.
  • [6]Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8:186-194.
  • [7]Li H: Improving SNP Discovery by base alignment quality. Bioinformatics 2011, 27:1157-1158.
  • [8]Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5:R80. BioMed Central Full Text
  • [9]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/Map format and SAMtools. Bioinformatics 2009, 25:2078-2079.
  • [10]Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 2008, 36:e105.
  • [11]Clark MJ, Homer N, O’Connor BD, Chen Z, Eskin A, Lee H, Merriman B, Nelson SF: U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line. PLoS Genet 2010, 6:e1000832.
  • [12]Sherry ST, Ward M, Sirotkin K: dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 1999, 9:677-679.
  • [13]Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, Liu Y, Weinstock GM, Wheeler DA, Gibbs RA, Yu F: A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res 2010, 20:273-280.
  • [14]Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol 2011, 29:24-26.
  • [15]GATK base quality score recalibration. http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration webcite
  • [16]Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 2010, 38:e178.
  文献评价指标  
  下载次数:53次 浏览次数:16次