期刊论文详细信息
BMC Genomics
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing
Masao Nagasaki1  Takahiro Mimori1  Mamoru Takahashi1  Yosuke Kawai1  Yumi Yamaguchi-Kabata1  Naoki Nariai1  Kaname Kojima1  Yukuto Sato1 
[1] Department of Integrative Genomics, Tohoku Medial Megabank Organization, Tohoku University, 2–1 Seiryo-machi, Aoba-ku Sendai, Miyagi, 980-8573, Japan
关键词: NGS;    MiSeq;    Illumina HiSeq;    Data cleaning;    Automated analysis;   
Others  :  1216285
DOI  :  10.1186/1471-2164-15-664
 received in 2014-04-16, accepted in 2014-08-04,  发布年份 2014
PDF
【 摘 要 】

Background

Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics.

Results

We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved.

Conclusion

The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

【 授权许可】

   
2014 Sato et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150629211314192.pdf 1465KB PDF download
Figure 1. 95KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Dolan PC, Denver DR: TileQC: a system for tile-based quality control of Solexa data. BMC Bioinformatics 2008, 9:250. doi:10.1186/1471-2105-9-250 BioMed Central Full Text
  • [2]Cox MP, Peterson DA, Biggs PJ: SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 2010, 11:485. doi:10.1186/1471-2105-11-485 BioMed Central Full Text
  • [3]Yang X, Liu D, Liu F, Wu J, Zou J, Xiao X, Zhao F, Zhu B: HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics 2013, 14:33. doi:10.1186/1471-2105-14-33 BioMed Central Full Text
  • [4]Li B, Zhan X, Wing MK, Anderson P, Kang HM, Abecasis GR: QPLOT: a quality assessment tool for next generation sequencing data. Biomed Res Int 2013, 2013:865181. doi:10.1155/2013/865181
  • [5]FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ webcite
  • [6]Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 2010, 38(6):1767-1771. doi:10.1093/nar/gkp1137
  • [7]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G: Durbin R; 1000 genome project data processing subgroup. the sequence alignment/map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079. doi:10.1093/bioinformatics/btp352
  • [8]Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. accuracy assessment. Genome Res 1998, 8(3):175-185. doi:10.1101/gr.8.3.175
  • [9]Sheikh MA, Erlich Y: Base-calling for bioinformaticians. In Bioinformatics for High Throughput Sequencing. Edited by Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM. New York, USA: Springer; 2012:67-84.
  • [10]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-1760. doi:10.1093/bioinformatics/btp324
  • [11]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9(4):357-359. doi:10.1038/nmeth.1923
  文献评价指标  
  下载次数:25次 浏览次数:15次