期刊论文详细信息
BMC Research Notes
ASAP: an environment for automated preprocessing of sequencing data
Chun Li3  Bingshan Li1  Eric S Torstenson2 
[1] Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, USA;Center for Human Genetics Research, Vanderbilt University, Nashville, USA;Center for Human Genetics Research, Vanderbilt University Medical Center, 519 Light Hall, Nashville, TN, 37212-0700, USA
关键词: Computer cluster;    Automation;    Data processing;    Next-generation sequencing;   
Others  :  1145050
DOI  :  10.1186/1756-0500-6-5
 received in 2012-07-16, accepted in 2012-12-21,  发布年份 2013
【 摘 要 】

Background

Next-generation sequencing (NGS) has yielded an unprecedented amount of data for genetics research. It is a daunting task to process the data from raw sequence reads to variant calls and manually processing this data can significantly delay downstream analysis and increase the possibility for human error. The research community has produced tools to properly prepare sequence data for analysis and established guidelines on how to apply those tools to achieve the best results, however, existing pipeline programs to automate the process through its entirety are either inaccessible to investigators, or web-based and require a certain amount of administrative expertise to set up.

Findings

Advanced Sequence Automated Pipeline (ASAP) was developed to provide a framework for automating the translation of sequencing data into annotated variant calls with the goal of minimizing user involvement without the need for dedicated hardware or administrative rights. ASAP works both on computer clusters and on standalone machines with minimal human involvement and maintains high data integrity, while allowing complete control over the configuration of its component programs. It offers an easy-to-use interface for submitting and tracking jobs as well as resuming failed jobs. It also provides tools for quality checking and for dividing jobs into pieces for maximum throughput.

Conclusions

ASAP provides an environment for building an automated pipeline for NGS data preprocessing. This environment is flexible for use and future development. It is freely available at http://biostat.mc.vanderbilt.edu/ASAP webcite.

【 授权许可】

   
2013 Torstenson et al.; licensee BioMed Central Ltd.

附件列表
Files Size Format View
Figure 1. 54KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25:1754-1760.
  • [2]McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20:1297-1303.
  • [3]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25:2078-2079.
  • [4]Kang HM, Jun G, Sidore C, Li Y, Anderson P, Trost MK, Chen W, Blackwell T, Abecasis G: UMAKE. 2012. http://genome.sph.umich.edu/wiki/UMAKE webcite.
  • [5]Goecks J, Nekrutenko A, Taylor J, Galaxy Team: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11:R86. BioMed Central Full Text
  • [6]Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010, 38:e164.
  • [7]Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25:1105-1111.
  • [8]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010, 28:511-515.
  文献评价指标  
  下载次数:15次 浏览次数:24次