期刊论文详细信息
BMC Bioinformatics
Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline
Jeffrey G Reid4  Andrew Carroll1  Narayanan Veeraraghavan4  Mahmoud Dahdouli4  Andreas Sundquist1  Adam English4  Matthew Bainbridge4  Simon White4  William Salerno4  Christian Buhay4  Fuli Yu3  Donna Muzny4  Richard Daly1  Geoff Duyk1  Richard A Gibbs3  Eric Boerwinkle2 
[1] DNAnexus, Mountain View, CA 94040, USA
[2] Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
[3] Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
[4] Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
关键词: Cloud computing;    Clinical sequencing;    Annotation;    Variant calling;    NGS data;   
Others  :  1087637
DOI  :  10.1186/1471-2105-15-30
 received in 2013-09-17, accepted in 2014-01-20,  发布年份 2014
PDF
【 摘 要 】

Background

Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results.

Results

To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts.

Conclusions

By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

【 授权许可】

   
2014 Reid et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117024643324.pdf 640KB PDF download
Figure 2. 76KB Image download
Figure 1. 31KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet 2010, 11(1):31-46.
  • [2]Bainbridge MN, et al.: Whole-genome sequencing for optimized patient management. Sci Transl Med 2011, 3(87):87re3.
  • [3]Cancer Genome Atlas Research, N: Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474(7353):609-615.
  • [4]Wheeler DA, et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 2008, 452(7189):872-6.
  • [5]Challis D, et al.: An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinforma 2012, 13:8. BioMed Central Full Text
  • [6]O’Driscoll A, Daugelaite J, Sleator RD: Big data’, Hadoop and cloud computing in genomics. J Biomed Inform 2013, 46(5):774-81.
  • [7]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-60.
  • [8]Li H, Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010, 26(5):589-95.
  • [9]Li H, et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-9.
  • [10]DePristo MA, et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011, 43(5):491-8.
  • [11]Shen Y, et al.: A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res 2010, 20(2):273-80.
  • [12]Cohorts H, et al.: Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nat Genet 2013, 45(8):899-901.
  • [13]Goecks J, et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11(8):R86. BioMed Central Full Text
  • [14]Blankenberg D, et al.: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol 2010, 19:1-21. p. Unit 19.10
  • [15]Giardine B, et al.: Galaxy: a platform for interactive large-scale genome analysis. Genom Res 2005, 15(10):1451-5.
  • [16]Kallio M, et al.: Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics 2011, 12:507. BioMed Central Full Text
  • [17]Ovaska K, et al.: Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2010, 2(9):65. BioMed Central Full Text
  • [18]Agrawal N, et al.: Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 2011, 333(6046):1154-7.
  • [19]Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010, 38(16):e164.
  • [20]Lupski JR, et al.: Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy. Genome Med 2013, 5(6):57.
  • [21]Liu X, Jian X, Boerwinkle E: dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 2011, 32(8):894-899.
  • [22]Lupski JR, et al.: Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med 2010, 362(13):1181-91.
  文献评价指标  
  下载次数:40次 浏览次数:23次