期刊论文详细信息
BMC Genomics
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing
Software
Kurt Prenger1  Thomas Messina1  Hongtao Fan2  Susan Stephens2  Shanrong Zhao2  Lance Smith3  Edward Jaeger3 
[1] Application Services Research & Development, Johnson & Johnson Services, Inc, New Brunswick, NJ, USA;Systems Pharmacology and Biomarkers, Janssen Research & Development, LLC, 3210 Merryfield Row, 92121, San Diego, CA, USA;Translational Informatics IT, Janssen Research & Development, LLC, 3210 Merryfield Row, 92121, San Diego, CA, USA;
关键词: Cloud computing;    Whole genome sequencing;    Single nucleotide polymorphism;    SNP;    Next generation sequencing;    Software;   
DOI  :  10.1186/1471-2164-14-425
 received in 2013-01-08, accepted in 2013-06-14,  发布年份 2013
来源: Springer
PDF
【 摘 要 】

BackgroundTechnical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses.ResultsHere, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies.ConclusionsRainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.

【 授权许可】

CC BY   
© Zhao et al.; licensee BioMed Central Ltd. 2013

【 预 览 】
附件列表
Files Size Format View
RO202311091264452ZK.pdf 669KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  文献评价指标  
  下载次数:3次 浏览次数:1次