期刊论文详细信息
BMC Bioinformatics
Accelerating genomic workflows using NVIDIA Parabricks
Research
Thad B. Carlson1  Josh J. Catana2  Haley T. Engelken2  Kyle A. O’Connell2  Collin J. Lobb2  Laura M. Gorrell2  Zelaikha B. Yosufzai2  Ross A. Campbell2  Juergen A. Klenk2  Dina Mikdadi2  Vivien R. Bonazzi2 
[1] Cloud Managed Services, Deloitte Consulting LLP, 48226, Detroit, MI, USA;Health Data and AI, Deloitte Consulting LLP, 22009, Arlington, VA, USA;
关键词: GPU acceleration;    NVIDIA Parabricks;    Cloud computing;    Amazon Web Services;    Google Cloud Platform;   
DOI  :  10.1186/s12859-023-05292-2
 received in 2022-07-20, accepted in 2023-04-15,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

BackgroundAs genome sequencing becomes better integrated into scientific research, government policy, and personalized medicine, the primary challenge for researchers is shifting from generating raw data to analyzing these vast datasets. Although much work has been done to reduce compute times using various configurations of traditional CPU computing infrastructures, Graphics Processing Units (GPUs) offer opportunities to accelerate genomic workflows by orders of magnitude. Here we benchmark one GPU-accelerated software suite called NVIDIA Parabricks on Amazon Web Services (AWS), Google Cloud Platform (GCP), and an NVIDIA DGX cluster. We benchmarked six variant calling pipelines, including two germline callers (HaplotypeCaller and DeepVariant) and four somatic callers (Mutect2, Muse, LoFreq, SomaticSniper).ResultsWe achieved up to 65 × acceleration with germline variant callers, bringing HaplotypeCaller runtimes down from 36 h to 33 min on AWS, 35 min on GCP, and 24 min on the NVIDIA DGX. Somatic callers exhibited more variation between the number of GPUs and computing platforms. On cloud platforms, GPU-accelerated germline callers resulted in cost savings compared with CPU runs, whereas some somatic callers were more expensive than CPU runs because their GPU acceleration was not sufficient to overcome the increased GPU cost.ConclusionsGermline variant callers scaled well with the number of GPUs across platforms, whereas somatic variant callers exhibited more variation in the number of GPUs with the fastest runtimes, suggesting that, at least with the version of Parabricks used here, these workflows are less GPU optimized and require benchmarking on the platform of choice before being deployed at production scales. Our study demonstrates that GPUs can be used to greatly accelerate genomic workflows, thus bringing closer to grasp urgent societal advances in the areas of biosurveillance and personalized medicine.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202309078710172ZK.pdf 1185KB PDF download
MediaObjects/12888_2023_4874_MOESM2_ESM.docx 22KB Other download
MediaObjects/12888_2023_4846_MOESM1_ESM.docx 54KB Other download
42004_2023_919_Article_IEq168.gif 1KB Image download
Fig. 4 1153KB Image download
Fig. 8 728KB Image download
Fig. 2 592KB Image download
【 图 表 】

Fig. 2

Fig. 8

Fig. 4

42004_2023_919_Article_IEq168.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  文献评价指标  
  下载次数:6次 浏览次数:0次