期刊论文详细信息
BMC Bioinformatics
CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU
Research Article
Hanyu Jiang1  Narayan Ganesan1 
[1] Department of Elec. and Comp. Engg, Stevens Institute of Technology, 07030, Hoboken, NJ, USA;
关键词: SIMT;    SIMD;    CUDA;    Hidden Markov model;    Parallelism;    Single segment Viterbi;    Multiple segment Viterbi;    Viterbi;   
DOI  :  10.1186/s12859-016-0946-4
 received in 2015-08-12, accepted in 2016-02-15,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundHMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors.ResultsA Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance.ConclusionsCUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.

【 授权许可】

CC BY   
© Jiang and Ganesan. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311098126111ZK.pdf 2374KB PDF download
12864_2017_4258_Article_IEq3.gif 1KB Image download
12887_2016_742_Article_IEq3.gif 1KB Image download
12864_2015_2055_Article_IEq77.gif 1KB Image download
12864_2017_3500_Article_IEq19.gif 1KB Image download
【 图 表 】

12864_2017_3500_Article_IEq19.gif

12864_2015_2055_Article_IEq77.gif

12887_2016_742_Article_IEq3.gif

12864_2017_4258_Article_IEq3.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  文献评价指标  
  下载次数:11次 浏览次数:3次