期刊论文详细信息
BMC Bioinformatics
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions
Yongchao Liu1  Adrianto Wirawan1  Bertil Schmidt1 
[1] Institut für Informatik, Johannes Gutenberg Universität Mainz, Mainz, Germany
关键词: Concurrent execution;    PTX SIMD instructions;    GPU;    CUDA;    Smith-Waterman;   
Others  :  1087918
DOI  :  10.1186/1471-2105-14-117
 received in 2012-12-21, accepted in 2013-03-15,  发布年份 2013
PDF
【 摘 要 】

Background

The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases.

Results

We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+.

Conclusions

CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net webcite.

【 授权许可】

   
2013 Liu et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117055214138.pdf 857KB PDF download
Figure 7. 24KB Image download
Figure 6. 32KB Image download
Figure 5. 47KB Image download
Figure 4. 53KB Image download
Figure 3. 38KB Image download
Figure 2. 54KB Image download
Figure 1. 26KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Smith T, Waterman M: Identification of common molecular subsequences. J Mol Biol 1981, 147:195-197.
  • [2]Gotoh O: An improved algorithm for matching biological sequences. J Mol Biol 1982, 162:707-708.
  • [3]Thompson JD, Higgins DG, Gibson TJ: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22:4673-4680.
  • [4]Liu Y, Schmidt B, Maskell DL: MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA. 20th IEEE International Conference on Application-specific Systems, Architectures and Processors; 2009:121-128.
  • [5]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1755-1760.
  • [6]Liu Y, Schmidt B, Maskell DL: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics 2012, 28(14):1830-1837.
  • [7]Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. USA 1988, 85(8):2444-2448.
  • [8]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basical local alignment search tool. J Mol Biol 1990, 215(3):403-410.
  • [9]Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
  • [10]Qiu J, Ekanayake J, Gunarathne T, Choi JY, Bae SH, Li H, Zhang B, Wu TL, Ruan Y, Ekanayake S, Hughes A, Fox G: Hybrid cloud and cluster computing paradigms for life science applications. BMC Bioinformatics 2010, 11(Suppl12):S3.
  • [11]Liu Y, Maskell DL, Schmidt B: CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Research Notes 2009, 2:73. BioMed Central Full Text
  • [12]Oliver T, Schmidt B, Nathan D, Clemens R, Maskell D: Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW. Bioinformatics 2005, 21(16):3431-3432.
  • [13]Oliver T, Schmidt B, Maskell DL: Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans. Circuit Syst. II 2005, 52:851-855.
  • [14]Li TI, Shum W, Truong K: 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinformatics 2007, 8:I85. BioMed Central Full Text
  • [15]Wozniak A: Using video-oriented instructions to speed up sequence comparison. Comput Appl Biosci 1997, 13(2):145-150.
  • [16]Rognes T, Seeberg E: Six-fold speedup of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 2000, 16(8):699-706.
  • [17]Farrar M: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 2007, 23(2):156-161.
  • [18]Alpern B, Carter L, Gatlin KS: Microparallelism and high performance protein matching. USA: Proceedings of the 1995 ACM/IEEE Supercomputing Conference; 1995.
  • [19]Rognes T: Faster Smith-Waterman database searches with inter-sequence SIMD parallelization. BMC Bioinformatics 2011, 12:221. BioMed Central Full Text
  • [20]Wirawan A, Kwoh CK, Hieu NT, Schmidt B: CBESW: Sequence Alignment on Playstation 3. BMC Bioinformatics 2008, 9:377. BioMed Central Full Text
  • [21]Szalkowski A, Ledergerber C, Krahenbuhl P, Dessimoz C: SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2. BMC Research Notes 2008, 1:107. BioMed Central Full Text
  • [22]Farrar MS: Optimizing Smith-Waterman for the Cell broadband engine. http://cudasw.sourceforge.net/sw-cellbe.pdf webcite
  • [23]Liu W, Schmidt B, Voss G, Muller-Wittig W: Streaming algorithms for biological sequence alignment on GPUs. IEEE Trans Parallel Distr Syst 2007, 18(9):1270-1281.
  • [24]Manavski SA, Valle G: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics 2008, 9(Suppl2):S10.
  • [25]Ligowski L, Rudnicki W: An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. IEEE International Symposium on Parallel & Distributed Processing 2009, 2009:1-8.
  • [26]Liu Y, Schmidt B, Maskel DL: CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Research Notes 2010, 3:93. BioMed Central Full Text
  • [27]Khajeh-Saeed A, Poole S, Perot J: Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors. J Comput Phys 2010, 229:4247-4258.
  • [28]Blazewicz J, Frohmberg W, Kierzynka M, Pesch E, Wojciechowski P: Protein alignment algorithms with an efficient backtracking routine on multiple GPUs. BMC Bioinformatics 2011, 12:181. BioMed Central Full Text
  • [29]Hains D, Cashero Z, Ottenberg M, Bohm W, Rajopadhye S: Improving CUDASW++, a parallelization of Smith-Waterman for CUDA enabled devices. IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum 2011, 2011:490-501.
  • [30]Alachiotis N, Berger SA, Stamatakis A: Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel. BMC Bioinformatics 2012, 13:196. BioMed Central Full Text
  • [31]Liu CM, Wong T, Wu E, Luo R, Yiu SM, Li Y, Wang B, Yu C, Chu X, Zhao K, Li R, Lam TW: SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 2011, 28(6):878-789.
  • [32]Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinformatics 2009, 10:421. BioMed Central Full Text
  • [33]Lindholm E, Nickolls J, Oberman S, Montrym J: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 2008, 28(2):39-55.
  • [34]NVIDIA: NVIDIA’s next generation CUDA compute architecture: Fermi. USA: NVIDIA Corporation Whitepaper; 2009.
  • [35]NVIDIA: NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110. USA: NVIDIA Corporation Whitepaper; 2012.
  • [36]NVIDIA: Tuning CUDA Applications for Kepler. http://docs.nvidia.com/cuda/kepler-tuning-guide/index.html webcite
  • [37]NVIDIA: Parallel thread execution ISA version 3.1. USA: NVIDIA Corporation Whitepaper; 2012.
  文献评价指标  
  下载次数:111次 浏览次数:11次