期刊论文详细信息
BMC Bioinformatics
Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
Nikolaos Alachiotis1  Simon A Berger1  Alexandros Stamatakis1 
[1] The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
关键词: GPU;    SIMT;    SIMD;    SSE;    OpenCL;    PaPaRa;    Dynamic programming;    Alignment kernel;   
Others  :  1088174
DOI  :  10.1186/1471-2105-13-196
 received in 2011-12-19, accepted in 2012-07-11,  发布年份 2012
PDF
【 摘 要 】

Background

Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel.

Results

We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS.

Conclusions

This accelerated version of PaPaRa (available athttp://www.exelixis-lab.org/software.html webcite) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms.

【 授权许可】

   
2012 Alachiotis et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117082747886.pdf 347KB PDF download
Figure 5. 17KB Image download
Figure 4. 43KB Image download
Figure 3. 38KB Image download
Figure 2. 38KB Image download
Figure 1. 28KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Berger SA, Stamatakis A: Aligning short Reads to Reference Alignments and Trees. Bioinformatics (Oxford, England) 2011, btr320. [ http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr320v1 webcite]
  • [2]Eddy S: Profile hidden Markov models. Bioinformatics 1998, 14(9):755-763.
  • [3]Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 2004, 32(5):1792-1797.
  • [4]Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 2005, 33:511-518.
  • [5]Alpern B, Carter L, Su Gatlin K: Microparallelism and high-performance protein matching. New York: ACM Press; 1995. [ http://portal.acm.org/ft_gateway.cfm?id=224222&type=html webcite]
  • [6]Wozniak A: Using video-oriented instructions to speed up sequence comparison. Bioinformatics 1997, 13(2):145-150. [ http://bioinformatics.oxfordjournals.org/cgi/content/abstract/13/2/145 webcite]
  • [7]Rognes T, Seeberg E: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 2000, 16(8):699-706. [ http://bioinformatics.oxfordjournals.org/cgi/content/abstract/16/8/699 webcite]
  • [8]Farrar M: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics (Oxford, England) 2007, 23(2):156-61. [ http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/2/156 webcite]
  • [9]Rognes Tr: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC bioinformatics 2011., 12(221) [ http://www.biomedcentral.com/1471-2105/12/221 webcite]
  • [10]Wiguo L, Schmidt B, Voss G, Muller-Wittig W: Streaming Algorithms for Biological Sequence Alignment on GPUs. IEEE Trans Parallel Distributed Syst 2007, 18(9):1270-1281. [ http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4288126 webcite]
  • [11]Manavski SA, Valle G: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC bioinformatics 2008, 9(Suppl 2):S10. [ http://www.biomedcentral.com/1471-2105/9/S2/S10 webcite]
  • [12]Liu Y, Maskell DL, Schmidt B: CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Res Notes 2009, 2:73. [ http://www.biomedcentral.com/1756-0500/2/73 webcite]
  • [13]Liu Y, Schmidt B, Maskell DL: CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Res Notes 2010, 3:93. [ http://www.biomedcentral.com/1756-0500/3/93 webcite]
  • [14]Alachiotis N, Berger SA, Stamatakis A: Accelerating Phylogeny-Aware Short DNA Read Alignment with FPGAs. In 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, Salt Lake City, USA. New York: IEEE; 2011:226-233. [ http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5771278 webcite]
  • [15]Alachiotis N, Stamatakis A: FPGA Acceleration of the Phylogenetic Parsimony Kernel? Int Conference Field Programmable Logic App 2011, 0:417-422.
  • [16]Fitch W, Margoliash E: Construction of phylogenetic trees. Science 1967, 155(3760):279-284.
  • [17]Sankoff D: Minimal mutation trees of sequences. SIAM J Appl Math 1975, 28:35-42.
  • [18]Karow J: Survey: Illumina, SOLiD, and 454 Gain Ground in Research Labs; Most Users Mull Additional Purchases. 2010. [ http: //www.genomeweb.com/sequencing/survey-illumina-solid-and-454-gain-ground-research-labs-most-users-mull-addition webcite]
  • [19]Mirarab S, Nguyen N, Warnow T: SEPP: SATe-Enabled Phylogenetic Placement. 2011. [ http://www.cs.utexas.edu/tandy/warnow-psb2012.pdf webcite]
  文献评价指标  
  下载次数:169次 浏览次数:77次