Algorithms for Molecular Biology | |
Faster algorithms for RNA-folding using the Four-Russians method | |
Balaji Venkatachalam1  Dan Gusfield1  Yelena Frid1  | |
[1] Department of Computer Science, University of California, Davis, 1 Shields Ave, Davis, CA, USA | |
关键词: GPU; Parallel algorithm; Algorithms; CUDA; Four-Russians; RNA-folding; | |
Others : 793016 DOI : 10.1186/1748-7188-9-5 |
|
received in 2013-12-04, accepted in 2014-02-18, 发布年份 2014 | |
【 摘 要 】
Background
The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n3) time using Nussinov’s dynamic programming algorithm. The Four-Russians method is a technique that reduces the running time for certain dynamic programming algorithms by a multiplicative factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated and solved. Frid and Gusfield designed an View MathML"> algorithm for RNA folding using the Four-Russians technique. In their algorithm the preprocessing is interleaved with the algorithm computation.
Theoretical results
We simplify the algorithm and the analysis by doing the preprocessing once prior to the algorithm computation. We call this the two-vector method. We also show variants where instead of exhaustive preprocessing, we only solve the subproblems encountered in the main algorithm once and memoize the results. We give a simple proof of correctness and explore the practical advantages over the earlier method.
The Nussinov algorithm admits an O(n2) time parallel algorithm. We show a parallel algorithm using the two-vector idea that improves the time bound to View MathML">.
Practical results
We have implemented the parallel algorithm on graphics processing units using the CUDA platform. We discuss the organization of the data structures to exploit coalesced memory access for fast running times. The ideas to organize the data structures also help in improving the running time of the serial algorithms. For sequences of length up to 6000 bases the parallel algorithm takes only about 2.5 seconds and the two-vector serial method takes about 57 seconds on a desktop and 15 seconds on a server. Among the serial algorithms, the two-vector and memoized versions are faster than the Frid-Gusfield algorithm by a factor of 3, and are faster than Nussinov by up to a factor of 20. The source-code for the algorithms is available at http://github.com/ijalabv/FourRussiansRNAFolding webcite.
【 授权许可】
2014 Venkatachalam et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140705042524873.pdf | 724KB | download | |
Figure 3. | 47KB | Image | download |
Figure 2. | 47KB | Image | download |
Figure 1. | 40KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
【 参考文献 】
- [1]Akutsu T: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. J Comb Optim 1999, 3(2–3):321-336.
- [2]Zakov S, Tsur D, Ziv-Ukelson M: Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach. WABI 2010, 65-77.
- [3]Arlazarov V, Dinic E, Kronrod M, Faradzev I: On economical construction of the transitive closure of a directed graph (in Russian). Dokl Akad Nauk 1970, 194(11):1209-1210.
- [4]Frid Y, Gusfield D: A simple, practical and complete O(n3)-time algorithm for RNA folding using the Four-Russians Speedup. Algorithms Mol Biol 2010, 5:13. BioMed Central Full Text
- [5]Chang DJ, Kimmer C, Ouyang M: Accelerating the Nussinov RNA folding algorithm with CUDA/GPU. In ISSPIT. IEEE; 2010:120-125.
- [6]Rizk G, Lavenier D: GPU accelerated RNA folding algorithm. In International Conference on, Computational Science, vol. 5544. Edited by Gabrielle A, Nabrzyski J, Seidel E, Albada GD, Dongarra J, Sloot PMA. Berlin Heidelberg: Springer; 2009:1004-1013.
- [7]Stojanovski M, Gjorgjevikj D, Madjarov G: Parallelization of dynamic programming in Nussinov RNA folding algorithm on the CUDA GPU. In ICT Innovations, vol. 150. Edited by Kocarev L. Berlin Heidelberg: Springer; 2011:279-289.
- [8]Nussinov R, Pieczenik G, Griggs JR, Kleitman DJ: Algorithms for loop matchings. SIAM J Appl Math 1978, 35:68-82.
- [9]Nussinov R, Jacobson AB: Fast algorithm for predicting the secondary structure of single-stranded RNA. PNAS 1980, 77(11):6309-6313.
- [10]Zuker M, Stiegler P: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 1981, 9:133-148.
- [11]Tinoco I, Borer PN, Dengler B, Levine MD, Uhlenbeck OC, Crothers DM, Gralla J: Improved estimation of secondary structure in ribonucleic-acids. Nat-New Biol 1973, 246(150):40-41.
- [12]Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M Turner: Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. PNAS 2004, 101(19):7287-7292.
- [13]Markham NR, Zuker M: UNAFold. Bioinformatics 2008, 453:3-31.
- [14]Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31(13):3406-3415.
- [15]Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res 2003, 31(13):3429-3431.
- [16]Reuter J, Mathews D: RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 2010, 11:129. BioMed Central Full Text
- [17]Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids:Cambridge University Press; 1998
- [18]Dowell R, Eddy S: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 2004, 5:71. BioMed Central Full Text
- [19]Lu ZJ, Gloor JW, Mathews DH: Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 2009, 15(10):1805-1813.
- [20]Knudsen B, Hein J: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 2003, 31(13):3423-3428.
- [21]Do CB, Woods DA, Batzoglou S: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 2006, 22(14):e90-e98.
- [22]Ding Y, Lawrence CE: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res 2003, 31(24):7280-7301.
- [23]Hamada M, Kiryu H, Sato K, Mituyama T, Asai K: Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 2009, 25(4):465-473.
- [24]Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP: Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 2007, 23(13):i19-i28.
- [25]Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP: Computational approaches for RNA energy parameter estimation. RNA 2010, 16(12):2304-2318.
- [26]Zakov S, Goldberg Y, Elhadad M, Ziv-Ukelson M: Rich parameterization improves RNA, structure prediction. In Research in Computational Molecular Biology (RECOMB), Volume 6577, Lecture Notes in, Computer Science. Edited by Bafna V, Sahinalp SC. Springer; 2011:546-562.
- [27]Wexler Y, Zilberstein CBZ, Ziv-Ukelson M: A study of accessible motifs and RNA folding complexity. J Comput Biol 2007, 14(6):856-872.
- [28]Backofen R, Tsur D, Zakov S, Ziv-Ukelson M: Sparse RNA folding: time and space efficient algorithms. J Discrete Algorithms 2011, 9:12-31.
- [29]Venkatachalam B, Frid Y, Gusfield D: Faster algorithms for RNA-folding using the Four-Russians method. UC Davis Technical report 2013
- [30]Frid Y, Gusfield D: A worst-case and practical speedup for the RNA co-folding problem using the Four-Russians idea. WABI 2010, 1-12.
- [31]Frid Y, Gusfield D: Speedup of RNA pseudoknotted secondary structure recurrence computation with the Four-Russians method. In COCOA, vol. 7402 Edited by Lin G. 2012, 176-187.