期刊论文详细信息
BMC Bioinformatics
ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
Christina Otto2  Mathias Möhl2  Steffen Heyne4  Mika Amit6  Gad M Landau3  Rolf Backofen5  Sebastian Will1 
[1] Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany
[2] Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany
[3] Department of Computer Science and Engineering, NYU-Poly, Brooklyn, NY, USA
[4] Max Planck Institute of Immunobiology and Epigenetics, Stuebeweg 51, Freiburg 79108, Germany
[5] Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C DK-1870, Denmark
[6] Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel
关键词: Sparsification;    Structure-based comparison of RNA;    RNA bioinformatics;   
Others  :  1089048
DOI  :  10.1186/s12859-014-0404-0
 received in 2014-08-04, accepted in 2014-12-01,  发布年份 2014
PDF
【 摘 要 】

Background

Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable.

Results

The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P webcite.

Conclusions

ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications.

【 授权许可】

   
2014 Otto et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150124011612794.pdf 1879KB PDF download
Figure 7. 44KB Image download
Figure 6. 41KB Image download
Figure 5. 151KB Image download
Figure 4. 58KB Image download
Figure 3. 39KB Image download
Figure 2. 35KB Image download
Figure 1. 75KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]The FANTOM Consortium: The transcriptional landscape of the mammalian genome Science 2005, 309(5740):1559-63.
  • [2]Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 2005, 308:1149-1154.
  • [3]Bertone P, Stoc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M: Global identification of human transcribed sequences with genome tiling arrays. Science 2004, 306:2242-2246.
  • [4]The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome Nature 2012, 489(7414):57-74.
  • [5]Mattick JS, Taft RJ, Faulkner GJ: A global view of genomic information - moving beyond the gene and the master regulator. Trends Genet 2010, 26(1):21-8.
  • [6]Bompfünewerer Consortium AF, Backofen R, Bernhart SH, Flamm C, Fried C, Fritzsch G, Hackermüller J, Hertel J, Hofacker IL, Missal K, Mosig A, Prohaska SJ, Rose D, Stadler PF, Tanzer A, Washietl S: Will S: RNAs everywhere: genome-wide annotation of structured RNAs. J Exp Zoolog B Mol Dev Evol 2007, 308(1):1-25.
  • [7]Smith MA, Gesell T, Stadler PF, Mattick JS: Widespread purifying selection on RNA structure in mammals. Nucleic Acids Res 2013, 41(17):8220-36. doi:10.1093/nar/gkt596
  • [8]Rivas E, Eddy SR: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2001, 2(1):8. BioMed Central Full Text
  • [9]Coventry A, Kleitman DJ, Berger B: MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc Natl Acad Sci USA 2004, 101(33):12102-7.
  • [10]Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D: Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. PLoS Comput Biol 2006, 2(4):33.
  • [11]Washietl S, Hofacker IL: Identifying structural noncoding RNAs using RNAz. Curr Protoc Bioinformatics 2007, 19:12.7.1-12.7.18.
  • [12]Will S, Yu M, Berger B: Structure-based whole-genome realignment reveals many novel noncoding RNAs. Genome Res 2013, 23(6):1018-1027.
  • [13]Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R: Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Comput Biol 2007, 3(4):65.
  • [14]Kaczkowski B, Torarinsson E, Reiche K, Havgaard JH, Stadler PF, Gorodkin J: Structural profiles of human miRNA families from pairwise clustering. Bioinformatics 2009, 25(3):291-4.
  • [15]Parker BJ, Moltke I, Roth A, Washietl S, Wen J, Kellis M, Breaker R, Pedersen JS: New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res 2011, 21(11):1929-43. doi:10.1101/gr.112516.110
  • [16]Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, Eddy SR, Gardner PP, Bateman A: Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 2013, 41(Database issue):226-32.
  • [17]Sankoff D: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 1985, 45(5):810-825.
  • [18]Gorodkin J, Heyer L, Stormo G: Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res 1997, 25(18):3724-32.
  • [19]Mathews DH, Turner DH: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 2002, 317(2):191-203.
  • [20]Holmes I: Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 2005, 6:73. doi:10.1186/1471-2105-6-73 BioMed Central Full Text
  • [21]Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 2005, 21(9):1815-24.
  • [22]Dowell RD, Eddy SR: Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 2006, 7:400. BioMed Central Full Text
  • [23]Bradley RK, Pachter L, Holmes I: Specific alignment of structured RNA: stochastic grammars and sequence annealing. Bioinformatics 2008, 24(23):2677-83.
  • [24]Harmanci AO, Sharma G, Mathews DH: Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics 2007, 8:130. doi:10.1186/1471-2105-8-130 BioMed Central Full Text
  • [25]Hofacker IL, Bernhart SH, Stadler PF: Alignment of RNA base pairing probability matrices. Bioinformatics 2004, 20(14):2222-7.
  • [26]McCaskill JS: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990, 29(6-7):1105-19.
  • [27]Torarinsson E, Havgaard JH, Gorodkin J: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 2007, 23(8):926-32.
  • [28]Bauer M, Klau GW, Reinert K: Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics 2007, 8:271. BioMed Central Full Text
  • [29]Do CB, Foo C-S, Batzoglou S: A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics 2008, 24(13):68-76.
  • [30]Heyne S, Will S, Beckstette M, Backofen R: Lightweight comparison of RNAs based on exact sequence-structure matches. Bioinformatics 2009, 25(16):2095-2102.
  • [31]Backofen R, Siebert S: Fast detection of common sequence structure patterns in RNAs. J Discrete Algorithms 2007, 5(2):212-228.
  • [32]Höchsmann M, Töller T, Giegerich R, Kurtz S: Local similarity in RNA secondary structures. In Proceedings of Computational Systems Bioinformatics (CSB 2003). Volume 2 . IEEE Computer Society, Washington; 2003:159-168.
  • [33]Siebert S, Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 2005, 21(16):3352-9.
  • [34]Amit M, Backofen R, Heyne S, Landau G. M, Möhl M, Otto C, Will S: Local exact pattern matching for non-fixed RNA structures. IEEE/ACM Trans Comput Biol Bioinformatics 2014, 11:1-12.
  • [35]Schmiedl C, Möhl M, Heyne S, Amit M, Landau G. M, Will S, Backofen R: Exact pattern matching for RNA structure ensembles. In Proceedings of the 16th International Conference on Research in Computational Molecular Biology (RECOMB 2012). LNCS, Volume 7262 . Springer, Berlin Heidelberg; 2012:245-260.
  • [36]Wexler Y, Zilberstein C, Ziv-Ukelson M: A study of accessible motifs and RNA folding complexity. J Comput Biol 2007, 14(6):856-72.
  • [37]Ziv-Ukelson M, Gat-Viks I, Wexler Y, Shamir R: A faster algorithm for RNA co-folding. In WABI 2008. Lecture Notes in Computer Science. Volume 5251 . Edited by Crandall KA, Lagergren J. Springer, Berlin Heidelberg; 2008:174-185.
  • [38]Backofen R, Tsur D, Zakov S, Ziv-Ukelson M: Sparse RNA folding: Time and space efficient algorithms. In Proc. 20th Symp. Combinatorial Pattern Matching. LNCS, Volume 5577 . Edited by Kucherov G, Ukkonen E. Springer, Berlin Heidelberg; 2009:249-262.
  • [39]Time and space efficient RNA-RNA interaction prediction via sparse folding . Springer, Berlin Heidelberg; 2010.
  • [40]Will S, Joshi T, Hofacker IL, Stadler PF, Backofen R: LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs. RNA 2012, 18(5):900-14.
  • [41]Backofen R, Will S: Local sequence-structure motifs in RNA. J Bioinformatics Comput Biol (JBCB) 2004, 2(4):681-698.
  • [42]Otto W, Will S, Backofen R: Structure local multiple alignment of RNA. In Proceedings of German Conference on Bioinformatics (GCB’2008). Lecture Notes in Informatics (LNI), Volume P-136 . Bonn, Gesellschaft für Informatik (GI); 2008:178-188.
  • [43]Bompfünewerer AF, Backofen R, Bernhart SH, Hertel J, Hofacker IL, Stadler PF, Will S: Variations on RNA folding and alignment: lessons from Benasque. J Math Biol 2008, 56(1-2):129-144.
  • [44]Wilm A, Mainz I, Steger G: An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol 2006, 1:19. BioMed Central Full Text
  • [45]Gardner PP, Wilm A, Washietl S: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 2005, 33(8):2433-9.
  • [46]Cleveland WS: Lowess: A program for smoothing scatterplots by robust locally weighted regression. Am Stat 1981, 35:(54).
  文献评价指标  
  下载次数:116次 浏览次数:14次