期刊论文详细信息
Algorithms for Molecular Biology
On the combinatorics of sparsification
Fenix WD Huang1  Christian M Reidys1 
[1] Department of Mathematic and Computer science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
关键词: Dynamic programming;    Generating function;    Sparsification;   
Others  :  794889
DOI  :  10.1186/1748-7188-7-28
 received in 2011-12-31, accepted in 2012-10-11,  发布年份 2012
PDF
【 摘 要 】

Background

We study the sparsification of dynamic programming based on folding algorithms of RNA structures. Sparsification is a method that improves significantly the computation of minimum free energy (mfe) RNA structures.

Results

We provide a quantitative analysis of the sparsification of a particular decomposition rule, Λ. This rule splits an interval of RNA secondary and pseudoknot structures of fixed topological genus. Key for quantifying sparsifications is the size of the so called candidate sets. Here we assume mfe-structures to be specifically distributed (see Assumption 1) within arbitrary and irreducible RNA secondary and pseudoknot structures of fixed topological genus. We then present a combinatorial framework which allows by means of probabilities of irreducible sub-structures to obtain the expectation of the Λ-candidate set w.r.t. a uniformly random input sequence. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) in case of RNA secondary structures as well as RNA pseudoknot structures. Furthermore, for RNA secondary structures we also analyze a simplified loop-based energy model. Our combinatorial analysis is then compared to the expected number of Λ-candidates obtained from the folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop-based energy model our results imply that sparsification provides a significant, constant improvement of 91% (theory) to be compared to an 96% (experimental, simplified arc-based model) reduction. However, we do not observe a linear factor improvement. Finally, in case of the “full” loop-energy model we can report a reduction of 98% (experiment).

Conclusions

Sparsification was initially attributed a linear factor improvement. This conclusion was based on the so called polymer-zeta property, which stems from interpreting polymer chains as self-avoiding walks. Subsequent findings however reveal that the O(n) improvement is not correct. The combinatorial analysis presented here shows that, assuming a specific distribution (see Assumption 1), of mfe-structures within irreducible and arbitrary structures, the expected number of Λ-candidates is Θ(n2). However, the constant reduction is quite significant, being in the range of 96%. We furthermore show an analogous result for the sparsification of the Λ-decomposition rule for RNA pseudoknotted structures of genus one. Finally we observe that the effect of sparsification is sensitive to the employed energy model.

【 授权许可】

   
2012 Huang and Reidys; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140705074224614.pdf 994KB PDF download
Figure 9. 17KB Image download
Figure 9. 17KB Image download
Figure 9. 17KB Image download
Figure 8. 13KB Image download
Figure 7. 13KB Image download
Figure 7. 13KB Image download
Figure 7. 13KB Image download
Figure 6. 17KB Image download
Figure 6. 17KB Image download
Figure 6. 17KB Image download
Figure 5. 18KB Image download
Figure 4. 16KB Image download
Figure 3. 14KB Image download
Figure 3. 14KB Image download
Figure 3. 14KB Image download
Figure 2. 30KB Image download
Figure 3. 14KB Image download
Figure 2. 28KB Image download
Figure 1. 43KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 2.

Figure 3.

Figure 3.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 6.

Figure 6.

Figure 7.

Figure 7.

Figure 7.

Figure 8.

Figure 9.

Figure 9.

Figure 9.

【 参考文献 】
  • [1]Bailor MH, Sun X, Al-Hashimi HM: Topology Links RNA Secondary Structure with Global Conformation, Dynamics, and Adaptation. Science 2010, 327:202-206.
  • [2]Tabaska JE, Cary RB, Gabow HN, Stormo GD: An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics 1998, 14:691-699.
  • [3]Loebl M, Moffatt I: The chromatic polynomial of fatgraphs and its categorification. Adv. Math. 2008, 217:1558-1587.
  • [4]Penner RC, Knudsen M, Wiuf C, Andersen JE: Fatgraph models of proteins. Comm Pure Appl Math 2010, 63:1249-1297.
  • [5]Massey WS: Algebraic Topology: An Introduction. Springer-Veriag, New York; 1967.
  • [6]Penner RC, Waterman MS: Spaces of RNA secondary structures. Adv. Math 1993, 101:31-49.
  • [7]Penner RC: Cell decomposition and compactification of Riemann’s moduli space in decorated Teichmü, ller theory. In Woods Hole Mathematics-perspectives in math and physics. Edited by Tongring N, Penner RC. Singapore; World Scientific 2004:263-301. [ArXiv: math. GT/0306190]
  • [8]Mathews D, Sabina J, Zuker M, Turner D: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol 1999, 288:911-940.
  • [9]Reidys CM, Huang FWD, Andersen JE, Penner RC, Stadler PF, Nebel ME: Topology and prediction of RNA pseudoknots. Bioinformatics 2011, 27:1076-1085.
  • [10]Bon M, Vernizzi G, Orland H, Zee A: Topological Classification of RNA Structures. J Mol Biol 2008, 379:900-911.
  • [11]Andersen JE, Penner RC, Reidys CM, Waterman MS: Topological classification and enumeration of RNA structrues by genus. J. Math. Biol 2011. [Prepreint]
  • [12]Smith T, Waterman M: RNA secondary structure. Math. Biol 1978, 42:31-49.
  • [13]Zuker M: On finding all suboptimal foldings of an RNA molecule. Science 1989, 244:48-52.
  • [14]Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P: Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem 1994, 125:167-188.
  • [15]Rivas E, Eddy SR: A Dynamic Programming Algorithm for RNA Structure Prediction Including Pseudoknots. J Mol Biol 1999, 285:2053-2068.
  • [16]Uemura Y A Hasegawa, Kobayashi S, Yokomori T: Tree adjoining grammars for RNA structure prediction. Theor Comp Sci 1999, 210:277-303.
  • [17]Akutsu T: Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discr Appl Math 2000, 104:45-62.
  • [18]Lyngsø RB, Pedersen CN: RNA pseudoknot prediction in energy-based models. J Comp Biol 2000, 7:409-427.
  • [19]Cai L, Malmberg RL, Wu Y: Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 2003, 19 S1:i66-i73.
  • [20]Dirks RM, Pierce NA: A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput Chem 2003, 24:1664-1677.
  • [21]Deogun JS, Donis R, Komina O, Ma F: RNA secondary structure prediction with simple pseudoknots. Proceedings of the second conference on Asia-Pacific bioinformatics (APBC 2004), Australian Computer Society 2004, 239-246.
  • [22]Reeder J, Giegerich R: Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 2004, 5:104. BioMed Central Full Text
  • [23]Li H, Zhu D: A New Pseudoknots Folding Algorithm for RNA Structure Prediction. In COCOON 2005, Volume 3595. Edited by Wang L. Springer, Berlin; 2005:94-103.
  • [24]Matsui H, Sato K, Sakakibara Y: Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics 2005, 21:2611-2617.
  • [25]Kato Y, Seki H, Kasami T: RNA Pseudoknotted Structure Prediction Using Stochastic Multiple Context-Free Grammar. IPSJ Digital Courier 2006, 2:655-664.
  • [26]Chen HL, Condon A, Jabbari H: An O(n5) Algorithm for MFE Prediction of Kissing Hairpins and 4-Chains in Nucleic Acids. J Comp Biol 2009, 16:803-815.
  • [27]Waterman MS: Secondary structure of single-stranded nucleic acids. Adv Math (Suppl Studies) 1978, 1:167-212.
  • [28]Orland H, Zee A: RNA folding and large N matrix theory. Nuclear Physics B 2002, 620:456-476.
  • [29]Wexler Y, Zilberstein C, Ziv-Ukelson M: A study of accessible motifs and RNA complexity. J Comput Biol 2007, 14(6):856-872.
  • [30]Salari R, Möhl M, Will S, Sahinalp C, Backofen R: Time and space efficient RNA-RNA interaction prediction via sparse folding. Proc of RECOMB 2010, 6044:473-490.
  • [31]Backofen R, Tsur D, Zakov S, Ziv-Ukelson M: Sparse RNA folding: Time and space efficient algorithms. J Disc Algor 2011, 9(1):12-31.
  • [32]Möhl M, Salari R, Will S, Backofen R, Sahinalp SC: Sparsification of RNA structure prediction including pseudoknots. Algorithms Mol Biol 2010, 5:39. BioMed Central Full Text
  • [33]McCaskill JS: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990, 29:1105-1119.
  • [34]Dimitrieva S, Bucher P: Practicality and time complexity of a sparsified RNA folding algorithm. J Bioinfo Comput Biol 2012, 10(2):1241007.
  • [35]Kafri Y, Mukamel D, Peliti L: Why is the DNA Denaturation Transition First Order? Phys Rev Lett 2000, 85:4988-4991.
  • [36]Kabakcioglu A, Stella AL: A scale-free network hidden in the collapsing polymer. Phys Rev E 2005, 72:055102.
  • [37]Vanderzande C: Lattic models of polymers. Cambridge University Press, New York; 1998.
  • [38]NCBI database [ http://www.ncbi.nlm.nih.gov/guide/dna-rna/#downloads_ webcite]
  • [39]Nussinov R, Piecznik G, Griggs JR, Kleitman DJ: Algorithms for Loop Matching. SIAM J Appl Math 1978, 35:68-82.
  • [40]Nebel ME: Investigation of the Bernoulli model for RNA secondary structures. Bull math biol 2003, 66(5):925-964.
  • [41]Zagier D: On the distribution of the number of cycles of elements in symmetric groups. Nieuw Arch Wisk IV 1995, 13:489-495.
  • [42]Flajolet P, Sedgewick R: Analytic Combinatorics. Cambridge University Press, New York; 2009.
  • [43]Han HSW, Reidys CM: The 5’-3’ distance of RNA secondary structures. J Comput Biol 2012, 19(7):867-878.
  文献评价指标  
  下载次数:231次 浏览次数:8次