期刊论文

【摘要】

Background

Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest.

Results

In this article, we present a new general framework for deriving algorithms for the non-uniform random generation of combinatorial objects according to the encoding and probability distribution implied by a stochastic context-free grammar. Briefly, the framework extends on the well-known recursive method for (uniform) random generation and uses the popular framework of admissible specifications of combinatorial classes, introducing weighted combinatorial classes to allow for the non-uniform generation by means of unranking. This framework is used to derive an algorithm for the generation of RNA secondary structures of a given fixed size. We address the random generation of these structures according to a realistic distribution obtained from real-life data by using a very detailed context-free grammar (that models the class of RNA secondary structures by distinguishing between all known motifs in RNA structure). Compared to well-known sampling approaches used in several structure prediction tools (such as SFold) ours has two major advantages: Firstly, after a preprocessing step in time $<a onClick=$ View MathML"> for the computation of all weighted class sizes needed, with our approach a set of m random secondary structures of a given structure size n can be computed in worst-case time complexity $<a onClick=$ View MathML"> while other algorithms typically have a runtime in $<a onClick=$ View MathML">. Secondly, our approach works with integer arithmetic only which is faster and saves us from all the discomforting details of using floating point arithmetic with logarithmized probabilities.

Conclusion

A number of experimental results shows that our random generation method produces realistic output, at least with respect to the appearance of the different structural motifs. The algorithm is available as a webservice at http://wwwagak.cs.uni-kl.de/NonUniRandGen webcite and can be used for generating random secondary structures of any specified RNA type. A link to download an implementation of our method (in Wolfram Mathematica) can be found there, too.

【授权许可】

2011 Nebel et al; licensee BioMed Central Ltd.

【预览】

附件列表
Files	Size	Format	View
20140705084225362.pdf	1425KB	PDF	download
Figure 7.	18KB	Image	download
Figure 7.	18KB	Image	download
Figure 7.	18KB	Image	download
Figure 7.	18KB	Image	download
Figure 7.	18KB	Image	download
Figure 6.	17KB	Image	download
Figure 6.	17KB	Image	download
Figure 6.	17KB	Image	download
Figure 6.	17KB	Image	download
Figure 6.	17KB	Image	download
Figure 5.	51KB	Image	download
Figure 5.	51KB	Image	download
Figure 5.	51KB	Image	download
Figure 5.	51KB	Image	download
Figure 5.	51KB	Image	download
Figure 4.	48KB	Image	download
Figure 4.	48KB	Image	download
Figure 4.	48KB	Image	download
Figure 4.	48KB	Image	download
Figure 4.	48KB	Image	download
Figure 3.	21KB	Image	download
Figure 2.	24KB	Image	download
Figure 1.	39KB	Image	download

【图表】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 4.

Figure 4.

Figure 4.

Figure 4.

Figure 5.

Figure 5.

Figure 5.

Figure 5.

Figure 5.

Figure 6.

Figure 6.

Figure 6.

Figure 6.

Figure 6.

Figure 7.

Figure 7.

Figure 7.

Figure 7.

Figure 7.

【参考文献】

[1]Flajolet P, Fusy E, Pivoteau C: Boltzmann Sampling of Unlabelled Structures. In Proceedings of ANALCO'07 (Analytic Combinatorics and Algorithms) Conference. SIAM Press; 2007:201-211.
[2]Fitch WM: Random sequences. Journal of Molecular Biology 1983, 163:171-176.
[3]Altschul SF, Erickson BW: Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 1985, 2(6):256-538.
[4]Denise A, Ponty Y, Termier M: Random Generation of structured genomic sequences. Proceedings of RECOMB 2003 2003, 3. (poster)
[5]Waterman MS: Secondary Structure of Single-Stranded Nucleic Acids. Advances in Mathematics Supplementary Studies 1978, 1:167-212.
[6]Ding Y, Lawrence CE: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Research 2003, 31(24):7280-7301.
[7]Ponty Y: Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy: the boustrophedon method. Journal of Mathematical Biology 2008, 56:107-127.
[8]Allali J, d'Aubenton Carafa Y, Chauve C, Denise A, Drevet C, Ferraro P, Gautheret D, Herrbach C, Leclerc F, de Monte A, Ouangraoua A, Sagot MF, Saule C, Termier M, Thermes C, Touzet H: Benchmarking RNA secondary structures comparison algorithms. Actes des Journées Ouvertes de Biologie, Informatique et Mathématiques - JOBIM'08 2008, 67-68.
[9]Wuchty S, Fontana W, Hofacker I, Schuster P: Complete Suboptimal Folding of RNA and the Stability of Secondary Structures. Biopolymers 1999, 49:145-165.
[10]Zuker M: On Finding All Suboptimal Foldings of an RNA Molecule. Science 1989, 244:48-52.
[11]Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31(13):3406-3415.
[12]Hofacker IL: The Vienna RNA secondary structure server. Nucleic Acids Research 2003, 31(13):3429-3431.
[13]Dowell RD, Eddy SR: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 2004, 5:71. BioMed Central Full Text
[14]Knudsen B, Hein J: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999, 15(6):446-454.
[15]Knudsen B, Hein J: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Research 2003, 31(13):3423-3428.
[16]Pedersen J, Meyer I, Forsberg R, Simmonds P, Hein J: A comparative method for finding and folding RNA secondary structures in protein-coding regions. Nucleic Acids Reserach 2004, 32:4925-4936.
[17]Pedersen JS, Forsberg R, Meyer IM, Hein J: An Evolutionary Model for Protein-Coding Regions with Conserved RNA Structure. Molecular Biology and Evolution 2004, 21:1913-1922.
[18]Wiebe NJP, Meyer IM: ¡sc¿Transat¡/sc¿A Method for Detecting the Conserved Helices of Functional RNA Structures, Including Transient, Pseudo-Knotted and Alternative Structures. PLoS Comput Biol 2010, 6(6):e1000823.
[19]Gesell T, von Haeseler A: In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics 2006, 22:716-722.
[20]Weinberg F, Nebel ME: Non Uniform Generation of Combinatorial Objects. Tech. rep., Technische Universität Kaiserslautern; 2010.
[21]Nebel ME, Scheid A: Analysis of the Free Energy in a Stochastic RNA Secondary Structure Model. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010.
[22]Xia T, SantaLucia J Jr, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH: Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 1998, 37:14719-14735.
[23]Mathews DH, Sabina J, Zuker M, Turner DH: Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J Mol Biol 1999, 288:911-940.
[24]Nijenhuis A, Wilf HS: Combinatorial Algorithms. 2nd edition. 1978. Academic Press
[25]Flajolet P, Zimmermann P, Van Cutsem B: A Calculus for the Random Generation of Combinatorial Structures. Theoretical Computer Science 1994, 132(2):1-35.
[26]Duchon P, Flajolet P, Louchard G, Schaeffer G: Boltzmann Samplers for the Random Generation of Combinatorial Structures. Combinatorics, Probability, and Computing, Volume 13 2004, 577-625. [Special issue on Analysis of Algorithms]
[27]Flajolet P, Sedgewick R: Analytic Combinatorics. Cambridge University Press; 2009.
[28]Harrison MA: Introduction to Formal Language Theory. Addison-Wesley; 1978.
[29]Stein PR, Waterman MS: On some new sequences generalizing the Catalan and Motzkin numbers. Discrete Mathematics 1978, 26:216-272.
[30]Viennot G, Chaumont MVD: Enumeration of RNA Secondary Structures by Complexity. Mathematics in medicine and biology, Lecture Notes in Biomathematics 1985, 57:360-365.
[31]Nebel ME: Combinatorial Properties of RNA Secondary Structures. Journal of Computational Biology 2002, 9(3):541-574.
[32]Hofacker IL, Schuster P, Stadler PF: Combinatorics of RNA secondary structures. Discrete Applied Mathematics 1998, 88:207-237.
[33]Nebel ME: Investigation of the Bernoulli-Model of RNA Secondary Structures. Bulletin of Mathematical Biology 2004, 66:925-964.
[34]Zuker M, Sankoff D: RNA Secondary Structures and their Prediction. Bull Mathematical Biology 1984, 46:591-621.
[35]Nebel ME: On a statistical filter for RNA secondary structures. Tech. rep., Frankfurter Informatik-Berichte; 2002.
[36]Nebel ME: Identifying Good Predictions of RNA Secondary Structure. Proceedings of the Pacific Symposium on Biocomputing 2004, 423-434.
[37]Molinero X: Ordered Generation of Classes of Combinatorial Structures. PhD thesis. Universitat Politècnica de Catalunya; 2005.
[38]Fu KS, Huang T: Stochastic Grammars and Languages. International Journal of Computer and Information Sciences 1972, 1(2):135-170.
[39]Huang T, Fu KS: On Stochastic Context-Free Languages. Information Sciences 1971, 3:201-224.
[40]Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 1994, 22:5112-5120.
[41]Liebehenschel J: Ranking and unranking of lexicographically ordered words: an average-case analysis. J Autom Lang Comb 1998, 2(4):227-268.
[42]Weinberg F, Nebel ME: Extending Stochastic Context-Free Grammars for an Application in Bioin-formatics. 4th International Conference on Language and Automata Theory and Applications (LATA2010) 2010.
[43]Nawrocki EP, Eddy SR: Query-Dependent Banding (QDB) for Faster RNA Similarity Searches. PLoS Comput Biol 2007, 3:e56.
[44]Martínez C, Molinero X: A generic approach for the unranking of labeled combinatorial classes. Random Struct. Algorithms 2001, 19(3-4):472-497.
[45]Wuyts J, Rijk PD, de Peer YV, Winkelmans T, Wachter RD: The European Large Subunit Ribosomal RNA Database. Nucleic Acids Research 2001, 29:175-177.
[46]Wuyts J, de Peer YV, Winkelmans T, Wachter RD: The European Database on Small Subunit Ribosomal RNA. Nucleic Acids Research 2002, 30:183-185.
[47]Salomaa A, Soittola M: Automata-theoretic aspects of formal power series. Springer; 1978.
[48]Mann H, Whitney D: On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 1947, 18:50-60.
[49]Wilcoxon F: Individual Comparisons by Ranking Methods. Biometrics Bulletin 1945, 1:80-83.

Algorithms for Molecular Biology
Random generation of RNA secondary structures according to native distributions

Markus E Nebel¹ Anika Scheid¹ Frank Weinberg¹
[1] Department of Computer Science, University of Kaiserslautern, Germany
关键词: RNA secondary structures; stochastic context-free grammars; Random generation;
Others : 795286 DOI : 10.1186/1748-7188-6-24

received in 2011-04-20, accepted in 2011-10-12, 发布年份 2011
PDF


	文献评价指标
	下载次数：313次	浏览次数：7次

【 摘 要 】

Background

Results

Conclusion

【 授权许可】

【 预 览 】

【 图 表 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【图表】

【参考文献】