期刊论文详细信息
BMC Bioinformatics
Frnakenstein: multiple target inverse RNA folding
Rune B Lyngsø4  James WJ Anderson4  Elena Sizikova3  Amarendra Badugu2  Tomas Hyland1  Jotun Hein4 
[1] Mathematics Institute, University of Oxford, Oxford OX1 3LB, UK
[2] ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
[3] Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK
[4] Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
关键词: Riboswitch;    Genetic algorithm;    Inverse folding;    RNA;   
Others  :  1088110
DOI  :  10.1186/1471-2105-13-260
 received in 2012-07-07, accepted in 2012-09-23,  发布年份 2012
PDF
【 摘 要 】

Background

RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard.

Results

In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets.

Conclusions

Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available athttp://www.stats.ox.ac.uk/research/genome/software/frnakenstein webcite.

【 授权许可】

   
2012 Lyngsøet al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117074815199.pdf 419KB PDF download
Figure 2. 42KB Image download
Figure 1. 55KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Pipas JM, McMahon JE: Method for predicting RNA secondary structure. Proc Nat Acad Sci USA 1975, 72(6):2017-2021.
  • [2]Markham NR, Zuker M: UNAFold: software for nucleic acid folding and hybridization. In Bioinformatics, Volume II. Structure, Function and Applications, number 453 in Methods in Molecular Biology, chapter 1. Edited by Keith JM. Totowa: NJ: Humana Press; 2008:3-31. ISBN 978-1-60327-428-9
  • [3]Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P: Fast Folding and Comparison of RNA Secondary Structures. Monatshefte für Chemie 1994, 125:167-188.
  • [4]Knudsen B, Hein JJ: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars and Evolutionary History. Bioinformatics 1999, 15(6):446-454.
  • [5]Knudsen B, Hein J: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 2003, 31(13):3423-3428.
  • [6]Shapiro BA, Yingling YG, Kasprzak W, Bindewald E: Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 2007, 17(2):157-165.
  • [7]Gardner PP, Giegerich R: A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinf 2004, 30(5):140.
  • [8]Andronescu M, Fejes AP, Hutter F, Hoos HH, Condon A: A new algorithm for RNA secondary structure design. J Mol Biol 2004, 336:607-624.
  • [9]Busch A, Backofen R: INFO-RNA – a fast approach to inverse RNA folding. Bioinformatics 2006, 22(15):1823-1831.
  • [10]Zadeh JN, Wolfe BR, Pierce NA: Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem 2011, 32(3):439-452.
  • [11]Gao JZM, Li LYM, Reidys CM: Inverse folding of RNA pseudoknot structures. Algorithms for Mol Biol 2010, 5:27. BioMed Central Full Text
  • [12]Taneda A: MODENA: a multi-objective RNA inverse folding. Adv and App iBioinf and Chem 2011, 4:1-12.
  • [13]Taneda A: Multi-objective genetic algorithm for pseudoknotted RNA sequence design. Front Genet 2012, 3:36.
  • [14]Flamm C, Hofacker IL, Maurer-Stroh S, Stadler PF, Zehl M: Design of multistable RNA molecules. RNA 2001, 7(2):254-265.
  • [15]Schwefel HP: Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie. Mit einer vergleichenden Einführung in die Hill-Climbing- und Zufallsstrategie. Interdisciplinary Syst Research. 26. Basel - Stuttgart: Birkhäuser Verlag. 390pp. 1977.
  • [16]Aguirre-Hernández R, Hoos HH, Condon A: Computational RNA secondary structure design: empirical complexity and improved methods. BMC Bioinf 2007, 8:34. BioMed Central Full Text
  • [17]Bubley R, Dyer M, Greenhill C, Jerrum M: On Approximately Counting Colourings of Small Degree Graphs. SIAM J Comput 1998, 29:387-400.
  • [18]Avihoo A, Churkin A, Barash D: RNAexinv: An extended inverse RNA folding from shape and physical attributes to sequences. BMC Bioinf 2011, 12:319. BioMed Central Full Text
  • [19]Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 2005, 33:D121-D124.
  • [20]Andronescu M, Bereg V, Hoos HH, Condon A: RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinf 2008, 9:340. BioMed Central Full Text
  • [21]Andersen ES, Rosenblad MA, Larsen N, Westergaard JC, Burks J, Wower IK, Wower J, Gorodkin J, Samuelsson T, Zwieb C: The tmRDB and SRPDB resources. Nucleic Acids Res 2006, 34(suppl. 1):D163-D168.
  • [22]Brown JW: The Ribonuclease P Database. Nucleic Acids Res 1999, 27:314-314.
  • [23]Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B, Lin N, Madabusi L, Muller K, Pande N, Shang Z, Yu N, Gutell R: The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinf 2002, 3:2. [M3: 10.1186/1471-2105-3-2] BioMed Central Full Text
  • [24]Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S: Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 1998, 26:148-153.
  • [25]Westbrook J, Feng Z, Chen L, Yang H, Berman HM: The Protein Data Bank and structural genomics. Nucleic Acids Res 2003, 31:489-491.
  文献评价指标  
  下载次数:49次 浏览次数:35次