BMC Genomics | |
IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy | |
Bernhard Y Renard1  Franziska Zickmann1  | |
[1] Research Group Bioinformatics (NG4), Robert Koch-Institute, Berlin, Germany | |
关键词: RNA-Seq integration; Genome annotation; Gene finder combination; Gene prediction; | |
Others : 1135437 DOI : 10.1186/s12864-015-1315-9 |
|
received in 2014-09-02, accepted in 2015-02-03, 发布年份 2015 | |
【 摘 要 】
Background
Gene prediction is a challenging but crucial part in most genome analysis pipelines. Various methods have evolved that predict genes ab initio on reference sequences or evidence based with the help of additional information, such as RNA-Seq reads or EST libraries. However, none of these strategies is bias-free and one method alone does not necessarily provide a complete set of accurate predictions.
Results
We present IPred (Integrative gene Prediction), a method to integrate ab initio and evidence based gene identifications to complement the advantages of different prediction strategies. IPred builds on the output of gene finders and generates a new combined set of gene identifications, representing the integrated evidence of the single method predictions.
Conclusion
We evaluate IPred in simulations and real data experiments on Escherichia Coli and human data. We show that IPred improves the prediction accuracy in comparison to single method predictions and to existing methods for prediction combination.
【 授权许可】
2015 Zickmann and Renard; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150309031731371.pdf | 1649KB | download | |
Figure 4. | 41KB | Image | download |
Figure 3. | 50KB | Image | download |
Figure 2. | 45KB | Image | download |
Figure 1. | 71KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Korf I: Gene finding in novel genomes. BMC Bioinformatics 2004, 5(1):59. BioMed Central Full Text
- [2]Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 2007, 23(6):673-9.
- [3]Goodswen SJ, Kennedy PJ, Ellis JT: Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques. PLoS ONE 2012, 7(11):50609.
- [4]Wei C, Brent M: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics 2006, 7(1):327. BioMed Central Full Text
- [5]Savidor A, Donahoo RS, Hurtado-Gonzales O, Verberkmoes NC, Shah MB, Lamour KH, et al.: Expressed peptide tags: an additional layer of data for genome annotation. J Proteome Res 2006, 5(11):3048-58.
- [6]Zickmann F, Lindner MS, Renard BY: GIIRA – RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 2014, 30(5):606-13.
- [7]Yada T, Takagi T, Totoki Y, Sakaki Y, Takaeda Y: Digit: a novel gene finding program by combining gene-finders. In Proceedings of the 8th Pacific Symposium on Biocomputing (PSB 2003). Lihue, Hawaii, USA; 2002.
- [8]Mathé C, Sagot M-F, Schiex T, Rouzé P: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res 2002, 30(19):4103-17.
- [9]Stanke M, Schöffmann O, Morgenstern B, Waack S: Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 2006, 7:62. BioMed Central Full Text
- [10]Allen JE, Salzberg SL: JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 2005, 21(18):3596-603.
- [11]Yok N, Rosen G: Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics 2011, 12(1):20. BioMed Central Full Text
- [12]Ederveen THA, Overmars L, van Hijum SAFT: Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction. PLoS ONE 2013, 8(5):63523.
- [13]Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al.: Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 2008, 9(1):7. BioMed Central Full Text
- [14]Elsik CG, Mackey AJ, Reese JT, Milshina NV, Roos DS, Weinstock GM: Creating a honey bee consensus gene set. Genome Biol 2007, 8(1):13. BioMed Central Full Text
- [15]Allen JE, Pertea M, Salzberg SL: Computational gene prediction using multiple sources of evidence. Genome Res 2004, 14(1):142-8.
- [16]Pavlović V, Garg A, Kasif S: A bayesian framework for combining gene predictions. Bioinformatics 2002, 18(1):19-27.
- [17]Murakami K, Takagi T: Gene recognition by combination of several gene-finding programs. Bioinformatics 1998, 14(8):665-75.
- [18]Shah SP, McVicker GP, Mackworth AK, Rogic S, Ouellette BFF: GeneComber: combining outputs of gene prediction programs for improved results. Bioinformatics 2003, 19(10):1296-7.
- [19]Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al.: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012, 7(3):562-78.
- [20]Besemer J, Lomsadze A, Borodovsky M: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 2001, 29(12):2607-18.
- [21]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010, 28(5):511-5.
- [22]Holtgrewe M. Mason - a read simulator for second generation sequencing data. Technical Report TR-B-10-06. Fachbereich für Mathematik und Informatik, Freie Universität Berlin (October 2010).
- [23]Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg S: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013, 14(4):36. BioMed Central Full Text