BMC Research Notes | |
Employing whole genome mapping for optimal de novo assembly of bacterial genomes | |
Surbhi Malhotra-Kumar2  Herman Goossens2  Henri de Greve3  Jean-Pierre Hernalsteens1  Moons Pieter2  Julia Sabirova2  Basil Britto Xavier2  | |
[1] Viral Genetics Research Group, Vrije Universiteit Brussel, Brussels, Belgium;Department of Medical Microbiology, Vaccine & Infectious Disease Institute, Universiteit Antwerpen, Antwerp, Belgium;Structural Biology Brussels, Flanders Institute for Biotechnology (VIB), Vrije Universiteit Brussel, Brussels, Belgium | |
关键词: SPAdes and Velvet; de bruijn graph; Microbial genomes; k-mer; Whole genome mapping; De-novo assembly; | |
Others : 1131786 DOI : 10.1186/1756-0500-7-484 |
|
received in 2014-04-08, accepted in 2014-07-21, 发布年份 2014 | |
【 摘 要 】
Background
De novo genome assembly can be challenging due to inherent properties of the reads, even when using current state-of-the-art assembly tools based on de Bruijn graphs. Often users are not bio-informaticians and, in a black box approach, utilise assembly parameters such as contig length and N50 to generate whole genome sequences, potentially resulting in mis-assemblies.
Findings
Utilising several assembly tools based on de Bruijn graphs like Velvet, SPAdes and IDBA, we demonstrate that at the optimal N50, mis-assemblies do occur, even when using the multi-k-mer approaches of SPAdes and IDBA. We demonstrate that whole genome mapping can be used to identify these mis-assemblies and can guide the selection of the best k-mer size which yields the highest N50 without mis-assemblies.
Conclusions
We demonstrate the utility of whole genome mapping (WGM) as a tool to identify mis-assemblies and to guide k-mer selection and higher quality de novo genome assembly of bacterial genomes.
【 授权许可】
2014 Xavier et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150303071954467.pdf | 741KB | download | |
Figure 1. | 101KB | Image | download |
【 图 表 】
Figure 1.
【 参考文献 】
- [1]Li Y, Zheng H, Luo R, Wu H, Zhu H, Li R, Cao H, Wu B, Huang S, Shao H, Ma H, Zhang F, Feng S, Zhang W, Du H, Tian G, Li J, Zhang X, Li S, Bolund L, Kristiansen K, de Smith AJ, Blakemore AIF, Coin LJM, Yang H, Wang J, Wang J: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat Biotech 2011, 29:723-730.
- [2]Salzberg SL, Yorke JA: Beware of mis-assembled genomes. Bioinformatics 2005, 21:4320-4321.
- [3]Phillippy AM, Schatz MC, Pop M: Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 2008, 9:R55.
- [4]Dark M: Whole-genome sequencing in bacteriology: state of the art. Infect Drug Resist 2013, 6:115-123.
- [5]Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18:821-829.
- [6]Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012, 19:455-477.
- [7]Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto T: REAPR: a universal tool for genome assembly evaluation. Genome Biol 2013, 14:R47.
- [8]Peng Y, Leung HM, Yiu SM, Chin FL: IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler. In Research in Computational Molecular Biology. Volume 6044. Edited by Berger B. Berlin Heidelberg: Springer; 2010:426-440. Lecture Notes in Computer Science]
- [9]Ananiev GE, Goldstein S, Runnheim R, Forrest DK, Zhou S, Potamousis K, Churas CP, Bergendahl V, Thomson JA, Schwartz DC: Optical mapping discerns genome wide DNA methylation profiles. BMC Mol Biol 2008, 9:68.
- [10]Onmus-Leone F, Hang J, Clifford RJ, Yang Y, Riley MC, Kuschner RA, Waterman PE, Lesho EP: Enhanced De novo assembly of high throughput pyrosequencing data using whole genome mapping. PLoS One 2013, 8:e61762.
- [11]Sabirova JS, Xavier BB, Hernalsteens JP, De Greve H, Ieven M, Goossens H, Malhotra Kumar S: Complete genome sequences of Two prolific Biofilm-forming staphylococcus aureus isolates belonging to USA300 and EMRSA-15 clonal lineages. Genome Announc 2014, 2:e00610-14.