BMC Bioinformatics | |
Sealer: a scalable gap-closing application for finishing draft genomes | |
Software | |
Daniel Paulino1  Benjamin P. Vandervalk1  Anthony Raymond1  René L. Warren1  Shaun D. Jackman1  Inanç Birol2  | |
[1] Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, V5Z 4S6, Vancouver, BC, Canada;Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, V5Z 4S6, Vancouver, BC, Canada;Department of Medical Genetics, University of British Columbia, V6H 3N1, Vancouver, BC, Canada; | |
关键词: Gap closing; Genome finishing; Sealer; Next-generation sequencing; Bloom filters; | |
DOI : 10.1186/s12859-015-0663-4 | |
received in 2015-02-24, accepted in 2015-07-07, 发布年份 2015 | |
来源: Springer | |
【 摘 要 】
BackgroundWhile next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes.ResultsHere we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study.ConclusionSealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.
【 授权许可】
Unknown
© Paulino et al. 2015. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311107990718ZK.pdf | 695KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]