期刊论文详细信息
GigaScience
LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
Inanç Birol1  Steven J. M. Jones1  Albert Lagman1  Bahar Behsaz1  Benjamin P. Vandervalk1  Chen Yang1  René L. Warren1 
[1] BC Cancer Agency, Michael Smith Genome Sciences Centre, Vancouver V5Z 4S6, British Columbia, Canada
关键词: LINKS;    Next-generation sequencing;    Genome assembly;    Scaffolding;    Nanopore sequencing;   
Others  :  1222606
DOI  :  10.1186/s13742-015-0076-3
 received in 2015-05-28, accepted in 2015-07-29,  发布年份 2015
PDF
【 摘 要 】

Background

Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value.

Results

We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes.

Conclusions

This study highlights the present utility of nanopore reads for genome scaffolding in spite of their current limitations, which are expected to diminish as the nanopore sequencing technology advances. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts.

【 授权许可】

   
2015 Warren et al.

【 预 览 】
附件列表
Files Size Format View
20150824050159414.pdf 1179KB PDF download
Fig. 4. 60KB Image download
Fig. 3. 43KB Image download
Fig. 2. 68KB Image download
Fig. 1. 37KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

【 参考文献 】
  • [1]Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol. 2014; 23C:110-120.
  • [2]Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C et al.. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013; 10:563-69.
  • [3]Berlin K, Koren S, Chin C-S, Drake J, Landolin JM, Phillippy AM. Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing. Nat Biotechnol. 2015; 33:623-30.
  • [4]Madoui MA, Engelen S, Cruaud C, Belser C, Bertrand L, Alberti A, et al. Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics. 2015;16:327.
  • [5]Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 2009; 4:265-70.
  • [6]Quick J, Quinlan AR, Loman NJ. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. Gigascience. 2014; 3:22. BioMed Central Full Text
  • [7]Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S et al.. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat. Biotechnol. 2015; 33:296-300.
  • [8]Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz MC, McCombie WR. Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome. bioRxiv. 2015.
  • [9]Data release of ALLPATHS-LG de novo assembly for A. thaliana Ler-1. http://1001genomes. org/data/MPI/MPISchneeberger2011/releases/current/Ler-1/Assemblies/Allpaths_LG/ webcite
  • [10]Lee H, Gurtowski J, Yoo S, Marcus S, McCombie WR, Schatz M. Error correction and assembly complexity of single molecule sequencing reads. bioRxiv. 2014.
  • [11]Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA et al.. Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics. 2013; 29:1492-7.
  • [12]Warren RL, Keeling C, Yuen M, Raymond A, Taylor G, Vandervalk BP et al.. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. The Plant Journal. 2015; 83:189-212.
  • [13]Bacterial whole-genome read data from the Oxford Nanopore Technologies MinION™ nanopore sequencer. http://gigadb. org/dataset/100102 webcite
  • [14]Bacterial whole-genome read data from the Oxford Nanopore Technologies MinION™ nanopore sequencer at the European Nucleotide Archive.  http://www.ebi.ac.uk/ena/data/view/ERP007108.
  • [15]Oxford nanopore and Illumina read data and assemblies for Salmonella Typhi.  http://figshare.com/articles/Salmonella_Typhi_H58_MinION_and_Illumina_data/1170110.
  • [16]Salmonella Typhi whole-genome read data from the Oxford Nanopore Technologies MinION™ nanopore sequencer at the European Nucleotide Archive. http://www.ebi.ac.uk/ena/data/view/ERR668747.
  • [17]Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly data resource for S. cerevisiae. http://schatzlab.cshl.edu/data/nanocorr.
  • [18]PacBio and Illumina data resource for the A. thaliana genome. http://schatzlab.cshl.edu/data/ectools.
  • [19]Warren RL, Yang C, Vandervalk BP, Behsaz B, Lagman A, Jones SJM, et al. Software and supporting material for “LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads”. GigaScience Database. 2015. doi:10.5524/100159.
  • [20]Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011; 21:487-93.
  • [21]Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods. 2015; 12:351-6.
  • [22]Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009; 19:1117-23.
  • [23]Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J et al.. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 2011; 21:2224-41.
  • [24]Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013; 29:1072-5.
  • [25]Boetzer M, Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics. 2014; 15:211. BioMed Central Full Text
  • [26]Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015.
  • [27]Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ et al.. A whole-genome assembly of Drosophila. Science. 2000; 287:2196-204.
  • [28]Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S et al.. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014; 9:e112963.
  • [29]Gnerre S, MacCallum I, Przybylski D, Ribeiro F, Burton J, Walker B et al.. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA. 2011; 108:1513-8.
  • [30]LINKS software release pages. http://www.bcgsc.ca/bioinfo/software/links.
  • [31]Loman NJ, Quinlan AR. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics. 2014; 30:3399-401.
  • [32]Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F et al.. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med. 2011; 365:709-17.
  • [33]Sequence read data for Picea glauca PG29 at the Sequence Read Archive. http://sra.dnanexus.com/studies/SRP014489
  • [34]Paulino D, Warren RL, Vandervalk BP, Raymond A, Jackman SD, Birol I. Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics. 2015; 16:230. BioMed Central Full Text
  • [35]Bloom BH. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM. 1970; 13:422-6.
  • [36]Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007; 23:500-1.
  文献评价指标  
  下载次数:51次 浏览次数:16次