| PeerJ | |
| StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees | |
| article | |
| Märt Roosaare1  Mihkel Vaher1  Lauris Kaplinski1  Märt Möls1  Reidar Andreson1  Maarja Lepamets1  Triinu Kõressaar1  Paul Naaber3  Siiri Kõljalg4  Maido Remm1  | |
| [1] Department of Bioinformatics, University of Tartu;Institute of Mathematical Statistics, University of Tartu;Synlab Eesti;Department of Microbiology, Institute of Biomedicine and Translational Medicine, University of Tartu;United Laboratories, Tartu University Clinics | |
| 关键词: k-mer; Clade; Strain identification; Species identification; Diagnostics; | |
| DOI : 10.7717/peerj.3353 | |
| 学科分类:社会科学、人文和艺术(综合) | |
| 来源: Inra | |
PDF
|
|
【 摘 要 】
Background Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. Results A tool named StrainSeeker was developed that constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k-mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. Conclusion StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker’s web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.
【 授权许可】
CC BY
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202307100013988ZK.pdf | 2181KB |
PDF