BMC Genomics | |
KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes | |
Sebastien Gagneux1  Sonia Borrell1  Mireia Coscolla1  David Stucki1  Andreas Steiner1  | |
[1] University of Basel, Basel, Switzerland | |
关键词: Mycobacterium tuberculosis; In-silico SNP-typing; Single nucleotide polymorphisms; FastQ; Whole genome sequencing; | |
Others : 1128475 DOI : 10.1186/1471-2164-15-881 |
|
received in 2014-02-06, accepted in 2014-10-03, 发布年份 2014 | |
【 摘 要 】
Background
High-throughput DNA sequencing produces vast amounts of data, with millions of short reads that usually have to be mapped to a reference genome or newly assembled. Both reference-based mapping and de novo assembly are computationally intensive, generating large intermediary data files, and thus require bioinformatics skills that are often lacking in the laboratories producing the data. Moreover, many research and practical applications in microbiology require only a small fraction of the whole genome data.
Results
We developed KvarQ, a new tool that directly scans fastq files of bacterial genome sequences for known variants, such as single nucleotide polymorphisms (SNP), bypassing the need of mapping all sequencing reads to a reference genome and de novo assembly. Instead, KvarQ loads “testsuites” that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files. KvarQ has a versatile command line interface and a graphical user interface. KvarQ currently ships with two “testsuites” for Mycobacterium tuberculosis, but new “testsuites” for other organisms can easily be created and distributed. In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences. The average scanning time per genome sequence was two minutes. The variant calls of a subset of these genomes were validated with a standard bioinformatics pipeline and revealed >99% congruency.
Conclusion
KvarQ is a user-friendly tool that directly extracts relevant information from fastq files. This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes. KvarQ is open-source, and pre-compiled packages with a graphical user interface are available at http://www.swisstph.ch/kvarq webcite.
【 授权许可】
2014 Steiner et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150224010007848.pdf | 1553KB | download | |
Figure 9. | 39KB | Image | download |
Figure 8. | 65KB | Image | download |
Figure 7. | 27KB | Image | download |
Figure 6. | 40KB | Image | download |
Figure 5. | 27KB | Image | download |
Figure 4. | 34KB | Image | download |
Figure 3. | 70KB | Image | download |
Figure 2. | 53KB | Image | download |
Figure 1. | 54KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
【 参考文献 】
- [1]Olsen RJ, Long SW, Musser JM: Bacterial genomics in infectious disease and the clinical pathology laboratory. Arch Pathol Lab Med 2012, 136(11):1414-1422. doi:10.5858/arpa.2012-0025-RA
- [2]Bertelli C, Greub G: Rapid bacterial genome sequencing: methods and applications in clinical microbiology. Clin Microbiol Infect 2013, 19(9):803-813. doi:10.1111/1469-0691.12217
- [3]Branco M: Bridging genomics technology and biology. Genome Biol 2013, 14(10):312. doi:10.1186/gb4135 BioMed Central Full Text
- [4]Robinson ER, Walker TM, Pallen MJ: Genomics and outbreak investigation: from sequence to consequence. Genome Med 2013, 5(4):36. doi: 10.1186/gm440
- [5]Parkhill J, Wren B: Bacterial epidemiology and biology - lessons from genome sequencing. Genome Biol 2011, 12(10):230. doi:10.1186/gb-2011-12-10-230 BioMed Central Full Text
- [6]Loman NJ, Constantinidou C, Chan JZM, Halachev M, Sergeant M, Penn CW, Robinson ER, Pallen MJ: High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat Rev Micro 2012, 10(9):599-606. doi:10.1038/nrmicro2850
- [7]Le VTM, Diep BA: Selected insights from application of whole-genome sequencing for outbreak investigations. Curr Opin Crit Care 2013, 19(5):432-439. doi:10.1097/MCC.0b013e3283636b8c
- [8]Croucher NJ, Harris SR, Grad YH, Hanage WP: Bacterial genomes in epidemiology–present and future. Philos Trans R Soc Lond B Biol Sci 2013, 368(1614):20120202. doi:10.1098/rstb.2012.0202
- [9]Gilmour MW, Graham M, Reimer A, Van Domselaar G: Public health genomics and the new molecular epidemiology of bacterial pathogens. Public Health Genomi 2013, 16(1–2):25-30. doi:10.1159/000342709
- [10]Gardy JL: Investigation of disease outbreaks with genome sequencing. Lancet Infect Dis 2013, 13(2):101-102. doi:10.1016/S1473-3099(12)70295-5
- [11]Walker TM, Monk P, Smith EG, Peto TEA: Contact investigations for outbreaks of Mycobacterium tuberculosis: advances through whole genome sequencing. Clin Microbiol and Infect 2013, 19(9):796-802. doi:10.1111/1469-0691.12183
- [12]Didelot X, Eyre D, Cule M, Ip C, Ansari M, Griffiths D, Vaughan A, O’Connor L, Golubchik T, Batty E, Piazza P, Wilson D, Bowden R, Donnelly P, Dingle K, Wilcox M, Walker A, APeto T, Harding R: Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome Biol (2012, 13(12):118. doi:10.1186/gb-2012-13-12-r118 BioMed Central Full Text
- [13]Snyder LA, Loman NJ, Faraj LA, Levi K, Weinstock G, Boswell TC, Pallen MJ, Ala’Aldeen DA: Epidemiological investigation of Pseudomonas aeruginosa isolates from a six-year-long hospital outbreak using high-throughput whole genome sequencing. Euro Surveill 2013., 18(42)
- [14]Snitkin ES, Zelazny AM, Thomas PJ, Stock F, Henderson DK, Palmore TN, Segre JA: Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Sci Transl Med 2012, 4(148):148-116. doi:10.1126/scitranslmed.3004129
- [15]Lavezzo E, Toppo S, Franchin E, Di Camillo B, Finotello F, Falda M, Manganelli R, Palu G, Barzon L: Genomic comparative analysis and gene function prediction in infectious diseases: application to the investigation of a meningitis outbreak. BMC Infect Dis 2013, 13:554. doi:10.1186/1471-2334-13-554 BioMed Central Full Text
- [16]Köser C, Holden M, Ellington M, Cartwright E, Brown N, Ogilvy-Stuart A, Hsu L, Chewapreecha C, Croucher N, Harris S, Sanders M, Enright MC, Dougan G, Bentley SD, Parkhill J, Fraser LJ, Betley JR, Schulz-Trieglaff OB, Smith GP, Peacock SJ: Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. New Engl J Med 2012, 366(24):2267-2275. doi:10.1056/NEJMoa1109910
- [17]Castillo-Ramirez S, Corander J, Marttinen P, Aldeljawi M, Hanage W, Westh H, Boye K, Gulay Z, Bentley S, Parkhill J, Holden M, Feil E: Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus. Genome Biol 2012, 13(12):126. doi:10.1186/gb-2012-13-12-r126 BioMed Central Full Text
- [18]Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, Brodkin E, Rempel S, Moore R, Zhao Y, Holt R, Varhol R, Birol I, Lem M, Sharma MK, Elwood K, Jones SJM, Brinkman FSL, Brunham RC, Tang P: Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. New Engl J Med 2011, 364(8):730-739. doi: 10.1056/NEJMoa1003176
- [19]Bryant JM, Schürch AC, van Deutekom H, Harris SR, de Beer JL, de Jager V, Kremer K, van Hijum SA, Siezen RJ, Borgdorff M, Bentley SD, Parkhill J, van Soolingen D: Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data. BMC Infect Dis 2013, 13(1):110. doi:10.1186/1471-2334-13-110 BioMed Central Full Text
- [20]Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U, Blom J, Wirth T, Jaenicke S, Schuback S, Rüsch-Gerdes S, Supply P, Kalinowski J, Niemann S: Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med 2013, 10(2):1001387. doi:10.1371/journal.pmed.1001387
- [21]Walker TM, Ip CLC, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, Eyre DW, Wilson DJ, Hawkey PM, Crook DW, Parkhill J, Harris D, Walker AS, Bowden R, Monk P, Smith EG, Peto TEA: Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis 2013, 13(2):137-146. doi:10.1016/S1473-3099(12)70277-3
- [22]Mather AE, Reid SWJ, Maskell DJ, Parkhill J, Fookes MC, Harris SR, Brown DJ, Coia JE, Mulvey MR, Gilmour MW, Petrovska L, de Pinna E, Kuroda M, Akiba M, Izumiya H, Connor TR, Suchard MA, Lemey P, Mellor DJ, Haydon DT, Thomson NR: Distinguishable epidemics of multidrug-resistant Salmonella typhimurium dt104 in different hosts. Science 2013, 341(6153):1514-1517. doi:10.1126/science.1240578
- [23]Drobniewski F, Nikolayevskyy V, Maxeiner H, Balabanova Y, Casali N, Kontsevaya I, Ignatyeva O: Rapid diagnostics of tuberculosis and drug resistance in the industrialized world: clinical and public health benefits and barriers to implementation. BMC Med 2013, 11:190. doi:10.1186/1741-7015-11-190 BioMed Central Full Text
- [24]Chen Y, Mukherjee S, Hoffmann M, Kotewicz ML, Young S, Abbott J, Luo Y, Davidson MK, Allard M, McDermott P, Zhao S: Whole-genome sequencing of gentamicin-resistant Campylobacter coli isolated from u.s. retail meats reveals novel plasmid-mediated aminoglycoside resistance genes. Antimicrob Agents Chemother 2013, 57(11):5398-5405. doi:10.1128/AAC.00669-13
- [25]Köser CU, Bryant JM, Becq J, Török ME, Ellington MJ, Marti-Renom MA, Carmichael AJ, Parkhill J, Smith GP, Peacock SJ: Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. New Engl J Med 2013, 369(3):290-292. doi:10.1056/NEJMc1215305
- [26]Paterson GK, Morgan FJE, Harrison EM, Cartwright EJP, Torok ME, Zadoks RN, Parkhill J, Peacock SJ, Holmes MA: Prevalence and characterization of human mecc methicillin-resistant Staphylococcus aureus isolates in England. J Antimicrob Chemother 2013. doi:10.1093/jac/dkt462
- [27]Zankari E, Hasman H, Kaas RS, Seyfarth AM, Agersø Y, Lund O, Larsen MV, Aarestrup FM: Genotyping using whole-genome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing. Antimicrob Chemother 2013, 68(4):771-777. doi:10.1093/jac/dks496
- [28]Török ME, Peacock SJ: Rapid whole-genome sequencing of bacterial pathogens in the clinical microbiology laboratory—pipe dream or reality? Antimicrob Chemother 2012, 67(10):2307-2308.
- [29]Pallen MJ, Loman NJ, Penn CW: High-throughput sequencing and clinical microbiology: progress, opportunities and challenges. Curr Opin Microbiol 2010, 13(5):625-631. doi:10.1016/j.mib.2010.08.003
- [30]Stucki D, Gagneux S: Single nucleotide polymorphisms in Mycobacterium tuberculosis and the need for a curated database. Tuberculosis 2012. doi:10.1016/j.tube.2012.11.002
- [31]Kruczkiewicz P, Mutschall S, Barker D, Thomas J, Van Domselaar G, Gannon VP, Carrillo CD, Taboada EN: MIST: a Tool for Rapid in Silico Generation of Molecular Data from Bacterial Genome Sequences. https://bytebucket.org/peterk87/microbialinsilicotyper/wiki/mist_paper.pdf webcite
- [32]Jolley K, Maiden M, BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 2010, 11(1):595. doi:10.1186/1471-2105-11-595 BioMed Central Full Text
- [33]Inouye M, Conway T, Zobel J, Holt K: Short read sequence typing (SRST): multi-locus sequence types from short reads. BMC Genomics 2012, 13(1):338. doi:10.1186/1471-2164-13-338 BioMed Central Full Text
- [34]Ramaswamy S, Musser JM: Molecular genetic basis of antimicrobial agent resistance in Mycobacterium tuberculosis: 1998 update. Tubercle Lung Dis 1998, 79(1):3-29. doi:10.1054/tuld.1998.0002
- [35]Riska PF, Jacobs WR, Alland D: Molecular determinants of drug resistance in tuberculosis. Tubercle Lung Dis 2000, 4(2 Suppl 1):4-10.
- [36]Rodwell TC, Valafar F, Douglas J, Qian L, Garfein RS, Chawla A, Torres J, Zadorozhny V, Soo Kim M, Hoshide M, Catanzaro D, Jackson L, Lin G, Desmond E, Rodrigues C, Eisenach K, Victor TC, Ismail N, Crudu V, Gle MT, Catanzaro A: Predicting extensively drug-resistant tuberculosis (XDR-TB) phenotypes with genetic mutations. J Clin Microbiol 2013. doi:10.1128/JCM.02701-13
- [37]Hershberg R, Lipatov M, Small PM, Sheffer H, Niemann S, Homolka S, Roach JC, Kremer K, Petrov DA, Feldman MW, Gagneux S: High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol 2008, 6(12):311. doi:10.1371/journal.pbio.0060311
- [38]Comas I, Chakravartti J, Small PM, Galagan J, Niemann S, Kremer K, Ernst JD, Gagneux S: Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet 2010, 42(6):498-503. doi:10.1038/ng.590
- [39]Coscolla M, Gagneux S: Does M. tuberculosis genomic diversity explain disease diversity? Drug disc today 2010, 7(1):43-59. doi:10.1016/j.ddmec.2010.09.004
- [40]Stucki D, Malla B, Hostettler S, Huna T, Feldmann J, Yeboah-Manu D, Borrell S, Fenner L, Comas I, Coscollà M, Gagneux S: Two new rapid SNP-Typing methods for classifying Mycobacterium tuberculosis complex into the main phylogenetic lineages. PLoS One 2012, 7(7):41253. doi:10.1371/journal.pone.0041253
- [41]Comas I, Coscolla M, Luo T, Borrell S, Holt KE, Kato-Maeda M, Parkhill J, Malla B, Berg S, Thwaites G, Yeboah-Manu D, Bothamley G, Mei J, Wei L, Bentley S, Harris SR, Niemann S, Diel R, Aseffa A, Gao Q, Young D, Gagneux S: Out-of-africa migration and neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet 2013, 45(10):1176-1182. doi:10.1038/ng.2744
- [42]Comas I, Homolka S, Niemann S, Gagneux S: Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS One 2009, 4(11):7815. doi:10.1371/journal.pone.0007815
- [43]Coscolla M, Lewin A, Metzger S, Maetz-Rennsing K, Calvignac-Spencer S, Nitsche A, Dabrowski PW, Radonic A, Niemann S, Parkhill J, Couacy-Hymann E, Feldman J, Comas I, Boesch C, Gagneux S, Leendertz FH: Novel Mycobacterium tuberculosis complex isolate from a wild chimpanzee. Emerg Infect Dis 2013, 19(6):969-976. doi:10.3201/eid1906.121012
- [44]Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S, Murphy L, Oliver K, Osborne J, et al.: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 1998, 393(6685):537-544. doi:10.1038/31159
- [45]Warner DF, Mizrahi V: Complex genetics of drug resistance in Mycobacterium tuberculosis. Nat Genet 2013, 45(10):1107-1108. doi:10.1038/ng.2769
- [46]Sandgren A, Strong M, Muthukrishnan P, Weiner BK, Church GM, Murray MB: Tuberculosis drug resistance mutation database. PLoS Med 2009, 6(2):1000002. doi:10.1371/journal.pmed.1000002
- [47]Comas I, Borrell S, Roetzer A, Rose G, Malla B, Kato-Maeda M, Galagan J, Niemann S, Gagneux S: Whole-genome sequencing of rifampicin-resistant Mycobacterium tuberculosis strains identifies compensatory mutations in RNA polymerase genes. Nat Genet 2012, 44(1):106-110. doi:10.1038/ng.1038
- [48]Takiff HE, Salazar L, Guerrero C, Philipp W, Huang WM, Kreiswirth B, Cole ST, Jacobs WR, Telenti A: Cloning and nucleotide sequence of Mycobacterium tuberculosis gyrA and gyrB genes and detection of quinolone resistance mutations. Antimicrob Chemother 1994, 38(4):773-780. doi:10.1128/AAC.38.4.773
- [49]Camus J-C, Pryor MJ, Médigue C, Cole ST: Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 2002, 148(Pt 10):2967-2973.
- [50]Lew JM, Kapopoulou A, Jones LM, Cole ST: TubercuList – 10 years after. Tuberculosis 2011, 91(1):1-7. doi:10.1016/j.tube.2010.09.008
- [51]Li H, Durbin R: Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 2009, 25(14):1754-1760. doi:10.1093/bioinformatics/btp324
- [52]Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics 2000, 16(10):944-945.