BMC Genomics | |
A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes | |
Martin CJ Maiden2  Julian Parkhill1  Keith A Jolley2  Craig Corton1  Holly B Bratcher2  | |
[1] The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK;Department of Zoology, University of Oxford, Oxford, UK | |
关键词: Bacterial population genomics; rST; rMLST; cgMLST; Gene-by-gene analysis; BIGSdb; de novo assembly; Neisseria meningitidis; | |
Others : 1125685 DOI : 10.1186/1471-2164-15-1138 |
|
received in 2014-10-02, accepted in 2014-12-04, 发布年份 2014 | |
【 摘 要 】
Background
Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary.
Results
The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database.
Conclusions
The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
【 授权许可】
2014 Bratcher et al.; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150217023846284.pdf | 1650KB | download | |
Figure 4. | 142KB | Image | download |
Figure 3. | 215KB | Image | download |
Figure 2. | 57KB | Image | download |
Figure 1. | 131KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Medini D, Serruto D, Parkhill J, Relman DA, Donati C, Moxon R, Falkow S, Rappuoli R: Microbiology in the post-genomic era. Nat Rev Microbiol 2008, 6(6):419-430.
- [2]Dohm JC, Lottaz C, Borodina T, Himmelbauer H: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 2007, 17(11):1697-1706.
- [3]Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res 2008, 18(5):810-820.
- [4]Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J: De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome Res 2008, 18(5):802-809.
- [5]Farrer RA, Kemen E, Jones JD, Studholme DJ: De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiol Lett 2009, 291(1):103-111.
- [6]Nishito Y, Osana Y, Hachiya T, Popendorf K, Toyoda A, Fujiyama A, Itaya M, Sakakibara Y: Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data. BMC Genomics 2010, 11:243. BioMed Central Full Text
- [7]Nagarajan H, Butler JE, Klimes A, Qiu Y, Zengler K, Ward J, Young ND, Methe BA, Palsson BO, Lovley DR, Barrett CL: De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads. PLoS One 2010, 5(6):e10922.
- [8]Silva A, Schneider MPC, Cerdeira L, Barbosa MS, Ramos RTJ, Carneiro AR, Santos R, Lima M, D'Afonseca V, Almeida SS, Santos AR, Soares SC, Pinto AC, Ali A, Dorella FA, Rocha F, de Abreu VAC, Trost E, Tauch A, Shpigel N, Miyoshi A, Azevedo V: Complete Genome Sequence of Corynebacterium pseudotuberculosis I19, a Strain Isolated from a Cow in Israel with Bovine Mastitis. J Bacteriol 2011, 193(1):323-324.
- [9]Cerdeira LT, Carneiro AR, Ramos RTJ, de Almeida SS, D'Afonseca V, Schneider MPC, Baumbach J, Tauch A, McCulloch JA, Azevedo VAC, Silva A: Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study. J Microbiol Meth 2011, 86(2):218-223.
- [10]Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nat Methods 2010, 7(6):479-479.
- [11]Ronen R, Boucher C, Chitsaz H, Pevzner P: SEQuel: improving the accuracy of genome assemblies. Bioinformatics 2012, 28(12):i188-196.
- [12]Chain PSG, Grafham DV, Fulton RS, FitzGerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, et al.: Genome Project Standards in a New Era of Sequencing. Science 2009, 326(5950):236-237.
- [13]Rodrigue S, Malmstrom RR, Berlin AM, Birren BW, Henn MR, Chisholm SW: Whole Genome Amplification and De novo Assembly of Single Bacterial Cells. PLoS One 2009, 4(9):e6864.
- [14]Earl AM, Eppinger M, Fricke WF, Rosovitz MJ, Rasko DA, Daugherty S, Losick R, Kolter R, Ravel J: Whole-Genome Sequences of Bacillus subtilis and Close Relatives. J Bacteriol 2012, 194(9):2378-2379.
- [15]Maiden MC, van Rensburg MJ, Bray JE, Earle SG, Ford SA, Jolley KA, McCarthy ND: MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013, 11(10):728-736.
- [16]Maiden MCJ, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG: Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 1998, 95(6):3140-3145.
- [17]Jolley KA, Maiden MC: BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 2010, 11(1):595. BioMed Central Full Text
- [18]Caugant DA: Population genetics and molecular epidemiology of Neisseria meningitidis. APMIS 1998, 106(5):505-525.
- [19]Yazdankhah SP, Caugant DA: Neisseria meningitidis: an overview of the carriage state. J Med Microbiol 2004, 53(Pt 9):821-832.
- [20]Neal KR: Changing carriage rate of Neisseria meningitidis among university students during the first week of term: cross sectional study. BMJ 2000, 320(7238):846-849.
- [21]Caugant DA, Maiden MC: Meningococcal carriage and disease - population biology and evolution. Vaccine 2009, 27(Suppl 2):B64-70.
- [22]Marri PR, Paniscus M, Weyand NJ, Rendon MA, Calton CM, Hernandez DR, Higashi DL, Sodergren E, Weinstock GM, Rounsley SD, So M: Genome sequencing reveals widespread virulence gene exchange among human Neisseria species. PLoS One 2010, 5(7):e11835.
- [23]Schoen C, Blom J, Claus H, Schramm-Gluck A, Brandt P, Muller T, Goesmann A, Joseph B, Konietzny S, Kurzai O, Schmitt C, Friedrich T, Linke B, Vogel U, Frosch M: Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis. Proc Natl Acad Sci USA 2008, 105(9):3473-3478.
- [24]Joseph B, Schneiker-Bekel S, Schramm-Gluck A, Blom J, Claus H, Linke B, Schwarz RF, Becker A, Goesmann A, Frosch M, Schoen C: Comparative genome biology of a serogroup B carriage and disease strain supports a polygenic nature of meningococcal virulence. J Bacteriol 2010, 192(20):5363-5377.
- [25]Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18(5):821-829.
- [26]Zerbino D: Using the Velvet de novo Assembler for Short-Read Sequencing Technologies. Curr Protoc Bioinformatics 2010, 11(5):1-12.
- [27]Didelot X, Urwin R, Maiden MC, Falush D: Genealogical typing of Neisseria meningitidis. Microbiology 2009, 155(10):3176-3186.
- [28]Holmes EC, Urwin R, Maiden MCJ: The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol Biol Evol 1999, 16(6):741-749.
- [29]Russell JE, Jolley KA, Feavers IM, Maiden MC, Suker J: PorA variable regions of Neisseria meningitidis. Emerg Infect Dis 2004, 10(4):674-678.
- [30]Thompson EAL, Feavers IM, Maiden MCJ: Antigenic diversity of meningococcal enterobactin receptor FetA, a vaccine component. Microbiology 2003, 149(Pt 7):1849-1858.
- [31]Brehony C, Wilson DJ, Maiden MC: Variation of the factor H-binding protein of Neisseria meningitidis. Microbiology 2009, 155:4155-4169.
- [32]Benjamini Y, Speed TP: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 2012, 40(10):e72.
- [33]Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 2008, 36(16):e105.
- [34]Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ: Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G + C)-biased genomes. Nat Methods 2009, 6(4):291-295.
- [35]Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A: Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 2011, 12(2):R18. BioMed Central Full Text
- [36]Jolley KA, Hill DM, Bratcher HB, Harrison OB, Feavers IM, Parkhill J, Maiden MC: Resolution of a meningococcal disease outbreak from whole genome sequence data with rapid web-based analysis methods. J Clin Microbiol 2012, 50(9):3046-3053.
- [37]Budroni S, Siena E, Hotopp JCD, Seib KL, Serruto D, Nofroni C, Comanducci M, Riley DR, Daugherty SC, Angiuoli SV, Covacci A, Pizza M, Rappuoli R, Moxon ER, Tettelin H, Medini D: Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc Natl Acad Sci USA 2011, 108(11):4494-4499.
- [38]Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27(1):29-34.
- [39]Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 2000, 28(1):27-30.
- [40]Jolley KA, Bliss CM, Bennett JS, Bratcher HB, Brehony CM, Colles FM, Wimalarathna HM, Harrison OB, Sheppard SK, Cody AJ, Maiden MC: Ribosomal Multi-Locus Sequence Typing: universal characterization of bacteria from domain to strain. Microbiology 2012, 158:1005-1015.
- [41]Jolley KA, Maiden MC: Using MLST to study bacterial variation: prospects in the genomic era. Future Microbiol 2014, 9:623-630.
- [42]Loman NJ, Constantinidou C, Chan JZM, Halachev M, Sergeant M, Penn CW, Robinson ER, Pallen MJ: High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat Rev Microbiol 2012, 10(9):599-606.
- [43]Aury JM, Cruaud C, Barbe V, Rogier O, Mangenot S, Samson G, Poulain J, Anthouard V, Scarpelli C, Artiguenave F, Wincker P: High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies. BMC Genomics 2008, 9:603. BioMed Central Full Text
- [44]Reuter S, Ellington MJ, Cartwright EJ, Koser CU, Torok ME, Gouliouris T, Harris SR, Brown NM, Holden MT, Quail M, Parkhill J, Smith GP, Bentley SD, Peacock SJ: Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology. JAMA Intern Med 2013, 173(15):1397-1404.
- [45]Bratcher HB, Bennett JS, Maiden MCJ: Evolutionary and genomic insights into meningococcal biology. Future Microbiol 2012, 7(7):873-885.
- [46]Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, Klee SR, Morelli G, Basham D, Brown D, Chillingworth T, Davies RM, Davis P, Devlin K, Feltwell T, Hamlin N, Holroyd S, Jagels K, Leather S, Moule S, Mungall K, Quail MA, Rajandream MA, Rutherford KM, Simmonds M, Skelton J, Whitehead S, Spratt BG, Barrell BG: Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature 2000, 404(6777):502-506.
- [47]Bentley SD, Vernikos GS, Snyder LA, Churcher C, Arrowsmith C, Chillingworth T, Cronin A, Davis PH, Holroyd NE, Jagels K, Maddison M, Moule S, Rabbinowitsch E, Sharp S, Unwin L, Whitehead S, Quail MA, Achtman M, Barrell B, Saunders NJ, Parkhill J: Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18. PLoS Genet 2007, 3(2):e23.
- [48]Tettelin H, Saunders NJ, Heidelberg J, Jeffries AC, Nelson KE, Eisen JA, Ketchum KA, Hood DW, Peden JF, Dodson RJ, Nelson WC, Gwinn ML, DeBoy R, Peterson JD, Hickey EK, Haft DH, Salzberg SL, White O, Fleischmann RD, Dougherty BA, Mason T, Ciecko A, Parksey DS, Blair E, Cittone H, Clark EB, Cotton MD, Utterback TR, Khouri H, Qin H, et al.: Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 2000, 287(5459):1809-1815.
- [49]Wetzel J, Kingsford C, Pop M: Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics 2011, 12:95. BioMed Central Full Text
- [50]Treangen TJ, Salzberg SL: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics 2012, 13(2):36.
- [51]Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, Swerdlow H, Turner DJ: A large genome center's improvements to the Illumina sequencing system. Nat Methods 2008, 5(12):1005-1010.
- [52]Quail MA, Swerdlow H, Turner DJ: Improved protocols for the illumina genome analyzer sequencing system. Curr Protoc Hum Genet 2009., 18doi:10.1002/0471142905.hg1802s62
- [53]Paszkiewicz K, Studholme DJ: De novo assembly of short sequence reads. Brief Bioinform 2010, 11(5):457-472.
- [54]Demerec M, Adelberg EA, Clark AJ, Hartman PE: A proposal for a uniform nomenclature in bacterial genetics. Genetics 1966, 54(1):61-76.
- [55]Bambini S, De Chiara M, Muzzi A, Mora M, Lucidarme J, Brehony C, Borrow R, Masignani V, Comanducci M, Maiden MC, Rappuoli R, Pizza M, Jolley KA: Neisseria adhesin A variation and revised nomenclature scheme. Clin Vaccine Immunol 2014, 21:966-971.
- [56]Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-genome. Curr Opin Genet Dev 2005, 15(6):589-594.
- [57]Bennett JS, Bentley SD, Vernikos GS, Quail MA, Cherevach I, White B, Parkhill J, Maiden MCJ: Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06. BMC Genomics 2010, 11:652. BioMed Central Full Text
- [58]Joseph B, Schwarz RF, Linke B, Blom J, Becker A, Claus H, Goesmann A, Frosch M, Muller T, Vogel U, Schoen C: Virulence evolution of the human pathogen Neisseria meningitidis by recombination in the core and accessory genome. PLoS One 2011, 6(4):e18441.
- [59]Hotopp JC, Grifantini R, Kumar N, Tzeng YL, Fouts D, Frigimelica E, Draghi M, Giuliani MM, Rappuoli R, Stephens DS, Grandi G, Tettelin H: Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes. Microbiology 2006, 152(Pt 12):3733-3749.
- [60]Brehony C, Jolley KA, Maiden MC: Multilocus sequence typing for global surveillance of meningococcal disease. FEMS Microbiol Rev 2007, 31(1):15-26.
- [61]Brehony C, Trotter CL, Ramsay ME, Chandra M, Jolley KA, van der Ende A, Carion F, Berthelsen L, Hoffmann S, Harðardóttir H, Vazquez J, Murphy K, Toropainen M, Caniça M, Ferreira E, Diggle M, Edwards G, Taha M-K, Stefanelli P, Kriz P, Gray S, Fox A, Jacobsson S, Claus H, Vogel U, Tzanakaki G, Heuberger S, Caugant DA, Frosch M, Maiden MCJ: Differential age distribution of disease-associated meningococcal lineages-Implications for vaccine development. Clin Vaccine Immunol 2014, 21(6):847-853.
- [62]Watkins ER, Maiden MC: Persistence of hyperinvasive meningococcal strain types during global spread as recorded in the PubMLST database. PLoS ONE 2012, 7(9):e45349.
- [63]Didelot X, Falush D: Inference of bacterial microevolution using multilocus sequence data. Genetics 2007, 175(3):1251-1266.
- [64]Caugant DA, Mocca LF, Frasch CE, Frøholm LO, Zollinger WD, Selander RK: Genetic structure of Neisseria meningitidis populations in relation to serogroup, serotype, and outer membrane protein pattern. J Bacteriol 1987, 169(6):2781-2792.
- [65]Olyhoek T, Crowe BA, Achtman M: Clonal population structure of Neisseria meningitidis serogroup A isolated from epidemics and pandemics between 1915 and 1983. Rev Infect Dis 1987, 9:665-682.
- [66]Harrison OB, Claus H, Jiang Y, Bennett JS, Bratcher HB, Jolley KA, Corton C, Care R, Poolman JT, Zollinger WD, Frasch CE, Stephens DS, Feavers I, Frosch M, Parkhill J, Vogel U, Quail MA, Bentley SD, Maiden MCJ: Description and nomenclature of Neisseria meningitidis capsule locus. Emerg Infect Dis 2013, 19(4):566-573.
- [67]Bille E, Ure R, Gray SJ, Kaczmarski EB, McCarthy ND, Nassif X, Maiden MC, Tinsley CR: Association of a bacteriophage with meningococcal disease in young adults. PLoS ONE 2008, 3(12):e3885.
- [68]Bille E, Zahar JR, Perrin A, Morelle S, Kriz P, Jolley KA, Maiden MC, Dervin C, Nassif X, Tinsley CR: A chromosomally integrated bacteriophage in invasive meningococci. J Exp Med 2005, 201(12):1905-1913.
- [69]Claus H, Friedrich A, Frosch M, Vogel U: Differential distribution of novel restriction-modification systems in clonal lineages of Neisseria meningitidis. J Bacteriol 2000, 182(5):1296-1303.
- [70]Urwin R, Russell JE, Thompson EA, Holmes EC, Feavers IM, Maiden MC: Distribution of Surface Protein Variants among Hyperinvasive Meningococci: Implications for Vaccine Design. Infect Immun 2004, 72(10):5955-5962.
- [71]Staden R: The Staden sequence analysis package. Mol Biotechnol 1996, 5:233-241.
- [72]Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011, 28(10):2731-2739.
- [73]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. BioMed Central Full Text
- [74]Milne I, Bayer M, Cardle L, Shaw P, Stephen G, Wright F, Marshall D: Tablet–next generation sequence assembly visualization. Bioinformatics 2010, 26(3):401-402.