BMC Genomics | |
Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes | |
Research Article | |
Frank M Aarestrup1  Rolf S Kaas1  Carsten Friis1  David W Ussery2  | |
[1] DTU Food, The Technical University of Denmark, Kgs Lyngby, Denmark;Department of Systems Biology, Center for Biological Sequence Analysis, The Technical University of Denmark, Kgs Lyngby, Denmark; | |
关键词: Escherichia coli; Core-genome; Pan-genome; Phylogeny; Whole genome sequencing; Genetic variation; Comparative genomics; MLST typing; Phylotyping; | |
DOI : 10.1186/1471-2164-13-577 | |
received in 2012-09-10, accepted in 2012-10-22, 发布年份 2012 | |
来源: Springer | |
【 摘 要 】
BackgroundEscherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques.ResultsWe find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters.A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes.ConclusionThe results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.
【 授权许可】
CC BY
© Kaas et al.; licensee BioMed Central Ltd. 2012
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311093943127ZK.pdf | 1074KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]