Frontiers in Microbiology | |
Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli | |
Microbiology | |
Fabien Vorimore1  Mai-Lan Tran2  Patrick Fach2  Sabine Delannoy2  Sandra Jaudou2  Hugues Richard3  | |
[1] ANSES, Laboratory for Food Safety, Genomics Platform IdentyPath, Maisons-Alfort, France;ANSES, Laboratory for Food Safety, Genomics Platform IdentyPath, Maisons-Alfort, France;ANSES, Laboratory for Food Safety, COLiPATH Unit, Maisons-Alfort, France;Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, Berlin, Germany; | |
关键词: machine learning; Escherichia coli; food safety; metagenomics; raw milk; | |
DOI : 10.3389/fmicb.2023.1118158 | |
received in 2022-12-07, accepted in 2023-04-21, 发布年份 2023 | |
来源: Frontiers | |
【 摘 要 】
IntroductionThe objective of this study was to develop, using a genome wide machine learning approach, an unambiguous model to predict the presence of highly pathogenic STEC in E. coli reads assemblies derived from complex samples containing potentially multiple E. coli strains. Our approach has taken into account the high genomic plasticity of E. coli and utilized the stratification of STEC and E. coli pathogroups classification based on the serotype and virulence factors to identify specific combinations of biomarkers for improved characterization of eae-positive STEC (also named EHEC for enterohemorrhagic E.coli) which are associated with bloody diarrhea and hemolytic uremic syndrome (HUS) in human.MethodsThe Machine Learning (ML) approach was used in this study on a large curated dataset composed of 1,493 E. coli genome sequences and 1,178 Coding Sequences (CDS). Feature selection has been performed using eight classification algorithms, resulting in a reduction of the number of CDS to six. From this reduced dataset, the eight ML models were trained with hyper-parameter tuning and cross-validation steps.Results and discussionIt is remarkable that only using these six genes, EHEC can be clearly identified from E. coli read assemblies obtained from in silico mixtures and complex samples such as milk metagenomes. These various combinations of discriminative biomarkers can be implemented as novel marker genes for the unambiguous EHEC characterization from different E. coli strains mixtures as well as from raw milk metagenomes.
【 授权许可】
Unknown
Copyright © 2023 Vorimore, Jaudou, Tran, Richard, Fach and Delannoy.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202310108061247ZK.pdf | 1525KB | download |