期刊论文详细信息
BMC Bioinformatics
PlasForest: a homology-based random forest classifier for plasmid detection in genomic datasets
Stéphanie Bedhomme1  Léa Pradier1  Tazzio Tissot2  Anna-Sophie Fiston-Lavier3 
[1]Centre d’Ecologie Fonctionnelle et Evolutive, CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, Ecole Pratique des Hautes Etudes, Institut de Recherche Pour le Développement, 34000, Montpellier, France
[2]Genomics, Bioinformatics and Evolution. Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Spain
[3]Centre de Recerca Matemàtica, 08193, Cerdanyola del Vallès, Spain
[4]Institut des Sciences de l’Evolution de Montpellier (ISE-M), Equipe Evolution, Vecteurs, Adaptation et Symbiose, UMR 5554, CNRS-Université Montpellier, 34090, Montpellier Cedex 05, France
关键词: Plasmid identification;    Homology;    Random forest classifier;    Genomic datasets;   
DOI  :  10.1186/s12859-021-04270-w
来源: Springer
PDF
【 摘 要 】
BackgroundPlasmids are mobile genetic elements that often carry accessory genes, and are vectors for horizontal transfer between bacterial genomes. Plasmid detection in large genomic datasets is crucial to analyze their spread and quantify their role in bacteria adaptation and particularly in antibiotic resistance propagation. Bioinformatics methods have been developed to detect plasmids. However, they suffer from low sensitivity (i.e., most plasmids remain undetected) or low precision (i.e., these methods identify chromosomes as plasmids), and are overall not adapted to identify plasmids in whole genomes that are not fully assembled (contigs and scaffolds).ResultsWe developed PlasForest, a homology-based random forest classifier identifying bacterial plasmid sequences in partially assembled genomes. Without knowing the taxonomical origin of the samples, PlasForest identifies contigs as plasmids or chromosomes with a F1 score of 0.950. Notably, it can detect 77.4% of plasmid contigs below 1 kb with 2.8% of false positives and 99.9% of plasmid contigs over 50 kb with 2.2% of false positives.ConclusionsPlasForest outperforms other currently available tools on genomic datasets by being both sensitive and precise. The performance of PlasForest on metagenomic assemblies are currently well below those of other k-mer-based methods, and we discuss how homology-based approaches could improve plasmid detection in such datasets.
【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202107223430353ZK.pdf 2303KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:41次