期刊论文详细信息
Frontiers in Microbiology
plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
Microbiology
Janik Sielemann1  Tomáš Vinař2  Broňa Brejová3  Cedric Chauve4  Katharina Sielemann5 
[1] Computational Biology, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, Germany;Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia;Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia;Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada;Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, Germany;
关键词: bioinformatics;    machine learning (ML);    classification;    plasmids;    assembly graph;   
DOI  :  10.3389/fmicb.2023.1267695
 received in 2023-07-26, accepted in 2023-09-08,  发布年份 2023
来源: Frontiers
PDF
【 摘 要 】

Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.

【 授权许可】

Unknown   
Copyright © 2023 Sielemann, Sielemann, Brejová, Vinař and Chauve.

【 预 览 】
附件列表
Files Size Format View
RO202311143002931ZK.pdf 1316KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:0次