期刊论文详细信息
PeerJ
Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)
article
Matthew Z. DeMaere1  Aaron E. Darling1 
[1]ithree institute, University of Technology Sydney
关键词: 3C;    HiC;    Chromosome conformation capture;    Microbial ecology;    Synthetic microbial communities;    Clustering;    Soft clustering;    External index;    Metagenomics;    Read mapping;    Simulation pipeline;   
DOI  :  10.7717/peerj.2676
学科分类:社会科学、人文和艺术(综合)
来源: Inra
PDF
【 摘 要 】
BackgroundChromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised.MethodsWe developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure.ResultsWhen all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance.DiscussionPreviously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development.
【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307100014642ZK.pdf 1015KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次