期刊论文

【摘要】

To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.

【授权许可】

【预览】

附件列表
Files	Size	Format	View
RO202310103997460ZK.pdf	808KB	PDF	download

Frontiers in Genetics
Performance comparisons between clustering models for reconstructing NGS results from technical replicates
Genetics
Maxime Vallée¹ Yue Zhai² Pascal Roy³ Jean Iwaz³ Claire Bardel⁴
[1] Cellule Bioinformatique de La Plateforme de Séquençage Haut Débit NGS-HCL, Hospices Civils de Lyon, Bron, France;Université Lyon 1, Lyon, France;Université de Lyon, Lyon, France;Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France;Université Lyon 1, Lyon, France;Université de Lyon, Lyon, France;Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France;Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France;Université Lyon 1, Lyon, France;Université de Lyon, Lyon, France;Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France;Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France;Service de Génétique, Hospices Civils de Lyon, Bron, France;
关键词: next generating sequencing; performance evaluation; clustering model; replicate analysis; sensitivity;
DOI : 10.3389/fgene.2023.1148147
received in 2023-01-19, accepted in 2023-03-06, 发布年份 2023
来源: Frontiers
PDF


	文献评价指标
	下载次数：3次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】