期刊论文详细信息
Frontiers in Genetics
Performance comparisons between clustering models for reconstructing NGS results from technical replicates
Genetics
Maxime Vallée1  Yue Zhai2  Pascal Roy3  Jean Iwaz3  Claire Bardel4 
[1] Cellule Bioinformatique de La Plateforme de Séquençage Haut Débit NGS-HCL, Hospices Civils de Lyon, Bron, France;Université Lyon 1, Lyon, France;Université de Lyon, Lyon, France;Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France;Université Lyon 1, Lyon, France;Université de Lyon, Lyon, France;Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France;Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France;Université Lyon 1, Lyon, France;Université de Lyon, Lyon, France;Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France;Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France;Service de Génétique, Hospices Civils de Lyon, Bron, France;
关键词: next generating sequencing;    performance evaluation;    clustering model;    replicate analysis;    sensitivity;   
DOI  :  10.3389/fgene.2023.1148147
 received in 2023-01-19, accepted in 2023-03-06,  发布年份 2023
来源: Frontiers
PDF
【 摘 要 】

To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.

【 授权许可】

Unknown   
Copyright © 2023 Zhai, Bardel, Vallée, Iwaz and Roy.

【 预 览 】
附件列表
Files Size Format View
RO202310103997460ZK.pdf 808KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:0次