| PeerJ | |
| Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples | |
| article | |
| Tunc Kayikcioglu1  Jasmine Amirzadegan1  Hugh Rand1  Bereket Tesfaldet1  Ruth E. Timme4  James B. Pettengill1  | |
| [1] Biostatistics and Bioinformatics Staff, Office of Analytics and Outreach, Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park;Joint Institute for Food Safety and Applied Nutrition, University of Maryland College Park, College Park;Oak Ridge Institute for Science and Education;Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park | |
| 关键词: SARS-CoV-2; Bioinformatics; Deconvolution; Wastewater surveillance; | |
| DOI : 10.7717/peerj.14596 | |
| 学科分类:社会科学、人文和艺术(综合) | |
| 来源: Inra | |
PDF
|
|
【 摘 要 】
BackgroundThe accurate identification of SARS-CoV-2 (SC2) variants and estimation of their abundance in mixed population samples (e.g., air or wastewater) is imperative for successful surveillance of community level trends. Assessing the performance of SC2 variant composition estimators (VCEs) should improve our confidence in public health decision making. Here, we introduce a linear regression based VCE and compare its performance to four other VCEs: two re-purposed DNA sequence read classifiers (Kallisto and Kraken2), a maximum-likelihood based method (Lineage deComposition for Sars-Cov-2 pooled samples (LCS)), and a regression based method (Freyja).MethodsWe simulated DNA sequence datasets of known variant composition from both Illumina and Oxford Nanopore Technologies (ONT) platforms and assessed the performance of each VCE. We also evaluated VCEs performance using publicly available empirical wastewater samples collected for SC2 surveillance efforts. Bioinformatic analyses were performed with a custom NextFlow workflow (C-WAP, CFSAN Wastewater Analysis Pipeline). Relative root mean squared error (RRMSE) was used as a measure of performance with respect to the known abundance and concordance correlation coefficient (CCC) was used to measure agreement between pairs of estimators.ResultsBased on our results from simulated data, Kallisto was the most accurate estimator as it had the lowest RRMSE, followed by Freyja. Kallisto and Freyja had the most similar predictions, reflected by the highest CCC metrics. We also found that accuracy was platform and amplicon panel dependent. For example, the accuracy of Freyja was significantly higher with Illumina data compared to ONT data; performance of Kallisto was best with ARTICv4. However, when analyzing empirical data there was poor agreement among methods and variations in the number of variants detected (e.g., Freyja ARTICv4 had a mean of 2.2 variants while Kallisto ARTICv4 had a mean of 10.1 variants).ConclusionThis work provides an understanding of the differences in performance of a number of VCEs and how accurate they are in capturing the relative abundance of SC2 variants within a mixed sample (e.g., wastewater). Such information should help officials gauge the confidence they can have in such data for informing public health decisions.
【 授权许可】
CC BY
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202307100002723ZK.pdf | 365KB |
PDF