BMC Genomics,2017年
David S. Campo, June Zhang, Sumathi Ramachandran, Yury Khudyakov
LicenseType:Unknown |
BMC Genomics,2017年
Sriram P. Chockalingam, Yury Khudyakov, Yueli Zheng, Seth Sims, David S. Campo, Amanda Sue, Inna Rytsareva, Cansu Tetik, Jain Chirag, Sharma V. Thankachan, Srinivas Aluru
LicenseType:CC BY |
BackgroundHepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed.The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples.MethodsWe developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes.ResultsOur three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold.ConclusionsWe present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data.
BMC Genomics,2017年
Yury Khudyakov, Sumathi Ramachandran, David S. Campo, June Zhang
LicenseType:CC BY |
BackgroundIntra-host hepatitis C virus (HCV) populations are genetically heterogeneous and organized in subpopulations. With the exception of blood transfusions, transmission of HCV occurs via a small number of genetic variants, the effect of which is frequently described as a bottleneck. Stochasticity of transmission associated with the bottleneck is usually used to explain genetic differences among HCV populations identified in the source and recipient cases, which may be further exacerbated by intra-host HCV evolution and differential biological capacity of HCV variants to successfully establish a population in a new host.ResultsTransmissibility was formulated as a property that can be measured from experimental Ultra-Deep Sequencing (UDS) data. The UDS data were obtained from one large hepatitis C outbreak involving an epidemiologically defined source and 18 recipient cases. k-Step networks of HCV variants were constructed and used to identify a potential association between transmissibility and network centrality of individual HCV variants from the source. An additional dataset obtained from nine other HCV outbreaks with known directionality of transmission was used for validation.Transmissibility was not found to be dependent on high frequency of variants in the source, supporting the earlier observations of transmission of minority variants. Among all tested measures of centrality, the highest correlation of transmissibility was found with Hamming centrality (r = 0.720; p = 1.57 E-71). Correlation between genetic distances and differences in transmissibility among HCV variants from the source was found to be 0.3276 (Mantel Test, p = 9.99 E-5), indicating association between genetic proximity and transmissibility. A strong correlation ranging from 0.565–0.947 was observed between Hamming centrality and transmissibility in 7 of the 9 additional transmission clusters (p < 0.05).ConclusionsTransmission is not an exclusively stochastic process. Transmissibility, as formally measured in this study, is associated with certain biological properties that also define location of variants in the genetic space occupied by the HCV strain from the source. The measure may also be applicable to other highly heterogeneous viruses. Besides improving accuracy of outbreak investigations, this finding helps with the understanding of molecular mechanisms contributing to establishment of chronic HCV infection.
BMC Genomics,2017年
Sriram P. Chockalingam, Yury Khudyakov, Yueli Zheng, Seth Sims, David S. Campo, Amanda Sue, Inna Rytsareva, Cansu Tetik, Jain Chirag, Sharma V. Thankachan, Srinivas Aluru
LicenseType:CC BY |
BackgroundHepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed.The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples.MethodsWe developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes.ResultsOur three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold.ConclusionsWe present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data.
BMC Genomics,2017年
Yury Khudyakov, Sumathi Ramachandran, David S. Campo, June Zhang
LicenseType:CC BY |
BackgroundIntra-host hepatitis C virus (HCV) populations are genetically heterogeneous and organized in subpopulations. With the exception of blood transfusions, transmission of HCV occurs via a small number of genetic variants, the effect of which is frequently described as a bottleneck. Stochasticity of transmission associated with the bottleneck is usually used to explain genetic differences among HCV populations identified in the source and recipient cases, which may be further exacerbated by intra-host HCV evolution and differential biological capacity of HCV variants to successfully establish a population in a new host.ResultsTransmissibility was formulated as a property that can be measured from experimental Ultra-Deep Sequencing (UDS) data. The UDS data were obtained from one large hepatitis C outbreak involving an epidemiologically defined source and 18 recipient cases. k-Step networks of HCV variants were constructed and used to identify a potential association between transmissibility and network centrality of individual HCV variants from the source. An additional dataset obtained from nine other HCV outbreaks with known directionality of transmission was used for validation.Transmissibility was not found to be dependent on high frequency of variants in the source, supporting the earlier observations of transmission of minority variants. Among all tested measures of centrality, the highest correlation of transmissibility was found with Hamming centrality (r = 0.720; p = 1.57 E-71). Correlation between genetic distances and differences in transmissibility among HCV variants from the source was found to be 0.3276 (Mantel Test, p = 9.99 E-5), indicating association between genetic proximity and transmissibility. A strong correlation ranging from 0.565–0.947 was observed between Hamming centrality and transmissibility in 7 of the 9 additional transmission clusters (p < 0.05).ConclusionsTransmission is not an exclusively stochastic process. Transmissibility, as formally measured in this study, is associated with certain biological properties that also define location of variants in the genetic space occupied by the HCV strain from the source. The measure may also be applicable to other highly heterogeneous viruses. Besides improving accuracy of outbreak investigations, this finding helps with the understanding of molecular mechanisms contributing to establishment of chronic HCV infection.
BMC Genomics,2017年
Mahder Teka, James Lara, Yury Khudyakov
LicenseType:CC BY |
BackgroundIdentification of acute or recent hepatitis C virus (HCV) infections is important for detecting outbreaks and devising timely public health interventions for interruption of transmission. Epidemiological investigations and chemistry-based laboratory tests are 2 main approaches that are available for identification of acute HCV infection. However, owing to complexity, both approaches are not efficient. Here, we describe a new sequence alignment-free method to discriminate between recent (R) and chronic (C) HCV infection using next-generation sequencing (NGS) data derived from the HCV hypervariable region 1 (HVR1).ResultsUsing dinucleotide auto correlation (DAC), we identified physical-chemical (PhyChem) features of HVR1 variants. Significant (p < 9.58 × 10−4) differences in the means and frequency distributions of PhyChem features were found between HVR1 variants sampled from patients with recent vs chronic (R/C) infection. Moreover, the R-associated variants were found to occupy distinct and discrete PhyChem spaces. A radial basis function neural network classifier trained on the PhyChem features of intra-host HVR1 variants accurately classified R/C-HVR1 variants (classification accuracy (CA) = 94.85%; area under the ROC curve, AUROC = 0.979), in 10-fold cross-validation). The classifier was accurate in assigning individual HVR1 variants to R/C-classes in the testing set (CA = 84.15%; AUROC = 0.912) and in detection of infection duration (R/C-class) in patients (CA = 88.45%). Statistical tests and evaluation of the classifier on randomly-labeled datasets indicate that classifiers’ CA is robust (p < 0.001) and unlikely due to random correlations (CA = 59.04% and AUROC = 0.50).ConclusionsThe PhyChem features of intra-host HVR1 variants are strongly associated with the duration of HCV infection. Application of the PhyChem biomarkers to models for detection of the R/C-state of HCV infection in patients offers a new opportunity for detection of outbreaks and for molecular surveillance. The method will be available at https://webappx.cdc.gov/GHOST/ to the authenticated users of Global Hepatitis Outbreak and Surveillance Technology (GHOST) for further testing and validation.