BMC Genomics | |
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering | |
Methodology Article | |
Csaba Szalai1  András Gézsi2  Péter Antal3  Peter Sarkozy3  Bence Bolgár3  Péter Marx3  | |
[1] Department of Genetics, Cell- and Immunobiology, Semmelweis University, Nagyvárad tér 4, H-1089, Budapest, Hungary;Department of Genetics, Cell- and Immunobiology, Semmelweis University, Nagyvárad tér 4, H-1089, Budapest, Hungary;Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2, H-1117, Budapest, Hungary;Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2, H-1117, Budapest, Hungary; | |
关键词: Next-generation sequencing; Variant calling; Support Vector Machine; | |
DOI : 10.1186/s12864-015-2050-y | |
received in 2015-06-24, accepted in 2015-10-06, 发布年份 2015 | |
来源: Springer | |
【 摘 要 】
BackgroundThe low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data.ResultsWe evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information.We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants.This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision.ConclusionsVariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller.
【 授权许可】
CC BY
© Gézsi et al. 2015
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311109399154ZK.pdf | 2684KB | download | |
Fig. 4 | 987KB | Image | download |
【 图 表 】
Fig. 4
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]