期刊论文详细信息
BMC Research Notes
Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
Karsten Suhre1  Alice Abdel Aleem3  Mahmoud F Elsaid2  Nader Chalhoub3  Wadha Ahmed Al Muftah3  Mashael Al-Shafai3  Pankaj Kumar3 
[1] Institute of Bioinformatics and System Biology, Helmholtz Zentrum Munchen, German Research Center of Environmental Health, Nuherberg, Germany;Neuropediatrics Department, Hamad Medical Corporation, Doha, Qatar;Weill Cornell Medical College in Qatar, Education City, Doha, Qatar
关键词: Illumina;    Trios;    Variant;    Genotype calling;    Multi-sample calling;    Qatari population;    Mendelian inheritance;    WGS pipeline;    CASAVA;    GATK;    NGS;   
Others  :  1127155
DOI  :  10.1186/1756-0500-7-747
 received in 2014-05-06, accepted in 2014-10-03,  发布年份 2014
PDF
【 摘 要 】

Background

With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms.

Results

Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype.

Conclusion

Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.

【 授权许可】

   
2014 Kumar et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150220023837750.pdf 883KB PDF download
Figure 6. 64KB Image download
Figure 5. 60KB Image download
Figure 4. 101KB Image download
Figure 3. 70KB Image download
Figure 2. 30KB Image download
Figure 1. 60KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20(9):1297-1303.
  • [2]DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011, 43(5):491-498.
  • [3]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: Genome project data processing S: the sequence alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
  • [4]Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J: SNP detection for massively parallel whole-genome resequencing. Genome Res 2009, 19(6):1124-1132.
  • [5]Li S, Li R, Li H, Lu J, Li Y, Bolund L, Schierup MH, Wang J: SOAPindel: efficient identification of indels from short paired reads. Genome Res 2013, 23(1):195-200.
  • [6]Bai Y, Cavalcoli J: SNPAAMapper: an efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data. Bioinformation 2013, 9(17):870-872.
  • [7]D’Antonio M, D’Onorio De Meo P, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignano T: WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinform 2013, 14(Suppl 7):S11. BioMed Central Full Text
  • [8]Evani US, Challis D, Yu J, Jackson AR, Paithankar S, Bainbridge MN, Jakkamsetti A, Pham P, Coarfa C, Milosavljevic A, Yu F: Atlas2 cloud: a framework for personal genome analysis in the cloud. BMC Genomics 2012, 13(Suppl 6):S19. BioMed Central Full Text
  • [9]Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ: Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 2011, 56(6):406-414.
  • [10]Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z: A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2013.
  • [11]Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 2011, 12(6):443-451.
  • [12]Wu CC, Lin YH, Lu YC, Chen PJ, Yang WS, Hsu CJ, Chen PL: Application of massively parallel sequencing to genetic diagnosis in multiplex families with idiopathic sensorineural hearing impairment. PLoS One 2013, 8(2):e57369.
  • [13]Raczy C, Petrovski R, Saunders CT, Chorny I, Kruglyak S, Margulies EH, Chuang HY, Kallberg M, Kumar SA, Liao A, Little KM, Stromberg MP, Tanner SW: Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 2013, 29(16):2041-2043.
  • [14]Chen W, Li B, Zeng Z, Sanna S, Sidore C, Busonero F, Kang HM, Li Y, Abecasis GR: Genotype calling and haplotyping in parent-offspring trios. Genome Res 2013, 23(1):142-151.
  • [15]Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, Genomes Project Analysis Group: The variant call format and VCFtools. Bioinformatics 2011, 27(15):2156-2158.
  • [16]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9(4):357-359.
  • [17]Picard http://broadinstitute.github.io/picard/ webcite
  • [18]Clarke L, Zheng-Bradley X, Smith R, Kulesha E, Xiao C, Toneva I, Vaughan B, Preuss D, Leinonen R, Shumway M, Sherry S, Flicek P, Genomes Project Consortium: The 1000 genomes project: data management and community access. Nat Methods 2012, 9(5):459-462.
  • [19]Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29(1):308-311.
  • [20]Ritchie ME, Liu R, Carvalho BS, Irizarry RA, Australia, New Zealand Multiple Sclerosis Genetics C: Comparing genotyping algorithms for Illumina’s Infinium whole-genome SNP BeadChips. BMC bioinformatics 2011, 12:68. BioMed Central Full Text
  • [21]Cingolani P, Platts A, le Wang L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6(2):80-92.
  • [22]Makarov V, O’Grady T, Cai G, Lihm J, Buxbaum JD, Yoon S: AnnTools: a comprehensive and versatile annotation toolkit for genomic variants. Bioinformatics 2012, 28(5):724-725.
  • [23]Keller I, Bensasson D, Nichols RA: Transition-transversion bias is not universal: a counter example from grasshopper pseudogenes. PLoS Genet 2007, 3(2):e22.
  • [24]Ozcelik T, Akarsu N, Uz E, Caglayan S, Gulsuner S, Onat OE, Tan M, Tan U: Mutations in the very low-density lipoprotein receptor VLDLR cause cerebellar hypoplasia and quadrupedal locomotion in humans. Proc Natl Acad Sci U S A 2008, 105(11):4232-4236.
  • [25]Ozcelik T, Akarsu N, Uz E, Caglayan S, Gulsuner S, Onat OE, Tan M, Tan U: Reply to herz et al. And Humphrey et al.: genetic heterogeneity of cerebellar hypoplasia with quadrupedal locomotion. Proc Natl Acad Sci U S A 2008, 105(23):E32-E33.
  • [26]Glass HC, Boycott KM, Adams C, Barlow K, Scott JN, Chudley AE, Fujiwara TM, Morgan K, Wirrell E, McLeod DR: Autosomal recessive cerebellar hypoplasia in the Hutterite population. Dev Med Child Neurol 2005, 47(10):691-695.
  • [27]Dilber E, Aynaci FM, Ahmetoglu A: Pontocerebellar hypoplasia in two siblings with dysmorphic features. J Child Neurol 2002, 17(1):64-66.
  • [28]Boycott KM, Flavelle S, Bureau A, Glass HC, Fujiwara TM, Wirrell E, Davey K, Chudley AE, Scott JN, McLeod DR, Parboosingh JS: Homozygous deletion of the very low density lipoprotein receptor gene causes autosomal recessive cerebellar hypoplasia with cerebral gyral simplification. Am J Hum Genet 2005, 77(3):477-483.
  • [29]Fanconi G, Ferrazzini F: Congenital analgia (congenital generalized pain indifference). Helvetica paediatrica acta 1957, 12(1):79-115.
  • [30]Bertoye A, Carron R, Rosenberg D, Cotton JB, Michel M: Apropos of a case of congenital indifference to pain (Universal congenital analgesia). pathogenic hypothesis. Pediatrie 1964, 19:605-608.
  • [31]Silverman FN, Gilden JJ: Congenital insensitivity to pain: a neurologic syndrome with bizarre skeletal lesions. Radiology 1959, 72(2):176-190.
  • [32]Thiemann HH: Congenital analgia (congenital universal absence of pain). Archiv fur Kinderheilkunde 1961, 164:255-262.
  • [33]Ogden TE, Robert F, Carmichael EA: Some sensory syndromes in children: indifference to pain and sensory neuropathy. J Neurol Neurosurg Psychiatry 1959, 22:267-276.
  • [34]Mathews KD, Afifi AK, Hanson JW: Autosomal recessive cerebellar hypoplasia. J Child Neurol 1989, 4(3):189-194.
  文献评价指标  
  下载次数:59次 浏览次数:24次